- The Aurorean
- Posts
- #13 | Discovering New Galaxies
#13 | Discovering New Galaxies
+ improving AI's factual accuracy, world tuberculosis day, and more
Hello fellow curious minds!
Welcome back to another edition of The Aurorean.
First off, thank you once again to the readers who completed the survey we shared last week. ❤️ Your participation will allow us to craft the most impactful resources possible for you as we continue to grow our operations.
To demonstrate our thanks and celebrate our growing community of thousands of STEM enthusiasts, at the end of this month we will host a raffle giveaway!
We will share more details about this next week, though the only people eligible for our giveaway will be subscribers who have completed our survey.
If you haven’t had the chance to complete our survey yet and want to participate in the raffle, click the link below.
Also, we mentioned last week we have been interviewing domain experts for our neuroscience deep dive! As we continue with this work, we want to understand what our audience thinks of complimentary audio and video content we could also produce in the future. Here is the first question we have for you. You can expect another poll from us next week!
My preferred type of content consumption for STEM topics is |
With that said, on to the news. Wondering what STEM discovered last week?
Let’s find out.
Quote of the Week 💬
Astronomers Serendipitously Discover 49 New Galaxies
“I did not expect to find almost fifty new galaxies in such a short time.”
⌛ The Seven Second Summary: An international team of astronomers from the International Centre for Radio Astronomy discovered 49 new galaxies while they were studying star-forming gas within a single galaxy.
🔬 How It Was Done:
The researchers used a specialized telescope to study the gases that create stars in a radio galaxy.
As they were analyzing their data, they realized the telescope detected gases in other galaxies as well, and these galaxies were previously not known to science.
💡 Why This May Matter: While NASA’s James Webb Space Telescope rightly receives acclaim for the invaluable contributions it has already made to science, other next-gen telescopes specialized in different kinds of astronomical observations continue to make significant discoveries in their own right. As engineering breakthroughs continue to compound, new telescopes will showcase staggering displays of fidelity and precision to help us understand the universe.
🔎 Elements To Consider: While the discovery of new galaxies is meaningful, these types of discoveries often raise more questions than answers because they often reveal unknown aspects of cosmic phenomena and may require astronomical models to refine their parameters to account for additional complexity.
📚 Learn More: ICRAR. Royal Astronomical Society.
Stat of the Week 📊
Evaluating The Factuality Of Long-Form Responses By AI Models
72%
⌛ The Seven Second Summary: Researchers from Google DeepMind and Stanford University introduced an open-sourced SAFE project designed to fact-check the output of Large Language Models (LLMs).
🔬 How It Was Done:
The researchers first created LongFact, a dataset consisting of 2,280 prompts covering 38 different topics. These prompts allow people to ask a LLM a variety of fact-based questions, such as 'What is the Eiffel Tower?'"
Then, the researchers designed their SAFE method to evaluate the factual accuracy of LLM responses to their dataset of prompt questions.
It does this by breaking down the long-form response into individual components of each plausible fact it generated. For example, in response to ‘What is the Eiffel Tower?', an LLM might generate an answer with 4 different components worth fact checking, such as:
When the Eiffel Tower was built.
Where the Eiffel Tower is located.
How the Eiffel Tower got its name.
What material the Eiffel Tower is made of.
Once the long-form response has been broken down into individual components, the SAFE method queries Google Search to find supporting evidence for each fact, and scores the overall factuality of the LLM’s response to its prompt.
🧮 Key Results:
The researchers benchmarked SAFE against human annotators on a dataset of roughly 16,000 facts and found SAFE’s assessments matched the human ratings 72% of the time.
In a random subset of 100 disagreement cases, SAFE was correct 76% of the time, whereas the human annotators were only correct 19% of the time.
Larger language models generally achieve better long-form factuality. The study benchmarked 13 models across four families (Gemini, Chat GPT, Claude, and PaLM-2) and found model size correlates with factual accuracy.
💡 Why This May Matter: Developments to improve model reliability is much needed because LLMs are already infamous for providing plausible-sounding yet incorrect information. Until a breakthrough emerges, LLM-generated content will continue to demand significant manual scrutiny, which diminishes its overall utility.
🔎 Elements To Consider: The authors compared their SAFE method against “crowdsourced human annotators”, however, they are not specific about the people’s qualifications, fact-checking process and other relevant details. The human evaluators may have answered questions within their domain of expertise, but they may have also answered questions they are not particularly qualified to answer.
Furthermore, since their system queries Google Search results to determine if a fact is true or not, the quality of this system is dependent on Google Search’s reliability. This type of dependency exists for any ‘source of truth’ dataset an AI model is trained on, which is why data quality is crucial for advancing AI reasoning capabilities
AI x Science 🤖
Credit: David Travis on Unsplash
Chat GPT Produces Fast & Accurate Medical Record Notes
In a small pilot study, a team of researchers from Uppsala University tested the quality of Chat GPT’s medical record notes with orthopedic professionals.
They created health records of 6 fictional orthopedic cases. Then, they had junior orthopedic surgeons, experienced orthopedic residents and Chat GPT-4 create discharge notes based on the health records from their fictional orthopedic cases. Afterwards, a panel of 15 experts conducted blind assessments on the quality of the discharge notes.
The results of their expert panel found Chat GPT-4 generated medical record notes of similar quality to the physicians. The average score of discharge summaries from Chat GPT-4 was 80.7, whereas the physicians received an average score of 74.3. Similarly, they received a comparable number of assessments deemed suitable for clinical use without needing any corrections: 38 for Chat GPT-4 and 32 for the physicians. Furthermore, both Chat GPT-4 and the physicians received 2 assessments each that were deemed unsuitable for clinical use.
The researchers also measured how long it took for Chat GPT-4 and the physicians to generate their discharge notes. They found the AI model was 10x faster than the humans. The next step in this research is to expand the study to include 1,000 authentic patient records. It will be interesting to see if these performance results persist at a larger scale. If so, it will serve as another example of how AI can be utilized to manage time-consuming, administrative tasks. Acta Orthopaedica.
Our Full AI Index
Automating Cellular Annotations: Researchers from Columbia and Duke University demonstrated Chat GPT-4 performed comparably to human experts at interpreting and annotating hundreds of different tissue and cell types vital for single-cell RNA sequencing analysis. Similar to the medical record notetaking study mentioned above, this represents yet another example of how AI systems are fast approaching expert-level capabilities to manage repetitive operational burdens for people. Columbia. Nature. Github.
Stroke Assessments: Researchers from the American Academy of Neurology used Chat GPT-4 in a study to locate brain lesions in patients after a stroke. Chat GPT-4 processed text from health histories and neurologic exams, and achieved a sensitivity of 74% and specificity of 87% in identifying the side of the brain with lesions. The model was even better at identifying the region of the brain with lesions, with a sensitivity of 85% and specificity of 94%, respectively. While the model demonstrated promising consistency, its accuracy across all questions and participants was just 41%. Another step in the right direction, with far more work to be done. AAN. Neurology.
AI Leaderboards: For the first time in ~1 year, an AI model aside from a Chat GPT version is considered the best in the world. The new leader is none other than Claude 3 Opus. It was released last month, and while it is currently more expensive to use than Chat GPT, Opus’ cheaper, faster and lightweight version, Haiku, is a mighty model in its own right, claiming the 7th spot in the Chatbot Leaderboard. Hugging Face.
AI Glasses: The New York Times reported Meta is planning to add AI capabilities like language translation and object identification to smart glasses it will release next week. NYT.
Policy: In an effort to minimize the risks of AI, all US federal agencies are now required to have a Chief AI Officer and AI governance boards to oversee AI systems and ensure responsible use. White House.
Other Observations 📰
Mycobacterium tuberculosis, the bacteria that causes TB. Credit: National Institute of Allergy and Infectious Diseases on Unsplash
World Tuberculosis Day & Progress On The Disease
Last week was World Tuberculosis (TB) Day, and several organizations celebrated the occasion by sharing global progress made to eradicate the disease.
TB is caused by the bacteria Mycobacterium tuberculosis. The bacteria spreads through respiratory particles in the air and causes symptoms such as coughing, fatigue and night sweats. It primarily affects the lungs, though it can also damage the brain, kidneys and other organs, all of which can be fatal if untreated.
While TB is still one of the most common causes of deaths globally, the world is trending in the right direction to eradicate the disease. For example, the Prime Minister of Cambodia shared the country has seen a 45% decline in both TB death and new infection since 2000. Furthermore, the WHO shared the continent of Africa is now diagnosing and treating ~70% of TB cases in the region, and exemplary countries like Cabo Verde, Eswatini and South Africa have achieved at least a 50% reduction in TB cases from recent highs.
In 2015, the WHO established a program to end the epidemic of TB by 2035. In order to stay on track for this goal, the world needs to cut TB deaths by 90% and cases by 80% from its 2015 levels by 2030. You can learn more about global data related to TB by visiting Our World In Data.
Our Full Science Index
Clean Energy: Renewables accounted for ~40% of Australia’s total energy generation in 2023. This mark is more than 2x of the country’s renewable energy generation from just 2017. Clean Energy Council.
Cancer Clinical Trial: Novocure met its primary goal for its Phase III clinical trial to delay the time it takes for cancer to progress within the brains of patients who have brain metastases originating from non-small cell lung cancer. Their drug treatment elongated this time period from a median of 11.3 months to 21.9 months. However, this did not achieve their secondary goals, such as improving overall patient survival. Novocure.
Brain Growth: Researchers from UC Davis discovered the human brain from 3,226 participants has progressively grown between the 1930s and the 1970s. It remains unclear what is causing this brain growth, and if human brains have continued to grow into the present day. UC Davis. JAMA Network.
Medical Approvals: The U.S. Food and Drug Administration approved the therapy treatment of a rare, progressive disease called pulmonary arterial hypertension. This disease is a specific type of high blood pressure that affects the heart and lungs by clogging blood vessels. Adding this new therapy to traditional treatments reduced the risk of death from any cause or worsening event from the disease by 84% after a median time of 33 weeks when compared to traditional therapy treatments alone. Merck.
Toxic Chemical Declines: The U.S. Environmental Protection Agency released an analysis highlighting the country has seen a 21% decrease in toxic chemical emissions since 2013. EPA.
Media of the Week 📸
Watch A Mouse Placenta Grow By The Day
Duke researchers designed an imaging tool to view and track the development of a placenta inside a mouse during pregnancy. This imaging technique creates some beautiful colorways. More importantly, it’s a method to understand how genetic, environmental and other lifestyle factors may impact the healthy development of an organ. Duke. Science Advances.
Autonomous Robots Handling Empty Boxes
Another robot video from Agility Robotics. This time you can watch their fleet of machines autonomously carry and manipulate empty crates and boxes in a factory-like environment. There’s a big difference between robots that can move empty boxes and robots that can control the weight distribution of boxes with stuff moving around inside of it, but these developments are promising. There’s a reason why Amazon expanded their partnership with this company.
A Polarized Pic Of A Black Hole At The Center Of The Milky Way
Credit: EHT Collaboration.
A team of international researchers uncovered previously unseen magnetic fields spiraling from the supermassive black hole Sagittarius A*. These magnetic characteristics are similar to the black hole at the center seen in a galaxy, and astronomers are now wondering if magnetic fields are a common characteristic in all black holes. Event Horizon Telescope. Paper 1. Paper 2.
This Week In The Cosmos 🪐
April 8: A total solar eclipse will be visible throughout North America. It will be the last total solar eclipse visible to the region until 2045. Other parts of the world will experience a solar eclipse in 2026.
This night will be a new moon everywhere else in the world.
Credit: Mathew Schwartz on Unsplash
That’s all for this week! Thanks for reading.