#27 | Frontier AI Models Get 20x Cheaper

+ a milestone in exoplant discoveries, clean steel and more

Hello fellow curious minds!

Welcome back to another edition of The Aurorean. If this email was forwarded to you, click here to subscribe to the newsletter.

Head’s up! We have exciting news and developments to share in week’s edition. It is the next step in our journey to provide the best possible value for our growing audience, and we can’t wait to see the impact it will have. ❤️

With that said, wondering what STEM discovered last week?

Let’s find out.

Quote of the Week 💬

GPT-4o mini: 20x Cheaper Yet Nearly As Performant As GPT-4o

“The cost per token of GPT-4o mini has dropped by 99% since text-davinci-003, a less capable model introduced in 2022”

Press Release Statement From OpenAI

⌛ The Seven Second Summary: The team at OpenAI released their latest small model, GPT-4o mini. This model is reportedly around 8 billion parameters in size and appears to rival Google's Gemini 1.5 Flash model and Meta's newly released Llama 3.1 8B model in terms of cost and performance.

🔬How It Was Done: Unfortunately, OpenAI did not release a paper to explain how they developed their model in-depth. However, we can approximate their methodologies thanks to past research papers from OpenAI, Google and Meta’s newly minted Llama 3.1 paper.

  • Train the model on lots of high quality, self generated data. High quality training data has an outsized influence on model performance, so this first step is critical to success.

  • Incorporate a feedback mechanism and verifier models to help the smaller model identify and correct reasoning mistakes it may make.

  • Design the model to reason through problems in a step-by-step format, and use other models to check for good reasoning. In this case, the larger GPT-4o model can act as a teacher for the mini version to learn from and emulate.

  • Filter out low-quality data and poor reasoning skills whenever they are identified to reinforce better reasoning abilities. Have human reviewers provide feedback to help the smaller model produce responses people consider to be high quality.

  • When necessary, use Monte Carlo Tree Search to find more expansive ideas and potential solutions to reason through difficult problems.

🧮Key Results: The cost per token of GPT-4o mini is 20x cheaper compared to GPT-4o and an astonishing 99% cheaper compared to a 2022 model from OpenAI.

💡Why This May Matter: Just last week we highlighted the dramatic decline in the industry's recent cost curves, and this is yet another piece of evidence to underscore the point. The latest model releases may not be making major strides in the quality of their responses, but they are getting cheaper by the week, which is still an indication of exponential improvement.

📚 Learn More: OpenAI. Artificial Analysis.

Stat of the Week 📊

Astronomers Discover 5,500th Exoplanet

5,502

⌛ The Seven Second Summary: Scientists announced the discovery of 6 new exoplanets, bringing the total to 5,502 just 31 years after the first exoplanet was identified in 1992.

🔬How It Was Done: Advancements in imaging instruments and detection methods are primarily driving the remarkable explosion in exoplanet discoveries.

  • For example, five separate teams used a range of ground and space-based instruments to make their 6 exoplanet discoveries, such as NASA's TESS, Spitzer, Hubble, and James Webb Space Telescope. The teams also used a combination of detection methods to find these exoplanets. Two specific methods were of note:

    • Radial Velocity Method: Measures a star's subtle wobble caused by an orbiting exoplanet's gravitational pull. By analyzing these subtle changes in a star's spectrum of light, scientists can detect shifts and identify nearby planets.

    • Transit Method: Detects exoplanets as they pass in front of their stars in orbit. When this happens, the exoplanets temporarily dim the star’s light as a sort of eclipse to help scientists determine the exoplanet's size and orbit.

💡Why This May Matter: Missions like the Habitable Worlds Observatory are searching for signs of life on planets outside our solar system. These missions rely on exoplanet discoveries to investigate and analyze if there are planets with Earth-like characteristics in other parts of the universe for future research.

📚 Learn More: NASA. 

AI x Science 🤖

Credit: Stephen Ellis on Unsplash

The Importance Of Spatial Intelligence To Advance AI Reasoning

Large language models (LLMs) like GPT, Gemini and Claude are incredibly useful tools with significant limitations. They are primarily trained on text data, which can be riddled with summaries instead of comprehensive explanations, factual errors, and subtle sorts of knowledge gaps that degrade the performance of LLM systems. In the aforementioned GPT-4o mini story, we referenced two techniques highlighted by Microsoft and Google to improve the quality of LLM training data and ground the systems in informative and reliable information, but even Meta’s Llama 3.1 technical paper mentions there is not enough suitable data on the Internet to train these models. This is why research labs are trying to produce high-quality, self-generated synthetic data in recent months to train newer models.

While synthetic data generation is a useful research endeavor for the field, AI systems may struggle to reach and surpass human-level general intelligence if they primarily learn about the world through textual descriptions. As Mike Tyson once said, “everyone has a plan until they get punched in the mouth.” This quote illustrates the limitation of theoretical knowledge, as a student may theoretically learn all there is to know about boxing if they read every boxing book in human history, but they will not earn a championship belt until they demonstrate their skills by sparring in the ring. Similarly, AI systems may need to experience the physical world in order to ground itself in enough reliable data to improve its reasoning skills beyond a certain level of naive artificial intelligence. Textual data and computer vision may only advance the field so far.

Thus, physical intelligence appears to be the next chasm for research labs to cross. They want their models to experience the physics of our world so they can better understand the people, animals and objects within it. The initial demos for GPT-4o and Google’s Project Astra certainly signals their plans to move in this direction, but these features have not been publicly released yet.

This is where robotics models come into play. We have previously mentioned how the field of robotics is transitioning away from specialized models and towards general models like Google Deepmind’s RT-X, and a related news story caught our attention.

Stanford’s Fei-Fei Lee raised $100 million USD for her new spatial awareness startup. Professor Lee is world-renowned for her computer science contributions, and while not much other details have released about the company, we found it notable she is taking a leave from teaching to pursue this type of AI bet. Time will tell how this pays off.

Our Full AI Index
  • Stored Data As A Way To Boost Model Performance: Researchers from the University of Washington built a 1.4 trillion-token datastore called MassiveDS to study the effects of scaling data storage on LLM inference performance. They discovered that increasing the size of their datastore improved language modeling and downstream tasks. For example, a smaller model with significant datastore access outperformed models 2x its size or more when they did not have any datastore access. arXiv. GitHub.

  • Astronomers Spot Deepfakes: Researchers at the University of Hull developed a method to detect deepfake images by analyzing the reflections in a person's eyeballs. The study found real images typically show consistent reflections in both eyes, whereas fake images often lack consistency. The team made this assessment by using a Gini coefficient measurement to calculate the difference in lighting between deepfakes and real images. This type of methodology is typically used to measure light distributions in galaxy images, so it is interesting to see it employed in a completely different context. Another useful tool in the toolkit to determine real from fake in our digital world. Royal Astronomical Society.

  • Simulating Patient Outcomes To Cancer Treatment: Researchers at Brighton and Sussex Medical School developed a personalized simulation system by using genomic sequencing data, clinical prognostic indicators and a machine learning model to predict treatment effectiveness for a specific type of cancer patient. Their model identified clusters of patients with a 15x lower survival time than others, suggesting a much worse prognosis than standard indicators imply. This insight can inform clinicians tailor alternative treatment strategies for these high-risk patients. University of Sussex. Nature.

Other Observations 📰

An aerial view of the Sydney Harbour Bridge in Australia. Credit: Henry on Unsplash

Progress On The World’s Transition To Make Clean Steel

The steel and iron industry is responsible for 7% of global greenhouse gas emissions and 11% of carbon dioxide emissions. These emissions are primarily driven by coal-based furnaces, and they must be phased for the world to reach its net-zero emissions target. Thankfully, important progress is underway.

According to Global Energy Monitor's (GEM) annual Pedal to the Metal report, 93% of new steel production capacity announced in 2024 will use electric arc furnaces (EAFs), up from 33% just 2 years ago. This represents over 2.2 billion tons of steel production per year, although most of the announced new capacity projects will not start their construction for many more months.

Electric arc furnaces make steel from scrap metal and electricity, and this electrothermal process is fantastic for several reasons. First, it consumes far less energy than its coal-based alternative. Second, since it recycles and reuses scrap steel there is no need to mine additional iron ore, which can be dangerous for workers and harm the environment. Third, the electricity to heat an arc furnace and melt scrap metal can be sourced from renewable energy sources, which can lower the industrial emissions footprint even more. One estimate from Dr. Waldram, a material scientist from Swansea University, projects EAF methods reduce the emissions from scrap steel production by ~71%.

The GEM report also estimates 36% of all global steel production will use electric arc furnaces by 2030, which is close to the International Energy Agency's suggested target of 37% to achieve net zero by 2050 and limit the rise in global temperatures to 1.5 °C.

An important caveat to mention is there will not be enough scrap metal to recycle to meet the world’s steel production needs until demand stagnates or declines for many continuous years. Thus, coal-based steel production will continue until this day arrives, or until all steel or an alternative material, can be made without any emissions.

This may be a topic we delve into in a future deep dive article. Let us know if this sounds like something you want to read!

Our Full Science Index
  • Progress To Reduce Tetanus Mortality: The United States’ Center for Disease Control released a new report documenting how tetanus vaccination efforts have significantly improved public health outcomes for mothers and their infants. Between 2000 - 2022, 47 out of 59 targeted countries were able to eliminate the disease. This global effort resulted in an 84% decline in neonatal tetanus deaths, saving ~39,000 lives in the process. CDC.

  • New Insight Into Early Life On Earth: A research team from the University of Bristol identified what is now considered to be the earliest common of all living organisms on Earth. It is a microbe that lived 4.2 billion years ago — over 400 million years earlier than the oldest previous ancestor — had a diet of hydrogen gas and carbon dioxide, and a surprisingly large genome. University of Bristol. Nature.

  • Immunotherapy Is Transforming Cancer Therapy: Last month we shared several jaw-dropping clinical trial results from meetings at the largest cancer conference in the world. A number of these treatments are immunotherapies, and a recent New York Magazine article tells a fantastic story about how one such therapy treatment is giving a person with a brain tumor a second lease on life. Incredible. New York Magazine.

Media of the Week 📸

Robot Soccer Teams Compete In RoboCup 2024

A few months ago we shared a video of two humanoid robots playing 1-on-1 soccer. Last week, the RoboCup held its annual tournament where teams of robots played against one another. Which recent competition was your favorite: the Euro Cup, Copa America or this?

The Wonderful Colors Of Jupiter

Credit: Image data: NASA/JPL-Caltech/SwRI/MSSS
Image processing by Gary Eason

Earlier this year, NASA's Juno spacecraft captured photo of Jupiter's northern hemisphere just 18,000 miles (29,000 kilometers) away. After a series of color enhancements, the chaotic clouds and cyclonic storms of the planet seems to resemble a funky, iridescent soap bubble. Or is that just us? NASA.

Imaging Human Hearts In Unprecedented Detail

Credit: Siemens Healthineers 2024; Data UCL led ESRF Beamtime 1290.

Scientists from University College London and the European Synchrotron Radiation Facility created one of the most detailed atlas’ of the human heart we have ever seen. They imaged two adult hearts with the precision of 20 micrometers — less than half the width of a human hair. This type of detail and clarity can help researchers examine minute differences in healthy and diseased hearts, develop more precise organ simulation models for surgical training and other use cases. University College London. RSNA Radiology.

That’s all for this week! Thanks for reading.