The Aurorean
Posts
#36 | Speeding Up DNA Data Storage

#36 | Speeding Up DNA Data Storage

+ studying cancer rates across species, the AI agent era and more

The Aurorean
November 01, 2024

Share The Aurorean | Subscribe | Deep Tech

Hello fellow curious minds!

Welcome back to another edition of The Aurorean.

It’s been a month since we published our last issue and we’ve missed sharing our STEM love and enthutiasm with you all.

Before we delve into the topics this week, we have some announcements we want to share with you all.

The Aurorean is fast approaching its 1 year anniversary (!!!) and we are incredibly excited to reach this milestone. Over the course of the year we have received hundreds, if not thousands, of heartwarming messages from our audience and it motivates us to continue our efforts to produce a valuable service for the STEM professionals and enthusiasts around the world.

The most common and consistent feedback we have received all year is about the news and research we do and do not showcase in the newsletter. We are well aware that our audience is a lovely mixture of all sorts of STEM professionals with a variety of interests, and science and technology as disciplines consist of many fascinating fields worthy of our attention. There are often dozens of worthwhile news and research we do not have the space to cover in our emails. We have done our best to make editorial decisions that will resonate, but we know there are better ways to solve this problem.

Our plan to address this feedback in our 2^nd year of existence is to build out the operations and infrastructure to support many more STEM newsletters that are focused on subject-specific topics. This will allow our audience to learn about the subjects they care the most about — from robotics and cancer research to space exploration, public health and more. Our goal has always been to distill significant STEM stories of progress to the world, and this is a natural next step.

In order to set ourselves up for success, we need to temporarily reduce the frequency of the current newsletter so we can make progress on our mission. Thus, instead of our typical weekly posts, we plan to post once at the end of November and once at the end of December, respectively, when the news and research cycles are slower as a result of the holiday season. We will re-evaluate returning our posting frequency to a weekly basis at the beginning of 2025. We don’t want you to miss any notable news at the beginning of the new year, and we want to accelerate our ability to service many subject-specific STEM newsletters simultaneously.

We’ll strike the perfect balance in due time. In the meantime, we cherish your continued support and feedback as we embark on this journey. You mean the world to us, and we plan to make you proud with what we produce in the weeks and months ahead.

With that said, wondering what STEM discovered last week?

Let’s find out.

Quote of the Week 💬

A New Technique To Speed Up DNA Data Storage

"It's a really nice proof of concept and a significant improvement over previous DNA data storage approaches... It gets around a barrier of DNA data storage that requires synthesizing DNA from scratch."

Dr. Kun Zhang, Principal Investigator @ Altos Labs San Diego Institute of Science

⌛ The Seven Second Summary: Researchers from Peking University developed a system to encode digital data into strands of DNA, and they were able to achieve writing speeds that may be up to 6,000x faster compared to other DNA data storage methods.

🔬 How It Was Done:

In order to store digital information in DNA, scientists must know and control the exact sequence of a DNA's full chemical composition. This has traditionally required teams to synthesize new DNA strands from scratch so they can chemically add one base pair at a time to ensure the correct DNA sequence is formed.
The team developed a faster approach by creating a library of standardized and reusable DNA fragments to create base pairs in a sequence, as well as standardized DNA templates to act as blank pages for their technology to print digital information on. Each fragment was designed with a unique chemical sequence that would only bind to a specific location on their DNA template.
They used an enzyme called methyltransferase to attach chemical markers to some DNA fragments to represent 1s; the DNA fragments they did not modify represent 0s. Since each DNA fragment has a unique chemical composition, it can function similarly to a puzzle piece by ensuring two chemicals precisely match one another. This allows the base pairs to automatically self-assemble into the correct positions of a DNA sequence and enable many bits of information to be written simultaneously instead of one at a time.

🧮 Key Results:

The team successfully encoded and retrieved images containing 269,337 bits of data (0.03 megabytes) with 97.47% accuracy.
The system was able to write at speeds up to 40 bits per second, and the team believes they can scale up production to eventually write 2 terabytes (2 million megabytes) of information per day.
In a real-world test, 60 student volunteers with no biolab experience were able to successfully store and retrieve text data with 98.58% accuracy using the system. A worthwhile indication of how intuitive their system is to use.

💡 Why This May Matter: Between recent exponential growth curves in AI training and usage, and the fact that hundreds of millions of people have yet to enter the Internet age, humanity will have a persistent need for more data storage for years on end. Since a single gram of DNA can store up to 215 petabytes of data (215 billion megabytes) — enough for 10 million hours of HD video — a breakthrough in writing speed and accessibility may eventually make DNA data storage a dense, durable, and energy-efficient alternative to electronic storage.

🔎 Elements To Consider: While the team’s DNA data storage system is promising, it needs to improve its scale and accuracy by several orders of magnitude before it may be considered for commercial use. For reference, hard drives typically maintain higher accuracy rates than this storage system after many decades of storage, so there are still quite a few problems to solve for the technology to be reliable enough to trust with any data of actual value.

📚 Learn More: Science. Nature.

Stat of the Week 📊

Large-Scale Study Maps Cancer Rates Across Vertebrate Species

292

⌛ The Seven Second Summary: Researchers from UC Santa Barbara and other leading institutions completed the largest-ever study of cancer prevalence across vertebrates, analyzing necropsy records from 292 species to understand why different animals get cancer at vastly different rates.

🔬 How It Was Done:

The team collected 16,049 necropsy records from 99 zoological institutions, analyzing deaths in adult animals across amphibians, reptiles, birds, and mammals to identify cancer patterns.
Veterinary pathologists examined tissue samples to confirm cancer diagnoses, and 94% of analyses were conducted in a single pathology lab to ensure consistent results.
The researchers used various statistical models to control for shared ancestry between species to reveal how factors like body mass, gestation time, and average lifespan influence cancer rates.

🧮 Key Results:

Mammals had the highest cancer rates with a median of 12% dying with tumors, followed by birds and reptiles at 4%, and amphibians at 1.2%.
Larger animals had slightly higher cancer rates when controlling for gestation periods, with every 10x increase in body mass corresponding to a 2.9% increase in cancer prevalence.
Species with longer gestation times showed dramatically lower cancer rates, with each 10x increase in gestation times corresponding to an 18.6% decrease in cancer prevalence.

💡 Why This May Matter: Cancer affects virtually all multicellular life, yet some species like black-footed penguins (<0.4%) show incredible resistance while ferrets (63%) and other species are highly susceptible. Understanding these natural variations could reveal new strategies for cancer prevention and treatment in both humans and animals.

🔎 Elements To Consider: While the study analyzed 16,000+ vertebrate records, some species had as few as 20 individuals examined while others had up to 477. This range in sampled species likely affected the accuracy of the team’s cancer rate estimates. Furthermore, domesticated animals and animals in zoos and other managed care settings may experience different cancer rates than their wild counterparts, and this was not fully accounted for in the data.

📚 Learn More: UC Santa Barbara. AACR Journal.

AI x Science 🤖

Credit: Mohamed Nohassi on Unsplash

What To Watch As We Enter The Age Of AI Agents

Last week Anthropic released an upgraded version of their Claude 3.5 Sonnet model and we have two main takeaways from the news.

First, this continues the ongoing trend where major research labs are releasing small to medium scale models rather than updates to their largest possible models. There are many potential reasons for this, but we think the most plausible explanation is twofold; the largest state-of-the-art models are bottlenecked by the energy constraints we have highlighted throughout the year; knowledge distillation and other efficiency gains amassed in 2024 has enabled research teams to build fast and cheap models with relatively similar performance as megasized models that require an order of magnitude more energy and compute. Thus, putting these two explanations together, leads us to believe Anthropic’s mid-sized model update encapsulates the most significant AI themes of the year.

Second, reliability performance metrics will be one of the most useful indicators of model progress moving forward. The headline from Anthropic’s model update is their new Claude 3.5 Sonnet becomes the first model from the major research labs capable of autonomously navigating and using a computer. It is worth noting other agentic AI systems have existed long before today. In fact, in one of our first newsletter issues we highlighted research from a team at Ohio State University addressing this very topic. But while Anthropic may not be releasing the first agentic system to the public, its brand recognition and market position likely means AI agent usage will soon skyrocket and become a major thematic area to follow for the following year. This begs the question: what is an appropriate metric to measure how AI agents are advancing over time?

Back in June, the Sierra Research team published a paper outlining a framework to evaluate AI agents through their Tau-Bench system. The objective of this evaluation criteria is to assess how consistent a model is at completing tasks over multiple attempts. With this paper, the Sierra Research team introduced a crucial distinction between two metrics: "pass at K" versus "pass to the power of K". “Pass at K" only requires a model to succeed once in K attempts and is a measurement of whether or not a model is at all capable of completing an agentic task. For example, the ability to successfully cancel an order and request a refund online at least once when given 8 attempts to complete the task. In contrast, "pass to the power of K" demands the model succeeds in all K attempts consecutively in order to pass its evaluation. In this example, failing to cancel an order or failing to request a refund at any point over the course of all 8 separate evaluation attempts is considered a failure.

The slope illustrating how mode performance degrades when it needs to successfully complete the same task many times in a row. Credit: Anthropic/Model Card Addendum: Claude 3.5 Haiku and Upgraded

This distinction is vital for evaluating agentic systems because real-world applications in customer service, coding, transaction processing and many more use cases require reliable and repeatable performance to be trustworthy enough to use. For instance, it may be interesting to discover that an AI agent can complete the aforementioned cancellation and refund request task in a random instance with 70% reliability. But if there is only a 40% chance the model can perform this same task correctly 8 times in a row, then how valuable is it for your personal or professional use? In most cases, people expect automated tasks to be > 90% reliable over the course of many attempts before it appears trustworthy. However, in many cases, > 99% is mandatory, particularly at increasingly larger scales involving evermore important tasks. Even in scenarios where a system is 99.99% reliabile, it still leads to 10,000 errors over the course of 1 billion interactions, and this is an unacceptable failure rate for manufacturing processes, public health settings and other critical domains in society.

Nonetheless, it is encouraging to see Anthropic share the results of its "pass to the power of K" evaluations. Hopefully their transparency will establish an industry expectation to focus on reliability performance metrics as the field grapples with its forthcoming wave towards agentic AI systems. There is still a long way to go to fulfill many of AI’s ambitious promises, but as our long-time readers know, progress moves quickly. Where do you think the field of AI research will be this time next year?

Other Observations 📰

A flower that grew from concrete. Credit: Ted Balmer on Unsplash

Discovering, Preserving & Engineering Biological Resilience

We frequently highlight how scientific breakthroughs are reshaping our understanding of life's adaptability, and recent discoveries and stories are painting a compelling picture of resilience in unexpected places. Last week, the Schmidt Ocean Institute published their findings from their investigation of hydrothermal vents 1.56 miles beneath the Pacific Ocean surface. They discovered thriving communities of giant tubeworms, carnivorous bristle worms, and sediment-eating snails living 4 inches beneath the seafloor. This marks the first time scientists have found animal life, rather than just microbes, living within the ocean crust itself. We’ve highlighted the Schmidt Ocean Institute before, and it seems like nearly every deep sea exploration they embark on leads to the discovery of dozens or hundreds of new species.

The discovery of life persisting in extreme conditions arrives as other scientists are racing to engineer more resilience into our agricultural and ecological systems. For example, in central Missouri, researchers at Pivot Bio have modified soil bacteria's DNA to maintain nitrogen production even when chemical fertilizer are present so that less product is needed to grow the same crop yields on a per acre basis. The New York Times reported that the team’s treated seeds are already used on 5% of American corn crops, and customers are reportedly using 20% less fertilizer per acre with Pivot Bio’s technology. This technology is meaningful because fertilizer production and use generates more greenhouse gas emissions than all U.S. coal power plants combined, so the process to build a renewable society also involves innovations to build a sustainable agricultural system to feed the world. Meanwhile in New York, American Castanea is using AI-driven bioinformatics and techniques from the cannabis industry in an attempt to revitalize the chestnut tree. This tree species is considered functionally extinct, but the right set of genetic modifications, conservation initiatives and policy positions may allow experts to restore the species.

Similarly, researchers at Colossal Biosciences recently announced they assembled the most complete Tasmanian tiger genome to date from a 110-year-old preserved head. The techniques scientists can now use to preserve and analyze artifacts of history are remarkable. For example, the team mentioned they are able to individual strands of RNA from the preserved Tasmanian tiger and understand which genes were active and inactive in various tissues at the time of the animal’s death. Colossal Biosciences’ ultimate aim is restore fully extinct species like the wholly mammoth, and society may be able to reach this milestone sooner than expected because of evermore advancements in science and technology domains.

These discoveries and efforts surrounding life's persistence are particularly relevant this week because it coincides with both the latest climate change progress report by the United Nations and one the largest seed deposits to the Svalbard Global Seed Vault. Life is exceptionally diverse and resilient on its own, and its increasingly possible for humanity to engineer even more diversity, resiliency and adaptation into nature to improve and maintain our cohabitation with the Earth and other life. The process to discover, preserve, analyze, engineer and restore nature is not just about survival — it's also about learning how humanity may become better stewards of life's capacity to adapt and thrive in the midst of rapid planetary changes.

Media of the Week 📸

Boston Dynamics Robot Practicing For A Factory Setting

Boston Dynamics released a video of their Atlas robot performing autonomous tasks to move engine covers between containers without pre-programmed movements. This video shows their robot utilizing three key systems at once: a machine learning vision system to understand its surroundings, a specialized set of grasping algorithms to handle different parts, and multiple sensors (visual, force, and body-position) to detect and recover from mistakes like trips or collisions. The adaptive learning is the big key here. It shows the field is getting closer to robots that can be useful in commercial and consumer settings.

Mouse Brain Tumor Cells In Astounding Detail

Mouse brain tumor cells. Credit: Dr. Bruno Cisterna and Dr. Eric Vitriol, Augusta University, Nikon Small World Competition

Dr. Bruno Cisterna and Dr. Eric Vitriol from Augusta University captured the winning image in the 2025 Nikon Small World Competition. The picture is of mouse brain tumor cells. and was made by using advanced microscopy techniques and months of practice to perfect the staining process to visualize the cells' nuclei, cytoskeleton and microtubule networks with such intricate detail. The research demonstrates how a specific protein maintains cellular transport pathways, and may have implications for understanding and treating neurodegenerative diseases like Alzheimer's in the future. Nikon Small World. Journal of Cell Biology.

JWST Discovers First Brown Dwarfs Beyond Our Galaxy

Credit: NIRCam and MIRI image

Astronomers from the European Space Agency and other agencies used the Webb telescope to identify the first possible brown dwarf outside our galaxy, roughly 200,000 light-years away. Brown dwarfs are known as “failed stars” because they cannot sustain nuclear fusion reactions in their core like normal stars. Contrary to what the name implies, these particular dwarfs are fairly massive. They range from 13 - 75 Jupiter masses (so several thousand times the mass of Earth) and share characteristics with giant exoplanets, such as their atmospheric composition and storm patterns. European Space Agency.

This Week In The Cosmos 🪐

November 1: A new moon. The best time to stargaze!

Credit: Akbar Nemati on Unsplash

That’s all for this week! Thanks for reading.