Revolutionizing Genome Analysis: Evo 2 AI Trained on Trillions of DNA Bases

An innovative open-source AI, Evo 2, has been trained on genomes from all three domains of life, including bacteria, archaea, and eukaryotes. After processing trillions of base pairs of DNA, Evo 2 has developed internal representations of key features in complex genomes, such as regulatory DNA and splice sites, which can be challenging for humans to identify.

In contrast to the relatively straightforward organization of bacterial genomes, eukaryotic genomes are more complex, with coding sections interrupted by introns and regulated by scattered sequences. The complexity of eukaryotic genomes has made them more difficult to interpret, with specialized tools often being error-prone when analyzing large genomes.

However, the use of neural networks, which excel at recognizing subtle patterns, has shown great promise in addressing these challenges. The team behind Evo 2 has developed a convolutional neural network called StripedHyena 2, which was trained in two stages. The initial stage focused on identifying important genome features, while the second stage involved feeding the system larger sequences to identify large-scale genome features.

The Evo 2 system has the potential to revolutionize the field of genome analysis, enabling researchers to gain a deeper understanding of the complexities of eukaryotic genomes and uncover new insights into the evolution of life on Earth.

Photo by Edward Jenner on Pexels
Photos provided by Pexels