News and outreach

scientific advancesBack

Study shows that Artificial Intelligence can decipher the function of unknown proteins
published on 03/09/2024

This is the first study to demonstrate that these tools allow the classification of previously unknown functions with a high level of detail.
This collaborative work between CABD and IBE makes it possible to identify genes and explore proteins that may be of biomedical and biotechnological interest, among other avenues of research.

A study by the Andalusian Center for Developmental Biology (CABD: CSIC-UPO-JA) together with the Institute of Evolutionary Biology (IBE: CSIC-UPF) in Barcelona has employed advanced artificial intelligence techniques for protein analysis. Thanks to the use of this methodology, the research team has been able to demonstrate that it is possible to identify and describe what proteins do in detail, even without prior information. This work allows the massive application of these methods to understand proteins in less studied organisms, identify new gene functions and explore which proteins may be of biomedical and biotechnological interest with much greater precision than traditional methods.

In nature, the information contained in DNA is transformed into proteins, which are the ones that act in cells. In this project, led by CABD researchers Ildefonso Cases and Ana M. Rojas together with IBE's Rosa Fernández, two methods based on deep learning were used to analyze proteins in several model organisms, such as yeast, mice and fruit flies. The exploration showed that language models (Transformers) are more effective than convolutional networks, providing more accurate and informative information about the proteins of the species studied. In addition, language models can retrieve functional information from RNA data (RNA is a molecule that carries the instructions from DNA to make proteins in cells).



Ana Rojas, Patricia Medina and Ildefonso Cases, some of the authors of the work.

“We are at a critical moment due to the huge number of sequencing projects of unknown organisms that produce millions of sequences, of which we cannot predict their function using traditional methods,” explains Ana Rojas (CABD). This work opens up new avenues of research related to greater precision in protein function analysis and classification models.

This new study, published in the journal 'Nuc Acids Red Genomics and Bioinformatics', lays the groundwork for the use of artificial intelligence in other applications. “These deep learning tools will make it possible to tackle new problems in computational biology. We are working on applying these techniques to other targets, such as on-demand promoters, single-cell cluster annotation, or protein engineering.”

For her part, IBE researcher Rosa Fernández emphasizes that this research is fundamental in the field of biodiversity, where every day new protein sequences are published whose function is unknown, making it possible to address the problem of Dark Proteome annotation. “To do this we are using these tools on thousands of transcriptomes from the animal kingdom, work that is under review. The more information we have on the functions of new sequences, the faster we will decipher the molecular mechanisms of biological processes that occur in the field of biodiversity and regeneration with potential biotechnological (food industry) and biomedical (pharmaceutical industry) applications,” concludes the researcher.

Israel Barrios-Núñez, Gemma I Martínez-Redondo, Patricia Medina-Burgos, Ildefonso Cases, Rosa Fernández, Ana M Rojas, Decoding functional proteome information in model organisms using protein language models, NAR Genomics and Bioinformatics, Volume 6, Issue 3, September 2024, lqae078,
 https://doi.org/10.1093/nargab/lqae078

This press release was made in collaboration with "Comunicación del CSICAndalucía y Extremadura"

CABD - Centro Andaluz de Biología del Desarrollo

Universidad Pablo de Olavide
Carretera de Utrera km1
41013 Sevilla, España
+34 954977911

ENGLISH      |      SPANISH

© CABD 2008-2024 - CMS by BLWorks.net

Control Panel | Staff Access