AI Study Finds Half of Cellular Contents Unknown

Summary: New artificial intelligence-driven methods reveal previously unrecognized cellular structures. These discoveries could improve our understanding of human development and the origins of disease.

Source: UCSD

Most diseases arise when parts of a cell fail to function correctly — for example, tumors may grow when genes are mis-expressed as proteins, or metabolic disorders may occur when mitochondria malfunction. To pinpoint which cellular components malfunction in disease, researchers first need a comprehensive map of those components.

A team at the University of California San Diego School of Medicine, together with international collaborators, combined microscopy, biochemical interaction data and machine learning to generate a multi-scale map of cell structure. Their method, called Multi-Scale Integrated Cell (MuSIC), was reported in Nature on November 24, 2021.

“When you imagine a cell, you likely picture textbook organelles such as mitochondria, the endoplasmic reticulum and the nucleus. But that representation omits many layers of organization,” said Trey Ideker, PhD, professor at UC San Diego School of Medicine and Moores Cancer Center. “MuSIC gives us a practical way to look beyond the familiar components and discover previously hidden cellular systems.”

Led by Ideker and Emma Lundberg, PhD (KTH Royal Institute of Technology and Stanford University), the pilot MuSIC study examined a human kidney cell line and identified roughly 70 distinct subcellular components, about half of which have not been documented before. One notable finding was a previously unknown protein assembly that binds RNA; follow-up experiments indicated this complex likely contributes to RNA splicing, a key step that controls how genes are translated into proteins and when specific genes are activated.

Traditionally, scientists study cell interiors using either imaging or biochemical association methods. Fluorescent imaging tags proteins with colored markers and tracks their localization and co-occurrence in cells. Biochemical approaches, such as affinity purification, use antibodies or tagged baits to pull a protein out of the cell and identify its interaction partners. Each approach provides valuable but different views of cellular organization.

MuSIC’s innovation is to fuse these complementary datasets through deep learning. The platform treats both fluorescence images and protein interaction measurements as measures of “distance” between proteins, trains a machine learning model to calibrate these measures, and integrates them into a unified hierarchical map of cell architecture.

“Bringing measurements that operate at such different physical scales into a single model is what makes MuSIC powerful,” said Yue Qin, the study’s first author and a graduate student in Ideker’s lab. “Microscopy resolves features down to microns — large organelles and assemblies — while biochemical methods access nanometer-scale interactions. Machine learning bridges that scale gap.”

The MuSIC system does not force components into fixed textbook-like positions. Instead, it represents subcellular systems and their relationships in a fluid hierarchy, reflecting the reality that component locations and associations vary by cell type, state and context. In this pilot, the team analyzed 661 proteins in a single cell line as a proof of concept.

This shows a drawing of a cell surrounded by binary code
UC San Diego researchers introduce Multi-Scale Integrated Cell (MuSIC), a technique that combines microscopy, biochemistry and artificial intelligence, revealing previously unknown cell components that may provide new clues to human development and disease. (Artist’s conceptual rendering.) Credit: UC San Diego Health Sciences

Ideker emphasized that MuSIC is at an early stage: “This was a pilot study. The next objective is to scale MuSIC to cover the full human proteome, extend it to many cell types and individuals, and ultimately compare healthy and diseased cells to identify molecular differences that underlie disease.”

The study includes contributions from researchers across UC San Diego, Harvard Medical School, KTH Royal Institute of Technology, Université Libre de Bruxelles, Peking University and other institutions. Key co-authors are Maya L. Gosztyla, Marcus R. Kelly, Steven M. Blue, Fan Zheng, Michael Chen and many others who contributed to data generation, analysis and validation experiments.

Disclosures: Trey Ideker is a co-founder, board member and equity holder in Data4Cure, Inc., and has advisory roles and sponsored research arrangements with Ideaya BioSciences, Inc. Gene Yeo holds positions and equity in Locanabio and Eclipse BioInnovations and serves as a visiting professor at the National University of Singapore. Emma Lundberg serves on scientific advisory boards and holds equity in several biotechnology companies. J. Wade Harper is a co-founder and advisor of Caraway Therapeutics and Interline Therapeutics. These relationships were reviewed and managed under UC San Diego conflict-of-interest policies.

About this AI research news

Author: Heather Buschman
Source: UCSD
Contact: Heather Buschman – UCSD
Image: The image is credited to UC San Diego

Original Research: Closed access. “A multi-scale map of cell structure fusing protein images and interactions” by Trey Ideker et al., published in Nature.


Abstract

A multi-scale map of cell structure fusing protein images and interactions

Cells are organized across multiple physical scales, spanning at least four orders of magnitude, and their architecture is modular. Two primary experimental approaches — protein fluorescent imaging and protein biophysical association — each produce extensive datasets with different resolutions and characteristics. Historically these datasets have been analyzed separately.

This work integrates immunofluorescence image data from the Human Protein Atlas with affinity purification interaction data from BioPlex to build a unified, hierarchical map of human cell architecture. Integration is achieved by recasting each data type as a measure of protein proximity and then using machine learning to align and calibrate those measures.

The resulting map, MuSIC 1.0, resolves 69 subcellular systems, roughly half of which appear to be previously undocumented. The authors carried out 134 additional affinity purifications to validate many of the predicted subunit associations, revealing new assemblies involved in pre-ribosomal RNA processing and identifying roles for proteins such as SRRM1, FAM120C and RPS3A in chromatin function and splicing. By combining data across scales, MuSIC sharpens imaging resolution and assigns spatial context to protein interactions, enabling the integration of diverse data types into proteome-wide maps of cell organization.