DirectContacts2: A network of direct physical protein interactions derived from high-throughput mass spectrometry experiments
What this paper is about
Every function in our body relies on proteins working together in groups called protein complexes. However, not every protein in a complex directly interacts with every other protein; some proteins only connect indirectly through other proteins. Knowing which proteins physically touch each other is like having the wiring diagram of the cell.
This paper presents DirectContacts2, a machine learning model that analyzed over 25,000 mass spectrometry experiments to figure out direct protein–protein contacts across human cells. Instead of testing all ~200 million possible protein pairs (which would be computationally impractical), they used this clever model to narrow it down to the most likely direct interactions.
Why is it important?
- Better structural biology: DirectContacts2 helps tools like AlphaFold build more accurate 3D models of protein complexes. They even created ~2,500 new AlphaFold3 models using these predictions.
- Disease insights: Many diseases (like cancer, developmental disorders, or Parkinson’s) happen when protein complexes fall apart or misassemble. With this map, scientists can trace how mutations disrupt direct interactions, for example, in the Orofacial Digital Syndrome (OFDS) complex.
- Drug discovery: If we know the exact proteins touching at a disease-relevant interface, drug developers can design molecules to block or stabilize those interactions.
Abstract
Cellular function is driven by the activity proteins in stable complexes. Protein complex assembly depends on the direct physical association of component proteins. Advances in macromolecular structure prediction with tools like AlphaFold and RoseTTAFold have greatly improved our ability to model these interactions in silico, but an all-by-all analysis of the human proteome’s ~200M possible pairs remains computationally intractable. A comprehensive cellular map of direct protein interactions will therefore be an invaluable resource to direct screening efforts. Here, we present DirectContacts2, a machine learning model that distinguishes direct from indirect protein interactions using features derived from over 25,000 mass spectrometry experiments. Applied to ~26 million human protein pairs, our model outperforms previous resources in identifying direct physical interactions and enriches for accurate structural models including ~2,500 new AlphaFold3 models. Our framework enables structural modeling of disease-relevant complexes (e.g. orofacial digital syndrome (OFDS) complex) offering insights into the molecular consequences of pathogenic mutations (OFD1) and broadly, establishes a highly accurate protein wiring diagram of the cell.
