Unlocking new discoveries through a scale-shift in data and real-world relevant and rigorous evaluations

About PINDER

Protein interactions are at the heart of biology. Predicting them enables breakthroughs in science and medicine. But current progress on Deep Learning is rate limited by unrealistic and saturated evaluations and small datasets neglecting protein dynamics.

PINDER is an academic-industry collaboration to address this, driven by VantAI, NVIDIA, and MIT. We aim to provide a gold standard dataset and evaluations to push the field forward

PINDER Provides

>500x more data

Realistic Evaluations

Highly Diverse Data

Predicted and Unbound structures

Explore PINDER

Explore Interactively

How PINDER Was Created

Ingest

  • Fully automated and reproducible pipeline
  • 2,319,564 systems with 9,430 unique ECOD domain pairs across 6,529 families
  • 100+ annotations
  • Interface quality assessment with 10+ metrics

Expand

  • Paired unbound and AlphaFold2 predicted monomers
  • PPI interfaces stratified by flexibility

Split

  • FoldSeek and MMSeqs based interface similarity comparison followed by transitive graph-clustering and additional deleaking via iAlign
  • Extensive orthogonal leakage validation via ECOD-, PFAM-overlap & other metrics
  • Maximum val and test set quality with minimal leakage

Evaluate

  • Large split (XL) with smaller subset (S)
  • AF2-training cutoff & interface structural deleaked XL subset (AF2)
  • Automated evaluation harness with 38 CASP-CAPRI compatible metrics and leaderboard included
  • Holo/Apo/Predicted input leaderboards across protein flexibility levels (easy, medium, hard)
Resources
Upcoming
  • Prediction challenge: PINDER to be used for PPI challenge at the 2024 NeurIPS ML in Structural Biology (MLSB) workshop
  • More data: Higher order oligomers
  • More data: Binding affinity and other interaction metrics
  • More data: Data augmentation strategies, including holo minimization to expand apo coverage
  • Leaderboard: Multiple upcoming state-of-the-art works already adopted PINDER
  • Regular updates: Enabled through extensive metrics, data ingestion is fully automated and will be updated at regular intervals