CyclicMPNN adapts sequence design to cyclic backbones
Why this work matters
Cyclic peptides are attractive scaffolds because their closed backbone can promote structural rigidity and proteolytic stability. Yet assigning sequences that reliably fold into a given cyclic backbone remains a key challenge. In this study, the authors introduce CyclicMPNN, a fine-tuned version of ProteinMPNN designed specifically to improve sequence design for short cyclic peptides.
Background
While backbone generation methods can now produce large numbers of cyclic peptide geometries, sequence design has lagged behind. ProteinMPNN performs well on many fixed-backbone problems, but its training data is dominated by globular proteins. Short cyclic peptides represent a distinct structural regime. The authors hypothesized that retraining ProteinMPNN on cyclic-specific data would improve structural agreement between designed sequences and their intended cyclic backbones.
The new approach
CyclicMPNN was created by fine-tuning ProteinMPNN on a dataset enriched for cyclic peptides. This training set combined experimentally determined cyclic peptide structures with in silico generated cyclic backbones.
For the synthetic backbones, sequences were first designed and then evaluated by structure prediction. Only designs whose predicted structures closely matched the intended backbone, based on low backbone RMSD cutoffs, were retained. The curated dataset was clustered to reduce redundancy and used to fine-tune the model over 200 epochs, selecting the lowest-perplexity checkpoint.
The architecture itself was unchanged. The improvement comes from adapting the training distribution.
Key results
The authors first tested CyclicMPNN on four experimentally determined cyclic peptide X-ray structures. In all cases, redesigned sequences produced predicted structures within 1.0 Å backbone RMSD of the experimental backbone. Compared to ProteinMPNN, CyclicMPNN achieved comparable or lower deviations in most cases and outperformed HighMPNN on several targets.
They next evaluated large sets of de novo cyclic backbones of different lengths. For 8-residue cyclic peptides, CyclicMPNN reduced the median backbone RMSD from 1.55 Å to 1.13 Å relative to ProteinMPNN. Improvements were also observed for 6- and 10-residue backbones. Structure prediction confidence scores were modestly higher with CyclicMPNN, particularly for shorter peptides.
For 14-residue cyclic backbones generated using RFPeptide, CyclicMPNN again showed improved backbone agreement, lowering median RMSD compared to both ProteinMPNN and HighMPNN.
In a motif-constrained design test, an Nrf2-derived “EETG” motif was embedded into 8-residue cyclic scaffolds. CyclicMPNN produced a larger number of designs that met the authors’ stability criterion. Among 100 scaffolds, 12 CyclicMPNN designs passed the threshold compared to 6 for ProteinMPNN, and more high-quality folding funnel outliers were observed.
Validation
All validation was computational. Designed sequences were evaluated using structure prediction to measure backbone RMSD and model confidence. The authors also performed PNear-based energy landscape analysis. For a selected set of 6-residue designs, CyclicMPNN showed a substantially higher median PNear value than ProteinMPNN, indicating more pronounced folding funnels toward the intended structure.
Broader significance
The authors conclude that fine-tuning on cyclic-enriched datasets improves sequence design performance for short cyclic peptides. They suggest that this strategy may be useful for adapting pretrained sequence design models to other underrepresented structural regimes.
This work was recently posted as a bioRxiv preprint by Hosseinzadeh and colleagues.
