Methodology for De Novo Peptide Binder Design Targeting RBX1
- Design Rationale: Peptide Binders
We chose de novo peptide binders to address RBX1’s heterogeneous structural landscape, which combines an intrinsically disordered N-terminus with a rigid C-terminal RING-H2 domain. Unlike bulky scaffolds, short peptides offer the conformational flexibility needed to adapt to these dynamic interfaces while significantly narrowing the combinatorial search space for efficient design. Mechanistically, these peptides are optimized to disrupt CRL-mediated degradation by competitively occupying the E2-enzyme recruitment site on the RING domain. This approach balances high-specificity targeting with the practical advantages of rapid synthesis and high-throughput validation.
- Generative Model: PepMind
We introduce PepMind, a novel multimodal discrete diffusion framework designed to transcend the limitations of traditional "sequence-first, structure-later" design pipelines. Unlike decoupled methods that often suffer from structural mismatch, PepMind treats the RBX1 target as a latent conditioning signal, performing joint co-generation of peptide sequences and 3D coordinates in a unified, end-to-end process. Our architecture leverages a three-tiered system—target encoding, discrete latent diffusion, and joint decoding—to capture intrinsic sequence–structure correlations. By modeling these modalities simultaneously, PepMind ensures that candidate binders are both chemically optimized and structurally pre-configured for the RBX1 interface, representing a significant shift toward integrated, target-aware protein design.
- Peptide Generation Strategy
To ensure both diversity and adequate coverage of the peptide design space, we adopt a length-stratified sampling strategy. Specifically, peptide lengths are varied from 5 to 40 amino acids, and for each length range, multiple candidates are generated using PepMind, resulting in approximately 1000 peptide binders in total. This approach allows the model to explore a wide spectrum of binding modalities, ranging from short motif-like binders to longer peptides capable of forming secondary structural elements. At this stage, no aggressive filtering is applied, as the primary objective is to maximize diversity and avoid prematurely discarding potentially viable candidates. Each generated peptide is associated with a predicted 3D structure, enabling direct downstream structural evaluation.
- Screening and Ranking Pipeline
Following peptide generation, we employ a multi-stage screening pipeline to refine candidates based on sequence novelty, structural plausibility, energetic favorability, and predicted binding confidence. To strictly satisfy the de novo design requirement of the competition, we first perform sequence similarity filtering using MMseqs2 against the UniRef50 database, removing any peptides with sequence identity greater than 25% to known proteins. Clustering analysis using MMseqs2 with 90% sequence identity confirmed that each peptide sequence forms an independent cluster, indicating high sequence diversity among the generated candidates. Placing this step at the beginning ensures that all downstream computational resources are allocated exclusively to valid, novel candidates.
The filtered peptides are then docked against RBX1 using RAPiDock to generate peptide–protein complex conformations, which serve as the structural basis for evaluation. These complexes are subsequently subjected to energy-based filtering using two complementary scoring functions. Rosetta is used to assess detailed interaction energetics, and candidates with energies higher than −10 are discarded, followed by FoldX scoring to estimate binding stability, with an additional threshold of −5 applied to further eliminate weak binders. The remaining peptides are ranked based on their energy scores, and the top 200 candidates are selected for high-confidence structural validation using AlphaFold3. For each complex, we compute key metrics including pLDDT, PAE, and ipTM to evaluate structural reliability and interface quality. Finally, candidates are ranked primarily based on peptide-level pLDDT scores, and the top 100 peptides are selected as the final submission set, representing a balanced combination of binding affinity, structural stability, and sequence novelty.