The Protein Engineering Team at Nostrum Biodiscovery proposes using a set of current SotA generative AI methods for binder design, coupled with a common validation pipeline.
Generative AI Design Methods
The team utilized three main generative AI approaches:
- BoltzGen Design:
- Bispecific VNARs Design: The BoltzGen pipeline was adapted to generate variable new antigen receptors (VNARs) to target the complexity of the protein RBX1. VNARs are single-heavy chain domains from sharks that offer therapeutic advantages such as extreme stability (resisting denaturation up to 80°C) and versatility. Bispecific VNARs joined by a G4S linker were designed. Hotspot residues on RBX1 (Trp33, Trp35, Glu100, Glu102, and Tyr106) were identified using MaSIF and pyDockODA, leading to a library of 2,500 new models generated by redesigning the CDR1 and CDR3 sequences.
- Bipartite Small Protein Design: A parallel approach engineered 20,000 de novo small bipartite proteins (80 to 120 amino acids) to target the RBX1’s globular RING domain and its unstructured N-terminal tail. Structural constraints were applied to force the binder’s N-terminal region into a beta-strand structure to physically complement and "lock" the RBX1 tail, mirroring native complexes (PDB IDs 1LDJ and 4P5O).
- RFDiffusion3 + ProteinMPNN: RFdiffusion generated candidate binder backbones targeting interactions with selected surface residues (55, 57, 87, and 91). ProteinMPNN was then applied to design compatible amino acid sequences, optimizing structural stability while preserving the intended geometry. The sequences were evaluated using structure prediction to ensure they maintained interactions with the selected GLMN residues and recapitulated the target binding mode.
- Protein Hunter Modified Version: Based on successful results in the previous Nipah Binder Competition (where the team ranked #2 in the computational design phase), a modified version of the Protein-Hunter software was developed(https://doi.org/10.1101/2025.10.10.681530)]. The key improvement was the ability to design only the CDR loops of scFv, nanobodies, and VNARs. The version includes a custom scFv constructor for various immune architectures (including VNARs), dynamically builds linkers, and automates CDR loop detection. It integrates NostrumAbMPNN, an in-house fine-tuned inverse folding algorithm of ProteinMPNN with structural antibody data. Models with higher ipTM, ipSAE, and pLDDT scores were selected for validation.
Validation and Scoring Pipeline
The generated sequences undergo a two-step validation process:
- AI-Based Validation: Sequences are first submitted to a BLAST search to evaluate minimum edit distance (UniRef50 or SAbDab). Passing sequences are submitted for structure prediction on Protenix. Models are selected for physics-based validation based on confidence metrics (Ranking Score, pLDDT, ipTM, pTM) and contacts with hotspot regions. The team notes that current AI metrics, despite transforming design, are often overconfident and have not been consistently correlated with experimental assays.
- Physics-Based Validation: Physics-based methods are preferred for predicting binding energies and structural motions, and for scoring the probability of a true binder. The best models from all design methods are submitted to the proprietary PELE software, Nostrum Biodiscovery’s core simulation technology. PELE combines a Monte Carlo stochastic approach with protein structure prediction algorithms to solve molecular recognition problems.
Scoring:
The final ranking is achieved by normalizing the scores from both AI and physics-based metrics using the Min-Max normalization approach. The sum of these normalized values is used to rank the models, with the top 90 selected for submission to the competition.