Our technology generates de novo designed sequences against a target of interest using a three-stage framework: (1) Foundational Model Training, (2) Controlled Structure Generation, and (3) Iterative Hallucination. During the training stage, our foundational model learns from three design procedures to ensure versatility. In flexible-target training, the model learns to co-generate both target and binder structures by using only the target's internal 3D structural distances as a soft constraint . Conversely, rigid-target training enforces hard constraints by fixing the true coordinates of the target structure, which is ideal for well-defined pockets. Finally, epitope-specific training provides fine-grained control by guiding the model to bind specific "hotspot" residues. These hotspots are identified at the target-binder interface using dual constraints of spatial distance (e.g., < 8Å) and molecular entity IDs, and this information is fed to the model as a conditional feature. To support unknown binding sites, we also train a novel epitope prediction head that allows the model to infer the most probable epitope and condition the generation process on this prediction. This comprehensive training regimen ensures the model is naturally equipped to infer de novo structures against any target of interest.
In the generation stage, we employ a two-step diffusion process where the model first decodes the critical hotspot residues (either user-defined or predicted) before generating the remaining binder structure. This process is made robust by a stochastic sampling strategy for the hotspot features, which prioritizes core interface residues while maintaining adaptability. We sample the number of retained hotspots using an exponential decay distribution (with a peak at 4 residues and a range of 5-20) , preventing over-reliance on a fixed epitope and aligning with the variability of practical applications. This diffusion process generates a novel 3D structural backbone. This backbone is then passed to LigandMPNN for inverse folding, which translates the optimized 3D geometry into a viable de novo amino acid sequence.
Finally, the sequences from the generation stage are treated as high-quality starting points for our third stage: an iterative hallucination and optimization module. This module is a sequence-structure co-optimization framework designed to maximize a target confidence metric, which we term ipSAE. The process begins with the prior binder (bound to its target) and iteratively folds the structure with AlphaFold3 to get a confidence score. The top-performing structure is then used for another round of inverse folding and refolding. This loop repeats until the ipSAE score converges. This unique workflow allows the design module to find a good starting point, while the hallucination module discovers the optimized, high-confidence binder protein.
No proteins found matching your criteria.