De Novo Design of RBX1 Inhibitory Protein Ligands via Multi-Round Iterative Evolution with SeedFold
Email: zhuzhuiyu@stu.xjtu.edu.cn
- Experimental Tools and Methods
After evaluating classical protein design tools including RFdiffusion, BindCraft, and the short-peptide design tool PocketXmol, we selected SeedFold, an inverse diffusion model developed by Tiktok based on AlphaFold2(1). Compared to the aforementioned tools, SeedFold offers a more user-friendly interface, operates entirely online, and provides more convenient parameter configuration and visualization, making it suitable for undergraduate teams engaged in small-scale protein ligand development.
However, SeedFold imposes certain constraints: users must specify the protein sequence length and the docking core sequence (≤10 amino acids). To comply with the competition rule of "no self-designed active sites," we developed an innovative strategy: targeting the RING domain of RBX1 (≥40 amino acids), we utilized the maximum allowed displacement of 10 amino acids in SeedFold to traverse the full-length protein using windows of 41-50, 51-60, ..., 91-100.
The experiment was conducted in three rounds. In the first round, we selected the boundary values of 70-120 amino acids for ligand protein sequence lengths allowed by SeedFold, with a step size of 10 amino acids, obtaining 180 ligands across 6 length nodes and 6 displacement regions, of which 44 were usable. Based on the first-round results, we observed that high-quality ligands concentrated around 80 amino acids. Consequently, the second round was designed with 80-100 amino acids and a step size of 5 amino acids, yielding 150 ligands, of which 37 were usable. This round further revealed that while 80-amino-acid ligands demonstrated optimal potential in affinity and other metrics, their larger RMSD values hindered subsequent predictive screening. Therefore, the third round was set to 85-95 amino acids with a step size of 2 amino acids, designing 90 ligands in total, of which 29 were usable.
Usable sequences were screened based on the following criteria: ipTM (I)≥0.6, confidence rating (Great=1/Good=0.8/Borderline=0.5/Bad=0.2, Q), pLDDT (P)≥70, min PAE interaction (M)≤2.0, pTM binder (B)≥0.75, and RMSD (R)≤2.0. We developed an algorithm (Score=0.25*(1-M)+0.20B+0.15P/100+0.15I+0.20(1-R)+0.05*Q) to quantify and rank these metrics. Subsequently, usable sequences were uploaded to HDock for affinity screening and to AlphaFold3 for structural screening(2-7). High-confidence structures output by AlphaFold3 were manually inspected. Sequences passing the screening proceeded to molecular dynamics (MD) simulation using OpenMM. Finally, online BLAST comparison against the UniRef50 database provided by UniProt ensured at least 25% sequence novelty.
- Limitations and Prospects
This experiment demonstrates that the workflow of "design → docking prediction → screening → guided redesign" can progressively converge toward optimal target ligands. Nevertheless, the number of candidate chains remains relatively limited due to tool and resource constraints. Furthermore, the docking prediction and screening phases still rely heavily on manual inspection; in fact, AI could be fully integrated to perform large-scale screening, with all stages unified into a single algorithm for autonomous AI evolution. Given sufficient time, wet experimental data from each round could also drive AI iteration, potentially yielding more accurate designs.
Regarding the RBX1 protein in this competition, multi-round AI design evolution indicates that binding protein sizes around 85 amino acids, targeting the 41-50 amino acid region of RBX1, are particularly effective. Combined with known RBX1 active site information, we believe that further drug development research should indeed focus on exploration within these parameter ranges.
- Yi Z, Chan L, Ma Y, Wei Q, Fei Y, Kexin Z, et al., editors. SeedFold: Scaling Biomolecular Structure Prediction2025.
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947-8.
- MartÃ-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000;29:291-325.
- Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988;85(8):2444-8.
- Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2011;9(2):173-5.
- Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539.
- Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493-500.