Submission 5

Description

I started from the structural information of 8XPY and focused on the biologically relevant ephrin-binding patch (A490–570), including the key hotspot residues 496, 504, 507, 530, and 533. My first phase was to generate diverse backbone candidates using RFdiffusion (core and extended windows) and then sequence-design them with ProteinMPNN. After folding them with ColabFold and evaluating the interfaces with Boltz-2 and ipSAE, I collected the best-performing binders and used them as strong positive examples.

In parallel, I prepared a cleaner and more targeted version of the Propedia dataset, trimming out low-quality complexes and keeping only structures that meaningfully resemble the sort of interfaces we want for Nipah-G. I merged these curated Propedia examples with my RFdiffusion/MPNN hits and used this combined high-quality dataset to re-fine-tune ESM2 (650M) in a PepMLM-style masked modeling setup. To help the model focus more on strong interfaces, I added a simple quality-weighted loss so complexes with more hotspot engagement and cleaner geometry had slightly higher influence during training. The idea was to push the model to internalize what a good Nipah-like binder should look like.

After training, I switched to a controlled generation phase. I sampled binders of fixed lengths (70 and 80 aa), used temperature + top-k + top-p sampling to increase diversity, and ranked the sequences by pseudo-perplexity. From these, I ran monomer folding, filtered by pLDDT, built complexes, and passed everything through Boltz-2 and ipSAE again. Once I had a shortlist, I also added an ipSAE-guided local refinement step where I mutated only interface residues and re-evaluated the candidates to see if small, local changes could further improve interface quality. This step actually worked well and produced several improved variants.

In the final stage, I gathered all Boltz-2 + ipSAE results (original + refined), computed structural metrics directly from the PDBs (interface pLDDT, BSA, H-bonds, hotspot engagement, and target patch specificity), and used these to build a final ranking. The goal wasn’t just to take the single highest ipSAE, but to select a balanced set—high ipSAE, good geometry, correct epitope targeting, and some diversity in shape and interface size. After this full triage, I ended up with four strong final candidates that consistently hit the right region, show solid interface confidence, and score well in ipSAE.

Overall, the pipeline now covers every step: dataset building, targeted fine-tuning, generative sampling, structure prediction, ipSAE evaluation, refinement, and final structural triage. It’s a proper closed loop where each stage improves the next, and it gave me a set of binders I feel are solid enough for submission to the competition.

Proteins (1)

TableGrid

YSYadi Shahryary

id: azure-crow-cypress

Binder

Miniprotein

ESM2 Optimization

Target

Nipah Virus Glycoprotein G

None

85.48

True

8.0 kDa