The idea is to leverage the NCBI Virus database to map variants of the Glycoprotein G to host organisms. It enables to build a virus-host paired MSA (sequences aligned with the Glycoprotein G and sequences aligned with Ephrin-B3 receptor with MAFFT). Next, many MSAs are built from this "main" paired MSA, keeping all the columns corresponding to the Glycoprotein G and selecting different columns corresponding to binding regions of Ephrin-B3. I selected regions starting between residues 107 and 115 (numbering of the uniprot sequence Q15768), of size between 10 to 18 AAs.
Then, similarly to what had been done previously for multimer prediction with AF2 (i.e. predicting complexes while the model was only trained on single chains), Evodiff was employed to design thousands of peptide sequences. input: these MSAs (for each MSA, 10 inputs obtained by subsampling to 64 sequences), masking only the sequence of the segment of Ephrin-B3 (i.e. the model has access to the paired MSA + the sequence of the Glycoprotein G, and it designs new sequences). About 5k sequences were generated following this protocol.
The sequences were screened based on AF3 ipTM and the best ones were kept for further optimization. Note that this step is the first one that relies on structure prediction, and that the sequences generated by Evodiff were not explicitly generated to minimize a confidence metric given by a structure prediction model.
For the optimization, the AF3 webserver was used to iteratively mutate or trim the most promising peptides, based on global and per-residue confidence metrics and on chemical/structural rationale. For example, solvent-exposed residues could be mutated to charged AAs to increase solubility while avoiding decreases in AF3 confidence metrics.
After filtering out sequences containing too many hydrophobic residues, the sequences were finally selected for submission based on a consensus of in silico metrics including ipTM, pDockQ, pDockQ2, ipSAE. In the submission, the candidates result from the "artificial evolution" and optimization of the segment KFQEYSPNLWGHEFRS from Ephrin-B3.
id: silent-ibis-cloud

Nipah Virus Glycoprotein G
0.79
66.99
--
1.3 kDa
10
id: frozen-boar-bronze

Nipah Virus Glycoprotein G
0.79
62.25
--
1.3 kDa
10
id: scarlet-dove-ivy

Nipah Virus Glycoprotein G
0.76
63.78
--
1.3 kDa
10
id: amber-toad-ice

Nipah Virus Glycoprotein G
0.76
61.48
--
1.2 kDa
9
id: rough-owl-orchid

Nipah Virus Glycoprotein G
0.76
62.54
--
1.4 kDa
11
id: brisk-panther-quartz

Nipah Virus Glycoprotein G
0.75
63.15
--
1.3 kDa
10
id: violet-owl-quartz

Nipah Virus Glycoprotein G
0.74
61.27
--
1.3 kDa
10
id: lunar-gecko-onyx

Nipah Virus Glycoprotein G
0.74
60.74
--
1.4 kDa
11
id: quick-zebra-birch

Nipah Virus Glycoprotein G
0.73
61.57
--
1.4 kDa
10
id: steady-falcon-cedar

Nipah Virus Glycoprotein G
0.72
62.46
--
1.3 kDa
10