This time around, I wanted to tackle the challenge using diverse models rather than sticking to structure-based optimization. I finally decided to work with three types: Language model (ESM3) Structure-based model (Custom BindCraft) Diffusion-based model (Boltzgen)
Language models are computationally less expensive, so it is easier to experiment with them. The idea was that if glycoprotein binds to Ephrin-B2, then why not modify it to make it more potent, so that the virus proteins bind to the modified Ephrin-B2 with higher affinity than the native protein itself. Considering this scenario, we began experimenting with modifying residues at the tial ends of the protein as well as at hotspots where the viral protein binds.
Because of the large number of target proteins, it is tough to make the BindCraft work with the whole structure. Even if you could, glycoproteins are very hard to target. There were multiple ways in which the target protein could be trimmed. But if protein sequences are cut sequentially, you leave many hotspots where the protein binds. The best way we found was to keep only the hotspot protein and its neighbouring proteins at a distance of 10 Ã….
But trimming proteins reduces the model's ability to see context, resulting in many structures of lower quality. Nonetheless, it makes a huge difference in run time. We tried to generate a structure on the trimmed structure, but ran the filter on the whole complex separately. Interestingly, many designs were promising, and the AF3 metrics (ipTM and pTM) appear to have high confidence scores.
We tried generating many structures using BoltGen (~20,000). The generated binders were structurally diverse. Although the binder looked good, the AF3 metrics were not. I feel that diffusion-based method still lacks the granularity needed to understand the target protein. It will be interesting to see how others have applied the BoltGen methods and how successful they have been experimentally.
id: brisk-kiwi-cedar

Nipah Virus Glycoprotein G
0.83
81.54
--
8.8 kDa
72
id: mellow-vole-cypress

Nipah Virus Glycoprotein G
0.81
87.14
--
13.4 kDa
124
id: silent-owl-topaz

Nipah Virus Glycoprotein G
0.77
83.55
--
15.7 kDa
137
id: radiant-bat-flint

Nipah Virus Glycoprotein G
0.77
85.87
--
15.6 kDa
137
id: vast-panther-reed

Nipah Virus Glycoprotein G
0.73
85.93
--
15.5 kDa
137
id: hollow-raven-vine

Nipah Virus Glycoprotein G
0.68
83.23
--
15.7 kDa
137
id: silent-cobra-snow

Nipah Virus Glycoprotein G
0.66
79.40
--
16.9 kDa
144
id: vast-lion-topaz

Nipah Virus Glycoprotein G
0.65
84.79
--
15.8 kDa
137
id: young-panda-granite

Nipah Virus Glycoprotein G
0.61
83.86
--
15.7 kDa
137
id: silver-ox-bronze

Nipah Virus Glycoprotein G
0.31
82.02
--
15.4 kDa
137