Forge is a sequence-based generative model for binder design built on latent flow matching. We frame binder generation analogously to text-to-image generation, treating the problem as a target-sequence-to-binder-sequence conditional generation task. But instead of operating directly in sequence space, Forge performs generation in the latent space of Raygun, an autoencoder-based model whose embeddings capture semantic and functional relationships. Using a standard flow-matching formulation with classifier-free guidance, we generated binder sequences by conditioning on the NiV G C-terminus sequence.
TRAINING DATA Rather than relying on protein complexes from the PDB (which reflect a small and structurally biased subset of known interactions) we train Forge on a subset of the STRING protein-interaction database (~10 million interacting protein pairs). The scale and diversity of these interactions (includes interactions involving disordered or structurally uncharacterized proteins) allow Forge to learn from a broader distribution of natural binding "strategies". By using sequence data, our goal is to recover the diversity and flexibility of nature’s binders, rather than being constrained to the well-explored solved structure space.
id: hollow-panther-rose

Nipah Virus Glycoprotein G
None
31.80
False
17.6 kDa
155
id: crimson-hawk-cedar

Nipah Virus Glycoprotein G
None
32.64
True
15.6 kDa
135
id: calm-ram-sand

Nipah Virus Glycoprotein G
None
31.51
True
15.6 kDa
135
id: lunar-moth-wave

Nipah Virus Glycoprotein G
None
35.89
False
20.6 kDa
179
id: soft-crane-frost

Nipah Virus Glycoprotein G
None
36.42
False
23.3 kDa
207
id: misty-moth-birch

Nipah Virus Glycoprotein G
None
32.54
False
18.1 kDa
155
id: quiet-mole-ruby

Nipah Virus Glycoprotein G
None
32.40
False
14.5 kDa
130
id: rapid-otter-crystal

Nipah Virus Glycoprotein G
None
31.22
False
23.0 kDa
200
id: lunar-eagle-leaf

Nipah Virus Glycoprotein G
None
34.31
True
20.5 kDa
180
id: brisk-cat-orchid

Nipah Virus Glycoprotein G
0.76
25.68
--
15.9 kDa
146