Diffusion Sequence Model (DSM) is a novel pLM trained with masked diffusion to enable both high-quality representation learning and generative protein design. DSM builds upon the ESM2 architecture by incorporating a masked forward diffusion process inspired by the LLaDA framework. After training, DSM is capable of generating diverse, biomimetic sequences that align with expected amino acid compositions, secondary structures, and predicted functions, even with 90%+ token corruption. DSM can be used to generate protein binders by combining it with various scoring functions, like probability of PPI, predicting binding affinity, and others. Here we highlight a common pipeline used by Synthyra to map potential binders to promising submissions: When existing binders are known, DSM-650 (https://huggingface.co/GleghornLab/DSM_650) can be used to create a 100,000s to millions of diverse variants. Without known binders, we recommend DSM-ppi (https://huggingface.co/Synthyra/DSM_ppi_full) to hallucinate them from scratch. These variants are then put through a multi-step in silico screening process. First, they are evaluated for their potential to bind to the target and their binding strength via the Synthyra PPI API (https://synthyra.com/). The top 1,000 candidates are then ranked by a composite score that considers the quality and confidence of their predicted 3D structure and sequence likelihood using ESM2 and ESMfold. From this refined list, the top 100 are selected, and their structures are modeled with an AlphaFold3 equivalent like Chai1. The final candidates for laboratory testing are chosen based on the quality of these modeled structures. The method also allows for a few exceptions, where variants with exceptionally high predicted binding affinity or other interesting characteristics can also be successful when included for testing, even if they do not pass all the filtering steps.
id: mellow-kiwi-granite

PD-L1
None
--
True
13.7 kDa
120
id: frozen-fox-granite

PD-L1
Strong
8.3e-9 M
True
13.5 kDa
120
id: jade-dove-quartz

PD-L1
None
--
True
13.8 kDa
120
id: quick-yak-sand

PD-L1
None
--
True
13.7 kDa
120
id: misty-owl-lotus

PD-L1
Medium
7.8e-7 M
True
13.4 kDa
120
id: strong-tiger-flint

PD-L1
Strong
2.8e-8 M
True
13.5 kDa
120
id: steady-owl-cedar

PD-L1
None
--
True
13.5 kDa
120
id: soft-ant-oak

PD-L1
Weak
1.1e-6 M
True
13.8 kDa
120
id: lunar-dove-lotus

PD-L1
Medium
9.1e-7 M
True
13.4 kDa
120
id: violet-vole-maple

PD-L1
Medium
9.1e-8 M
True
13.5 kDa
120
id: radiant-bee-willow

PD-L1
Medium
7.8e-7 M
True
13.7 kDa
120
id: noble-vole-moss

PD-L1
Medium
1.8e-7 M
True
13.6 kDa
120
id: misty-zebra-cypress

PD-L1
Weak
5.6e-7 M
True
13.5 kDa
120
id: brisk-shark-clay

PD-L1
Medium
3.1e-7 M
True
13.7 kDa
120
id: rapid-ox-reed

PD-L1
Medium
2.1e-7 M
True
13.4 kDa
120
id: bright-wolf-sand

PD-L1
Weak
1.3e-6 M
True
13.5 kDa
120
id: shy-shark-dust

PD-L1
None
--
True
13.6 kDa
120
id: dark-crane-lava

PD-L1
Medium
1.7e-7 M
True
13.7 kDa
120
id: strong-hawk-bronze

PD-L1
Medium
5.8e-7 M
True
13.6 kDa
120
id: noble-panda-granite

PD-L1
Strong
1.4e-8 M
True
13.6 kDa
120
id: deep-kiwi-birch

EGFR
None
--
False
26.3 kDa
241