Employs contrastive learning with three core alignment strategies: (1) using structure as the supervision signal for AA sequences and vice versa, (2) mutual supervision between sequences and functions, and (3) mutual supervision between structures and functions. This tri-modal alignment training enables ProTrek to tightly associate SSF by bringing genuine sample pairs (sequence-structure, sequence-function, and structure-function) closer together while pushing negative samples farther apart in the latent space.
ProTrek achieves over 30x and 60x improvements in sequence-function and function-sequence retrieval, is 100x faster than Foldseek and MMseqs2 in protein-protein search, and outperforms ESM-2 in 9 of 11 downstream prediction tasks.
id: violet-yak-dust

EGFR
None
--
True
18.4 kDa
159
id: gentle-crane-moss

EGFR
None
--
True
17.4 kDa
150
id: misty-fox-fern

EGFR
None
--
True
20.8 kDa
176
id: soft-goat-moss

EGFR
None
--
True
6.8 kDa
56
id: brisk-ram-clay

EGFR
Medium
2.2e-7 M
True
6.8 kDa
56
id: lunar-panther-onyx

EGFR
None
--
True
6.6 kDa
56