A protein language model from the ProtGPT family was fine-tuned on a synthetic dataset of 10,000 Epririn B variants. Random mutations ranging from 8 to 140 amino acids were introduced to enable broad exploration of the local sequence landscape. Subsequently, ProtRL was applied to align the model toward generating sequences with high IPSAE and favorable in-silico properties, using a multi-objective optimization framework. After reinforcement learning, the aligned model was used to generate a library of 5,000 variants. The 50 sequences with the highest implicit reward were folded and evaluated using structural scoring pipelines, and the top 10 performing designs are submitted here. Alex Vicente and colleagues, Noelia Ferruz Lab, Centre for Genomic Regulation
id: deep-owl-orchid

Nipah Virus Glycoprotein G
0.75
83.20
--
16.0 kDa
140
id: dark-otter-lotus

Nipah Virus Glycoprotein G
0.74
76.45
--
15.8 kDa
140
id: soft-swan-plume

Nipah Virus Glycoprotein G
0.70
82.54
--
15.9 kDa
139