pLM trained using GRPO on LLM agent sequences

Description

Method II: Supervised Fine-Tuning and Reinforcement LearningLeveraging the high-quality synthetic data generated in Method I -- we advanced to a sequence-based approach using a large proprietary Protein Language Model (pLM) developed at the Ferruz Lab.Supervised Fine-Tuning (SFT):We first performed supervised fine-tuning on the pLM using the curated sequences obtained from the agent-guided structural design loop. This stage adapted the model's sampling distribution to reflect the structural and functional motifs identified in the previous method.Reinforcement Learning (RL): To further align the model with functional requirements, we applied Reinforcement Learning to the SFT version of the model. We utilized a reward function derived from the key structural metrics established in Method I (including iPAE and pDockQ). This process directed the model to prioritize the generation of sequences that maximize these stability and binding affinity indicators.