ProtRL was applied to a fine-tuned pLM using epirin B as the starting point. The REINFORCE algorithm was used with length, PAE, shape complementarity, LIS, IPTM_D0chn, dRMSD, IPSAE, and the number of clusters as the reward function. First selection was based on the implicit reward followed by in-silico metrics.