There is significant discussion in the academic literature about RL making models better at pass@1 and *worse* at pass@N (or related claims).
We run a lot of RL runs at Cursor and don't see this iss
The user discusses the debate in academic literature regarding reinforcement learning's impact on model performance.