Lilian Weng @lilianweng
Rule-based rewards (RBRs) use model to provide RL signals based on a set of safety rubrics, making it easier to adapt to changing safety policies wo/ heavy dependency on human data. It also enables us to look at safety and capability in a more unified lens as a more capable
OpenAI @OpenAI
We’ve developed Rule-Based Rewards (RBRs) to align AI behavior safely without needing extensive human data collection, making our systems safer and more reliable for everyday use. https://t.co/54IgGtwKdv
