News

In the post-training phase, beyond human preference alignment for dialogue scenarios, techniques like rejection sampling and reinforcement learning were employed to specifically enhance ...