Hacker News
new
show
ask
jobs
Batched reward model inference and Best-of-N sampling
33 points
by
rawsh
4 days ago
story
loading...