Hacker News

new
show
ask
jobs

Batched reward model inference and Best-of-N sampling

33 points

by rawsh

4 days ago

story

loading...