VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO 342 points by timhigins 16 hours ago 175 comments story