Study identifies weaknesses in how AI systems are evaluated 416 points by pseudolus 4 weeks ago 192 comments story