FASTEST LLM decode engine on Apple Silicon. 658 tok/s on M4-Max,beats MLX by 19% 5 points by sanchitmonga 15 hours ago 3 comments story