Show HN: Tri·TFM Lens – 5-axis quality evaluation for ChatGPT/Gemini responses

2 points

7 hours ago

1 comments

story

I built a Chrome extension that evaluates AI chatbot responses across 5 dimensions: Emotion (tone fit), Fact (verifiability), Narrative (structure), Depth (explains WHY vs just WHAT), and Bias (directional framing).

One click next to any ChatGPT or Gemini response → 2 seconds → full quality profile with a Balance score (STABLE/DRIFTING/DOM).

Some results that surprised me:

- "How are you?" → DRIFTING. High emotion, zero facts, zero depth. - "Why don't antibiotics work on viruses?" → STABLE, Fact=0.95, Depth=0.75 - Persuasive prompts → Bias=+0.72. The model doesn't pretend to be neutral. - Philosophical answers → Fact=0.40 even with citations. Citing Kant doesn't make unfalsifiable claims verifiable.

The Fact axis uses a 3-step calibration: classify the question as falsifiable or not → apply a ceiling → score within it. This transfers across models at r=0.96.

Interesting negative finding: RLHF-trained models compensate for shallow prompts by adding unsolicited explanations. The Depth axis rubric works (5/5 on controlled responses) but in practice models over-explain everything.

Stack: Manifest V3, vanilla JS, Gemini Flash API as judge, Balance computed client-side. Uses your own API key, no data stored.

Research paper with full methodology and 100-prompt validation available on request.