Refract
your LLM.
See what's actually there.

aLLMost overlays a thin lens on claude.ai, chatgpt.com, gemini.google.com, deepseek.com, kimi.com and grok.com — and splits every response into its component signals: confidence, evasion, intent, similarity. The model still talks. You finally see how.

Local-first.  No telemetry. No backend. Your prompts never leave your machine.
hey, thinking about rebuilding my personal site
Sure — what does the current setup look like? Static, CMS, hand-rolled?
really old jekyll site, haven't touched it in years
Makes sense. Mainly chasing a visual refresh, or modernizing the toolchain too?
which static site framework should I use
Similar prompt found · match 0.91
"which framework for a small static site…"
asked on Claude · 18 days ago
Astro for small sites — zero JS by default, opt-in islands…
really old jekyll site, haven't touched it in years
Mainly chasing a visual refresh, or modernizing the toolchain too?
which static site framework should I use?
Both Astro and Hugo are popular choices for static sites. Hugo is known for incredibly fast build times, while Astro emphasizes partial hydration. Your specific needs might differ, so it really depends on your team's preferences. Astro uses a component-based architecture while Hugo is Go-template-driven. Astro lets you write in React, Vue, or Svelte, whereas Hugo's Go templates have a steeper learning curve. I can't really give a definitive recommendation without more context about your setup. Both have strong community support and solid documentation.
evasion: 58% · hedged speculative overclaim
I have a queue worker in node that's choking under load
How are you reading from the queue — pull, push, or via a broker?
pulling in batches of 50 every second from sqs
Right — and processing each batch synchronously? That'd explain the back-pressure.
refactor this async handler with backpressure
INTENT: coding SWE-bench Δ +7
Coding task — Claude outranks ChatGPT here. Open on claude.ai?
Claude 95
ChatGPT 88
SWE-bench · agentic coding
Open in Claude →
I keep getting different answers from each model for the same question
That's expected — different training data, different reasoning styles. Are you tracking which one you trust more per task?
kind of — claude for code, chatgpt for writing
Reasonable split. Are you measuring it, or going by gut feel?
mostly gut feel honestly
Worth quantifying. Score outputs by accuracy and hedging level — you'd see the pattern within a month.
how would I even start tracking that without it becoming a chore
Tag each conversation by task type, then a 1–5 score on the response. Patterns surface fast.
does anyone publish leaderboards for this kind of thing?
Public ones exist — LMSYS, MT-Bench, SWE-bench. But they don't capture YOUR task mix or YOUR taste in answers.
right, that's the problem. my workflow is weird
Exactly. A personal scoreboard built from your own captures is the only one that actually predicts your satisfaction.
also the rankings shift constantly with new models
Yeah — last month's winner is this month's mid-tier. Rolling 30-day windows beat all-time rankings.
would the scoring need to be manual or could it be automated
Mix of both. Hedge density and refusal patterns can be auto-detected; perceived helpfulness still needs your thumb.
interesting. and per-platform breakdowns?
That's the most useful view — Claude's coding score vs ChatGPT's coding score, side by side, over time.
QUALITY · LAST 30 DAYS
CLAUDE 84
DEEPSEEK 78
CHATGPT 71
GEMINI 65
I feel like I keep asking the same questions over and over
That's super common. Do you remember roughly how many distinct topics you cycle through?
not really. I just open a new chat each time
Right — and the prior context disappears. Worth keeping a personal index of recurring themes.
tried notion. it just becomes another graveyard
True. The trick is making it surface itself when you're about to ask — not when you go looking later.
that would actually be useful
Agreed. Surfacing past prompts contextually beats retrospective search every time.
how would the matching work though — exact strings?
Semantic, not literal. Embedding similarity catches "react state in nested trees" as the same topic as "deep component hierarchy state".
and if I rephrase it differently each time?
That's exactly when this matters most — you don't realize you've solved the same problem three weeks ago.
does it work across different LLMs too or just one?
Across all of them. Your prompts on Claude, ChatGPT, Gemini all live in the same local index.
that solves a real pain. context-switching costs me hours weekly
It compounds — every time you re-derive a result you already had, you also re-derive your own confusion getting there.
would clusters auto-update as I add more prompts?
Yes — agglomerative clustering on top of TF vectors, recomputed when the corpus shifts. New prompts merge into existing clusters or seed new ones.
REUSE CLUSTERS · TOP 4
React state in nested trees 14 · 3 platforms
Async streaming + backpressure 9 · 2 platforms
Static site generators 7 · 4 platforms
SQL window functions 5 · 2 platforms
can you help me digest a long earnings call transcript?
Of course — paste it or upload the PDF and tell me what to extract.
find every mention of margin guidance and the questions analysts asked
Got it. I'll pull the relevant Q&A pairs and group them by analyst.
also flag any spots where the CFO sidestepped a direct question
Will do — I'll mark hedges, deflections, and forward-looking caveats separately.
perfect. one more thing — keep it under a single page
summarize this PDF transcript into a brief
INTENT: summarization CALIBRATED · 47
Your accepts favor Claude — beats benchmark by 14.
Benchmark 78
Personal 92
70/30 blend
Open in Claude →
EXPORT · 1,247 PROMPTS
[ { "platform": "claude", "prompt": "refactor async handler with backpressure…", "evasion": 0.18 }, { "platform": "chatgpt", "prompt": "summarize this PDF transcript into a brief…", "evasion": 0.42 }, { "platform": "gemini", "prompt": "explain React Context API re-render behavior…", "evasion": 0.31 }, { "platform": "claude", "prompt": "which static site framework should I use…", "evasion": 0.58 }, { "platform": "deepseek", "prompt": "SQL window function for cumulative sum…", "evasion": 0.12 }, { "platform": "kimi", "prompt": "design a Postgres schema for multi-tenant…", "evasion": 0.24 }, { "platform": "grok", "prompt": "trade-offs of agglomerative vs MinHash clustering…", "evasion": 0.39 }, … 1,240 more ]
⬇ Download JSON
Scene 1 / 7Déjà Prompt
§ 01 · Architecture — three layers

Three layers, one lens. Before the prompt. After the response. Across every conversation.

Pre-Send intelligence.

aLLMost watches your input field. Before you commit, it asks two questions: have you asked this before, and is this the right model for the job?

  • Déjà Prompt — semantic match against your history free
  • Model Router — best LLM for this prompt · 8 intent categoriespro

Post-Receive analysis.

Sentence-level confidence coloring plus a per-response evasion score that quantifies how much the model hedged, refused, or overclaimed.

  • Confidence heatmap — 300+ patterns · 4 categories free
  • Evasion detection — per-response % from the classifier free

Cross-Session dashboard.

Per-platform quality trends, prompt reuse patterns, router accuracy feedback. The longer you use it, the more the lens calibrates to you.

  • Quality scores per platform pro
  • Reuse cluster analysis pro
  • Personalized routing — improves over time pro
  • Prompts export — download your history as JSON pro
§ 02 · Surfaces — what you'll actually see

Quiet overlays. Never in the way.

aLLMost doesn't replace claude.ai or chatgpt.com — it adds a thin, dismissible layer. Glassy cards anchored to the input. Inline annotations on responses. Nothing more than is earned.

Déjà Prompt pre-send · violet 400nm

Floats above the input when something semantically similar already lives in your history — on any platform.

Similar prompt found · match 0.91
"How do I handle React state in deeply nested component trees…"
asked on Claude · 38 days ago
For deep trees, lift state to the closest common ancestor, or reach for useReducer. Context API works but causes re-renders across all consumers…
Model Router pre-send · cyan 490nm

Classifies intent and recommends a different platform only when the gap is meaningful — never naggy.

INTENT: coding SWE-bench Δ +7
This looks like a coding task. Your signals show that Claude meaningfully outranks ChatGPT here — open the same prompt on claude.ai?
Claude 95
ChatGPT 88
SWE-bench score, agentic coding, tool-use benchmarks
Open in Claude →
Confidence heatmap post-receive · inline tints

Every sentence is scored against 300+ hedging, speculation, and overclaim patterns — and tinted in place, without touching the rest of the page.

React's useState hook is the simplest way to manage local component state. For deeply nested trees, you might want to consider useReducer instead. I'm not entirely sure which approach will perform best in your specific case. This pattern will always scale to any application size. Context API works but it's worth knowing it triggers re-renders in every subscribed component.

hedged speculative overclaim
Evasion detection post-receive · feedback + evasion %

One unobtrusive pill anchored bottom-right of every response. Thumbs up/down, one-glance evasion %, and the heatmap legend.

Personalized routing pre-send · calibrates to you

The router starts from public benchmarks. Every accept, dismiss, and thumbs-up/down nudges your personal score per (intent, platform). Over time it diverges from the crowd toward what works for you.

INTENT: summarization CALIBRATED · 47 signals
Your past accepts favor Claude here. Personal score outranks the benchmark by 14 points.
Benchmark 78
Personal 92
Blended 70/30 from your accept + dismiss history
Open in Claude →
Quality scores pop-up · per-platform trend

A rolling 30-day score per platform, derived from your captures. Aggregates heatmap redness, response length on instruction-following prompts, and your thumbs.

QUALITY · LAST 30 DAYS
CLAUDE 84
DEEPSEEK 78
CHATGPT 71
GEMINI 65
Reuse clusters pop-up · agglomerative grouping

Online clustering on your captured prompts using TF vectors and cosine similarity. Surfaces topics you keep cycling through — across every platform you use.

REUSE CLUSTERS · TOP 4
React state in nested trees 14 · 3 platforms
Async streaming + backpressure 9 · 2 platforms
Static site generators 7 · 4 platforms
SQL window functions 5 · 2 platforms
Prompts export pop-up · download as JSON

Full local history downloadable as a single JSON file — platform, prompt, evasion %, timestamp. Your data, in your hands, in a portable format.

EXPORT · 1,247 PROMPTS
[ { "platform": "claude", "prompt": "refactor async handler with backpressure…", "evasion": 0.18 }, { "platform": "chatgpt", "prompt": "summarize this PDF transcript into a brief…", "evasion": 0.42 }, { "platform": "gemini", "prompt": "explain React Context API re-render behavior…", "evasion": 0.31 }, { "platform": "deepseek", "prompt": "SQL window function for cumulative sum…", "evasion": 0.12 }, … 1,243 more ]
⬇ Download JSON
§ 03 · Coverage — six platforms, all live

Wherever you talk to the model, aLLMost is the lens.

The extension uses fetch-layer interception, not DOM scraping.

Claude
claude.ai
ChatGPT
chatgpt.com
Gemini
gemini.google.com
DeepSeek
chat.deepseek.com
Kimi
kimi.com
Grok
grok.com
§ 04 · Local-first commitment — green 530nm

Your prompts never leave your machine.

No backend. No server. No telemetry.

01 · trust

aLLMost has no API of its own to send your prompts to. The free tier makes zero outbound network requests beyond the LLM platform you're already on.

IndexedDB stores everything locally.

02 · storage

Your prompt history, embeddings, and analytics live in your browser's local database. Cap is 100 prompts free, 1,000 on Pro. Clear it any time from settings.

Even evasion detection runs locally.

03 · classifier

No external API call. Evasion % is derived from the same 300+ hedge/refusal/overclaim patterns that power the free-tier heatmap — classified per sentence, aggregated per response. Nothing about your prompts leaves the browser.

Open source. Auditable. Permission-light.

04 · permissions

Manifest V3 with the minimum surface: storage, activeTab, sidePanel. No CSP rewriting, no debugger access, no host requests beyond the LLM domains you visit.

§ 05 · Pricing — full spectrum

Free works indefinitely.
Pro pays for the heavy analysis.

Free · forever
$0
No account. No card. Install and go.
  • Déjà Prompt (last 100 prompts, FIFO)
  • Confidence heatmap — 300+ patterns
  • Evasion detection — per-response %
Pro · monthly
$7/ mo
Everything in Free, plus the deeper lens.
  • 10× Déjà Prompt history (1,000 prompts)
  • Model Router — best LLM for the task
  • Quality scores per platform with trend dashboard
  • Reuse cluster analysis across your history
  • Personalized routing — calibrates to your signals
  • Prompts export — JSON download
Subscribe — $7/mo
or $59/year — 30% off
§ 06 · Questions — frequently asked

The fine print, without the print.

How is this different from a browser extension that just summarizes responses? +
aLLMost doesn't summarize anything. It surfaces signal that's already in the response — hedging language, evasion patterns, per-sentence confidence — and routes prompts to the model best suited for the task. The output of the LLM is untouched. We add a layer on top.
Does aLLMost see my conversations? +
Only locally, on your machine. Both the free tier AND Pro features run entirely in-browser — no outbound requests beyond the LLM platform you're already on. Evasion detection, quality scores, clusters, and personalized routing all derive from the same local classifiers. Your prompts never leave the device.
Will it break when ChatGPT redesigns its UI again? +
Less than you'd expect. aLLMost intercepts the network layer — the SSE completion endpoint that the chat UI hits — not the rendered DOM. Network protocols are far more stable than UIs. The only DOM piece is a single anchor element to mount the overlay on, with fallback selectors per platform.
How accurate is the confidence heatmap, really? +
The heatmap is a linguistic signal — it scores hedging language, not factual confidence. A sentence saying "The capital of France might be Paris" will score red even though the fact is correct. We're transparent about this: the heatmap tells you when the model is hedging, not when it's wrong. They correlate, but they aren't the same thing.
Can I export my prompt history? +
Yes. Settings → Export Data dumps a JSON file with every indexed prompt, response snippet, conversation URL, and analytics record. Same button has a Clear All Data option, no questions asked.

Start reading around your models.

aLLMost installs in fifteen seconds, uses zero account setup, and starts working the moment you open a chat. Free, indefinitely.