Published · 25 May 2026 · 8 min read

How to check if your Claude prompt is any good.

Most tools that show up for "Claude prompt checker" are actually generators in disguise. They rewrite your prompt instead of evaluating it. Here's the six-lever check a real checker uses, with a free in-browser version.

Searching "Claude prompt checker" right now is one of the more dishonest experiences in the AI tools space. You type the word checker meaning "tell me whether the prompt I already wrote is any good." Google hands you back ten generators instead. Tools that ignore your prompt entirely and rewrite it from your topic, with no scoring and no diagnosis. They are not solving the problem you asked them to solve. They are selling you a different product.

This post is the real version. It is the checklist a working marketer (or anyone using Claude seriously) actually runs in their head before sending a prompt. There is a free in-browser tool at the end that automates the scoring, but the framework matters more than the tool, so read the framework first.

Why generators dominate the SERP

"Prompt checker" gets indexed by tools that contain the word checker but do not actually check anything. The product is easier to build (a wrapper around the Claude API that rewrites your prompt) than a real evaluator (which has to define what good means, score against it, and surface diagnoses you can act on).

If the tool you tried did not return a numerical score, did not tell you which dimension was missing, and did not give you a path to fix it, it was a generator, not a checker. You can stop using it. It is not failing you because you used it wrong.

What a real prompt checker is testing for

A useful Claude prompt has six structural levers. A bad prompt is missing one or more of them. A "checker" that does not score these six is doing something else entirely. Here they are. You can run the check by hand in two minutes, on any prompt you have written.

Lever 1 of 6

Output specification

Does the prompt tell Claude the exact shape of the output? Columns, word count, sections, format, structure. "Write me a brief" leaves shape undefined, so Claude defaults to the average shape, which is the generic shape.

PASS: The prompt says "respond with a 4-column table" or "give me three sections labeled X, Y, Z, each 100-200 words."

FAIL: The prompt asks for "a brief" or "an analysis" or "thoughts" with no structural constraint.

Lever 2 of 6

Refuse-condition

Does the prompt tell Claude when NOT to answer? Without one, Claude defaults to maximally helpful, which on a question the data cannot support looks like a confident answer to a question that does not have one.

PASS: The prompt includes "refuse if X" or "do not invent" or "if there is not enough data, say so."

FAIL: No refusal language. The prompt assumes any answer will be a good answer.

Lever 3 of 6

Real source material

Does the prompt require you to paste actual input in? A prompt that works with no inputs produces generic output by definition. The classic failure: "act as a senior marketer and tell me about Q3 priorities."

PASS: The prompt has placeholders like {{BRAND_VOICE}} or instructions like "paste the following: brand guidelines, last quarter's report, ICP doc."

FAIL: Nothing to paste. The prompt can be run with no inputs and still return something.

Lever 4 of 6

Conversational register

Is the prompt written like a brief to a colleague, or like a search query? Claude mirrors what you give it. Command-shaped input produces command-shaped output. Terse in, terse out.

PASS: The prompt is 25+ words, written in full sentences, with context for why you are asking. Reads like a memo, not a Google search.

FAIL: Under 25 words. No sentence structure. All caps or all lowercase. Reads like you are paying per character.

Lever 5 of 6

Uncertainty flagging

Does the prompt ask Claude to mark what is inferred vs. what is grounded in your input? If every claim comes back at the same confidence, you over-trust the output or you have to re-verify every claim by hand.

PASS: The prompt says "mark which claims are inferred" or "flag any assumptions" or "cite the source for each point."

FAIL: No instruction to distinguish inferred from grounded. The output is one undifferentiated block of confident statements.

Lever 6 of 6

Skip condition

Does the prompt give Claude permission to tell you the task itself is a bad idea? Prompts written by people who actually use them include a way to push back. Prompts written to look impressive never do.

PASS: The prompt includes "tell me if this is not worth doing" or "if the angle is unwinnable, say so instead of writing the brief."

FAIL: The prompt assumes the task is correct. Claude can only execute, not advise.

A worked example: scoring a real prompt

Here is a prompt I see almost every week. It comes from a junior marketer who has been told to use Claude. It looks reasonable. It scores 1 out of 6.

The original promptAct as a senior marketing strategist. Write me a Q3 content marketing strategy that drives engagement and builds brand awareness.

Lever-by-lever:

Output spec: FAIL. "Strategy" has no shape. Three paragraphs? A table? A 5,000-word doc? Claude will guess.
Refuse-condition: FAIL. No refusal anywhere. Claude will produce a strategy whether the inputs justify one or not.
Real source material: FAIL. No inputs. No brand. No ICP. No KPIs. No past performance. This is a generator prompt.
Conversational register: PASS. It is 22 words, written in a sentence. Just about clears the bar.
Uncertainty flagging: FAIL. Nothing asks Claude to distinguish what it knows from what it guessed.
Skip condition: FAIL. Claude has no way to say "I cannot write a strategy without seeing the brand."

Score: 1 out of 6. The output is going to be confident, generic, and useless. The fix is not a different model or a better tool. It is rewriting the prompt to address the five missing levers.

The same prompt, scored 6 out of 6I am the marketing lead at a B2B SaaS for mid-market HR teams. Q3 starts in 6 weeks. I am about to plan content. I need a working brief, not a strategy deck. Paste below: our voice guide, last quarter's content performance (CSV), our top 5 competitor blog posts, our ICP one-pager. Based on those four inputs, produce: 1. A 4-row table of recommended content themes for Q3, with for each: theme, target query, why now (cite which input drove it), kill criterion. 2. A "do not write" list of 3 themes that would underperform, with the reason from the inputs. Rules: - If the inputs do not support a recommendation, say so. Do not invent. - For each theme, mark which parts are grounded in the inputs vs. inferred. - If the entire brief is not worth writing because the inputs are too thin, tell me what is missing and stop.

That version specifies the output shape, includes a refuse-condition, requires real source material, reads as a brief, asks for uncertainty flagging, and has a skip condition. Same model, same task. The difference is the prompt.

Why I built a tool that does this in the browser

You do not need a tool to run this check. You can do it on paper. But doing it on paper requires you to remember six things every time, and most people are running the check after they have already mostly written the prompt, when their attention has moved on.

So I built the free prompt audit tool. Paste your prompt, click run, get a score out of 6 with each lever flagged as present or missing. It runs entirely in your browser. Nothing is sent anywhere. There is no email gate. View source if you do not believe me, the detection heuristics are right there in the page.

It is honest about its limits: each lever uses a transparent keyword heuristic to pre-set a hint, and you make the final call. The tool does not pretend to be smarter than you. It is a checklist with memory.

The honest caveat

A score of 6 out of 6 does not guarantee a good output. It guarantees that the structural failure modes are not the reason if the output is bad. The remaining ways to fail are: bad inputs (a voice guide that does not represent the voice), wrong task (using Claude for something a calculator would do better), or bad judgment on what to ship (the slop tax: confidently shipping mediocre output because the prompt scored well).

The check is necessary, not sufficient. It tells you whether the prompt itself is sound. The rest is on you, which is the right place for it to be.

Audit your own prompt right now (free, in your browser).

Paste any prompt. Get a 0 to 6 score against the six levers above. See exactly which levers are missing. Copy the result text or screenshot the card to share. No email gate. No server. The detection heuristics are visible in the page source.

Open the audit tool

The quarterly update log

One email per quarter when the prompt library changes: what was added, what was retired, what got sharper. No drip sequence, no course, no "value bombs." That's the whole deal.

Get the update log →