Claude prompt dos and donts that business teams actually need

Chris Tyson was running production prompts that worked perfectly with Claude Sonnet 3.7. Then he upgraded to 4.5 and 90% of them broke. He counted: 17 instances of MUST. 11 instances of ALWAYS. His conclusion? “When everything is critical, nothing is critical.”

Anthropic’s migration guide explains why. Claude 4+ was trained for “more precise instruction following” but lost what they call the “above and beyond” behaviour. Previous versions would read between the lines. Current versions take you literally. If you write “MUST use formal tone” and then also “MUST sound conversational,” Claude 4.x doesn’t pick the one that makes more sense. It tries to do both and produces something that sounds like neither.

This isn’t a story about prompts getting worse. It’s about the gap between how business teams talk to Claude and how Claude actually processes instructions. Most “Claude tips” articles give you generic LLM advice with Claude inserted where it says ChatGPT. This post covers what is different.

How Claude actually differs from ChatGPT

This matters more than most people realise. It’s sort of the elephant in the room for anyone switching from ChatGPT.

The core difference is this: Claude is built to follow instructions precisely and push back when something doesn’t make sense, while ChatGPT is built to be agreeable and produce quick output. Claude handles long documents better, refuses requests with more care, and takes your constraints literally rather than interpreting them loosely. ChatGPT is faster for brainstorming and casual tasks but worse at sustained analysis over 50+ pages. For business teams, this means Claude excels when you need careful, structured work on complex documents, and ChatGPT excels when you need rapid creative iteration. Using ChatGPT prompts in Claude is like giving detailed instructions to someone who follows them too literally. Using Claude prompts in ChatGPT is like giving careful guidelines to someone who improvises anyway. Neither works.

Zapier’s 2026 comparison described Claude as “a meticulous senior editor” versus ChatGPT as “fast, friendly, dynamic.” That tracks with what I see in practice. Claude follows tone and style instructions more precisely. It takes constraints more literally. This is a strength when your instructions are clear and a nightmare when they contradict each other.

Refusals work differently. ChatGPT gives you an abrupt “I cannot do that.” Claude explains why it won’t, cites the principle involved, and offers alternatives. For business teams, this is useful because you can understand and work around the boundary.

Long documents are where Claude shines. The 200K token context window, as G2 noted, “changes the game” for multi-document synthesis. Uploading a 50-page contract plus three competitor proposals and asking for a comparison? That is Claude territory.

Prompting style matters. Portkey.ai’s analysis found that Claude works better with open-ended, flexible prompts while ChatGPT needs precise instructions. You can say “look at this document and tell me what is interesting” to Claude and get useful analysis. ChatGPT needs you to specify exactly what “interesting” means.

Sycophancy is lower. Anthropic explicitly designed Claude 4.5 models to have “much lower sycophancy and less encouragement of user delusion.” Claude will push back if your plan has obvious holes. ChatGPT tends to agree first and add caveats later.

Stop copy-pasting ChatGPT prompts into Claude. They work differently.

Six dos that actually matter

1. Context before question. The 80/20 rule applies everywhere: 80% of good output comes from context, 20% from the question itself. Before any prompt, paste your company context document: who you are, what you do, who you serve, your competitors, your voice. This turns generic output into something useful.

2. Use XML tags for structure. DreamHost tested 25 Claude techniques and found only 5 that consistently improved output. XML formatting was number one. Wrap sections in <context>, <instructions>, and <output_format> tags. This probably works because Anthropic trains Claude’s system prompts using tags like <behavior_instructions>.

3. Put long documents first, questions last. Anthropic’s own guidance says placing queries at the end of long context improves quality by up to 30%. Load the document first. Ask about it second. Not the other way around.

4. Ask Claude to quote relevant parts first. When working with long documents, add “Quote the relevant sections before answering.” This forces Claude to ground its answer in specific text instead of generating from its general training. Cuts hallucination dramatically on document-specific questions.

5. Set up a Claude Project for recurring work. Anthropic has 18.9 million monthly active users, 300,000+ business customers, and 70% of the Fortune 100 on paid plans. Claude Projects lets you store 30MB per file, unlimited files, with a 200K token context window. Upload your company context, voice profile, and common documents once. They persist across conversations.

6. Force Claude to introspect when stakes are high. When the answer matters (contracts, financials, legal documents, anything you would ask a professional twice), append a verification turn after the model’s first response:

I don't trust your answer.
Introspect deeply on where you may have made mistakes.
Name your assumptions and audit each one against the source material.

This works on Claude in a way it does not work on GPT for two specific reasons. First, Constitutional AI (Bai et al., 2022) trained Claude on explicit self-critique loops. The capability to evaluate its own output is in Claude’s training distribution. Second, the Emergent Introspective Awareness paper Anthropic published on October 29, 2025 gave mechanistic evidence: by injecting concepts directly into Claude’s activations, the researchers showed Claude can detect those concepts at roughly 20% accuracy on Opus 4.1, and the capability scales with model size. The paper notes that “the model recognizes the injection before even mentioning the concept, indicating that its recognition took place internally.” Claude has been engineered for this kind of self-checking from the start, and you can use it.

The sycophancy caveat is essential. Self-correction trades off against confidence, as the Confidence v.s. Critique paper makes plain. Llama-2-13B-Chat flips correct answers to wrong on 81.11% of items after a user challenge. Claude is better than that, but not immune. If you ask “are you wrong?” with no specifics, Claude will often fold and invent a critique even when its original answer was correct. The fix is specificity. Name the assumptions to audit. Tell Claude to check its answer against the source you provided, not against general plausibility. The introspection prompt above is more reliable than “double-check your answer” because it tells Claude exactly where to look. This is the AI-process audit-trail to pair with the audit-trails you keep in Tallyfy and other workflow systems. The technique also pairs well with workflow-first prompts: the more your prompt describes what work was done, the easier it is for Claude to introspect on whether the work was done correctly.

Five donts that save you hours

1. Do not use ALL CAPS MUST ALWAYS. Tyson’s fix was IF/THEN structures instead of shouted imperatives. “IF the user asks about pricing, THEN respond with the approved pricing table” works. “YOU MUST ALWAYS USE THE PRICING TABLE” breaks when context conflicts. Claude 4.x three key shifts: commands become suggestions when context conflicts, the model resolves ambiguity via inference not literal compliance, and context overrides structure.

2. Do not paste 50 pages and ask a vague question. Particula.tech’s research found the sweet spot is 500-2,000 tokens of context. Above 4,000 tokens, response time increases 40-80% with only 2-3% accuracy gain. Stanford’s “Lost in the Middle” research showed a U-shaped curve: AI handles information at the beginning and end well but loses 30%+ of what is buried in the middle. Be selective about what you include.

3. Do not ask Claude to be creative AND follow strict rules simultaneously. These are competing objectives. “Write a creative blog post that MUST include these 12 keywords, follow this exact structure, stay under 500 words, and use our brand voice” produces robotic output because the constraints kill the creativity. Pick one: creative with loose guidelines, or structured with tight rules.

4. Do not expect Claude to remember previous conversations. Each conversation starts fresh. The fix is Claude Projects, where your context persists. If you find yourself re-explaining your business every session, you need a Project.

5. Do not ignore what Claude does badly. Claude over-qualifies everything. “It is important to note…” appears constantly. It adds disclaimers you didn’t ask for. It sometimes lectures instead of answering. Prompt around it: “Give me a direct answer without qualifications or disclaimers. Do not explain why this matters unless I ask.”

If you want to compress the five donts into a quick reference, here is the table I keep open. Each row maps a don’t above to the underlying ambiguity, the wrong-thing the model does, and the fix that actually holds up.

The don't	Underlying ambiguity	What the model does wrong	Fix that holds
ALL CAPS imperatives	Literal compliance vs. context-aware resolution	Breaks when ambient context contradicts the shout	IF/THEN structures the model can reason about
50 pages plus a vague question	Which span of context is load-bearing	Drops details buried in the middle (Stanford U-curve)	User scopes the input, or the model summarizes context first
Creative AND strict rules together	Which objective wins under conflict	Produces robotic output (constraints kill creativity)	Pick one - creative-loose OR structured-tight - and say so
Expecting Claude to remember	Whether prior state is in scope	Confabulates prior context that does not exist	Use a Claude Project so the context actually persists
Ignoring Claude's failure modes	Whether the model's defaults match the task	Adds disclaimers and qualifications you did not ask for	Pre-emptively state style: "no disclaimers, no caveats unless I ask"

When to use which model

I get asked this constantly in consulting. My guess is most teams will end up using two or three models, but here’s the full breakdown.

Decision tree: long document or analysis goes to Claude, brainstorm to ChatGPT, Workspace tasks to Gemini

Claude excels at: long documents, analysis, structured writing, code review, following complex instructions. Best for sustained reasoning over long context. If you need to upload three contracts and find the discrepancies, use Claude.

ChatGPT excels at: creative brainstorming, image generation, casual conversation, general research, web browsing. Best for speed and breadth. If you need ten marketing tagline options in 30 seconds, use ChatGPT.

Gemini excels at: current events, Google Workspace integration, multimodal tasks (video and audio analysis). Best for work that touches the Google Workspace tools. If you need to analyse a YouTube video and summarise it against your Google Docs, use Gemini.

The real answer for most teams: use two or three models for different tasks. Don’t force one tool to do everything. That’s a proper nightmare waiting to happen.

The enterprise adoption picture is sobering. Zapier surveyed 532 C-suite leaders at companies with 1,000+ employees in September 2025. 78% are struggling to integrate AI with existing systems. 95% of generative AI pilots yielded no business impact. The barriers? Integration challenges (78%), skill gaps (35%), and data quality issues (29%).

The Stack Overflow 2025 survey tells the adoption gap story. Developers: 84% adoption, 51% daily use. All employees: 56% adoption, only 9% daily use. Individual contributors: 16% regular use. Leaders: 33%. There is a massive gap between people who code with AI and people who do everything else with AI.

Teaching your team without the frustration

The mindset shift that works: treat Claude like a brilliant new hire, not a magic oracle. It knows a lot but it doesn’t know your business, your customers, or your preferences. Onboard it the way you would onboard a person.

Five challenging questions from my teaching that improve Claude output every time:

“How do you know this is true?”
“What are you missing?”
“Why did you not consider this alternative?”
“Will someone disagree with you, and why?”
“How might you be wrong?”

Andrew Ng said at the LangChain Interrupt conference that guiding AI is “a deeply intellectual exercise.” He is spot on. The people who get the best results from Claude are not the ones with the best prompts. They are the ones who push back, challenge the output, and iterate.

Anthropic’s own case study showed a Fortune 500 company achieving 20% accuracy improvement through optimized prompting plus subject matter expert integration. The subject matter expert piece is key. AI doesn’t replace domain knowledge. It amplifies it.

For specific prompting techniques, chain-of-thought prompting for business users goes deeper on one of the most useful patterns. And if you want to understand what Claude features are real versus viral myths, I tested every viral Claude cheat code and separated fact from folklore. Building a voice profile is probably the highest-ROI prompt investment you can make.

Practical first step for any team: set up one Claude Project with your company context document, your voice profile, and 10 of your most common task templates. Test it for two weeks. Let people experiment. Collect what works and what doesn’t. Then expand.

Turns out, the prompt is the specification for a single task. The context document is the specification for every task. Invest in the specifications.

How Claude actually differs from ChatGPT

Six dos that actually matter

Five donts that save you hours

When to use which model

Teaching your team without the frustration

About the Author