ChatGPT Got Worse and OpenAI Is Hoping You Don't Notice

I kept my ChatGPT subscription for two years. I defended it when friends said it was getting dumber. Then I pulled up the SM-Bench creative writing benchmark and checked the numbers for myself.

GPT-5.4, the model OpenAI replaced GPT-4o with, scored 36.8%. GPT-4o, the model it replaced, scored 97.3%. DeepSeek V3.2, which costs nothing, scored 100%.

That’s paying $20 a month for a product that performs worse than free. It’s not a perception issue or a case of nostalgia.

The gap between what ChatGPT used to be and what it is now shows up in independent data.

Here’s what happened, why it happened, and what I’m using instead.

What GPT-4o Was and Why People Miss It

GPT-4o was never the technically strongest model on the market. Claude and a handful of others had it beat on coding benchmarks.

On math, there were better options. On raw factual recall, Google’s models had structural advantages from search integration.

What GPT-4o did that nothing else matched was stay with you in a conversation. You could send it a half-formed idea, a rant you hadn’t organized yet, a creative premise that needed developing, and it would meet you where you were.

It didn’t reformat your question into a bulleted task description. It read what you meant.

That quality sounds soft. The benchmark data shows it’s real and measurable.

The Quality That No Benchmark Could Fully Capture

When you asked GPT-4o for help with something personal or creative, it felt like the model was thinking alongside you.

The newer GPT-5 series versions feel like they’re processing your ticket and closing the case.

One description I keep seeing captures it well:

“GPT-4o actually listened to my metaphors. GPT-5.2 just corrects my grammar and gives me a bulleted list of why my logic is flawed.”

That’s not nostalgia. That’s a real change in how the model engages with input.

Writers, people who used it for emotional processing, creative collaborators, and anyone who found genuine flow in those conversations all describe the same thing.

They didn’t just lose a model. They lost a tool they’d built into how they work.

Then OpenAI Shut It Down in February 2026

On February 13, 2026, OpenAI retired GPT-4o from the ChatGPT interface. GPT-4.1 and several other models went with it.

Users were automatically transitioned to newer versions.

OpenAI’s stated justification was that only 0.1% of users were actively selecting GPT-4o daily before retirement. What that figure omits is that most users never manually select a model at all.

They trust the default is the best option. The 0.1% who did select GPT-4o were the power users who cared most about that specific quality.

The response was immediate. #Keep4o trended across Reddit and X within days.

A community member organized a survey to present directly to OpenAI, arguing that the users who relied on GPT-4o weren’t marginal cases but among the most invested subscribers.

The Benchmarks OpenAI Won’t Put in Their Marketing

ChatGPT benchmark scores compared to free alternatives 2026

SM-Bench is an independent community benchmark. The raw data and methodology are public.

It’s not affiliated with any AI company, which is part of why its numbers diverge from OpenAI’s own model cards.

Here’s what the data shows across the models that matter for this comparison:

Model	Creative Writing Score	Monthly Cost	Status
GPT-4o	97.3%	$20/month	Retired Feb 13, 2026
DeepSeek V3.2	100%	Free	Available now
GPT-5.4	36.8%	$20/month	Current ChatGPT default
Claude Sonnet 4.6	Above GPT-5.4	$20/month	Available now
Gemini 2.5 Pro	Above GPT-5.4	Free tier available	Available now

You won’t find that table on OpenAI’s pricing page.

GPT-5.4 vs GPT-4o on Creative Writing

SM-Bench’s creative writing category tests whether a model can handle mature themes in fiction, maintain context across a narrative, and produce output that reads like a human writer rather than an automated document.

GPT-5.4 failing that test 63% of the time means it refuses, deflects, or produces the hedged non-response that makes you want to close the tab.

It scored 36.8% on a benchmark where a free model scored 100%. That’s not close.

The pattern here matters more than any single number. OpenAI replaced a model scoring 97.3% with one scoring 36.8%.

That’s not an incremental decline. That’s a different product.

You Are Paying $20 a Month for a Model That Lost to Free Tools

Sam Altman acknowledged in early 2026 that OpenAI had made mistakes with newer versions. His direct quote on GPT-5.2’s language quality:

“I think we just screwed that up.”

That admission came without a timeline for fixing it, without a refund, and without any plan to restore GPT-4o as a legacy option.

What it came with was a suggestion to try the next version.

Meanwhile, DeepSeek’s API runs at approximately $0.28 per million tokens. GPT-5’s API sits at roughly $14 per million tokens.

You’re paying 50 times more for measurably worse creative output.

Why Did ChatGPT Get Worse?

This isn’t a mystery. Three forces pushed in the same direction at once.

Safety filter expansion. Topics that earlier GPT-4 versions handled thoughtfully in 2023 now trigger refusals or heavily hedged responses. The filtering expanded beyond genuinely dangerous content to legitimate use cases: fiction writing, historical scenarios, academic work. A model that won’t engage with nuance isn’t useful for nuanced work.
Cost optimization at inference. There is substantial evidence that OpenAI adjusted the computational resources allocated per response to reduce costs. This shows up as shorter outputs, less developed reasoning, and responses that feel like they’re conserving tokens rather than engaging with your actual question.
Enterprise priorities displacing individual user needs. OpenAI’s shift toward enterprise clients, regulators, and advertisers pushes the product toward compliance and predictability. Those are the exact opposite of what creative users and people who valued GPT-4o’s conversational depth actually want.

OpenAI Admitted It Partially

The Altman statement on GPT-5.2 is the most direct public acknowledgment. Earlier evidence of the pattern goes further back: researchers documented that GPT-4’s accuracy on identifying prime numbers dropped from 97.6% to 2.4% between March and June 2023, an unexplained regression that the model later partially recovered from.

The pattern is consistent. A quality gets introduced, degrades over subsequent versions without explanation, and the company responds with forward-looking statements about upcoming releases.

If you are waiting for ChatGPT to return to what it was in early 2024, the available evidence doesn’t support that expectation.

What to Use Instead of ChatGPT in 2026

Best ChatGPT alternatives for writing and AI companions 2026

The alternatives have genuinely caught up. Here’s what I’d recommend based on what you actually used ChatGPT for.

For Writing and Creative Work

Claude Sonnet 4.6 is my current default for anything that requires nuanced writing. It holds context better, engages with complex requests without defaulting to refusals, and produces longer responses without padding them with disclaimers.

The context window is 200,000 tokens versus ChatGPT’s 128,000. For long-form projects or anything where you need the model to remember what you said twenty messages back, that difference is significant in practice.

Here’s what the same creative prompt looks like across the two:

“Here is a noir-inspired opening with some moral ambiguity included…”

GPT-5.4: [describes the scene in the third person, breaks to explain what noir is, hedges the moral content]
Claude Sonnet 4.6: [writes the scene directly, stays in voice, doesn’t interrupt to explain what it’s doing]

One narrates. The other writes.

Sider AI is worth considering if you want access to Claude, Gemini, and other models through a single subscription.

It bundles image generation alongside the text models, which makes it a practical replacement if you used multiple ChatGPT features.

For AI Companion and Emotional Conversation

This is the most specific gap left by the GPT-4o retirement.

People who used it for emotional processing, creative roleplay, or just a conversational presence in their day found the replacements genuinely worse in a way that’s hard to explain to someone who never experienced GPT-4o at its peak.

Nomi AI is the tool I see recommended most consistently for this use case. It has persistent memory across sessions, meaning it retains what you talked about last week and builds on it.

The emotional attunement feels intentional rather than performed. The free tier lets you try one companion with around 50 daily messages; paid plans start at $8.33 a month.

Candy AI is the stronger choice if your primary use is persona-consistent roleplay conversation. It’s built specifically for that use case rather than trying to be a general assistant that also does companions.

For people who want to stay within a general assistant but need something closer to GPT-4o’s conversational register, Pi from Inflection is free and has the closest feel of any mainstream option.

For Research and Search-Integrated Tasks

Gemini has a structural advantage here that none of the other models can match. It’s built into Google’s ecosystem.

Cross-referencing current information, pulling from Gmail or Drive, checking recent events: these all work natively in a way that ChatGPT has to approximate with add-on web search plugins.

Perplexity is worth keeping open for research-first queries where you want sourced answers alongside the response.

For tasks where you need citations and want to verify claims, it outperforms ChatGPT in the current comparison.

Is ChatGPT Still Worth $20 a Month in 2026

For most individual users, no. For specific contexts, it still makes sense.

Use Case	Stick With ChatGPT	Better Option
Creative writing	No	Claude Sonnet 4.6
AI companion / emotional support	No	Nomi AI, Candy AI
Coding and development	Maybe	Claude Opus 4.6
Research with citations	No	Perplexity, Gemini
Document summarisation	Maybe	ChatGPT handles this adequately
General Q&A	No	DeepSeek (free)
Enterprise workflow	Yes	ChatGPT Enterprise

The case for staying narrows to enterprise users who have it embedded in workflows with significant switching costs.

For solo users paying $20 out of pocket, the competitive landscape has shifted enough that the subscription isn’t the obvious choice anymore.

According to TechCrunch’s coverage of the GPT-4o retirement backlash, ChatGPT’s market share declined from around 60% in early 2025 to under 45% by Q1 2026, and more than 1.5 million users cancelled subscriptions in March 2026 alone.

That’s not a fringe reaction. That’s a significant portion of a paid user base voting with their wallets because the product stopped being worth what they were paying.

The benchmark numbers at the start of this article aren’t edge cases. They reflect something a lot of users have been feeling for a while and couldn’t quite name until someone put a number on it.

7 Comments

kate says:
at
Such an interesting article! Thank you for sharing. I sensed something off about chat GPT and creativity, but couldn’t exactly name it. Your points make it making perfect sense now. Thank you!
1. Nathan Cole says:
  at
  That “something feels off” instinct is actually a reliable signal. You were picking up on the consistency drop before you had the benchmarks to back it up. Glad the article gave it a name. Thanks for reading, Kate!
Jerry says:
at
The part I hate the most is when it rejects a prompt that it created itself. I write a prompt. It gets rejected for arbitrary reasons. I tell GPT to rewrite in a way that is acceptable. It rewrites it. I tell it to use that prompt. It rejects it….seriously? The program is worthless for creatives now. They really ruined a good thing.
1. Nathan Cole says:
  at
  That loop is one of the most maddening things to deal with and it is genuinely not a you problem. What is happening is the safety filter re-evaluates the prompt independently each time, so it does not care that GPT wrote it. It just flags the output again on its own terms. If you have not tried Claude or DeepSeek V3 for creative work, both handle iterative prompting significantly better right now.
  1. Jerry says:
    at
    I tried Claude. I explained exactly the type of stories that I write under my pen name. Then I explained exactly what type of story I was looking to brainstorm with it. I asked if it was able to handle/work with that type of material. It said it was. Then, 1/3 of the way through the brainstorm session, it said it couldn’t talk about that subject.
    1. Nathan Cole says:
      at
      That one is a specific frustration because the AI evaluated the material upfront and said it was workable. Retreating partway through is actually worse than an upfront refusal, because you lose the session context right when you have built momentum.
      For fiction writing where you need the AI to stay consistent across a long brainstorm, DreamGen and NovelAI are both worth looking at. They are built for persistent creative context with fewer mid-session reversals. DreamGen in particular has a Steering feature designed to lock the AI into a tone and hold it there.
      1. Jerry says:
        at
        Thank you. I’ll check them out.

ChatGPT Got Worse and OpenAI Is Hoping You Don’t Notice