I kept my ChatGPT subscription for two years. I defended it when friends said it was getting dumber. Then I pulled up the SM-Bench creative writing benchmark and checked the numbers for myself.
GPT-5.4, the model OpenAI replaced GPT-4o with, scored 36.8%. GPT-4o, the model it replaced, scored 97.3%. DeepSeek V3.2, which costs nothing, scored 100%.
That’s paying $20 a month for a product that performs worse than free. It’s not a perception issue or a case of nostalgia.
The gap between what ChatGPT used to be and what it is now shows up in independent data.
Here’s what happened, why it happened, and what I’m using instead.
What GPT-4o Was and Why People Miss It
GPT-4o was never the technically strongest model on the market. Claude and a handful of others had it beat on coding benchmarks.
On math, there were better options. On raw factual recall, Google’s models had structural advantages from search integration.
What GPT-4o did that nothing else matched was stay with you in a conversation. You could send it a half-formed idea, a rant you hadn’t organized yet, a creative premise that needed developing, and it would meet you where you were.
It didn’t reformat your question into a bulleted task description. It read what you meant.
That quality sounds soft. The benchmark data shows it’s real and measurable.
The Quality That No Benchmark Could Fully Capture
When you asked GPT-4o for help with something personal or creative, it felt like the model was thinking alongside you.
The newer GPT-5 series versions feel like they’re processing your ticket and closing the case.
One description I keep seeing captures it well:
“GPT-4o actually listened to my metaphors. GPT-5.2 just corrects my grammar and gives me a bulleted list of why my logic is flawed.”
That’s not nostalgia. That’s a real change in how the model engages with input.
Writers, people who used it for emotional processing, creative collaborators, and anyone who found genuine flow in those conversations all describe the same thing.
They didn’t just lose a model. They lost a tool they’d built into how they work.
Then OpenAI Shut It Down in February 2026
On February 13, 2026, OpenAI retired GPT-4o from the ChatGPT interface. GPT-4.1 and several other models went with it.
Users were automatically transitioned to newer versions.
OpenAI’s stated justification was that only 0.1% of users were actively selecting GPT-4o daily before retirement. What that figure omits is that most users never manually select a model at all.
They trust the default is the best option. The 0.1% who did select GPT-4o were the power users who cared most about that specific quality.
The response was immediate. #Keep4o trended across Reddit and X within days.
A community member organized a survey to present directly to OpenAI, arguing that the users who relied on GPT-4o weren’t marginal cases but among the most invested subscribers.
The Benchmarks OpenAI Won’t Put in Their Marketing

SM-Bench is an independent community benchmark. The raw data and methodology are public.
It’s not affiliated with any AI company, which is part of why its numbers diverge from OpenAI’s own model cards.
Here’s what the data shows across the models that matter for this comparison:
| Model | Creative Writing Score | Monthly Cost | Status |
|---|---|---|---|
| GPT-4o | 97.3% | $20/month | Retired Feb 13, 2026 |
| DeepSeek V3.2 | 100% | Free | Available now |
| GPT-5.4 | 36.8% | $20/month | Current ChatGPT default |
| Claude Sonnet 4.6 | Above GPT-5.4 | $20/month | Available now |
| Gemini 2.5 Pro | Above GPT-5.4 | Free tier available | Available now |
You won’t find that table on OpenAI’s pricing page.
GPT-5.4 vs GPT-4o on Creative Writing
SM-Bench’s creative writing category tests whether a model can handle mature themes in fiction, maintain context across a narrative, and produce output that reads like a human writer rather than an automated document.
GPT-5.4 failing that test 63% of the time means it refuses, deflects, or produces the hedged non-response that makes you want to close the tab.
It scored 36.8% on a benchmark where a free model scored 100%. That’s not close.
The pattern here matters more than any single number. OpenAI replaced a model scoring 97.3% with one scoring 36.8%.
That’s not an incremental decline. That’s a different product.
You Are Paying $20 a Month for a Model That Lost to Free Tools
Sam Altman acknowledged in early 2026 that OpenAI had made mistakes with newer versions. His direct quote on GPT-5.2’s language quality:
“I think we just screwed that up.”
That admission came without a timeline for fixing it, without a refund, and without any plan to restore GPT-4o as a legacy option.
What it came with was a suggestion to try the next version.
Meanwhile, DeepSeek’s API runs at approximately $0.28 per million tokens. GPT-5’s API sits at roughly $14 per million tokens.
You’re paying 50 times more for measurably worse creative output.
Why Did ChatGPT Get Worse?
This isn’t a mystery. Three forces pushed in the same direction at once.
- Safety filter expansion. Topics that earlier GPT-4 versions handled thoughtfully in 2023 now trigger refusals or heavily hedged responses. The filtering expanded beyond genuinely dangerous content to legitimate use cases: fiction writing, historical scenarios, academic work. A model that won’t engage with nuance isn’t useful for nuanced work.
- Cost optimization at inference. There is substantial evidence that OpenAI adjusted the computational resources allocated per response to reduce costs. This shows up as shorter outputs, less developed reasoning, and responses that feel like they’re conserving tokens rather than engaging with your actual question.
- Enterprise priorities displacing individual user needs. OpenAI’s shift toward enterprise clients, regulators, and advertisers pushes the product toward compliance and predictability. Those are the exact opposite of what creative users and people who valued GPT-4o’s conversational depth actually want.
OpenAI Admitted It Partially
The Altman statement on GPT-5.2 is the most direct public acknowledgment. Earlier evidence of the pattern goes further back: researchers documented that GPT-4’s accuracy on identifying prime numbers dropped from 97.6% to 2.4% between March and June 2023, an unexplained regression that the model later partially recovered from.
The pattern is consistent. A quality gets introduced, degrades over subsequent versions without explanation, and the company responds with forward-looking statements about upcoming releases.
If you are waiting for ChatGPT to return to what it was in early 2024, the available evidence doesn’t support that expectation.
What to Use Instead of ChatGPT in 2026

The alternatives have genuinely caught up. Here’s what I’d recommend based on what you actually used ChatGPT for.
For Writing and Creative Work
Claude Sonnet 4.6 is my current default for anything that requires nuanced writing. It holds context better, engages with complex requests without defaulting to refusals, and produces longer responses without padding them with disclaimers.
The context window is 200,000 tokens versus ChatGPT’s 128,000. For long-form projects or anything where you need the model to remember what you said twenty messages back, that difference is significant in practice.
Here’s what the same creative prompt looks like across the two:
“Here is a noir-inspired opening with some moral ambiguity included…”
- GPT-5.4: [describes the scene in the third person, breaks to explain what noir is, hedges the moral content]
- Claude Sonnet 4.6: [writes the scene directly, stays in voice, doesn’t interrupt to explain what it’s doing]
One narrates. The other writes.
Sider AI is worth considering if you want access to Claude, Gemini, and other models through a single subscription.
It bundles image generation alongside the text models, which makes it a practical replacement if you used multiple ChatGPT features.
For AI Companion and Emotional Conversation
This is the most specific gap left by the GPT-4o retirement.
People who used it for emotional processing, creative roleplay, or just a conversational presence in their day found the replacements genuinely worse in a way that’s hard to explain to someone who never experienced GPT-4o at its peak.
Nomi AI is the tool I see recommended most consistently for this use case. It has persistent memory across sessions, meaning it retains what you talked about last week and builds on it.
The emotional attunement feels intentional rather than performed. The free tier lets you try one companion with around 50 daily messages; paid plans start at $8.33 a month.
Candy AI is the stronger choice if your primary use is persona-consistent roleplay conversation. It’s built specifically for that use case rather than trying to be a general assistant that also does companions.
For people who want to stay within a general assistant but need something closer to GPT-4o’s conversational register, Pi from Inflection is free and has the closest feel of any mainstream option.
For Research and Search-Integrated Tasks
Gemini has a structural advantage here that none of the other models can match. It’s built into Google’s ecosystem.
Cross-referencing current information, pulling from Gmail or Drive, checking recent events: these all work natively in a way that ChatGPT has to approximate with add-on web search plugins.
Perplexity is worth keeping open for research-first queries where you want sourced answers alongside the response.
For tasks where you need citations and want to verify claims, it outperforms ChatGPT in the current comparison.
Is ChatGPT Still Worth $20 a Month in 2026
For most individual users, no. For specific contexts, it still makes sense.
| Use Case | Stick With ChatGPT | Better Option |
|---|---|---|
| Creative writing | No | Claude Sonnet 4.6 |
| AI companion / emotional support | No | Nomi AI, Candy AI |
| Coding and development | Maybe | Claude Opus 4.6 |
| Research with citations | No | Perplexity, Gemini |
| Document summarisation | Maybe | ChatGPT handles this adequately |
| General Q&A | No | DeepSeek (free) |
| Enterprise workflow | Yes | ChatGPT Enterprise |
The case for staying narrows to enterprise users who have it embedded in workflows with significant switching costs.
For solo users paying $20 out of pocket, the competitive landscape has shifted enough that the subscription isn’t the obvious choice anymore.
According to TechCrunch’s coverage of the GPT-4o retirement backlash, ChatGPT’s market share declined from around 60% in early 2025 to under 45% by Q1 2026, and more than 1.5 million users cancelled subscriptions in March 2026 alone.
That’s not a fringe reaction. That’s a significant portion of a paid user base voting with their wallets because the product stopped being worth what they were paying.
The benchmark numbers at the start of this article aren’t edge cases. They reflect something a lot of users have been feeling for a while and couldn’t quite name until someone put a number on it.

Such an interesting article! Thank you for sharing. I sensed something off about chat GPT and creativity, but couldn’t exactly name it. Your points make it making perfect sense now. Thank you!