What’s Changed: Janitor AI now runs reasoning models like DeepSeek R1, DeepSeek V4, and GLM 5.2 by default, and they write a hidden thinking block before the reply. In longer chats that block leaks into the visible message. The cleanest fix is a provider setting that hides the reasoning, not a prompt that tries to kill it.
If Janitor AI keeps putting its thinking in the response, you are running a reasoning model, and the think box is bleeding into your roleplay.
People describe it the same way every time. A short chat looks fine, then a few hundred messages in, half the actual reply ends up trapped inside the thinking block even with its visibility turned off.
The reason this feels random is that it is tied to chat length, not to anything you did. Reasoning models write a private chain of thought first, then the answer, and on a long chat they sometimes forget to close that thinking section cleanly.
I will explain what the think box really is, why the toggle stops working as the chat grows, and the fix order that gets clean replies back without making your bot dumber. There is a real trap here where the obvious fix makes the writing worse, so the order matters.
What is a reasoning model: A reasoning model like DeepSeek R1 or GLM 5.2 writes a step-by-step thinking pass inside hidden tags before it writes the visible answer, which improves quality but adds a block that can leak.

Why Janitor AI Puts Its Thinking in the Response
Janitor AI puts its thinking in the response because reasoning models emit a hidden chain-of-thought inside think tags, and in long chats the model stops closing that tag cleanly, so the interface can no longer tell where the thinking ends and the reply begins.
It is a tag failure, not a settings failure.

The way I read the research, the think tags are structural markers, nothing more. An academic analysis of distilled DeepSeek R1 models found that the answer tokens pay almost no attention to the and tags themselves, they attend to the reasoning content between them.
That detail matters because it means stripping the tags after the fact is mechanically safe, the model does not need them to write the prose.
There is a length effect baked in. Models always attend heavily to the very start of the conversation, an attention-sink pattern that persists no matter how long the chat gets, and as the context fills the model gets sloppier about emitting a clean closing tag.
The academic study on DeepSeek R1 traces the whole reasoning-to-answer flow that breaks here.
The think box toggle only hides text the interface can label as thinking. When the closing tag goes missing, the interface treats your narrative as more thinking and hides or mangles it.
If your replies are also getting cut off, our guide on Janitor AI incomplete responses covers the related truncation side.
| Symptom | Likely cause | Fix |
|---|---|---|
| Short replies are clean, long chats leak | Model stops closing the think tag as context grows | Add a closing tag stop sequence or hide reasoning at the provider |
| Half the reply is stuck in the think box | Missing closing tag, interface mislabels narrative | Edit the response to pull the narrative out, then set a stop sequence |
| Hitting message limits fast | Verbose thinking burns output tokens | Lower reasoning effort or switch to a non-reasoning model |
| Replies got bland after a regex fix | Reasoning was suppressed, not hidden | Stop suppressing, hide instead |
| Thinking shows even with toggle off | Toggle cannot label unclosed reasoning | Use a provider that excludes reasoning from the response |
Why Hiding the Thinking Beats Killing It
Hiding the thinking is the right move because suppressing the reasoning entirely makes the model write worse, with accuracy on reasoning-heavy output dropping by up to about 10 percent when the thinking steps are forced off.
The thinking is doing real work even when you do not want to see it.
The way I see it, this is the trap most people fall into. They paste a regex or an out-of-character command to kill the thinking, the leak stops, and the replies quietly go bland and stubborn. Mechanistic testing shows answer tokens depend on the reasoning content to shape the final text, so cutting it off nerfs the bot mid-roleplay.
There is a clean distinction worth holding onto. Blocking the reasoning stops the model from thinking at all, while backgrounding it lets the model think fully and only sends you the final answer. Backgrounding is what you want, and several providers do it for you at the request level rather than in the chat.
What are reasoning tokens: Reasoning tokens are the words the model generates inside its thinking block, and they count toward your output limit even though you may never want to see them.
That token cost is the other reason long chats feel worse. The thinking block is often more detailed than the actual reply, so a reasoning model can chew through your message cap far faster than a plain model.
If the bot is also looping, the steps in our guide on stopping Janitor AI repeating responses stack neatly on top of this.
How to Stop Janitor AI Showing Thinking in Replies
The fix order is to hide reasoning at the provider level first, then add a closing-tag stop sequence, then edit any leaked block out by hand, and switch models only if the leak keeps returning.
Provider settings beat prompts because they act before the text ever reaches the chat.

What I would do first is hide the reasoning where it is generated, not where it is displayed. On OpenRouter you can exclude the reasoning from the response entirely, and on GLM models you can turn the thinking output off in the request. Those are the cleanest because they never let the block leak in the first place.
Before: A normal request to a reasoning model returns a long
block that streams straight into your roleplay, and a missing closing tag traps half the reply with it.After: Add
"reasoning": { "exclude": true }on OpenRouter (or setenable_thinkingto false on GLM), and only the clean narrative comes back while the model still thinks in the background.
Here is the order I would run, easiest and least destructive first.
- Turn on the think-box visibility toggle, which hides the clean cases on its own.
- Hide reasoning at the provider: on OpenRouter set the reasoning exclude option, on GLM set enable thinking to false.
- Add
as a Stop Sequence in Janitor AI so a leaked second thinking block gets cut off. - Lower the reasoning effort setting if your provider supports it, which shrinks the block instead of removing it.
- Edit the response with the pencil to pull any trapped narrative out of a leaked block.
- Switch to a non-reasoning model if the leak keeps coming back in long chats.
The decision table below is how I pick between methods rather than firing all of them at once.
| Method | What it does | Where to set it | Keeps quality |
|---|---|---|---|
| Think-box toggle | Hides reasoning the UI can label | Janitor AI chat settings | Yes, but fails on long chats |
| Reasoning exclude | Strips reasoning from the response | OpenRouter request settings | Yes, model still thinks |
| Disable thinking | Turns the GLM thinking output off | GLM chat template setting | Lower on complex scenes |
| Stop sequence | Cuts a leaked second block | Janitor AI stop sequences | Yes, surgical |
| Edit response | Pulls trapped narrative out by hand | The pencil on the reply | Yes, safe after the fact |
| Switch model | Avoids the thinking block entirely | Janitor AI model field | Different feel, clean prose |
One thing I would not do is regex or argue the thinking away, since that suppresses the reasoning and drags the writing down. For getting the most out of these backends without the leak, our DeepSeek V4 on Janitor AI guide covers what is working right now.
Which Models Skip the Thinking Block Entirely
The models that skip the thinking block are the non-reasoning ones, and they give clean prose by design with no think tags to leak.
If you never want to manage a reasoning block again, this is the surest route.
What I would reach for here are the plain chat models rather than the reasoning variants. DeepSeek V3 and the V3 0324 build write straight prose, and community roleplayers point to LLaMA 4 Maverick, LLaMA 3.1 Nemotron, and Dolphin 3.0 R1 Mistral as free or cheap options that never open a think tag.
The tradeoff is honest. A non-reasoning model will not plan a complex multi-character scene as tightly as a reasoning model that thinks first, so for heavy plotting you may still prefer to background the reasoning instead of dropping it.
For squeezing more chat out of whichever model you land on, the unlimited messages on DeepSeek guide is the companion piece.
When You Just Want Clean Replies With No Setup
If wrangling provider settings, stop sequences, and model names is the part you hate, a self-contained companion app gives clean replies with no reasoning trace and nothing to configure.
You trade open model choice for never seeing a think tag again.
I will be straight about the tradeoff. Janitor AI’s appeal is the open model and proxy choice, and for tinkerers the settings above are a one-time fix worth doing. For anyone who just wants to talk without touching a request body, a bundled app removes the entire reasoning-leak problem.
If that sounds like you, Candy AI runs its own model and never exposes a thinking block, so replies come back clean by default.
For users who want strong memory alongside that simplicity, Nectar AI is the other one I would point to. Both skip the proxy, the stop sequences, and the think-tag babysitting entirely.
Frequently Asked Questions
Why is Janitor AI showing the bot’s thinking in the reply?
You are using a reasoning model like DeepSeek R1 or GLM 5.2 that writes a hidden chain of thought before the answer. In long chats it stops closing the think tag cleanly, so the reasoning leaks into the visible message.
Why does the think-box visibility toggle stop working?
The toggle only hides text the interface can identify as thinking. When the model fails to write a closing think tag, the interface cannot tell where reasoning ends, so it mislabels your narrative and the block leaks.
Will removing the thinking make the bot write worse?
Yes, if you suppress it rather than hide it. Forcing the reasoning off can drop quality by around 10 percent because the answer depends on the thinking content. Hide or background the reasoning instead of killing it.
How do I get clean replies without losing quality?
Hide the reasoning at the provider level. On OpenRouter use the reasoning exclude option, on GLM disable thinking in the request, or add a closing think tag as a stop sequence. The model still thinks, you just stop seeing it.
Which models do not show a thinking block?
Non-reasoning models like DeepSeek V3, DeepSeek V3 0324, LLaMA 4 Maverick, and Dolphin 3.0 R1 Mistral write plain prose with no think tags. They are the simplest way to avoid the leak entirely.
Quick Takeaways
- Janitor AI shows its thinking because reasoning models write a hidden block and stop closing the tag in long chats.
- The think-box toggle fails because it cannot label reasoning the model never closed.
- Hide the reasoning at the provider with the OpenRouter exclude option or by disabling GLM thinking, do not suppress it with regex.
- Suppressing the thinking can cut reply quality by about 10 percent, so background it instead.
- For clean replies with zero setup, a companion app like Candy AI never exposes a thinking block.
