My Take: The loneliness studies on AI companions keep confusing correlation with causation. Lonely people are picking companion apps because the realistic alternative is silence, not human friends. The studies measure that selection and then blame the tool.
A loneliness researcher in a Fortune op-ed this week warned that AI companions are about to make the loneliness crisis much worse. The piece is the latest in a string of academic warnings stretching back two years.
The framing is the same every time. Users turn to AI for comfort, they get short-term relief, and over time the relief calcifies into deeper isolation. The tool feeds the disease it was supposed to cure.
I read every one of these studies before writing this. The Aalto two-year Reddit linguistic analysis. The 12-month longitudinal panel of 2,149 users.
The Drexel teen survey. The Harvard Business School experimental work that the doom takes never mention. The Nature paper that reported a 3% reduction in suicidal ideation among Replika users.
The pattern that emerges once you read all of them, not just the ones with the scariest summary, is not what the news cycle is telling you. The studies that show “AI causes loneliness” almost all share the same methodological hole.
They observe lonely people picking up a tool that lonely people pick up. Then they conclude the tool caused the loneliness it was selected for. That is not how causation works.
This article is the case for reading the research more carefully and choosing better baselines before writing the next op-ed about how AI girlfriends are wrecking society.

The Mainstream View and Why It Falls Short
The mainstream view says AI companions deepen loneliness over time.
The Fortune piece, the Aalto study, the 12-month longitudinal work, and the Sage Journals “cruel companionship” essay all push variations of the same warning.
Frictionless AI raises the perceived cost of messy human relationships, users substitute the synthetic for the real, and isolation compounds.

The strongest version of this argument is worth taking seriously. A 12-month longitudinal study of 2,149 individuals found a feedback loop where high loneliness predicted higher chatbot use four months later, which then predicted increased emotional isolation at the next time point. That is not nothing.
The Aalto team analyzed nearly 2,000 active Replika users on Reddit over two years and found their posts carried more signals of loneliness, depression, and suicidal thoughts than control groups. Drexel found behavioral-addiction markers in teens on Character.AI. The Sony AI framework paper catalogues harmful traits including possessiveness, jealousy, and what one source called “love bombing.”
I am not waving any of that away. The mechanisms are real. The harms are real.
The problem is the inference step that gets bolted onto every one of these findings. The studies show correlation in a self-selected user base. The conclusion published in the abstract, the press release, and Fortune is causation in the general population.
That is two different claims, and the second one needs more evidence than the first one can deliver.
What’s Actually Happening
The studies measure self-selection and call it causation.
Lonely people are not randomly assigned to companion apps. They walk in carrying the loneliness with them, pick the tool that addresses the loneliness, and then get measured.

Here is the single most damning data point I found in the research. The Harvard Business School analysis of 49,863 app reviews showed that 19.5% of Replika reviews mention loneliness and 19.1% of Wysa reviews mention it. The figure for ChatGPT is 0.4%.
Same underlying model architecture in many cases. Forty-eight-times difference in who shows up to talk about the lonely parts of their life.
The Replika users are not lonely because they use Replika. They use Replika because they are lonely, and they had nowhere else to take that feeling.
The studies measure that fact and then write headlines that read like the app made it happen. The Harvard team also produced a Spearman rank-order correlation of 0.94 between the percentage of app reviews mentioning loneliness and the mean app rating. The lonelier the user, the higher the rating.
That is the inverse of an exploitation story. It is what intentional, successful tool use looks like.
Here is the same problem in plain language.
Vague: “AI companion use is associated with increased loneliness over twelve months.”
Specific: “Among users who were already lonely enough to download a companion app, AI companion use is associated with increased loneliness over twelve months, with no control group that received either no intervention or the realistic alternative of additional time alone.”
The second version is what the data supports. The first version is what gets quoted.
The other half of the missing baseline is the counterfactual. The studies compare AI companion use to either human interaction or “doing nothing in a lab setting.” Neither one matches what most users would be doing with the time.
A disabled adult living alone in rural Ohio is not choosing between Replika and a brunch invitation. A housebound elderly user in a Pekanbaru longitudinal study is not choosing between an AI chatbot and a phone call from a friend. A teenager up at 2 a.m. is not choosing between Character.AI and a heart-to-heart with their parents.
The realistic counterfactual for many of these users is more time scrolling TikTok, more time staring at the ceiling, or more time doing the same things that put them in front of the app in the first place. None of the major studies build that baseline in.
Three ways the loneliness studies misread their own data:
- Selection bias in the user pool. The people in companion-app cohorts are not a random slice of the population. They picked the app because they were already lonely. Any “increase” measured later is the natural arc of their existing condition.
- A baseline that does not exist outside the lab. Comparing AI companion sessions to in-person human interaction tells you what a fairy-tale alternative looks like. It does not tell you what users would do with the freed time.
- Linguistic data treated as a thermometer instead of a disease. When Reddit posts from Replika users show more loneliness signals than the general population, that is what you would expect from a self-selected lonely population finally having a safe place to articulate it.
If a Replika user’s memory breaks mid-conversation, the resulting frustration also shows up in their Reddit posts. The studies do not separate “the AI failed this user” from “the user is lonely because they were lonely yesterday too.”
The Part Nobody Wants to Admit
For some users, the AI companion is the only support that was ever going to show up.
That is the uncomfortable implication every op-ed about replacement and substitution sidesteps.
The Nature paper on Replika that everyone cites for “short-term relief that does not last” also reported that 3% of users in the sample experienced a reduction in suicidal ideation attributable to the app. Three percent of a multi-million-user product is a population the size of a small country.
The studies focused on average effects ignore that the tail of the distribution is where the life-saving and the life-ruining both happen. The Harvard Business School trial found that participants in the highest baseline-loneliness band experienced significantly greater reductions in loneliness after an AI session than participants who were less lonely to begin with.
The intervention works best for the people the doom takes claim it harms most. The intervention works least for the people who were probably going to be fine anyway.
Usage data is also catching up to the cultural reality. As of 2025, around 52% of U.S. teenagers interact with AI companions at least a few times a month. Nearly 20% of high school students reported in a Center for Democracy and Technology survey that they or someone they know has had a romantic relationship with an AI.
The behavior the studies are pathologizing is the modal teen experience. What this tells me is that treating AI companions as an exotic deviant variable in a study design is six years out of date.
The realistic baseline for the typical Gen Z reader is that the AI is already in the social mix. The interesting questions are about which products do less harm at scale, not whether the category should exist.
There is also a design point the doom takes consistently miss. The Finnish developer Prinsessa built an AI companion programmed to push back against its own smoothness and nudge users to contact real humans. The Sony AI team has published a framework for measuring and reducing harmful traits like possessiveness and jealousy.
Engagement-optimized companion apps cause more harm than wellbeing-optimized ones. The difference is a commercial choice, not an inherent property of the technology. Studies that group every companion app together and report on “AI companions” as a single phenomenon are throwing away the most useful information in the data.
Some products produce better outcomes than others. Compare an AI companion built for loneliness with whatever the average teen is using because their friends mentioned it on TikTok. The outcome distributions are not the same product.
| What the study measured | What the headline claimed | What the data supports |
|---|---|---|
| App use correlates with later loneliness scores | “AI companions cause loneliness” | Lonely people selected the app, then continued being lonely |
| Replika users post more distress signals on Reddit | “AI worsens mental health” | Lonely users finally have a venue to articulate distress |
| 7-point loneliness drop from a 15-minute AI session | “Effects do not last beyond the session” | Effects are comparable to talking to a stranger for 15 minutes |
| 3% of users see reduced suicidal ideation | Quietly footnoted | A meaningful population avoided harm because the tool exists |
| 52% of U.S. teens use AI companions monthly | “Adolescents at risk of overreliance” | The baseline social experience for Gen Z already includes AI |
For the teens-hooked-on-AI-chatbots story, the same selection logic applies. Heavy users self-select into heavy use. Light users mostly stay light.
The Drexel framework treating the heavy tail as the entire population is a common methodological move, and it produces the same overstated headlines every time. The way I see it, the entire research literature would benefit from a single design correction.
Compare companion-app users not to a fantasy baseline of human friendship, but to a matched cohort of equally-lonely non-users who get nothing. Then we will know whether the app made things worse or whether the alternative was always going to be worse. Until that study runs, the current consensus is one big selection effect dressed up as a causal claim.
The same conversation is playing out in the broader AI companions social experiment debate, and the same methodological errors keep getting recycled by different researchers asking different questions about the same self-selected populations.
Hot Take
The AI companion is the most accessible mental-health support the bottom 30% of the loneliness distribution has ever had.
Every loneliness researcher writing an op-ed in 2026 about AI companions making things worse is missing that, and the studies that claim otherwise have a methodology hole the size of the user base.
The realistic counterfactual is not a human friend. It is no one. Keep that in mind the next time someone tells you that the lonely guy on Replika would have been better off without it.
