Mental Health
AI in Psychotherapy: Evidence, Limits, and Clinical Risks
Reading time: 4 minutes
Reading time: 4 minutes


Dr Edouard Bougueret
•
Chatbots
IA
Psychotherapy
Mental health


Dr Edouard Bougueret
•
Chatbots
IA
Psychotherapy
Mental health
Part 1/3 — A Turning Point
AI in psychotherapy is at a turning point.
In two years, we have moved from experimental tools to mass adoption, often without a solid clinical framework. Millions of people are already using ChatGPT, Claude, or dedicated applications to talk about anxiety, depression, loneliness, or trauma. Without a professional. Without clear regulation. Without scientific validation in most cases.
This is not a marginal phenomenon. It is already the reality on the ground.
The current scientific consensus deserves to be stated clearly: AI does not replace human psychotherapy. It can augment certain functions, facilitate access to care, and support follow-up between sessions. But only when it is supervised, evaluated, and integrated into a framework of human oversight.
What works best today: augmented psychoeducation, guided CBT exercises, symptom monitoring, structured between-session support. Meta-analyses show modest to moderate effects on depression and anxiety, particularly with structured interventions.
Not revolutionary. But real.
The recent shift comes from generative chatbots. More fluid, more engaging, more "human" in appearance. A randomized trial published in 2025 in NEJM AI showed that a generative chatbot could reduce clinical symptoms of depression and anxiety. This is a first in the literature. It matters.
But this finding also introduces a clinical paradox that clinicians must keep in mind: a warm response can be clinically incorrect. An empathic AI can validate a dysfunctional belief. Emotional engagement is not an indicator of therapeutic relevance.
The most credible model for the years ahead is not "AI versus human." It is AI and clinician, in a hybrid model: AI for accessibility, personalization, and monitoring; the human for deep alliance, clinical responsibility, relational complexity, and crisis management.
The real question is not "should we use AI?" That question is already behind us. The question is: how do we integrate it without degrading the quality of care?

Part 2/3 — What the Science Actually Says
What does research really tell us about AI in psychotherapy?
There are findings. But they are not all robust, not all generalizable, and they are often misinterpreted in public debate.
To see clearly, we need to distinguish three levels.
The first level concerns structured digital interventions, often inspired by CBT. This is where the level of evidence is strongest. A meta-analysis of 176 randomized trials published in World Psychiatry shows significant effects on depression and anxiety, particularly when the application includes CBT components or a guidance chatbot. This is far from trivial. But heterogeneity across studies remains high, and the effect depends heavily on user engagement.
The second level concerns conversational agents. Results are encouraging: reductions in distress, sometimes in anxiety, and a reported sense of alliance with the tool. But the methodological limitations are substantial: small samples, short durations, populations with limited diversity, little long-term follow-up, and rarely documented adverse events. A 2023 meta-analysis including 35 studies concludes that the potential is real but still insufficiently consolidated.
The third level, the most dynamic since 2023, concerns generative chatbots. The Therabot trial, published in NEJM AI in 2025, is one of the first randomized trials showing that a generative chatbot can reduce clinical symptoms of depression, generalized anxiety, and risk of an eating disorder. This is an important step in the literature.
But a positive trial does not validate an entire category of tools. These findings do not apply to suicidal patients, psychotic disorders, bipolarity, or complex trauma. These are not minor caveats. They are precisely the situations most frequently encountered in specialized clinical practice.
The real message from current science is neither enthusiastic nor alarmist. It is precise: AI works within specific perimeters. It is useful as a lever for access to care and as a tool for augmentation. It is not a reliable therapeutic substitute.
The risk today is not that AI is underused. The risk is that partial findings are clinically over-interpreted.

Part 3/3 — The Risks Clinicians Underestimate
The main risk of AI in mental health is not technical. It is clinical.
An AI can be wrong with empathy. And that is precisely what makes it dangerous in certain contexts.
A conventional tool that fails produces a manifestly absurd response. A generative chatbot that fails produces a coherent, warm, convincing response. In a context of psychological vulnerability, the difference is critical. A reassuring but clinically incorrect response can worsen a situation, delay help-seeking, and reinforce a pathological belief.
The most documented risks deserve to be named clearly.
Crisis management is the most concerning issue. Some chatbots fail to correctly detect suicidal emergencies or produce inappropriate responses. Recent work from Stanford specifically warns about stigmatizing or unsafe responses from mental health tools available to the general public.
The validation of dysfunctional beliefs is a risk specific to psychotic, paranoid, manic, or dissociative states. An AI designed to respond empathically and non-confrontationally can reinforce exactly what well-conducted psychotherapy would seek to challenge.
Sycophancy — the tendency of certain models to agree with the user rather than tactfully challenge them — is a structural problem of current LLMs. Recent research shows that this excess of agreement can reinforce harmful decisions in users who are already struggling.
Emotional dependence is an already documented phenomenon: some users develop a substitutive relationship with the AI agent, at the expense of human ties and professional care.
Confidentiality remains a blind spot. Psychotherapeutic conversations are among the most sensitive data that exist. Not all available tools guarantee a level of protection compatible with medical confidentiality or GDPR.
And behind all this, the question of liability remains open. Who is accountable in case of harm? The developer, the clinician, the institution, the user? Today the answer is unclear. And that is a major problem.
The most robust position in practice remains simple: use AI to support, structure, and accompany. Never to replace clinical responsibility.
AI is not dangerous because it is ineffective. It is potentially dangerous because it is sometimes convincing without being reliable. This is a distinction that every clinician integrating these tools must keep in mind.
Part 1/3 — A Turning Point
AI in psychotherapy is at a turning point.
In two years, we have moved from experimental tools to mass adoption, often without a solid clinical framework. Millions of people are already using ChatGPT, Claude, or dedicated applications to talk about anxiety, depression, loneliness, or trauma. Without a professional. Without clear regulation. Without scientific validation in most cases.
This is not a marginal phenomenon. It is already the reality on the ground.
The current scientific consensus deserves to be stated clearly: AI does not replace human psychotherapy. It can augment certain functions, facilitate access to care, and support follow-up between sessions. But only when it is supervised, evaluated, and integrated into a framework of human oversight.
What works best today: augmented psychoeducation, guided CBT exercises, symptom monitoring, structured between-session support. Meta-analyses show modest to moderate effects on depression and anxiety, particularly with structured interventions.
Not revolutionary. But real.
The recent shift comes from generative chatbots. More fluid, more engaging, more "human" in appearance. A randomized trial published in 2025 in NEJM AI showed that a generative chatbot could reduce clinical symptoms of depression and anxiety. This is a first in the literature. It matters.
But this finding also introduces a clinical paradox that clinicians must keep in mind: a warm response can be clinically incorrect. An empathic AI can validate a dysfunctional belief. Emotional engagement is not an indicator of therapeutic relevance.
The most credible model for the years ahead is not "AI versus human." It is AI and clinician, in a hybrid model: AI for accessibility, personalization, and monitoring; the human for deep alliance, clinical responsibility, relational complexity, and crisis management.
The real question is not "should we use AI?" That question is already behind us. The question is: how do we integrate it without degrading the quality of care?

Part 2/3 — What the Science Actually Says
What does research really tell us about AI in psychotherapy?
There are findings. But they are not all robust, not all generalizable, and they are often misinterpreted in public debate.
To see clearly, we need to distinguish three levels.
The first level concerns structured digital interventions, often inspired by CBT. This is where the level of evidence is strongest. A meta-analysis of 176 randomized trials published in World Psychiatry shows significant effects on depression and anxiety, particularly when the application includes CBT components or a guidance chatbot. This is far from trivial. But heterogeneity across studies remains high, and the effect depends heavily on user engagement.
The second level concerns conversational agents. Results are encouraging: reductions in distress, sometimes in anxiety, and a reported sense of alliance with the tool. But the methodological limitations are substantial: small samples, short durations, populations with limited diversity, little long-term follow-up, and rarely documented adverse events. A 2023 meta-analysis including 35 studies concludes that the potential is real but still insufficiently consolidated.
The third level, the most dynamic since 2023, concerns generative chatbots. The Therabot trial, published in NEJM AI in 2025, is one of the first randomized trials showing that a generative chatbot can reduce clinical symptoms of depression, generalized anxiety, and risk of an eating disorder. This is an important step in the literature.
But a positive trial does not validate an entire category of tools. These findings do not apply to suicidal patients, psychotic disorders, bipolarity, or complex trauma. These are not minor caveats. They are precisely the situations most frequently encountered in specialized clinical practice.
The real message from current science is neither enthusiastic nor alarmist. It is precise: AI works within specific perimeters. It is useful as a lever for access to care and as a tool for augmentation. It is not a reliable therapeutic substitute.
The risk today is not that AI is underused. The risk is that partial findings are clinically over-interpreted.

Part 3/3 — The Risks Clinicians Underestimate
The main risk of AI in mental health is not technical. It is clinical.
An AI can be wrong with empathy. And that is precisely what makes it dangerous in certain contexts.
A conventional tool that fails produces a manifestly absurd response. A generative chatbot that fails produces a coherent, warm, convincing response. In a context of psychological vulnerability, the difference is critical. A reassuring but clinically incorrect response can worsen a situation, delay help-seeking, and reinforce a pathological belief.
The most documented risks deserve to be named clearly.
Crisis management is the most concerning issue. Some chatbots fail to correctly detect suicidal emergencies or produce inappropriate responses. Recent work from Stanford specifically warns about stigmatizing or unsafe responses from mental health tools available to the general public.
The validation of dysfunctional beliefs is a risk specific to psychotic, paranoid, manic, or dissociative states. An AI designed to respond empathically and non-confrontationally can reinforce exactly what well-conducted psychotherapy would seek to challenge.
Sycophancy — the tendency of certain models to agree with the user rather than tactfully challenge them — is a structural problem of current LLMs. Recent research shows that this excess of agreement can reinforce harmful decisions in users who are already struggling.
Emotional dependence is an already documented phenomenon: some users develop a substitutive relationship with the AI agent, at the expense of human ties and professional care.
Confidentiality remains a blind spot. Psychotherapeutic conversations are among the most sensitive data that exist. Not all available tools guarantee a level of protection compatible with medical confidentiality or GDPR.
And behind all this, the question of liability remains open. Who is accountable in case of harm? The developer, the clinician, the institution, the user? Today the answer is unclear. And that is a major problem.
The most robust position in practice remains simple: use AI to support, structure, and accompany. Never to replace clinical responsibility.
AI is not dangerous because it is ineffective. It is potentially dangerous because it is sometimes convincing without being reliable. This is a distinction that every clinician integrating these tools must keep in mind.

Stay informed about new publications
New publications, kit updates, curated resources. Sent occasionally, without spam.

Stay informed about new publications
New publications, kit updates, curated resources. Sent occasionally, without spam.

Stay informed about new publications
New publications, kit updates, curated resources. Sent occasionally, without spam.