Mental Health Chatbots in Clinical Settings: What the Evidence Actually Supports


Mental health chatbots are having a moment. Woebot, Wysa, Youper, and a growing list of others are being used not just as consumer wellness apps, but increasingly in clinical settings — GP practices, community mental health services, hospital outpatient programs, and university counselling centres.

The promise is obvious. Australia is facing a mental health workforce crisis. Wait times for psychologists routinely stretch to 8-12 weeks in metropolitan areas and significantly longer in regional communities. The Better Access scheme expanded eligibility, but the supply of clinicians hasn’t kept pace with demand. Something needs to fill the gap while people wait.

But “something needs to fill the gap” is a poor basis for clinical decision-making. What does the evidence actually show about the effectiveness and safety of mental health chatbots in clinical contexts?

What These Tools Do

First, let’s be specific about what we’re discussing. Mental health chatbots in clinical settings fall into roughly three categories.

Structured therapeutic programs. These deliver evidence-based therapy content — typically cognitive behavioural therapy (CBT) — through a conversational interface. The user works through modules, completes exercises, and the chatbot provides feedback and encouragement. Woebot is the best-known example. The content is clinician-designed and the therapeutic approach is well-established; the chatbot is essentially an interactive delivery mechanism.

Symptom monitoring and triage. These chatbots check in with patients between clinical appointments, ask about symptoms, mood, sleep, and functioning, and flag deterioration to the treating clinician. They’re less about providing therapy and more about extending the clinician’s visibility into the patient’s day-to-day experience.

Crisis support. Some chatbots are designed to recognise signs of acute distress or suicidal ideation and respond with safety planning, grounding techniques, and escalation to human crisis services. This is the highest-stakes application and the one that generates the most controversy.

The Evidence For

The strongest evidence exists for structured CBT-based chatbots treating mild to moderate anxiety and depression.

A meta-analysis published in the Journal of Medical Internet Research covering 22 randomised controlled trials found that chatbot-delivered CBT produced small to moderate effect sizes for depression (Hedges’ g = 0.42) and anxiety (Hedges’ g = 0.38) compared to waitlist controls. These effects are smaller than face-to-face therapy but clinically meaningful, particularly for people who would otherwise receive no treatment.

Woebot has the most extensive evidence base, with multiple RCTs showing reductions in depression and anxiety symptoms over 2-8 week periods. Importantly, their studies have shown these effects in clinically referred populations, not just self-selected app users — which is a meaningful distinction.

For symptom monitoring, the evidence is less about therapeutic effect and more about clinical utility. Several Australian pilot programs have found that chatbot-based check-ins between appointments provide clinicians with richer data about symptom trajectories, improving treatment planning. Clinicians report being able to identify deterioration earlier and adjust treatment accordingly.

Engagement rates are also notable. People tend to be more honest with chatbots about certain topics — substance use, suicidal thoughts, medication non-adherence — than they are with human clinicians in face-to-face settings. This isn’t because the chatbot is better; it’s because the absence of perceived judgement lowers the barrier to disclosure.

The Evidence Against

The case against is substantial and deserves equal weight.

Limited efficacy for severe conditions. The evidence for chatbots treating moderate-to-severe depression, psychotic disorders, complex PTSD, personality disorders, or eating disorders is essentially non-existent. These are the conditions that constitute the bulk of public mental health caseloads. Deploying chatbots for conditions where there’s no evidence of benefit — or worse, potential for harm — is clinically irresponsible.

Dropout rates are high. Across studies, chatbot engagement drops off sharply after the first few sessions. A typical pattern: 70-80% of users complete session one, 40-50% reach session four, and fewer than 25% complete a full 8-12 session program. If the therapeutic benefit depends on completing the program, these dropout rates significantly limit real-world effectiveness.

The therapeutic relationship problem. Decades of psychotherapy research consistently show that the therapeutic alliance — the quality of the relationship between client and therapist — is one of the strongest predictors of treatment outcome, regardless of therapeutic modality. Chatbots, by definition, can’t form a genuine therapeutic relationship. They can simulate warmth and empathy through language, but the user knows they’re talking to software. For some people that’s fine. For others — particularly those with attachment difficulties or relational trauma — the absence of genuine human connection may be counterproductive.

Safety concerns are real. Several high-profile incidents have raised questions about chatbot safety. Cases where chatbots provided inappropriate responses to users expressing suicidal ideation, or where chatbots failed to recognise psychotic symptoms and continued delivering standard CBT content, illustrate the limits of current natural language understanding in high-stakes clinical contexts.

The Australian Context

Australia’s healthcare system creates specific conditions that shape how chatbots should be evaluated.

The Medicare-funded mental health system is built around a stepped care model. Mild conditions should receive low-intensity interventions; moderate conditions get structured psychological therapy; severe conditions need specialist services. In theory, chatbots fit neatly into the low-intensity step — and some Australian PHNs (Primary Health Networks) are beginning to fund chatbot programs at this level.

The challenge is that the stepped care model assumes accurate assessment and appropriate allocation. If someone with moderate-to-severe depression is directed to a chatbot because the wait for a psychologist is too long, that’s not stepped care — it’s rationed care dressed up as clinical strategy.

The Australian Digital Health Agency has published frameworks for evaluating digital mental health tools, but these are guidance documents rather than regulatory requirements. There’s no mandatory approval process for mental health chatbots equivalent to TGA approval for medical devices. A chatbot can be deployed in a clinical setting without demonstrating efficacy to any regulatory body.

A Responsible Path Forward

Here’s what I think a responsible approach looks like.

Use chatbots where the evidence supports them. For mild anxiety and depression, as an adjunct to (not replacement for) clinical care, the evidence is reasonable. For waitlist support — helping people manage symptoms while they wait for a clinician — the case is stronger still, because the alternative is nothing.

Don’t use them where the evidence doesn’t support them. Severe mental illness, complex presentations, active suicidality without immediate human backup — these are not appropriate contexts for chatbot intervention. The pressure to deploy technology because of workforce shortages doesn’t override clinical evidence.

Maintain clinician oversight. Chatbots in clinical settings should feed data back to treating clinicians. The chatbot monitors; the clinician decides. Autonomous chatbot-driven treatment decisions for clinical populations aren’t supported by current evidence.

Be honest with patients. People using mental health chatbots should understand clearly what they’re getting. It’s not therapy. It’s a structured self-help tool delivered through a conversational interface. The language around these tools matters — calling them “AI therapists” creates misleading expectations.

Invest in the workforce too. Chatbots are not a substitute for training more psychologists, psychiatrists, and mental health nurses. They’re a supplement during a workforce crisis. If the response to the mental health workforce shortage is “deploy chatbots” without simultaneously investing in training pipelines, we’ve made a structural problem worse while appearing to address it.

Mental health chatbots aren’t the solution to Australia’s mental health access crisis. They’re one small tool in what needs to be a much larger response. Used appropriately, with honest expectations and proper clinical governance, they can help. Used inappropriately — as a cheap substitute for the care people actually need — they risk causing real harm.