Catch Unhappy Customers Before They Churn

AI agents analyze every inbound email for sentiment, tag urgent messages, and route escalations — before a frustrated customer becomes a lost account.


Why this matters

Support teams process hundreds of emails daily. A frustrated customer who waited three weeks for a refund sits in the same queue as a routine status check. By the time a human spots the tone, the customer has already posted a negative review or initiated a chargeback. Sentiment signals are present in every email — the problem is reading them at scale, in real time.


How MultiMail solves this

MultiMail delivers inbound emails as structured markdown via webhook and stores them queryable via API. An AI agent subscribes to inbound webhooks, reads the email body via read_email, scores sentiment using your model of choice, and calls tag_email to mark urgency level. High-negative emails can trigger an immediate reply draft or route to a priority queue — all without human intervention on the detection step.

1

Receive inbound email via webhook

MultiMail fires an inbound webhook to your endpoint when a message arrives. The payload includes the email ID, sender, subject, and a markdown-rendered body. Your agent receives this event and kicks off the analysis pipeline immediately — no polling required.

2

Fetch full email content

Use read_email to retrieve the full message body. MultiMail stores emails as clean markdown, stripping HTML noise and normalizing quoted reply chains — so your model sees the customer's actual words, not a soup of style tags and duplicate quoted text.

3

Score sentiment and detect escalation signals

Pass the subject and body to your LLM or classification model. Classify tone as positive, neutral, negative, or hostile. Also detect escalation language: mentions of refunds, cancellations, legal threats, or explicit time pressure. The normalized markdown makes token usage predictable.

4

Tag the email for priority handling

Call tag_email with structured labels like sentiment:negative and urgency:critical. Tags are queryable via the API, so your support tooling, dashboards, and routing rules can filter on sentiment without any schema changes or separate data stores on your end.

5

Route or escalate high-priority messages

For emails classified as high-urgency or hostile, trigger downstream actions: draft a holding reply via reply_email for human review, push a Slack alert, or move the thread to a priority queue. The agent handles detection autonomously; humans handle resolution language.


Implementation

Inbound webhook handler with sentiment scoring
typescript
import express from 'express';
import Anthropic from '@anthropic-ai/sdk';

const app = express();
app.use(express.json());

const MM_API_KEY = process.env.MM_API_KEY!;
const BASE_URL = 'https://api.multimail.dev';
const anthropic = new Anthropic();

async function readEmail(emailId: string): Promise<{ subject: string; body: string; from: string }> {
  const res = await fetch(`${BASE_URL}/emails/${emailId}`, {
    headers: { Authorization: `Bearer ${MM_API_KEY}` },
  });
  if (!res.ok) throw new Error(`read_email failed: ${res.status}`);
  return res.json();
}

async function tagEmail(emailId: string, tags: string[]): Promise<void> {
  await fetch(`${BASE_URL}/emails/${emailId}/tags`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${MM_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ tags }),
  });
}

async function scoreSentiment(text: string): Promise<{ label: string; score: number }> {
  const msg = await anthropic.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 64,
    messages: [{
      role: 'user',
      content: `Classify the sentiment of this email. Reply with JSON only, no prose: {"label": "positive|neutral|negative|hostile", "score": 0.0-1.0}\n\n${text}`,
    }],
  });
  const raw = msg.content[0].type === 'text' ? msg.content[0].text : '{}';
  return JSON.parse(raw);
}

app.post('/webhooks/inbound', async (req, res) => {
  const { email_id } = req.body;
  res.sendStatus(200); "cm">// ack immediately before processing

  try {
    const email = await readEmail(email_id);
    const sentiment = await scoreSentiment(`Subject: ${email.subject}\n\n${email.body}`);

    const tags: string[] = [`sentiment:${sentiment.label}`];
    if (sentiment.label === 'hostile' || sentiment.score > 0.8) {
      tags.push('urgency:critical');
    } else if (sentiment.label === 'negative') {
      tags.push('urgency:high');
    }

    await tagEmail(email_id, tags);
    console.log(`[${email_id}] from=${email.from} label=${sentiment.label} score=${sentiment.score} tags=${tags.join(',')}`);
  } catch (err) {
    console.error(`[${email_id}] sentiment pipeline failed:`, err);
  }
});

app.listen(3000, () => console.log('webhook listener up on :3000'));

Express handler that receives MultiMail inbound webhooks, fetches the full email, scores sentiment, and tags urgency in real time.

Batch backfill with check_inbox
python
import os
import json
import requests
import anthropic

MM_API_KEY = os.environ[&"cm">#039;MM_API_KEY']
BASE = &"cm">#039;https://api.multimail.dev'
HEADERS = {&"cm">#039;Authorization': f'Bearer {MM_API_KEY}'}

client = anthropic.Anthropic()

def check_inbox(mailbox: str, limit: int = 50) -> list[dict]:
    r = requests.get(
        f&"cm">#039;{BASE}/mailboxes/{mailbox}/inbox',
        headers=HEADERS,
        params={&"cm">#039;limit': limit}
    )
    r.raise_for_status()
    return r.json()[&"cm">#039;emails']

def read_email(email_id: str) -> dict:
    r = requests.get(f&"cm">#039;{BASE}/emails/{email_id}', headers=HEADERS)
    r.raise_for_status()
    return r.json()

def tag_email(email_id: str, tags: list[str]) -> None:
    requests.post(
        f&"cm">#039;{BASE}/emails/{email_id}/tags',
        headers=HEADERS,
        json={&"cm">#039;tags': tags}
    )

def score_sentiment(text: str) -> dict:
    msg = client.messages.create(
        model=&"cm">#039;claude-haiku-4-5-20251001',
        max_tokens=64,
        messages=[{
            &"cm">#039;role': 'user',
            &"cm">#039;content': f'Classify sentiment. Reply JSON only: {{"label": "positive|neutral|negative|hostile", "score": 0.0-1.0}}\n\n{text}'
        }]
    )
    return json.loads(msg.content[0].text)

def process_untagged(mailbox: str) -> None:
    emails = check_inbox(mailbox)
    untagged = [
        e for e in emails
        if not any(t.startswith(&"cm">#039;sentiment:') for t in e.get('tags', []))
    ]
    print(f&"cm">#039;{len(untagged)} untagged emails in {mailbox}')

    for email in untagged:
        full = read_email(email[&"cm">#039;id'])
        text = f"Subject: {full[&"cm">#039;subject']}\n\n{full['body']}"
        sentiment = score_sentiment(text)

        tags = [f"sentiment:{sentiment[&"cm">#039;label']}"]
        if sentiment[&"cm">#039;label'] == 'hostile' or sentiment['score'] > 0.85:
            tags.append(&"cm">#039;urgency:critical')
        elif sentiment[&"cm">#039;label'] == 'negative':
            tags.append(&"cm">#039;urgency:high')

        tag_email(email[&"cm">#039;id'], tags)
        print(f"  {email[&"cm">#039;id']} → {tags}")

if __name__ == &"cm">#039;__main__':
    process_untagged(&"cm">#039;[email protected]')

Poll the inbox for emails missing a sentiment tag and run scoring on them — useful for backfilling history or recovering from webhook gaps.

MCP tool sequence for sentiment triage
python
import os
import anthropic

client = anthropic.Anthropic()

SYSTEM = """
You are a support triage agent. For each inbound email you process:
1. Use read_email to fetch the full message body.
2. Assess sentiment: positive, neutral, negative, or hostile.
   Pay attention to the subject line — it often carries strong signal.
3. Use tag_email to apply two tags: &"cm">#039;sentiment:<label>' and 'urgency:<level>'.
   urgency levels: low (positive/neutral), high (negative), critical (hostile or score >0.85).
4. For any email tagged urgency:critical, use reply_email to draft a short holding reply
   that acknowledges the issue without making promises. Set requires_approval=true.
5. After each email, log: email ID, sender, sentiment label, score, and tags applied.
"""

USER_PROMPT = """
Triage the latest 10 emails in [email protected].
Apply sentiment and urgency tags to all of them.
Draft holding replies for any emails classified as hostile or urgency:critical.
"""

response = client.beta.messages.create(
    model=&"cm">#039;claude-sonnet-4-6',
    max_tokens=4096,
    system=SYSTEM,
    messages=[{&"cm">#039;role': 'user', 'content': USER_PROMPT}],
    tools=[
        {
            &"cm">#039;type': 'mcp',
            &"cm">#039;server_label': 'multimail',
            &"cm">#039;server_url': 'https://mcp.multimail.dev/mcp',
            &"cm">#039;headers': {'Authorization': f'Bearer {os.environ["MM_API_KEY"]}'},
            &"cm">#039;allowed_tools': ['check_inbox', 'read_email', 'tag_email', 'reply_email'],
        }
    ],
    betas=[&"cm">#039;mcp-client-2025-04-04'],
)

for block in response.content:
    if hasattr(block, &"cm">#039;text'):
        print(block.text)

Using MultiMail's MCP server in an agent loop to read emails, apply sentiment tags, and draft holding replies for hostile messages — all in one agent turn.

Weekly sentiment trend report
python
import os
import json
import requests
from collections import defaultdict
from datetime import datetime, timedelta, timezone

MM_API_KEY = os.environ[&"cm">#039;MM_API_KEY']
BASE = &"cm">#039;https://api.multimail.dev'
HEADERS = {&"cm">#039;Authorization': f'Bearer {MM_API_KEY}'}

def get_emails_by_tag(mailbox: str, tag: str, since: str) -> list[dict]:
    r = requests.get(
        f&"cm">#039;{BASE}/mailboxes/{mailbox}/inbox',
        headers=HEADERS,
        params={&"cm">#039;tag': tag, 'since': since, 'limit': 500}
    )
    r.raise_for_status()
    return r.json()[&"cm">#039;emails']

def sentiment_report(mailbox: str, days: int = 7) -> dict:
    since = (
        datetime.now(timezone.utc) - timedelta(days=days)
    ).strftime(&"cm">#039;%Y-%m-%dT%H:%M:%SZ')

    counts: dict[str, int] = {}
    sender_negatives: dict[str, int] = defaultdict(int)

    for label in (&"cm">#039;positive', 'neutral', 'negative', 'hostile'):
        emails = get_emails_by_tag(mailbox, f&"cm">#039;sentiment:{label}', since)
        counts[label] = len(emails)
        if label in (&"cm">#039;negative', 'hostile'):
            for e in emails:
                sender_negatives[e[&"cm">#039;from']] += 1

    total = sum(counts.values()) or 1
    return {
        &"cm">#039;mailbox': mailbox,
        &"cm">#039;period_days': days,
        &"cm">#039;total_emails': total,
        &"cm">#039;sentiment_breakdown': {
            k: {&"cm">#039;count': v, 'pct': round(v / total * 100, 1)}
            for k, v in counts.items()
        },
        &"cm">#039;at_risk_senders': [
            {&"cm">#039;sender': s, 'negative_count': c}
            for s, c in sorted(sender_negatives.items(), key=lambda x: -x[1])
            if c >= 2
        ],
    }

if __name__ == &"cm">#039;__main__':
    report = sentiment_report(&"cm">#039;[email protected]', days=7)
    print(json.dumps(report, indent=2))

Query tagged emails over a rolling time window to compute sentiment distribution and surface at-risk senders for proactive CSM outreach.


What you get

Zero-lag detection

Sentiment scoring runs on every inbound email via webhook — no batch delays. A frustrated customer gets flagged within seconds of sending, not hours later when a human eventually scans the queue.

Queryable structured tags

tag_email writes labels like sentiment:negative and urgency:critical that your existing tooling can filter on. No ETL pipeline, no separate sentiment database — the signal lives with the email and is accessible via the same API.

Predictable token usage

MultiMail normalizes emails to clean markdown before your agent reads them, stripping HTML noise and deduplicating quoted reply chains. You're not burning tokens on boilerplate — just the customer's actual words.

Trend visibility before churn

Aggregate sentiment tags over a rolling window to surface accounts sending multiple negative emails in a week. That pattern is a churn signal your CSM team can act on before a cancellation request lands.

Human control on outbound language

Monitored mode lets the agent tag and route autonomously. For holding replies to hostile emails, gated_send keeps a human in the loop on the specific language used — detection is automated, escalation tone is not.


Recommended oversight mode

Recommended
monitored
Sentiment tagging and urgency routing are low-risk, fully reversible actions — a mislabeled tag can be corrected without customer impact. Monitored mode lets the agent operate at full speed on every inbound message while keeping your team informed of critical classifications. For outbound replies drafted in response to hostile emails, pair monitored tagging with gated_send on the reply step so a human reviews the specific language before it reaches the customer.

Common questions

How does MultiMail handle threading when analyzing sentiment across a conversation?
Use get_thread to retrieve the full conversation history before scoring. Sentiment in a single message can be misleading — a terse 'fine' reads differently in a thread that started three weeks ago with an unresolved refund request. Passing the full thread to your model gives more accurate classification and avoids false positives on short replies.
Can I run sentiment analysis on historical emails, not just new ones?
Yes. check_inbox returns paginated results with optional tag and date filters. Query for emails that don't have a sentiment: tag, then run them through the same scoring pipeline. Batch in groups of 50–100 emails per request to stay within comfortable read limits.
Which model should I use for sentiment scoring?
claude-haiku-4-5-20251001 is fast and cost-effective for classification-only tasks where you need a label and a confidence score. For nuanced escalation detection — legal threats, chargeback intent, regulatory complaints — claude-sonnet-4-6 is more reliable. Prompt for structured JSON output rather than free text; it's more stable and avoids parsing failures.
How do I reduce false positives on urgency:critical tags?
Combine label and score thresholds: apply urgency:critical only when label is 'hostile' AND score > 0.8, not on label alone. Also, always include the subject line in your scoring prompt — subjects like 'Re: Still waiting for refund' carry strong signal that the body alone may underweight on short messages.
Can sentiment tags trigger downstream automation in other tools?
Yes, via tag-change webhooks. Configure a webhook in MultiMail to fire when urgency:critical is applied. Your downstream system — Zendesk, Linear, PagerDuty, Slack — receives the event and can create a ticket, page on-call, or send an alert without polling the API. The webhook payload includes the email ID, tags, and sender.
Are derived sentiment labels subject to GDPR data retention rules?
Under GDPR Article 5(1)(e), metadata derived from personal data shares the same retention obligation as the source. If you delete emails after 30 days, sentiment tags stored in MultiMail are deleted with the email — no separate cleanup needed. If you export sentiment scores to an external analytics store, apply the same retention policy there.

Explore more use cases

The only agent email with a verifiable sender

Email infrastructure built for AI agents. Verifiable identity, graduated oversight, and a 50-tool MCP server. Formally verified in Lean 4.