n8n Email Automation: Build an AI Classifier and Autoresponder (2026)
Email volume keeps climbing, and businesses that manage it manually are spending hours every week on work that can be fully automated. n8n email automation lets you classify incoming messages by category, pull in the right context for each type, and draft accurate replies without any manual sorting.
This n8n email automation tutorial walks through a complete n8n email automation workflow example: a text classifier that routes emails into nine categories, a Google Sheets-powered context layer that gives the AI the right information for each category, and an AI agent that drafts the reply. You will also see how to set up evaluations so you can measure accuracy before you go live.
If you are new to n8n, the n8n for beginners guide covers the basics before diving into a workflow this size.
See also: n8n for beginners guide
What Is n8n Email Automation?
n8n email automation is the process of using n8n workflows to receive, classify, and respond to emails without manual intervention. Incoming emails trigger the workflow, an AI node categorizes the message, and a reply is drafted and sent based on rules or context you define. n8n email automation capabilities include text classification, context loading from Google Sheets, and AI-powered reply drafting, all configured without writing custom code.
The workflow covered here was built for a real community project and handles nine email categories: pricing, setup, security, HR, escalated sales, escalated support, escalated legal, spam, and misdirected. Each category pulls different context from Google Sheets before a reply is generated.
n8n email management automation handles the full cycle from inbound sorting to outbound reply, which is a significant step beyond basic email filtering. The full classifier and autoresponder stack means every incoming message gets a categorized, context-aware draft reply with no human in the loop required.
See also: n8n Gmail setup
How This n8n Email Automation Workflow Is Structured
This n8n email automation workflow has two entry points. A webhook trigger handles live production traffic. An evaluation trigger lets you run the workflow against a test data set during development, so you can measure accuracy before deploying.
Both triggers feed into edit fields nodes that standardize the three key fields: from, subject, and body. The webhook payload already uses these names cleanly. The test data set uses slightly different field names (from_email instead of from), so the edit fields node maps both into the same shape before anything else runs.
After standardization, a third edit fields node labeled ‘inputs for classification’ combines everything into a single clean object. This makes the downstream nodes simpler to configure because they always read from the same field names regardless of which trigger fired.
An IF node follows. It checks whether the email body is empty. If it is, the workflow routes to a ‘No Operation’ node and stops. This prevents null inputs from reaching the classifier and causing errors in production.
Setting Up the n8n Text Classifier Node
The Text Classifier node is the core of this workflow. It takes the incoming email text, evaluates it against your defined categories, and outputs which category the email belongs to. Each category the classifier routes to becomes a separate branch in your workflow.
To add it, search for ‘Text Classifier’ in the n8n node panel. Connect a chat model to the classifier node, then add your categories one by one. Each category has a name and an optional description. The description tells the model what belongs in that category, so be specific. Vague descriptions lead to misclassification.
The Nine Email Categories
The categories in this build are: pricing, setup, security, HR, escalated_sales, escalate_support, escalate_legal, spam, and misdirected. Each maps to a different branch, and most branches pull different spreadsheet data before the reply is drafted.
Two settings worth knowing: ‘Allow Multiple Cases to Be True’ should be off for email classification because each email should land in exactly one category. ‘One No Clear Match’ is available for real-world builds where ambiguous emails should fall through gracefully, but for structured test data sets it is usually left off.
The Text Classifier node also accepts a system prompt. Use this to reinforce the classification logic, set the tone for how categories should be applied, and handle edge cases the category descriptions alone might miss.
Testing the Classifier
Before wiring up the rest of the workflow, test the classifier in isolation. Paste a few example emails into the webhook body and run the workflow. Confirm that each email routes to the correct category output before building the context layer on top.
Getting the classifier right is the highest-priority step in the entire build. If an email lands in the wrong category, the wrong context gets loaded and the reply will be wrong. The accuracy of every downstream step depends on the classifier firing correctly.
Loading Category-Specific Context From Google Sheets
Once an email is classified, the workflow needs to load the right context before generating a reply. The approach here uses Google Sheets instead of a vector database. Each spreadsheet contains the information the AI agent needs for a specific category: pricing plans, escalation rules, security policies, product integrations, and so on.
The reason to use Google Sheets over a RAG setup comes down to maintainability. Non-technical teams can update a spreadsheet without understanding vector databases or embedding pipelines. When the AI reply is wrong, the fix is editing the spreadsheet. That is a much simpler feedback loop than updating a vector store.
This approach works well when spreadsheets are short. If the context documents grow to thousands of rows, a RAG architecture would be more appropriate. For most small to mid-size n8n email automation workflows, sheets stay manageable.
See also: n8n Google Sheets integration
Mapping Categories to Spreadsheets
Not every category needs the same spreadsheets. Pricing emails need the pricing plans document, the escalation rules for pricing, and the product integrations table. HR emails only need the HR escalation rules. Spam emails do not need any context at all.
Before building this section, map out which spreadsheets each category actually needs. This avoids sending irrelevant context to the LLM, which wastes tokens and can actually reduce reply quality by introducing noise.
In the workflow, each category branch has one or more Google Sheets nodes followed by a code node that transforms the rows into a single document_text string. Here is the JavaScript pattern used for each spreadsheet:
Handling Multiple Spreadsheets Per Category
When a category needs more than one spreadsheet, use a Merge node set to Append mode to stack all the spreadsheet outputs together. Then use a Summarize node to concatenate the document_text fields into a single string.
After the merge and summarize, an edit fields node standardizes the output to two fields: document_text and category. This is important because categories with a single spreadsheet skip the merge step entirely and already output document_text directly. Having two different standardization nodes (one for solo inputs, one for multi-input) ensures both paths produce the same field shape before they reach the AI agent.
Need Help Building AI Automations?
Join Our AI Community
Drafting the Email Reply With an AI Agent
After the context is loaded, all category branches feed into a single AI agent node that drafts the reply. Because every branch outputs the same document_text and category fields, no merge step is needed here. The paths converge naturally.
The AI agent receives three inputs: the original email body, the category label, and the concatenated document_text from the spreadsheets. Its job is to write a reply that is accurate to the documentation and appropriate for the email category.
Adding Customer Info as a Tool
A customer information spreadsheet can be attached to the agent as a tool. This lets the agent look up account details like plan tier, billing cycle, SLA terms, and any custom notes before writing the reply.
When adding tools to an n8n AI agent, always write a specific tool description. The description tells the agent when to use the tool and what it will return. A vague or empty description means the agent may skip the tool even when the information would improve the reply. Set the description to something like: ‘Use this tool when the email mentions a specific customer or account. Returns name, plan, billing cycle, SLA, and notes for that customer.’
See also: n8n AI agent node
Model Selection
For this type of workflow, a large context window is more important than raw reasoning speed. The context you are feeding in (pricing tables, escalation rules, integration lists) can be several thousand tokens per request. A model with a 1 million token context window handles this without truncation.
Claude Sonnet works well here because it handles long context well and produces clean, professional email drafts. GPT-4o is a reasonable alternative. For production use, run a few test emails through both and compare the output quality against your expected reply format.
Testing and Evaluating Your n8n Email Workflow
Building the workflow is only part of the job. Before this goes to production, you need to know how accurately it classifies emails and how good the replies are. The evaluation trigger and Set Metrics node handle both.
Basic Category Evaluation
The simplest evaluation checks only whether the email landed in the correct category. In the Set Metrics node, select the Categorization option. Wire the expected_category field from your test data set and the output_category from your classifier into the node. If they match, the score is 1. If not, it is 0.
Run your full test data set through this evaluation first. Focus on getting category accuracy as high as possible before worrying about reply quality. An email in the wrong category gets the wrong context, which means the reply will be wrong regardless of how good the agent is. Category accuracy is the foundation.
Advanced Correctness Evaluation
Once category accuracy is solid, upgrade to a full correctness evaluation. The advanced option in the Set Metrics node uses an LLM to score both the category and the reply together. Wire the expected category plus expected reply into the ‘expected answer’ field, and the actual category plus actual reply into the ‘actual answer’ field.
The evaluation prompt should be explicit. Include a rule that an email in the wrong category automatically scores zero, regardless of reply quality. This matches how the scoring actually works in production: if the classifier fails, the whole response fails.
Filtering for Single-Row Testing
During development, use a Filter node between the evaluation trigger and the rest of the workflow. Set it to pass only the email_id you want to test. This lets you isolate a single row, run the workflow, check the output, and fix issues before running the full data set.
A common debugging pattern: if the classifier is routing a specific email wrong, test that row in isolation, check the text classifier output, and refine the category description or system prompt until it routes correctly. Then remove the filter and run the full set.
Deploying Your n8n Email Automation to Production
When the n8n email automation workflow is tested and evaluation scores are acceptable, publish it. In n8n, click the Publish button in the workflow editor. This activates the production version of the workflow and makes the production webhook URL live.
When testing, you run the workflow from the editor using the test webhook URL. In production, any system sending emails to your workflow (a Zapier hook, an email parser, a form submission) should use the production URL. These are different URLs in n8n, so make sure external systems are pointed at the right one before going live.
After publishing, monitor the first batch of real emails carefully. Production traffic often surfaces edge cases that test data sets miss. Spam emails with unusual formatting, emails in multiple languages, or messages that span two categories will reveal where the classifier needs tuning. Update the category descriptions and system prompt as you learn from real traffic.
Frequently Asked Questions
How does the n8n text classifier node work?
The Text Classifier node in n8n passes each input item to a connected LLM along with the categories you define. The model evaluates the text and assigns it to the most appropriate category. You configure each category with a name and description, and the description is what the model uses to distinguish between categories. The node outputs to a separate branch for each category, letting you build different logic for each.
Can I use n8n to automatically reply to emails?
Yes. The standard pattern is: trigger on new email, classify it with the Text Classifier node, load any relevant context, then use an AI agent or basic LLM node to draft a reply, and finally send it with the Gmail or Send Email node. This guide covers the classifier and agent steps. For the Gmail send step, the n8n Gmail integration handles the actual sending.
Do I need a vector database for n8n email automation?
Not necessarily. For most small to mid-size email workflows, Google Sheets work well as the context source. Sheets are easier to maintain, easier for non-technical teams to update, and produce good results when the context documents are short. A vector database becomes the better choice when context documents are very large, frequently updated from many sources, or require semantic search to retrieve the most relevant chunks.
What AI model works best for email classification in n8n?
For classification accuracy, most modern models perform similarly. GPT-4o mini, Claude Haiku, and Gemini Flash are all fast and cost-effective for classification. For the reply drafting step where context can be thousands of tokens, prefer a model with a large context window. Claude Sonnet and GPT-4o both handle long-context email drafts reliably.
How do I test my n8n email workflow before going live?
Use the evaluation trigger along with a test data set spreadsheet containing example emails and their expected categories and replies. Wire the evaluation trigger through your workflow, then use the Set Metrics node to score output against expected values. Start with category-only scoring to validate the classifier, then add full correctness scoring once categories are accurate.
Is there an n8n email automation tutorial on YouTube?
Yes. The video embedded at the top of this page is a full n8n email automation tutorial on YouTube that walks through every node in the workflow, from dual triggers through the text classifier, Google Sheets context loading, AI agent setup, and evaluation scoring. The n8n email automation tutorial YouTube walkthrough is especially useful for following the workflow logic visually, since you can see each node connect and fire in real time.
Next Steps
The core pattern here extends to any business that receives high email volume with repeated question types. Pricing emails, support requests, HR inquiries, legal questions: if your team answers the same types of questions repeatedly, a classifier-and-autoresponder workflow removes the manual sorting and drafting work entirely.
Start with the text classifier and get category accuracy above 90% on your test set before building the context layer. Once categories are reliable, add the Google Sheets context nodes one category at a time. Test each category branch independently before connecting them all to the AI agent.
For production, prioritize the categories that get the most volume first. A workflow that handles your top three email types perfectly is more valuable than one that handles all nine types at 70% accuracy.
