n8n Compare Datasets Node: Find Differences Between Two Data Sources
The Compare Datasets node is one of n8n’s most underutilized tools. It takes two sets of data and tells you exactly what’s different between them — which records exist only in the first set, which exist only in the second, and which exist in both but have changed values. If you’ve ever needed to sync two systems, detect new or deleted records, or identify what changed between two snapshots, this is the node built for that job.
In this guide we walk through how the Compare Datasets node works, its output branches, configuration options, and practical workflow examples where it solves real synchronization and change-detection problems.
What the Compare Datasets Node Does
The Compare Datasets node takes two inputs — Input 1 (your “old” or “source” dataset) and Input 2 (your “new” or “target” dataset) — and compares them item by item based on a key field you specify. The node then routes each item to one of four output branches depending on what it found.
Output 1 contains items that exist only in Input 1 (records that have been deleted from Input 2). Output 2 contains items that exist only in Input 2 (new records added since Input 1). Output 3 contains items that exist in both inputs but where field values have changed (modified records). Output 4 contains items that are identical in both inputs (unchanged records). This four-way split gives you complete visibility into exactly what changed between your two datasets.
Configuring the Comparison Key
The most important configuration choice is the Fields to Match On — the field (or fields) that uniquely identify a record across both datasets. This is typically an ID field: a user ID, order number, product SKU, or email address. The Compare Datasets node uses this key to pair items from Input 1 with their counterpart in Input 2 before comparing field values.
You can match on multiple fields if no single field is a unique identifier — for example, matching on both first_name and last_name together when no ID field exists. The node treats all matched-on fields as the identity of the record; changes to non-key fields are what get reported as modifications in Output 3.
Output 1: Records Only in Input 1 (Deleted)
Output 1 catches records that were in your old dataset but are missing from the new one. In a sync context, these are records that have been deleted from the source system and should be removed from your destination. In a snapshot comparison context, these represent items that have disappeared since the last check.
Connect Output 1 to whatever logic handles deletions — marking records as inactive in your CRM, sending a “record removed” notification, logging the deletion to an audit table, or actually deleting the record from a downstream system. Having this as a dedicated branch means you never have to manually figure out what’s missing — the node does the set difference for you.
Output 2: Records Only in Input 2 (New)
Output 2 contains records that appear in the new dataset but weren’t in the old one — these are brand new additions. In a sync workflow, these are the records you need to create in your destination system. In a monitoring workflow, these represent new entries since the last check.
Connect Output 2 to a creation flow — insert new rows into a database, create new contacts in a CRM, send a “new record” alert, or trigger a welcome email. Because n8n gives you a dedicated branch for new records, you can apply different logic to new vs. updated records without any IF nodes or complex conditional routing.
Output 3: Changed Records
Output 3 is often the most valuable output — it contains records that exist in both datasets but where at least one field value has changed. The node outputs the version from Input 2 (the new version) along with metadata about which fields changed, so you can act specifically on the modifications.
Use Output 3 to update records in a destination system, trigger change notifications, log audit trails of what changed and when, or send targeted messages based on specific field changes. For example, if a customer’s email address changed, update it in every connected system. If a product’s price changed, notify relevant stakeholders.
Output 4: Unchanged Records
Output 4 contains records that appear in both datasets with identical values — nothing changed. Most workflows don’t need to act on unchanged records, but this output exists for completeness and for specific use cases like confirming that certain records are stable, logging sync confirmations, or troubleshooting why expected changes aren’t showing up.
In practice, it’s common to simply leave Output 4 unconnected unless your workflow specifically needs to process unchanged records. Not connecting an output branch in n8n is perfectly valid — the data just goes nowhere, and the workflow continues with the other branches that are connected.
Practical Workflow Examples
Here are real scenarios where Compare Datasets is the right tool. In a CRM sync: every hour, fetch all contacts from your CRM (Input 1, stored in a data table from the previous run) and fetch the current contact list fresh from an API (Input 2). Compare Datasets tells you who’s new (Output 2 → create in CRM), who’s changed (Output 3 → update in CRM), and who’s been deleted (Output 1 → deactivate in CRM). Store the fresh list back to the data table for next run.
In a price change monitor: scrape product prices daily and compare today’s prices (Input 2) against yesterday’s cached prices (Input 1). Output 3 shows every product with a price change — send these to a Slack channel or email alert. In an inventory audit: compare the inventory system’s records (Input 1) against a physical count spreadsheet (Input 2) to identify discrepancies — Output 1 shows items that appear to have gone missing, Output 2 shows unrecorded items, and Output 3 shows quantity mismatches.
