Uncategorized

How to Scrape Reddit with n8n: RSS Feeds and Apify Workflow Guide

How to Scrape Reddit with n8n: Two Powerful Approaches

Reddit is a treasure trove of data that businesses and creators rely on every day. It’s a place where people openly share problems, ask for recommendations, discuss products, and look for help — which makes it an incredible source for lead generation, market research, and community monitoring.

The problem? Reddit’s official API has become heavily restricted. Getting API access is now difficult and expensive, leaving most automation builders without a clean path to that data.

In this guide we’ll walk through two practical workarounds that let you pull Reddit data directly into n8n — no official API key required. The first is a lightweight, nearly free approach using RSS feeds via FetchRSS.com. The second is a more powerful approach using the Apify Reddit Scraper actor to pull rich, structured data at scale. We’ll also show how to classify posts by topic using an AI text classifier and generate AI-powered responses for community engagement.

Why Reddit’s API Is Effectively Off-Limits

A few years ago, connecting to Reddit’s API was straightforward — you registered an app, got a key, and could start pulling posts programmatically. That changed significantly when Reddit introduced strict rate limits, paywalls for higher-tier access, and tightened approval requirements for developer applications.

Today, getting API access for any automated scraping or monitoring use case is either extremely difficult to get approved, prohibitively expensive, or outright blocked for common use cases like competitive monitoring and lead generation.

This is where workarounds come in. Since Reddit’s public content is still viewable without authentication, there are clever ways to access that data — through RSS feeds and through third-party scraping services like Apify that handle the technical complexity for you.

Two Approaches: Which One Is Right for You?

Before diving into the builds, here’s a quick overview of what we’re covering and when to use each:

Approach 1 — FetchRSS (Free/Low Cost): Uses the FetchRSS.com service to convert any subreddit into an RSS feed that n8n can consume natively. Free for up to 5 posts per check, with paid tiers for more. Best for lightweight daily monitoring — getting a digest of the latest posts from a subreddit sent to Telegram, Slack, or email. Limited data fields.

Approach 2 — Apify Reddit Scraper ($45/month actor): Uses the Apify community node for n8n to run a professional Reddit scraper actor. Returns rich structured data including post body, username, upvotes, comment count, and more. Then classifies each post with an AI text classifier and routes it into categorized Google Sheets tabs. Best for lead generation, content research, and community monitoring where you need the full post data.

You can build both workflows and run them in parallel depending on your needs — use FetchRSS for quick daily digests and Apify for deeper weekly research runs.

Approach 1: FetchRSS.com — Lightweight Reddit Monitoring

FetchRSS.com is a service that generates RSS feeds from any web URL — including subreddits that don’t natively support RSS in the format n8n expects. It turns a subreddit page into a clean feed that you can plug directly into n8n’s RSS Trigger node.

Pricing on FetchRSS is very accessible:

Free: 5 feeds, 5 posts per feed, 24-hour update rate
Basic ($5/month): 15 posts per feed
Advanced ($10/month): 20 posts per feed
Professional ($15/month): 25 posts per feed

One thing to know about the free plan: Reddit typically pins 2–3 “weekly” or “megathread” posts at the top of any subreddit. Those eat into your 5-post limit, meaning you often only see 2–3 fresh posts per check. Upgrading to a paid plan gives you a much more useful window of new content.

To get started, sign up at fetchrss.com, paste in your subreddit URL (e.g. https://www.reddit.com/r/n8n/), and click “Get RSS.” FetchRSS will generate a unique feed URL tied to your account that you’ll use in n8n.

Building the RSS Trigger Workflow in n8n

Once you have your FetchRSS feed URL, setting up the n8n workflow is straightforward. Here’s how to build the Approach 1 workflow:

Step 1 — Add an RSS Feed Trigger node. In n8n, search for “RSS” in the node panel and select the RSS Feed Trigger (the one with the lightning bolt icon — this polls automatically). Set your poll interval to once per day (e.g., hour 14) and paste in your FetchRSS URL.

Step 2 — Run a test. Execute the trigger manually to confirm it pulls data. You’ll see the raw feed items come through — each one has a title, publish date, content, creator (username), and link to the original post.

Note on trigger vs. read node: n8n has two RSS nodes — the RSS Trigger and the RSS Read node. The trigger polls automatically on your schedule, which is the most efficient approach. Alternatively, you can use the RSS Read node connected to a Schedule Trigger if you want more control over when it runs.

Cleaning Up the Reddit Data

The raw RSS output from FetchRSS has a few issues you’ll want to fix before using the data downstream:

The title includes the username. FetchRSS formats the title as "username on: Post Title Here". To extract just the post title, use a Code node with a quick JavaScript split:

// Extract just the post title (remove the "username on: " prefix)
const cleanTitle = $json.title.split(' on: ')[1] || $json.title;

The content is messy HTML. The content field contains the raw Reddit HTML including upvote counts, subreddit links, and other junk. Use a Code node (or ask Claude to write a quick regex cleaner) to strip the HTML tags and keep just the text content.

The fields you actually need: After cleaning, use an Edit Fields node to keep only what matters: title (cleaned), publishDate, content (cleaned), and link (the direct Reddit post URL).

Routing and Delivering Your Reddit Posts

Once the data is clean, you have several options for what to do with it. Here are the most common patterns:

Daily Telegram digest: Add an AI Agent node (with a simple system prompt asking it to format the post as a brief summary) and then connect a Telegram node to send the formatted post to your phone or Telegram channel. This gives you a daily briefing of what’s trending in your subreddit.

Email newsletter: Route the cleaned posts into a Google Doc or email template. Great for sending a weekly “what’s happening in [community]” digest to your team.

Filtering before delivery: You don’t always want every post — just the relevant ones. Add a Filter node or an IF node before the AI agent to apply conditions. For example, filter only posts where the content contains certain keywords, or where the post is above a certain word count. You can also let the AI agent itself classify relevance (you’ll see a full example of this in Approach 2).

The key insight here is that FetchRSS + n8n gives you a lightweight automation layer on top of Reddit — you get daily awareness of what’s happening in any subreddit without needing official API access.

FetchRSS Limitations to Know About

The FetchRSS approach is fast to set up and cheap to run, but it has real constraints worth understanding before you rely on it:

Post count ceiling: Even on the Professional plan you’re capped at 25 posts per feed. For large subreddits with hundreds of daily posts, this is a small sample.

No post body on free plan: Depending on the subreddit, the content field may only include the post title and a truncated snippet rather than the full body text. For link posts, you get almost no content.

No comments data: FetchRSS only surfaces the post itself — there’s no way to get comments, upvote counts, or other engagement data through this method.

24-hour update lag: On the free plan, the feed only refreshes once per day. Paid plans update more frequently, but you’re still working within FetchRSS’s refresh cycle.

If you need more data, more posts, or richer metadata — that’s when Approach 2 becomes necessary.

Approach 2: Apify Reddit Scraper — Full Data at Scale

The Apify Reddit Scraper is a professional-grade web scraper that bypasses Reddit’s API restrictions entirely by scraping the public site directly. It returns a comprehensive dataset for each post — far more than what RSS feeds can provide.

This approach uses the Apify community node for n8n (the same one covered in the Apify Google Maps tutorial). If you haven’t installed it yet, go to Settings → Community Nodes and install @apify/n8n-nodes-apify. Requires n8n 1.57.0+ and Node.js 22.x+.

The Reddit Scraper actor on Apify costs $45/month as a subscription. This is a significant investment, but it gives you unlimited runs (subject to Apify’s platform credit system) with rich, structured data. There is a second, cheaper Reddit scraper available at $20 per run — but if you plan to run this daily, the $45/month subscription is far more cost-effective.

Apify does offer a free trial account with $5 in credits — enough to test the workflow before committing to a paid plan.

Setting Up the Apify Node in n8n

In n8n, add an Apify node and look for the operation “Run Actor and Get Dataset” — this is the operation that both launches the scraper and retrieves the results in a single step (no separate Get Dataset Items node needed for this use case).

Configure the node:

Credentials: Connect your Apify account via API key (paste your Personal API Token from Apify’s Settings → API & Integrations). OAuth2 is also available for n8n Cloud users.
Resource: Actor
Operation: Run Actor and Get Dataset
Actor: Search for “Reddit Scraper” in the actor dropdown and select the comprehensive one (the $45/month subscription actor)

The workflow can be triggered two ways — use a Manual Trigger for on-demand runs and a Schedule Trigger for automated daily or hourly runs. Connect both to the same Apify node using an OR merge, or maintain two separate workflow versions.

Configuring the Reddit Scraper Actor Input

The Reddit Scraper takes a JSON input that controls exactly what gets scraped. Here’s the configuration used in the workflow:

{
  "startUrls": [
    {"url": "https://www.reddit.com/r/n8n/top/?t=day"}
  ],
  "maxItems": 10,
  "skipComments": true,
  "skipUserPosts": false
}

Breaking down the key settings:

startUrls: The subreddit URL to scrape. Adding top/?t=day pulls the top posts from the last 24 hours — ideal for daily runs. Change to new/ to get the most recent posts instead.
maxItems: How many posts to retrieve. Start small (10–20) when testing. Remember Apify charges based on platform usage, so scraping 1,000 posts costs significantly more than 10.
skipComments: Set to true to skip scraping comment threads (much faster and cheaper — you only need post data for most use cases).

Dynamic date filtering: For production use, you can replace the hardcoded t=day with a dynamic expression that builds the URL based on today’s date — ensuring you only capture posts from the current day regardless of when the workflow runs.

What the Reddit Scraper Returns

One of the biggest advantages of Approach 2 over the RSS method is the richness of the data. Each post returned by the Apify Reddit Scraper includes:

id — Reddit’s internal post ID
url — Direct URL to the post
username — The posting user’s handle
title — The post title (clean, no formatting needed)
communityName — The subreddit name (e.g. “n8n”)
body — The full post body text
html — The raw HTML version of the body
link — External link if it’s a link post
numberOfComments — Total comment count
ratio — Upvote ratio (0–1)

Compare this to the RSS approach where you get a messy title and a partially-parsed content blob. The Apify data is structured, clean, and immediately usable for analysis or AI processing.

After the Apify node runs, use an Edit Fields node to trim down to just the fields you actually need: username, title, link, and body. This keeps downstream nodes from getting overwhelmed with unnecessary data.

Text Classifier: Sorting Posts by Category

With 10–25 posts per run, you don’t want to process all of them the same way. A Text Classifier node (available in n8n) uses an AI model to read each post and route it to different workflow branches based on its topic.

Connect the Text Classifier after Edit Fields and feed it the post’s title and body. Use a Gemini or GPT model (any works) and define your categories. For an n8n community monitoring use case, the categories might be:

Beginner — Someone new to n8n looking for tutorials, resources, or where to start
Workflow — Someone sharing a workflow they built (great for content inspiration)
Tips and Advice — Tips, tricks, or best practices shared by experienced users
Needs Help — Someone with a technical error or workflow problem (non-content)
Other — Anything that doesn’t fit the above categories

The classifier creates five output branches. Every post flows into exactly one branch based on its content. The “Other” category is particularly useful — if you notice many posts landing there over time, it’s a signal that you should create a new category to capture that type of content.

Tailor these categories to your specific use case. If you’re using this for lead generation in a product niche, your categories might be things like “Evaluation” (comparing tools), “Problem Report” (pain point that your product solves), “Competitor Mention,” and so on.

Writing Categorized Posts to Google Sheets

After the text classifier, each branch connects to a Google Sheets Append Row node that writes the post to the appropriate tab in your spreadsheet.

Set up your Google Sheet with five tabs — one per category. Each tab gets columns for: Title, Link, Up Votes (ratio), and Comments. This gives you a running log of what’s happening in the subreddit, organized by topic.

Important configuration note: use Append Row (not “Append or Update Row”) for this workflow. Since every post is new each time the workflow runs, there’s no row to match and update — you always want fresh rows appended. Using “Append or Update” without a match column will cause errors.

This categorized spreadsheet becomes a valuable research tool over time. You can see trends in beginner questions (what topics are people confused about?), track which workflows are being shared (content inspiration), and monitor which problems come up repeatedly (product feedback).

The Beginner Branch: AI-Powered Response Generation

The workflow goes one step further for posts in the “Beginner” category. Instead of just logging them to a spreadsheet, it generates a custom response for each post using an AI Agent — ready for you to post directly to Reddit.

After the text classifier routes a post to the Beginner branch, it flows into an AI Agent node configured with:

Model: Claude Sonnet or similar — you want a capable model that can synthesize resources and write helpful, natural-sounding replies
System Prompt: Instructs the agent to act as a helpful community member, analyze the question, search a resource spreadsheet for relevant content, and craft a 4-sentence reply
Tool: Google Sheets read access to a “Resources” tab containing your library of tutorial links, courses, and articles

The key insight here is the Google Sheets resource library. You build a simple spreadsheet with two columns: Topic and URL. You might include your YouTube tutorials, course links, documentation pages, and any other reference material. The AI agent searches this sheet when crafting its reply, ensuring responses include specific, relevant links rather than generic advice.

A Structured Output Parser extracts just the response text from the agent’s output, making it easy to pass cleanly to the next node.

Building Your Resource Library for the AI Agent

The AI Agent’s Google Sheets resource lookup is what separates a generic bot reply from a genuinely helpful one. Here’s how to set it up:

Create a “Resources” tab in your Google Sheet with columns: Topic (keywords describing what the resource covers) and URL (link to the resource). Populate it with your best content. Some examples for an n8n community:

Topic: “getting started, beginner, n8n basics” → URL: your 17-hour n8n course
Topic: “RAG, embeddings, vector database, AI memory” → URL: your RAG tutorial video
Topic: “webhooks, HTTP, API integration” → URL: your webhook tutorial
Topic: “Apify, web scraping, data collection” → URL: your Apify tutorial

The AI agent reads the post title and body, identifies the key topics being asked about, searches the Resources tab for matching rows, and incorporates the relevant URLs into a natural, helpful response.

Over time, as you see what questions come through the Beginner branch, you can expand your resource library to cover more topics. The result is a continuously improving response system that’s grounded in your actual content.

Storing and Managing Responses

The generated response for each beginner post gets written to a separate Google Sheet (or a new tab in the same sheet) with columns for: Title, Reddit Link, Generated Response, and Responded (a checkbox you manually tick when you’ve actually posted the reply on Reddit).

This gives you a simple queue of Reddit posts that need responses. Each morning you can open the sheet, see which new beginner questions came in overnight, review the AI-generated reply, tweak it if needed, and copy-paste it into Reddit.

The “Responded” column serves as your tracking mechanism — once you’ve posted a reply, mark it done. This prevents you from accidentally responding to the same post twice on subsequent workflow runs.

For high-volume use, you can also route the generated response to Telegram for immediate notification — paste the message into Telegram on your phone, copy it, and reply directly from Reddit mobile. This is particularly useful if you’re running the workflow hourly and want to engage while posts are still fresh.

Telegram Notifications (Optional)

Both approaches in this tutorial support Telegram notifications as an optional final step. Here’s when it makes sense:

If running hourly: Telegram notifications are genuinely useful. You get an instant ping whenever a relevant new post hits, allowing you to engage while the post is still gaining traction. Early, helpful replies tend to get upvoted much more than replies posted hours later.

If running once daily: Telegram notifications add less value since you’re already reviewing the spreadsheet each morning. The extra noise may not be worth it.

To set up Telegram in n8n: create a Telegram bot via BotFather, get your Bot Token and your personal Chat ID, add a Telegram node, configure the credentials, and set the message field to the formatted post or response text. The n8n Telegram node is one of the simplest integrations in the platform — setup typically takes under five minutes.

Scheduling and Running the Workflows

Both workflows are designed to run on an automated schedule. Here’s the recommended cadence for each:

Approach 1 (FetchRSS): Once daily. Reddit’s top posts don’t change so rapidly that you need more frequent checks, and FetchRSS’s feed refresh rate limits you anyway. A daily run at mid-morning (e.g., 9–10am) catches posts that gained traction overnight.

Approach 2 (Apify): Once daily for research and lead generation use cases. Hourly if you’re actively doing community engagement and want to reply to posts while they’re fresh. Keep in mind that more frequent Apify runs increase your platform credit usage — set maxItems lower (5–10) for hourly runs to keep costs manageable.

In n8n, the Schedule Trigger node lets you set your interval and specific time. Both workflows also include a Manual Trigger so you can kick off a run on demand — useful when you’re testing or when you want an immediate digest outside your normal schedule.

Use Cases for Reddit Scraping with n8n

The workflows in this tutorial are built around monitoring the n8n subreddit, but the same patterns apply to dozens of other use cases:

Lead Generation: If you sell a B2B product, monitor subreddits where your potential customers hang out. When someone posts asking for tool recommendations in your space, your AI agent can generate a personalized, helpful reply that naturally introduces your solution.

Competitor Intelligence: Monitor subreddits where your competitors are mentioned. Track feature requests, complaints, and comparisons to understand what your market cares about.

Content Research: Use the “Workflow” and “Tips” categories to surface content ideas. When you see the same question asked three times across different posts, that’s a video topic waiting to happen.

Customer Support Monitoring: If your product has a subreddit or frequently gets mentioned in related communities, monitor for support questions and generate draft responses for your team to review and post.

Market Research: Aggregate posts from multiple related subreddits into a single spreadsheet. Run trend analysis on what topics are gaining or losing traction over time.

Tips, Troubleshooting, and Cost Management

Control Apify costs from day one. The $45/month Reddit Scraper subscription gives you access, but you still pay Apify platform credits per run. Keep maxItems low (10–20) while testing. Scale up once you’ve confirmed the workflow does what you need. Always set a monthly budget cap in your Apify account settings.

Use “Append Row” not “Append or Update Row” in Google Sheets. For new posts, you always want to append fresh rows. “Append or Update” requires a column to match on — if you don’t configure it, you’ll get errors on every run.

The Other category is a signal, not a waste. Posts that land in “Other” tell you something important — your classifier doesn’t have the right categories yet. Save them, review them weekly, and expand your category list when you notice patterns.

FetchRSS pinned posts will always appear. Most subreddits keep 2–3 megathread or weekly discussion posts pinned at the top. These will appear in every RSS fetch. Add a Filter node to exclude posts with titles matching common pinned post patterns (e.g., “Weekly Thread,” “Megathread,” “Self-Promotion”).

AI responses need human review before posting. The generated responses are drafts — always read them before posting to Reddit. A response that references a slightly wrong resource or misunderstands the question can damage your credibility in the community more than no response at all.

Join Our AI Community

Get access to the JSON workflow files from this article, weekly live sessions, and a community of builders working through the same challenges. Everything is free and the community is active.

Join Now

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

How to Scrape Reddit with n8n: RSS Feeds and Apify Workflow Guide

Table of Contents

How to Scrape Reddit with n8n: Two Powerful Approaches

Why Reddit’s API Is Effectively Off-Limits

Two Approaches: Which One Is Right for You?

Approach 1: FetchRSS.com — Lightweight Reddit Monitoring

Building the RSS Trigger Workflow in n8n

Cleaning Up the Reddit Data

Routing and Delivering Your Reddit Posts

FetchRSS Limitations to Know About

Approach 2: Apify Reddit Scraper — Full Data at Scale

Setting Up the Apify Node in n8n

Configuring the Reddit Scraper Actor Input

What the Reddit Scraper Returns

Text Classifier: Sorting Posts by Category

Writing Categorized Posts to Google Sheets

The Beginner Branch: AI-Powered Response Generation

Building Your Resource Library for the AI Agent

Storing and Managing Responses

Telegram Notifications (Optional)

Scheduling and Running the Workflows

Use Cases for Reddit Scraping with n8n

Tips, Troubleshooting, and Cost Management

Join Our AI Community

Join 1,000+ AI Automation Builders

admin

Important Links

LinkedIn

Get in touch

Keep Learning

n8n PDF Generator: Create and Send PDFs Automatically

n8n Basic LLM Chain Node: Your First Step Into AI Workflows

n8n Error Handling: Build Resilient Workflows That Never Silently Fail

n8n Evaluations: How to Test and Measure Your AI Workflows

n8n Call Workflow Tool: Give Your AI Agent Superpowers