Claude Code for Data Science: 10 Real Examples (2026)
If you have spent years running the same EDA, cleaning messy data, and writing the same statistical tests on repeat, Claude Code for data science might be the tool that changes how you work. This guide walks through 10 practical examples built specifically for data scientists, from exploratory data analysis all the way to feature engineering and hypothesis testing.
Each example includes the prompt, the workflow, and what Claude Code actually produced. The last example in this list turns a 2-hour task into something done in under a minute.
What is Claude Code?
Claude Code is an agentic coding assistant built by Anthropic that runs directly inside your terminal or IDE. Unlike a chatbot, it can read files from your project, write and execute code, and iterate on results without you manually copying anything.
For data scientists, this means you can describe a task in plain English and Claude Code will generate the Python, run it, check the output, and fix errors automatically. It works seamlessly in VS Code, which is where most data scientists already live.
Claude Code is different from Cursor in one key way: it produces reusable Python scripts alongside the results, so every task it completes leaves behind a file you can re-run anytime. For repetitive data workflows, this adds up fast.
Example 1: Exploratory Data Analysis (EDA) with Claude Code
Exploratory data analysis is often the first thing you do with a new dataset and one of the most time-consuming. Claude Code handles the full EDA in a single prompt.
The Prompt
Here is the prompt used for a 260-row e-commerce sales dataset with columns for order ID, customer name, product category, region, order date, revenue, quantity, and status:
I have an e-commerce sales dataset called ecommerce_sales.csv. Please run a full EDA analysis. I want to see: dataset shape, dtypes, and a summary of nulls, descriptive statistics, descriptive plots, correlation heatmap, top 5 products, revenue breakdown, flag any outliers or data quality issues, save all charts as PNG, and write me a summary of key findings.
What Claude Code Produced
Claude Code generated around 150 lines of Python using Pandas, NumPy, and Matplotlib. Within a few minutes it delivered:
- Dataset overview: 260 rows, 11 columns, 68 total null values
- Revenue stats: mean and median, right-skewed distribution, one negative value flagged as a likely refund
- 29 IQR outliers above the upper fence, top outliers near 450 and 900
- Top 5 products: Monitor, Golf Clubs, Bicycle, Sweater, Jeans
- Top categories: Sports, Home, Electronics, Clothing
- Regional breakdown, quantity stats, discount range (0-20%)
- Data quality flags: 1 negative revenue, 48 missing discounts, 17 missing revenue values
- Order status: 9% canceled, 16% pending
- 4 saved PNG charts: distribution plots, correlation heatmap, product categories, revenue by region
- A reusable EDA Python script you can run on updated versions of the same dataset
EDA is a beginner-level skill but it takes serious time. Claude Code brings it down from 30+ minutes of manual Pandas work to a few minutes of reviewing results. The script it leaves behind means you never have to repeat that setup again. For more on working with complex Pandas data structures, see our guide on working with complex Pandas data structures.
Example 2: Data Cleaning in One Prompt
Data cleaning is the part of the job nobody talks about. You get a dataset from a legacy system with mixed-case names, phone numbers in five different formats, dates that look like they came from different countries, and inconsistent boolean values.
Normally, you spend 30 minutes to 2 hours writing Pandas code to clean all of it. Claude Code for data science lets you describe what clean data looks like instead of writing all of that yourself.
The Prompt
I have a messy customer dataset. Please clean it completely. Standardize customer name to title case. Fix invalid email formats and flag invalid ones. Standardize phone numbers to (XXX) XXX-XXXX. Parse all sign-up date values into YYYY-MM-DD. Clean annual revenue: remove $ and commas, convert to float. Standardize country to full country name. Fix age: remove impossible values. Standardize is_active to True/False. Remove duplicate rows. Show a before-and-after comparison and save as cleaned_customers.csv.
Results
On a 51-row dataset with deliberately messy data, Claude Code delivered:
- Cleaned dataset with 49 rows after removing 2 exact duplicates (including ones that were not adjacent in the file)
- Phone numbers all standardized to one format
- Dates parsed from multiple international formats into ISO 8601
- Annual revenue stripped of currency symbols and converted to float
- Country field expanded from abbreviations like US, U.S., USA to United States
- A new email_valid column flagging invalid entries
- A reusable Python cleaning script built with Pandas, regex, and dateutil
The real win here is the Python script. If you have the same data structure updated monthly, you can re-run the cleaner without touching a line of code.
Need Help Building AI Automations?
Join Our AI Community
Example 3: Charts and a PowerPoint Deck From Raw Data
After analyzing data, you have to present it. Context-switching between Python and PowerPoint, manually styling charts, and fighting with alignment is one of the most annoying parts of a data scientist’s job. Claude Code for data science handles all of it in one prompt.
The Prompt
I have a monthly revenue CSV. Please create: a line chart, bar chart, heatmap, and summary chart. Make the charts presentation-ready with proper titles, labels, and a consistent color scheme, and save as PNG. Then create a PowerPoint presentation with a title slide, one slide per chart with 2-3 sentences of insight below each, and a final summary slide with 3 key takeaways. Save as revenue.pptx.
Claude Code built the charts in matplotlib, assembled the deck using python-pptx, and delivered a 6-slide PowerPoint with title, four chart slides with auto-generated commentary, and a summary. It also saved a Python file you can re-run whenever the underlying data updates.
If you produce the same deck weekly or quarterly, you can fully automate this. The Python script Claude Code generates is the automation. For interactive visualizations, you can also pair this with a Streamlit bar chart for live dashboards.
Example 4: Writing and Running SQL Queries From Plain English
Every data scientist has a complicated relationship with SQL. Window functions, complex joins, and CTEs are powerful but slow to write, especially when documentation on existing queries is sparse.
Claude Code lets you describe what you want in plain English and it writes the SQL, loads your data, runs the queries, and shows the results.
The Setup
Three CSV files were loaded into an in-memory SQLite database. Claude Code ran five business queries:
- Show total revenue by customer tier
- Which 10 customers have spent the most money in the last 6 months?
- What is the month-over-month revenue growth?
- Find customers who have placed more than 3 orders
- What is the average order value by region?
Claude Code returned the SQL for each query alongside the results. The queries included GROUP BY, JOINs, a CTE for month-over-month growth, and date filtering. It also flagged when a query needed a workaround, noting the dataset had no refund status so it used canceled as the closest proxy.
This approach works especially well when you need to answer ad-hoc business questions fast. Describe the question, get the SQL and the answer together.
Example 5: Pulling Data From External APIs
A lot of valuable data is locked behind APIs. Claude Code for data science handles the full workflow: writing the request, handling authentication, loading the response into a DataFrame, and saving it.
CoinGecko Crypto Data Example
This example uses the CoinGecko public API, which requires no login or API key:
Use the CoinGecko API to grab the top 20 cryptocurrencies by market cap. Load the data into a DataFrame, create a bar chart of market cap values, and save the full dataset as crypto_data.csv.
Claude Code wrote a Python script using the requests library, passed the correct parameters (top 20, USD, sorted by market cap descending), loaded the response into a Pandas DataFrame, built a bar chart using matplotlib, and saved the CSV.
The same pattern works with any API. For APIs that need authentication, give Claude Code the endpoint docs and tell it which credentials to use. For APIs with rate limiting or pagination, describe that in the prompt and Claude Code will account for it.
Example 6: Web Scraping With Python
When data is not available through an API, you often need to scrape it. Claude Code is strong at writing web scrapers because it handles the HTML parsing, data extraction, and output formatting in a single prompt.
Wikipedia Fortune 500 Scraper
Please scrape the Wikipedia page for a list of largest companies by revenue. Extract the main table with company name, rank, revenue, profit, employees, industry, and country. Load it into a DataFrame, show the top 10 companies, create a bar chart, group by country and industry, and save as a CSV. Handle any formatting issues with numbers.
Claude Code used BeautifulSoup and requests to scrape the table, cleaned the numeric formatting, and produced the CSV and horizontal bar chart in one pass. Results showed Amazon and Walmart at the top, with the US and China dominating the list.
For sites with Cloudflare protection, a back-and-forth conversation in Claude Code can usually surface a working approach. It may take a few rounds of debugging, but it gets there.
Example 7: Feature Engineering for Machine Learning
Feature engineering is where most of the real lift in machine learning happens. It requires reasoning about the business problem, not just running a formula. The question is: what hidden signals in this data predict the outcome?
Claude Code for data science can do that reasoning out loud, suggest meaningful features based on domain knowledge, and build them in Python immediately.
Customer Churn Feature Engineering
I have a customer churn dataset. I want to do feature engineering before building a model. Step 1: analyze existing features and their relationship to the target. Step 2: suggest and create at least 8 new engineered features based on SaaS domain knowledge. Step 3: show which new features are most correlated with churn. Step 4: save the enriched dataset.
Claude Code analyzed the raw features, applied SaaS-specific reasoning (customers with many support tickets but low product usage are high churn risk), and created 8+ new columns including ratios, rolling counts, and interaction terms.
This is where Claude Code data science skills shine. It does not just generate boilerplate, it reasons about what signals matter for the specific problem you are solving.
Claude Code vs Cursor for Data Science: Which Should You Use?
Claude Code vs Cursor for data science comes down to what you need from your AI coding tool.
Cursor is a full IDE built on top of VS Code with AI baked in. It is excellent for editing existing code, navigating large codebases, and inline suggestions. If you are a software engineer who also does data work, Cursor fits well.
Claude Code works as a terminal agent and VS Code extension. The key advantage for data scientists: it always produces a reusable Python file alongside the output. Every task you run becomes a script you can re-run, schedule, or share. For data workflows that repeat weekly or monthly, this is the more practical choice.
If you want interactive chat with your codebase, Cursor wins. If you want automated, reproducible data pipelines built from plain English prompts, Claude Code is the better fit for how to use Claude Code for data science workflows at scale.
Frequently Asked Questions
Can Claude Code run Python code for data science directly?
Yes. Claude Code writes Python scripts and executes them in your environment. It uses libraries like Pandas, NumPy, Matplotlib, scikit-learn, and Seaborn natively. You do not need to configure anything, just describe the task and point it at your data file.
How do I use Claude Code for data science in VS Code?
Install the Claude Code extension from the VS Code marketplace. Open your project folder, open the Claude Code panel, and type your task in plain English. Claude Code reads your local files, writes the code, runs it, and shows the results in the terminal. You can accept or reject any file changes before they are applied.
Is Claude Code better than ChatGPT for data science?
Claude Code is purpose-built for agentic coding, meaning it can execute code directly rather than just suggesting it. For data science workflows where you want to run analysis, generate files, and iterate without copying code manually, Claude Code outperforms ChatGPT. For quick one-off questions, both work well.
What data science tasks is Claude Code best for?
Claude Code handles EDA, data cleaning, visualization, SQL generation, API data pulls, web scraping, feature engineering, and report generation extremely well. It is strongest on tasks that are repetitive and well-defined. Open-ended research questions still benefit from human judgment.
Does Claude Code work with large datasets?
Claude Code processes whatever your local Python environment can handle. For very large datasets (millions of rows), guide it toward chunk-based processing or database-backed approaches. For datasets up to a few hundred thousand rows, it performs standard Pandas operations without issue.
Start Using Claude Code for Data Science Today
Claude Code for data science is not about replacing your skills. It is about eliminating the repetitive parts of the job so you can focus on the analysis that actually requires your judgment.
The 7 examples above, EDA, cleaning, visualizations, SQL, APIs, web scraping, and feature engineering, cover the tasks most data scientists spend the majority of their time on. Each one can be reduced from hours to minutes.
Start with the task you repeat most often. Write a clear, detailed prompt. Run it. Check the output. Adjust. Within a few sessions, you will have a library of reusable scripts built around your specific workflows.
