How to Scrape Instagram Data: 3 Methods with Code Examples
Instagram has 2 billion monthly active users. That's a goldmine of public data—profiles, posts, engagement metrics, hashtags, and comments.
But getting that data isn't straightforward. Instagram's official API is extremely limited. Most developers need to find alternatives.
In this guide, I'll show you 3 proven methods to scrape Instagram data, from DIY scraping to APIs that handle everything for you.
New to scraping? Start with our social media scraping overview to understand the fundamentals.
What Instagram Data Can You Scrape?
Before we dive into methods, here's what's actually accessible:
| Data Type | What You Get |
|---|---|
| Profiles | Username, bio, follower count, following count, post count, profile picture, verified status |
| Posts | Images, videos, captions, likes, comments count, timestamp, location, hashtags |
| Reels | Video URL, views, likes, comments, audio info, duration |
| Comments | Comment text, author, likes, replies, timestamp |
| Hashtags | Post count, top posts, recent posts |
| Stories | Images, videos (public accounts only) |
All of this is public data—the same information anyone can see by visiting an Instagram profile.
Method 1: DIY Scraping with Puppeteer
The hands-on approach. You control everything, but you also handle everything—proxies, rate limits, CAPTCHAs, and Instagram's anti-bot systems.
Setup
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Basic Profile Scraper
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
async function scrapeInstagramProfile(username) {
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
// Set realistic viewport and user agent
await page.setViewport({ width: 1366, height: 768 });
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
);
try {
await page.goto(`https://www.instagram.com/${username}/`, {
waitUntil: 'networkidle2',
timeout: 30000
});
// Wait for profile data to load
await page.waitForSelector('header section', { timeout: 10000 });
// Extract profile data from the page
const profileData = await page.evaluate(() => {
const header = document.querySelector('header section');
// Get follower/following counts
const stats = header.querySelectorAll('ul li');
const getCount = (element) => {
const text = element?.innerText || '0';
const match = text.match(/[\d,]+/);
return match ? parseInt(match[0].replace(/,/g, '')) : 0;
};
return {
username: document.querySelector('header h2')?.innerText,
fullName: document.querySelector('header section span')?.innerText,
bio: document.querySelector('header section > div > span')?.innerText,
posts: getCount(stats[0]),
followers: getCount(stats[1]),
following: getCount(stats[2]),
profilePic: document.querySelector('header img')?.src,
isVerified: !!document.querySelector('header svg[aria-label="Verified"]'),
scrapedAt: new Date().toISOString()
};
});
return profileData;
} catch (error) {
console.error('Scraping failed:', error.message);
return null;
} finally {
await browser.close();
}
}
// Usage
scrapeInstagramProfile('instagram')
.then(data => console.log(JSON.stringify(data, null, 2)));
The Problems with DIY Scraping
- Rate limiting - Instagram blocks IPs after ~100-200 requests
- Login walls - Many pages require authentication
- CAPTCHAs - Frequent challenges that break automation
- Proxy management - You need rotating residential proxies ($$$)
- Constant maintenance - Instagram changes their HTML frequently
Estimated cost: $200-500/month for proxies alone, plus your development time.
Want to avoid these headaches? Learn how to scrape Instagram without getting blocked or skip to Method 3.
Method 2: Instagram's Official API (Graph API)
The "legitimate" approach. Limited but stable.
What You Can Access
With a Facebook Developer account and approved app:
- Your own account's insights (if business/creator)
- Basic profile info of users who authorized your app
- Comments on your own posts
- Media you've published
What You CAN'T Access
- Other users' followers/following lists
- Other users' post engagement
- Hashtag search results
- Reels data
- Comments on others' posts
Setup
// Instagram Graph API - Basic Profile
const accessToken = 'YOUR_ACCESS_TOKEN';
const userId = 'YOUR_USER_ID';
async function getOwnProfile() {
const response = await fetch(
`https://graph.instagram.com/${userId}?fields=id,username,account_type,media_count&access_token=${accessToken}`
);
return response.json();
}
// Get your own media
async function getOwnMedia() {
const response = await fetch(
`https://graph.instagram.com/${userId}/media?fields=id,caption,media_type,media_url,timestamp,like_count,comments_count&access_token=${accessToken}`
);
return response.json();
}
Verdict: Only useful if you need your own account data or are building an app where users log in with Instagram.
For a deeper comparison, see: Instagram Official vs Private API
Method 3: Instagram Scraping API (Recommended)
The practical solution. An API handles all the infrastructure—proxies, rate limits, CAPTCHAs, browser automation—and you just make HTTP requests.
Why Use an API?
| DIY Scraping | Scraping API |
|---|---|
| Manage proxies yourself | Proxies included |
| Handle CAPTCHAs | CAPTCHAs handled |
| Fix when Instagram changes | Always maintained |
| 100-200 requests before blocks | Unlimited requests |
| $200-500/month infrastructure | Pay per request |
Getting Instagram Profile Data
const API_KEY = 'your_sociavault_api_key';
async function getInstagramProfile(username) {
const response = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
{
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
}
}
);
const data = await response.json();
return data;
}
// Usage
const profile = await getInstagramProfile('natgeo');
console.log(profile);
/* Response:
{
"success": true,
"data": {
"username": "natgeo",
"full_name": "National Geographic",
"biography": "Experience the world through the eyes of National Geographic photographers.",
"follower_count": 283000000,
"following_count": 134,
"media_count": 28947,
"is_verified": true,
"is_business_account": true,
"profile_pic_url": "https://...",
"external_url": "https://natgeo.com"
}
}
*/
Getting Instagram Posts
async function getInstagramPosts(username, limit = 12) {
const response = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=${limit}`,
{
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
}
}
);
return response.json();
}
// Get latest 20 posts from a profile
const posts = await getInstagramPosts('nike', 20);
posts.data.forEach(post => {
console.log({
type: post.media_type,
caption: post.caption?.substring(0, 100),
likes: post.like_count,
comments: post.comment_count,
url: post.post_url
});
});
Scraping Instagram Reels
async function getInstagramReels(username, limit = 10) {
const response = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/reels?username=${username}&limit=${limit}`,
{
headers: {
'Authorization': `Bearer ${API_KEY}`
}
}
);
return response.json();
}
// Get reels with engagement data
const reels = await getInstagramReels('mrbeast', 10);
reels.data.forEach(reel => {
console.log({
views: reel.view_count,
likes: reel.like_count,
comments: reel.comment_count,
duration: reel.duration,
videoUrl: reel.video_url
});
});
Getting Post Comments
async function getPostComments(postUrl, limit = 100) {
const response = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/comments?url=${encodeURIComponent(postUrl)}&limit=${limit}`,
{
headers: {
'Authorization': `Bearer ${API_KEY}`
}
}
);
return response.json();
}
// Analyze comments on a viral post
const comments = await getPostComments('https://instagram.com/p/ABC123');
// Sentiment analysis example
const positive = comments.data.filter(c =>
c.text.match(/love|amazing|great|awesome|❤️|🔥|👏/i)
).length;
console.log(`Positive sentiment: ${(positive / comments.data.length * 100).toFixed(1)}%`);
Python Example
import requests
API_KEY = 'your_sociavault_api_key'
BASE_URL = 'https://api.sociavault.com/v1/scrape/instagram'
def get_instagram_profile(username):
response = requests.get(
f'{BASE_URL}/profile',
params={'username': username},
headers={'Authorization': f'Bearer {API_KEY}'}
)
return response.json()
def get_instagram_posts(username, limit=12):
response = requests.get(
f'{BASE_URL}/posts',
params={'username': username, 'limit': limit},
headers={'Authorization': f'Bearer {API_KEY}'}
)
return response.json()
def get_hashtag_posts(hashtag, limit=50):
response = requests.get(
f'{BASE_URL}/hashtag',
params={'tag': hashtag, 'limit': limit},
headers={'Authorization': f'Bearer {API_KEY}'}
)
return response.json()
# Usage
profile = get_instagram_profile('cristiano')
print(f"Followers: {profile['data']['follower_count']:,}")
posts = get_instagram_posts('cristiano', 10)
for post in posts['data']:
print(f"Likes: {post['like_count']:,} | {post['caption'][:50]}...")
Complete Example: Instagram Competitor Analysis
Here's a practical script that scrapes competitor data for analysis:
const API_KEY = process.env.SOCIAVAULT_API_KEY;
async function analyzeCompetitors(competitors) {
const results = [];
for (const username of competitors) {
console.log(`Analyzing @${username}...`);
// Get profile data
const profileRes = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const profile = await profileRes.json();
// Get recent posts for engagement calculation
const postsRes = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=12`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const posts = await postsRes.json();
// Calculate average engagement
const totalEngagement = posts.data.reduce((sum, post) => {
return sum + (post.like_count || 0) + (post.comment_count || 0);
}, 0);
const avgEngagement = totalEngagement / posts.data.length;
const engagementRate = (avgEngagement / profile.data.follower_count) * 100;
// Get posting frequency
const dates = posts.data.map(p => new Date(p.timestamp));
const daysBetween = (dates[0] - dates[dates.length - 1]) / (1000 * 60 * 60 * 24);
const postsPerWeek = (posts.data.length / daysBetween) * 7;
results.push({
username: profile.data.username,
followers: profile.data.follower_count,
following: profile.data.following_count,
totalPosts: profile.data.media_count,
avgLikes: Math.round(posts.data.reduce((s, p) => s + p.like_count, 0) / posts.data.length),
avgComments: Math.round(posts.data.reduce((s, p) => s + p.comment_count, 0) / posts.data.length),
engagementRate: engagementRate.toFixed(2) + '%',
postsPerWeek: postsPerWeek.toFixed(1),
isVerified: profile.data.is_verified
});
// Rate limit courtesy
await new Promise(r => setTimeout(r, 500));
}
// Sort by engagement rate
results.sort((a, b) => parseFloat(b.engagementRate) - parseFloat(a.engagementRate));
return results;
}
// Analyze fitness influencers
const competitors = ['kaikifit', 'whitneyysimmons', 'brittany_perille'];
analyzeCompetitors(competitors).then(results => {
console.table(results);
// Export to CSV
const csv = [
Object.keys(results[0]).join(','),
...results.map(r => Object.values(r).join(','))
].join('\n');
require('fs').writeFileSync('competitor-analysis.csv', csv);
console.log('Saved to competitor-analysis.csv');
});
Storing Scraped Data
Once you have the data, you need somewhere to put it:
SQLite (Simple)
const Database = require('better-sqlite3');
const db = new Database('instagram_data.db');
// Create tables
db.exec(`
CREATE TABLE IF NOT EXISTS profiles (
username TEXT PRIMARY KEY,
full_name TEXT,
biography TEXT,
follower_count INTEGER,
following_count INTEGER,
media_count INTEGER,
is_verified INTEGER,
scraped_at TEXT
);
CREATE TABLE IF NOT EXISTS posts (
post_id TEXT PRIMARY KEY,
username TEXT,
caption TEXT,
like_count INTEGER,
comment_count INTEGER,
media_type TEXT,
timestamp TEXT,
scraped_at TEXT
);
`);
// Insert profile
function saveProfile(profile) {
const stmt = db.prepare(`
INSERT OR REPLACE INTO profiles
VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
`);
stmt.run(
profile.username,
profile.full_name,
profile.biography,
profile.follower_count,
profile.following_count,
profile.media_count,
profile.is_verified ? 1 : 0
);
}
// Insert posts
function savePosts(username, posts) {
const stmt = db.prepare(`
INSERT OR REPLACE INTO posts
VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
`);
for (const post of posts) {
stmt.run(
post.id,
username,
post.caption,
post.like_count,
post.comment_count,
post.media_type,
post.timestamp
);
}
}
Export to Google Sheets
const { google } = require('googleapis');
async function exportToSheets(data, spreadsheetId, range) {
const auth = new google.auth.GoogleAuth({
keyFile: 'credentials.json',
scopes: ['https://www.googleapis.com/auth/spreadsheets']
});
const sheets = google.sheets({ version: 'v4', auth });
// Convert data to rows
const headers = Object.keys(data[0]);
const rows = [headers, ...data.map(item => headers.map(h => item[h]))];
await sheets.spreadsheets.values.update({
spreadsheetId,
range,
valueInputOption: 'RAW',
resource: { values: rows }
});
console.log('Data exported to Google Sheets');
}
Legal Considerations
Scraping public Instagram data is generally legal when you:
- ✅ Only access publicly available information
- ✅ Don't bypass authentication or access controls
- ✅ Respect rate limits and don't overload servers
- ✅ Don't use data for harassment or spam
- ✅ Comply with GDPR/CCPA for personal data
Read our full guide: Is Web Scraping Legal?
Which Method Should You Choose?
| Scenario | Best Method |
|---|---|
| Learning/experimenting | DIY with Puppeteer |
| Need your own account data | Official Graph API |
| Production app | Scraping API |
| Large-scale data collection | Scraping API |
| One-time research | Scraping API |
Getting Started
- Sign up at sociavault.com
- Get 50 free credits to test
- Copy your API key from the dashboard
- Start scraping with the examples above
Frequently Asked Questions
Is it legal to scrape Instagram data?
Yes, scraping publicly available Instagram data is generally legal. Courts have ruled that public data isn't protected by the CFAA. However, you should never bypass login walls or scrape private accounts. See our complete Instagram scraping legal guide.
How do I scrape Instagram followers?
You can scrape Instagram follower lists using an API. Our guide to scraping Instagram followers covers three methods with code examples for exporting follower data.
What's the best Instagram scraping API?
SociaVault is built specifically for social media scraping with Instagram support. See our comparison of the best social media scraping APIs for alternatives.
Can I scrape Instagram Reels?
Yes! You can scrape Instagram Reels including view counts, likes, comments, video URLs, and audio information. The API method shown above handles Reels extraction.
Related guides:
Found this helpful?
Share it with others who might benefit
Ready to Try SociaVault?
Start extracting social media data with our powerful API. No credit card required.