How to Scrape Instagram Data: 3 Methods with Code Examples

Instagram has 2 billion monthly active users. That's a goldmine of public data—profiles, posts, engagement metrics, hashtags, and comments.

But getting that data isn't straightforward. Instagram's official API is extremely limited. Most developers need to find alternatives.

In this guide, I'll show you 3 proven methods to scrape Instagram data, from DIY scraping to APIs that handle everything for you.

New to scraping? Start with our social media scraping overview to understand the fundamentals.

What Instagram Data Can You Scrape?

Before we dive into methods, here's what's actually accessible:

Data Type	What You Get
Profiles	Username, bio, follower count, following count, post count, profile picture, verified status
Posts	Images, videos, captions, likes, comments count, timestamp, location, hashtags
Reels	Video URL, views, likes, comments, audio info, duration
Comments	Comment text, author, likes, replies, timestamp
Hashtags	Post count, top posts, recent posts
Stories	Images, videos (public accounts only)

All of this is public data—the same information anyone can see by visiting an Instagram profile.

Method 1: DIY Scraping with Puppeteer

The hands-on approach. You control everything, but you also handle everything—proxies, rate limits, CAPTCHAs, and Instagram's anti-bot systems.

Setup

npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Basic Profile Scraper

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

async function scrapeInstagramProfile(username) {
  const browser = await puppeteer.launch({ 
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  const page = await browser.newPage();
  
  // Set realistic viewport and user agent
  await page.setViewport({ width: 1366, height: 768 });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  );
  
  try {
    await page.goto(`https://www.instagram.com/${username}/`, {
      waitUntil: 'networkidle2',
      timeout: 30000
    });
    
    // Wait for profile data to load
    await page.waitForSelector('header section', { timeout: 10000 });
    
    // Extract profile data from the page
    const profileData = await page.evaluate(() => {
      const header = document.querySelector('header section');
      
      // Get follower/following counts
      const stats = header.querySelectorAll('ul li');
      const getCount = (element) => {
        const text = element?.innerText || '0';
        const match = text.match(/[\d,]+/);
        return match ? parseInt(match[0].replace(/,/g, '')) : 0;
      };
      
      return {
        username: document.querySelector('header h2')?.innerText,
        fullName: document.querySelector('header section span')?.innerText,
        bio: document.querySelector('header section > div > span')?.innerText,
        posts: getCount(stats[0]),
        followers: getCount(stats[1]),
        following: getCount(stats[2]),
        profilePic: document.querySelector('header img')?.src,
        isVerified: !!document.querySelector('header svg[aria-label="Verified"]'),
        scrapedAt: new Date().toISOString()
      };
    });
    
    return profileData;
    
  } catch (error) {
    console.error('Scraping failed:', error.message);
    return null;
  } finally {
    await browser.close();
  }
}

// Usage
scrapeInstagramProfile('instagram')
  .then(data => console.log(JSON.stringify(data, null, 2)));

The Problems with DIY Scraping

Rate limiting - Instagram blocks IPs after ~100-200 requests
Login walls - Many pages require authentication
CAPTCHAs - Frequent challenges that break automation
Proxy management - You need rotating residential proxies ($$$)
Constant maintenance - Instagram changes their HTML frequently

Estimated cost: $200-500/month for proxies alone, plus your development time.

Want to avoid these headaches? Learn how to scrape Instagram without getting blocked or skip to Method 3.

Method 2: Instagram's Official API (Graph API)

The "legitimate" approach. Limited but stable.

What You Can Access

With a Facebook Developer account and approved app:

Your own account's insights (if business/creator)
Basic profile info of users who authorized your app
Comments on your own posts
Media you've published

What You CAN'T Access

Other users' followers/following lists
Other users' post engagement
Hashtag search results
Reels data
Comments on others' posts

Setup

// Instagram Graph API - Basic Profile
const accessToken = 'YOUR_ACCESS_TOKEN';
const userId = 'YOUR_USER_ID';

async function getOwnProfile() {
  const response = await fetch(
    `https://graph.instagram.com/${userId}?fields=id,username,account_type,media_count&access_token=${accessToken}`
  );
  
  return response.json();
}

// Get your own media
async function getOwnMedia() {
  const response = await fetch(
    `https://graph.instagram.com/${userId}/media?fields=id,caption,media_type,media_url,timestamp,like_count,comments_count&access_token=${accessToken}`
  );
  
  return response.json();
}

Verdict: Only useful if you need your own account data or are building an app where users log in with Instagram.

For a deeper comparison, see: Instagram Official vs Private API

Method 3: Instagram Scraping API (Recommended)

The practical solution. An API handles all the infrastructure—proxies, rate limits, CAPTCHAs, browser automation—and you just make HTTP requests.

Why Use an API?

DIY Scraping	Scraping API
Manage proxies yourself	Proxies included
Handle CAPTCHAs	CAPTCHAs handled
Fix when Instagram changes	Always maintained
100-200 requests before blocks	Unlimited requests
$200-500/month infrastructure	Pay per request

Getting Instagram Profile Data

const API_KEY = 'your_sociavault_api_key';

async function getInstagramProfile(username) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
      }
    }
  );
  
  const data = await response.json();
  return data;
}

// Usage
const profile = await getInstagramProfile('natgeo');
console.log(profile);

/* Response:
{
  "success": true,
  "data": {
    "username": "natgeo",
    "full_name": "National Geographic",
    "biography": "Experience the world through the eyes of National Geographic photographers.",
    "follower_count": 283000000,
    "following_count": 134,
    "media_count": 28947,
    "is_verified": true,
    "is_business_account": true,
    "profile_pic_url": "https://...",
    "external_url": "https://natgeo.com"
  }
}
*/

Getting Instagram Posts

async function getInstagramPosts(username, limit = 12) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
      }
    }
  );
  
  return response.json();
}

// Get latest 20 posts from a profile
const posts = await getInstagramPosts('nike', 20);

posts.data.forEach(post => {
  console.log({
    type: post.media_type,
    caption: post.caption?.substring(0, 100),
    likes: post.like_count,
    comments: post.comment_count,
    url: post.post_url
  });
});

Scraping Instagram Reels

async function getInstagramReels(username, limit = 10) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/reels?username=${username}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`
      }
    }
  );
  
  return response.json();
}

// Get reels with engagement data
const reels = await getInstagramReels('mrbeast', 10);

reels.data.forEach(reel => {
  console.log({
    views: reel.view_count,
    likes: reel.like_count,
    comments: reel.comment_count,
    duration: reel.duration,
    videoUrl: reel.video_url
  });
});

Getting Post Comments

async function getPostComments(postUrl, limit = 100) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/comments?url=${encodeURIComponent(postUrl)}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`
      }
    }
  );
  
  return response.json();
}

// Analyze comments on a viral post
const comments = await getPostComments('https://instagram.com/p/ABC123');

// Sentiment analysis example
const positive = comments.data.filter(c => 
  c.text.match(/love|amazing|great|awesome|❤️|🔥|👏/i)
).length;

console.log(`Positive sentiment: ${(positive / comments.data.length * 100).toFixed(1)}%`);

Python Example

import requests

API_KEY = 'your_sociavault_api_key'
BASE_URL = 'https://api.sociavault.com/v1/scrape/instagram'

def get_instagram_profile(username):
    response = requests.get(
        f'{BASE_URL}/profile',
        params={'username': username},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

def get_instagram_posts(username, limit=12):
    response = requests.get(
        f'{BASE_URL}/posts',
        params={'username': username, 'limit': limit},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

def get_hashtag_posts(hashtag, limit=50):
    response = requests.get(
        f'{BASE_URL}/hashtag',
        params={'tag': hashtag, 'limit': limit},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

# Usage
profile = get_instagram_profile('cristiano')
print(f"Followers: {profile['data']['follower_count']:,}")

posts = get_instagram_posts('cristiano', 10)
for post in posts['data']:
    print(f"Likes: {post['like_count']:,} | {post['caption'][:50]}...")

Complete Example: Instagram Competitor Analysis

Here's a practical script that scrapes competitor data for analysis:

const API_KEY = process.env.SOCIAVAULT_API_KEY;

async function analyzeCompetitors(competitors) {
  const results = [];
  
  for (const username of competitors) {
    console.log(`Analyzing @${username}...`);
    
    // Get profile data
    const profileRes = await fetch(
      `https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    const profile = await profileRes.json();
    
    // Get recent posts for engagement calculation
    const postsRes = await fetch(
      `https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=12`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    const posts = await postsRes.json();
    
    // Calculate average engagement
    const totalEngagement = posts.data.reduce((sum, post) => {
      return sum + (post.like_count || 0) + (post.comment_count || 0);
    }, 0);
    
    const avgEngagement = totalEngagement / posts.data.length;
    const engagementRate = (avgEngagement / profile.data.follower_count) * 100;
    
    // Get posting frequency
    const dates = posts.data.map(p => new Date(p.timestamp));
    const daysBetween = (dates[0] - dates[dates.length - 1]) / (1000 * 60 * 60 * 24);
    const postsPerWeek = (posts.data.length / daysBetween) * 7;
    
    results.push({
      username: profile.data.username,
      followers: profile.data.follower_count,
      following: profile.data.following_count,
      totalPosts: profile.data.media_count,
      avgLikes: Math.round(posts.data.reduce((s, p) => s + p.like_count, 0) / posts.data.length),
      avgComments: Math.round(posts.data.reduce((s, p) => s + p.comment_count, 0) / posts.data.length),
      engagementRate: engagementRate.toFixed(2) + '%',
      postsPerWeek: postsPerWeek.toFixed(1),
      isVerified: profile.data.is_verified
    });
    
    // Rate limit courtesy
    await new Promise(r => setTimeout(r, 500));
  }
  
  // Sort by engagement rate
  results.sort((a, b) => parseFloat(b.engagementRate) - parseFloat(a.engagementRate));
  
  return results;
}

// Analyze fitness influencers
const competitors = ['kaikifit', 'whitneyysimmons', 'brittany_perille'];

analyzeCompetitors(competitors).then(results => {
  console.table(results);
  
  // Export to CSV
  const csv = [
    Object.keys(results[0]).join(','),
    ...results.map(r => Object.values(r).join(','))
  ].join('\n');
  
  require('fs').writeFileSync('competitor-analysis.csv', csv);
  console.log('Saved to competitor-analysis.csv');
});

Storing Scraped Data

Once you have the data, you need somewhere to put it:

SQLite (Simple)

const Database = require('better-sqlite3');
const db = new Database('instagram_data.db');

// Create tables
db.exec(`
  CREATE TABLE IF NOT EXISTS profiles (
    username TEXT PRIMARY KEY,
    full_name TEXT,
    biography TEXT,
    follower_count INTEGER,
    following_count INTEGER,
    media_count INTEGER,
    is_verified INTEGER,
    scraped_at TEXT
  );
  
  CREATE TABLE IF NOT EXISTS posts (
    post_id TEXT PRIMARY KEY,
    username TEXT,
    caption TEXT,
    like_count INTEGER,
    comment_count INTEGER,
    media_type TEXT,
    timestamp TEXT,
    scraped_at TEXT
  );
`);

// Insert profile
function saveProfile(profile) {
  const stmt = db.prepare(`
    INSERT OR REPLACE INTO profiles 
    VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
  `);
  
  stmt.run(
    profile.username,
    profile.full_name,
    profile.biography,
    profile.follower_count,
    profile.following_count,
    profile.media_count,
    profile.is_verified ? 1 : 0
  );
}

// Insert posts
function savePosts(username, posts) {
  const stmt = db.prepare(`
    INSERT OR REPLACE INTO posts 
    VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
  `);
  
  for (const post of posts) {
    stmt.run(
      post.id,
      username,
      post.caption,
      post.like_count,
      post.comment_count,
      post.media_type,
      post.timestamp
    );
  }
}

Export to Google Sheets

const { google } = require('googleapis');

async function exportToSheets(data, spreadsheetId, range) {
  const auth = new google.auth.GoogleAuth({
    keyFile: 'credentials.json',
    scopes: ['https://www.googleapis.com/auth/spreadsheets']
  });
  
  const sheets = google.sheets({ version: 'v4', auth });
  
  // Convert data to rows
  const headers = Object.keys(data[0]);
  const rows = [headers, ...data.map(item => headers.map(h => item[h]))];
  
  await sheets.spreadsheets.values.update({
    spreadsheetId,
    range,
    valueInputOption: 'RAW',
    resource: { values: rows }
  });
  
  console.log('Data exported to Google Sheets');
}

Legal Considerations

Scraping public Instagram data is generally legal when you:

✅ Only access publicly available information
✅ Don't bypass authentication or access controls
✅ Respect rate limits and don't overload servers
✅ Don't use data for harassment or spam
✅ Comply with GDPR/CCPA for personal data

Read our full guide: Is Web Scraping Legal?

Which Method Should You Choose?

Scenario	Best Method
Learning/experimenting	DIY with Puppeteer
Need your own account data	Official Graph API
Production app	Scraping API
Large-scale data collection	Scraping API
One-time research	Scraping API

Getting Started

Sign up at sociavault.com
Get 50 free credits to test
Copy your API key from the dashboard
Start scraping with the examples above

Frequently Asked Questions

Is it legal to scrape Instagram data?

Yes, scraping publicly available Instagram data is generally legal. Courts have ruled that public data isn't protected by the CFAA. However, you should never bypass login walls or scrape private accounts. See our complete Instagram scraping legal guide.

How do I scrape Instagram followers?

You can scrape Instagram follower lists using an API. Our guide to scraping Instagram followers covers three methods with code examples for exporting follower data.

What's the best Instagram scraping API?

SociaVault is built specifically for social media scraping with Instagram support. See our comparison of the best social media scraping APIs for alternatives.

Can I scrape Instagram Reels?

Yes! You can scrape Instagram Reels including view counts, likes, comments, video URLs, and audio information. The API method shown above handles Reels extraction.

Related guides:

How to Scrape Instagram Data: 3 Methods with Code Examples (2026)

How to Scrape Instagram Data: 3 Methods with Code Examples

What Instagram Data Can You Scrape?

Method 1: DIY Scraping with Puppeteer

Setup

Basic Profile Scraper

The Problems with DIY Scraping

Method 2: Instagram's Official API (Graph API)

What You Can Access

What You CAN'T Access

Setup

Method 3: Instagram Scraping API (Recommended)

Why Use an API?

Getting Instagram Profile Data

Getting Instagram Posts

Scraping Instagram Reels

Getting Post Comments

Python Example

Complete Example: Instagram Competitor Analysis

Storing Scraped Data

SQLite (Simple)

Export to Google Sheets

Legal Considerations

Which Method Should You Choose?

Getting Started

Frequently Asked Questions

Is it legal to scrape Instagram data?

How do I scrape Instagram followers?

What's the best Instagram scraping API?

Can I scrape Instagram Reels?

Found this helpful?

Ready to Try SociaVault?