Back to Blog
Tutorial

How to Scrape Instagram Data: 3 Methods with Code Examples (2026)

February 9, 2026
10 min read
S
By SociaVault Team
Instagram ScrapingData ExtractionPythonJavaScriptAPIWeb ScrapingInstagram APIScrape Instagram

How to Scrape Instagram Data: 3 Methods with Code Examples

Instagram has 2 billion monthly active users. That's a goldmine of public data—profiles, posts, engagement metrics, hashtags, and comments.

But getting that data isn't straightforward. Instagram's official API is extremely limited. Most developers need to find alternatives.

In this guide, I'll show you 3 proven methods to scrape Instagram data, from DIY scraping to APIs that handle everything for you.

New to scraping? Start with our social media scraping overview to understand the fundamentals.

What Instagram Data Can You Scrape?

Before we dive into methods, here's what's actually accessible:

Data TypeWhat You Get
ProfilesUsername, bio, follower count, following count, post count, profile picture, verified status
PostsImages, videos, captions, likes, comments count, timestamp, location, hashtags
ReelsVideo URL, views, likes, comments, audio info, duration
CommentsComment text, author, likes, replies, timestamp
HashtagsPost count, top posts, recent posts
StoriesImages, videos (public accounts only)

All of this is public data—the same information anyone can see by visiting an Instagram profile.

Method 1: DIY Scraping with Puppeteer

The hands-on approach. You control everything, but you also handle everything—proxies, rate limits, CAPTCHAs, and Instagram's anti-bot systems.

Setup

npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Basic Profile Scraper

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

async function scrapeInstagramProfile(username) {
  const browser = await puppeteer.launch({ 
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  const page = await browser.newPage();
  
  // Set realistic viewport and user agent
  await page.setViewport({ width: 1366, height: 768 });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  );
  
  try {
    await page.goto(`https://www.instagram.com/${username}/`, {
      waitUntil: 'networkidle2',
      timeout: 30000
    });
    
    // Wait for profile data to load
    await page.waitForSelector('header section', { timeout: 10000 });
    
    // Extract profile data from the page
    const profileData = await page.evaluate(() => {
      const header = document.querySelector('header section');
      
      // Get follower/following counts
      const stats = header.querySelectorAll('ul li');
      const getCount = (element) => {
        const text = element?.innerText || '0';
        const match = text.match(/[\d,]+/);
        return match ? parseInt(match[0].replace(/,/g, '')) : 0;
      };
      
      return {
        username: document.querySelector('header h2')?.innerText,
        fullName: document.querySelector('header section span')?.innerText,
        bio: document.querySelector('header section > div > span')?.innerText,
        posts: getCount(stats[0]),
        followers: getCount(stats[1]),
        following: getCount(stats[2]),
        profilePic: document.querySelector('header img')?.src,
        isVerified: !!document.querySelector('header svg[aria-label="Verified"]'),
        scrapedAt: new Date().toISOString()
      };
    });
    
    return profileData;
    
  } catch (error) {
    console.error('Scraping failed:', error.message);
    return null;
  } finally {
    await browser.close();
  }
}

// Usage
scrapeInstagramProfile('instagram')
  .then(data => console.log(JSON.stringify(data, null, 2)));

The Problems with DIY Scraping

  1. Rate limiting - Instagram blocks IPs after ~100-200 requests
  2. Login walls - Many pages require authentication
  3. CAPTCHAs - Frequent challenges that break automation
  4. Proxy management - You need rotating residential proxies ($$$)
  5. Constant maintenance - Instagram changes their HTML frequently

Estimated cost: $200-500/month for proxies alone, plus your development time.

Want to avoid these headaches? Learn how to scrape Instagram without getting blocked or skip to Method 3.

Method 2: Instagram's Official API (Graph API)

The "legitimate" approach. Limited but stable.

What You Can Access

With a Facebook Developer account and approved app:

  • Your own account's insights (if business/creator)
  • Basic profile info of users who authorized your app
  • Comments on your own posts
  • Media you've published

What You CAN'T Access

  • Other users' followers/following lists
  • Other users' post engagement
  • Hashtag search results
  • Reels data
  • Comments on others' posts

Setup

// Instagram Graph API - Basic Profile
const accessToken = 'YOUR_ACCESS_TOKEN';
const userId = 'YOUR_USER_ID';

async function getOwnProfile() {
  const response = await fetch(
    `https://graph.instagram.com/${userId}?fields=id,username,account_type,media_count&access_token=${accessToken}`
  );
  
  return response.json();
}

// Get your own media
async function getOwnMedia() {
  const response = await fetch(
    `https://graph.instagram.com/${userId}/media?fields=id,caption,media_type,media_url,timestamp,like_count,comments_count&access_token=${accessToken}`
  );
  
  return response.json();
}

Verdict: Only useful if you need your own account data or are building an app where users log in with Instagram.

For a deeper comparison, see: Instagram Official vs Private API

The practical solution. An API handles all the infrastructure—proxies, rate limits, CAPTCHAs, browser automation—and you just make HTTP requests.

Why Use an API?

DIY ScrapingScraping API
Manage proxies yourselfProxies included
Handle CAPTCHAsCAPTCHAs handled
Fix when Instagram changesAlways maintained
100-200 requests before blocksUnlimited requests
$200-500/month infrastructurePay per request

Getting Instagram Profile Data

const API_KEY = 'your_sociavault_api_key';

async function getInstagramProfile(username) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
      }
    }
  );
  
  const data = await response.json();
  return data;
}

// Usage
const profile = await getInstagramProfile('natgeo');
console.log(profile);

/* Response:
{
  "success": true,
  "data": {
    "username": "natgeo",
    "full_name": "National Geographic",
    "biography": "Experience the world through the eyes of National Geographic photographers.",
    "follower_count": 283000000,
    "following_count": 134,
    "media_count": 28947,
    "is_verified": true,
    "is_business_account": true,
    "profile_pic_url": "https://...",
    "external_url": "https://natgeo.com"
  }
}
*/

Getting Instagram Posts

async function getInstagramPosts(username, limit = 12) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
      }
    }
  );
  
  return response.json();
}

// Get latest 20 posts from a profile
const posts = await getInstagramPosts('nike', 20);

posts.data.forEach(post => {
  console.log({
    type: post.media_type,
    caption: post.caption?.substring(0, 100),
    likes: post.like_count,
    comments: post.comment_count,
    url: post.post_url
  });
});

Scraping Instagram Reels

async function getInstagramReels(username, limit = 10) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/reels?username=${username}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`
      }
    }
  );
  
  return response.json();
}

// Get reels with engagement data
const reels = await getInstagramReels('mrbeast', 10);

reels.data.forEach(reel => {
  console.log({
    views: reel.view_count,
    likes: reel.like_count,
    comments: reel.comment_count,
    duration: reel.duration,
    videoUrl: reel.video_url
  });
});

Getting Post Comments

async function getPostComments(postUrl, limit = 100) {
  const response = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/comments?url=${encodeURIComponent(postUrl)}&limit=${limit}`,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`
      }
    }
  );
  
  return response.json();
}

// Analyze comments on a viral post
const comments = await getPostComments('https://instagram.com/p/ABC123');

// Sentiment analysis example
const positive = comments.data.filter(c => 
  c.text.match(/love|amazing|great|awesome|❤️|🔥|👏/i)
).length;

console.log(`Positive sentiment: ${(positive / comments.data.length * 100).toFixed(1)}%`);

Python Example

import requests

API_KEY = 'your_sociavault_api_key'
BASE_URL = 'https://api.sociavault.com/v1/scrape/instagram'

def get_instagram_profile(username):
    response = requests.get(
        f'{BASE_URL}/profile',
        params={'username': username},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

def get_instagram_posts(username, limit=12):
    response = requests.get(
        f'{BASE_URL}/posts',
        params={'username': username, 'limit': limit},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

def get_hashtag_posts(hashtag, limit=50):
    response = requests.get(
        f'{BASE_URL}/hashtag',
        params={'tag': hashtag, 'limit': limit},
        headers={'Authorization': f'Bearer {API_KEY}'}
    )
    return response.json()

# Usage
profile = get_instagram_profile('cristiano')
print(f"Followers: {profile['data']['follower_count']:,}")

posts = get_instagram_posts('cristiano', 10)
for post in posts['data']:
    print(f"Likes: {post['like_count']:,} | {post['caption'][:50]}...")

Complete Example: Instagram Competitor Analysis

Here's a practical script that scrapes competitor data for analysis:

const API_KEY = process.env.SOCIAVAULT_API_KEY;

async function analyzeCompetitors(competitors) {
  const results = [];
  
  for (const username of competitors) {
    console.log(`Analyzing @${username}...`);
    
    // Get profile data
    const profileRes = await fetch(
      `https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    const profile = await profileRes.json();
    
    // Get recent posts for engagement calculation
    const postsRes = await fetch(
      `https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=12`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    const posts = await postsRes.json();
    
    // Calculate average engagement
    const totalEngagement = posts.data.reduce((sum, post) => {
      return sum + (post.like_count || 0) + (post.comment_count || 0);
    }, 0);
    
    const avgEngagement = totalEngagement / posts.data.length;
    const engagementRate = (avgEngagement / profile.data.follower_count) * 100;
    
    // Get posting frequency
    const dates = posts.data.map(p => new Date(p.timestamp));
    const daysBetween = (dates[0] - dates[dates.length - 1]) / (1000 * 60 * 60 * 24);
    const postsPerWeek = (posts.data.length / daysBetween) * 7;
    
    results.push({
      username: profile.data.username,
      followers: profile.data.follower_count,
      following: profile.data.following_count,
      totalPosts: profile.data.media_count,
      avgLikes: Math.round(posts.data.reduce((s, p) => s + p.like_count, 0) / posts.data.length),
      avgComments: Math.round(posts.data.reduce((s, p) => s + p.comment_count, 0) / posts.data.length),
      engagementRate: engagementRate.toFixed(2) + '%',
      postsPerWeek: postsPerWeek.toFixed(1),
      isVerified: profile.data.is_verified
    });
    
    // Rate limit courtesy
    await new Promise(r => setTimeout(r, 500));
  }
  
  // Sort by engagement rate
  results.sort((a, b) => parseFloat(b.engagementRate) - parseFloat(a.engagementRate));
  
  return results;
}

// Analyze fitness influencers
const competitors = ['kaikifit', 'whitneyysimmons', 'brittany_perille'];

analyzeCompetitors(competitors).then(results => {
  console.table(results);
  
  // Export to CSV
  const csv = [
    Object.keys(results[0]).join(','),
    ...results.map(r => Object.values(r).join(','))
  ].join('\n');
  
  require('fs').writeFileSync('competitor-analysis.csv', csv);
  console.log('Saved to competitor-analysis.csv');
});

Storing Scraped Data

Once you have the data, you need somewhere to put it:

SQLite (Simple)

const Database = require('better-sqlite3');
const db = new Database('instagram_data.db');

// Create tables
db.exec(`
  CREATE TABLE IF NOT EXISTS profiles (
    username TEXT PRIMARY KEY,
    full_name TEXT,
    biography TEXT,
    follower_count INTEGER,
    following_count INTEGER,
    media_count INTEGER,
    is_verified INTEGER,
    scraped_at TEXT
  );
  
  CREATE TABLE IF NOT EXISTS posts (
    post_id TEXT PRIMARY KEY,
    username TEXT,
    caption TEXT,
    like_count INTEGER,
    comment_count INTEGER,
    media_type TEXT,
    timestamp TEXT,
    scraped_at TEXT
  );
`);

// Insert profile
function saveProfile(profile) {
  const stmt = db.prepare(`
    INSERT OR REPLACE INTO profiles 
    VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
  `);
  
  stmt.run(
    profile.username,
    profile.full_name,
    profile.biography,
    profile.follower_count,
    profile.following_count,
    profile.media_count,
    profile.is_verified ? 1 : 0
  );
}

// Insert posts
function savePosts(username, posts) {
  const stmt = db.prepare(`
    INSERT OR REPLACE INTO posts 
    VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
  `);
  
  for (const post of posts) {
    stmt.run(
      post.id,
      username,
      post.caption,
      post.like_count,
      post.comment_count,
      post.media_type,
      post.timestamp
    );
  }
}

Export to Google Sheets

const { google } = require('googleapis');

async function exportToSheets(data, spreadsheetId, range) {
  const auth = new google.auth.GoogleAuth({
    keyFile: 'credentials.json',
    scopes: ['https://www.googleapis.com/auth/spreadsheets']
  });
  
  const sheets = google.sheets({ version: 'v4', auth });
  
  // Convert data to rows
  const headers = Object.keys(data[0]);
  const rows = [headers, ...data.map(item => headers.map(h => item[h]))];
  
  await sheets.spreadsheets.values.update({
    spreadsheetId,
    range,
    valueInputOption: 'RAW',
    resource: { values: rows }
  });
  
  console.log('Data exported to Google Sheets');
}

Scraping public Instagram data is generally legal when you:

  • ✅ Only access publicly available information
  • ✅ Don't bypass authentication or access controls
  • ✅ Respect rate limits and don't overload servers
  • ✅ Don't use data for harassment or spam
  • ✅ Comply with GDPR/CCPA for personal data

Read our full guide: Is Web Scraping Legal?

Which Method Should You Choose?

ScenarioBest Method
Learning/experimentingDIY with Puppeteer
Need your own account dataOfficial Graph API
Production appScraping API
Large-scale data collectionScraping API
One-time researchScraping API

Getting Started

  1. Sign up at sociavault.com
  2. Get 50 free credits to test
  3. Copy your API key from the dashboard
  4. Start scraping with the examples above

Frequently Asked Questions

Yes, scraping publicly available Instagram data is generally legal. Courts have ruled that public data isn't protected by the CFAA. However, you should never bypass login walls or scrape private accounts. See our complete Instagram scraping legal guide.

How do I scrape Instagram followers?

You can scrape Instagram follower lists using an API. Our guide to scraping Instagram followers covers three methods with code examples for exporting follower data.

What's the best Instagram scraping API?

SociaVault is built specifically for social media scraping with Instagram support. See our comparison of the best social media scraping APIs for alternatives.

Can I scrape Instagram Reels?

Yes! You can scrape Instagram Reels including view counts, likes, comments, video URLs, and audio information. The API method shown above handles Reels extraction.


Related guides:

Found this helpful?

Share it with others who might benefit

Ready to Try SociaVault?

Start extracting social media data with our powerful API. No credit card required.