Social Media Scraping: The Complete Guide for 2026
Social media scraping is the automated extraction of public data from platforms like Instagram, TikTok, Twitter, LinkedIn, YouTube, and Facebook.
This guide covers everything: what data you can scrape, legal considerations, methods, tools, and code examples for each major platform.
What is Social Media Scraping?
Social media scraping extracts publicly visible information:
- Profiles - Usernames, bios, follower counts, profile pictures
- Posts - Images, videos, captions, timestamps, engagement metrics
- Comments - Comment text, authors, likes, replies
- Hashtags - Trending tags, post counts, top content
- Search results - Accounts and posts matching keywords
This is the same data anyone can see by visiting a profile or searching a hashtag—just collected automatically instead of manually.
Why Scrape Social Media Data?
| Use Case | What You Collect | Why It Matters |
|---|---|---|
| Influencer Marketing | Follower counts, engagement rates, audience demographics | Find authentic influencers, detect fake followers |
| Competitor Analysis | Posting frequency, content types, engagement patterns | Understand what works in your niche |
| Market Research | Trending topics, sentiment, conversations | Identify opportunities and pain points |
| Lead Generation | Contact info, company data, decision makers | Build targeted prospect lists |
| Brand Monitoring | Mentions, sentiment, reach | Track brand perception in real-time |
| Content Research | Viral posts, trending formats, hashtags | Create content that resonates |
Is Social Media Scraping Legal?
Short answer: Scraping publicly available data is generally legal.
Key legal precedents:
- hiQ Labs v. LinkedIn (2022) - Ninth Circuit ruled that scraping public LinkedIn profiles doesn't violate the CFAA
- Meta v. BrandTotal (2022) - Court ruled that scraping publicly visible data isn't unauthorized access
Stay legal by:
- ✅ Only scraping publicly available data
- ✅ Not bypassing authentication or access controls
- ✅ Respecting robots.txt (recommended but not required)
- ✅ Not overloading servers with requests
- ✅ Complying with GDPR/CCPA for personal data
Avoid:
- ❌ Scraping private/protected content
- ❌ Using scraped data for harassment or spam
- ❌ Selling personal data without consent
- ❌ Creating fake accounts to access data
Read our full legal guide: Is Social Media Scraping Legal?
Methods for Scraping Social Media
Method 1: DIY Browser Automation
Use tools like Puppeteer or Playwright to control a browser:
const puppeteer = require('puppeteer');
async function scrapeProfile(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
// Extract data from page
const data = await page.evaluate(() => {
return {
username: document.querySelector('header h2')?.innerText,
bio: document.querySelector('header span')?.innerText,
// ... more selectors
};
});
await browser.close();
return data;
}
Pros: Full control, free
Cons: Requires proxy management, breaks frequently, high maintenance
Method 2: Official APIs
Each platform has an official API with varying limitations:
| Platform | Free Tier | Paid Tier | Limitations |
|---|---|---|---|
| Twitter/X | 1,500 reads/mo | $100-5000/mo | Heavy restrictions |
| Business accounts only | N/A | No public data access | |
| TikTok | Research API | Limited | Academic use only |
| Very limited | Enterprise | Extremely restricted | |
| YouTube | 10,000 units/day | Pay per use | Relatively open |
Pros: Stable, supported
Cons: Expensive, limited data access, strict quotas
Method 3: Scraping APIs (Recommended)
A scraping API handles all the complexity—proxies, rate limits, CAPTCHAs, browser automation—and gives you clean JSON:
const response = await fetch(
'https://api.sociavault.com/v1/scrape/instagram/profile?username=nike',
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const data = await response.json();
console.log(data.data.follower_count); // 300000000
Pros: Reliable, maintained, scalable, pay per use
Cons: Costs money (but less than building infrastructure yourself)
Platform-by-Platform Guide
Instagram Scraping
Instagram is one of the most scraped platforms. Here's what you can extract:
Available data:
- Profiles (username, bio, followers, following, post count)
- Posts (images, videos, captions, likes, comments)
- Reels (video, views, engagement)
- Stories (public accounts)
- Comments and replies
- Hashtag posts
// Get Instagram profile
const profile = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/profile?username=natgeo`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const data = await profile.json();
console.log({
followers: data.data.follower_count, // 283,000,000
posts: data.data.media_count, // 28,947
isVerified: data.data.is_verified // true
});
// Get Instagram posts
const posts = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/posts?username=natgeo&limit=12`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const postsData = await posts.json();
postsData.data.forEach(post => {
console.log({
likes: post.like_count,
comments: post.comment_count,
caption: post.caption?.substring(0, 50)
});
});
Related: How to Scrape Instagram Data
TikTok Scraping
TikTok has become essential for trend research and influencer marketing:
Available data:
- User profiles (followers, likes, videos, bio)
- Videos (views, likes, comments, shares, sounds)
- Comments and replies
- Hashtag videos
- Search results
// Get TikTok profile
const profile = await fetch(
`https://api.sociavault.com/v1/scrape/tiktok/profile?username=khaby.lame`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const data = await profile.json();
console.log({
followers: data.data.follower_count, // 162,000,000
likes: data.data.heart_count, // 2,400,000,000
videos: data.data.video_count // 1,200
});
// Get TikTok videos
const videos = await fetch(
`https://api.sociavault.com/v1/scrape/tiktok/videos?username=khaby.lame&limit=20`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const videosData = await videos.json();
videosData.data.forEach(video => {
console.log({
views: video.play_count,
likes: video.digg_count,
shares: video.share_count,
description: video.desc?.substring(0, 50)
});
});
Related: Extract TikTok Data
Twitter/X Scraping
Twitter is valuable for real-time sentiment and trend analysis:
Available data:
- Profiles (followers, following, tweet count, bio)
- Tweets (text, likes, retweets, replies, media)
- Search results
- Followers and following lists
- Trending topics
// Get Twitter profile
const profile = await fetch(
`https://api.sociavault.com/v1/scrape/twitter/profile?username=elonmusk`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
// Search Twitter
const search = await fetch(
`https://api.sociavault.com/v1/scrape/twitter/search?q=artificial intelligence&limit=100`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
Related: Twitter Scraping API
LinkedIn Scraping
LinkedIn is goldmine for B2B data:
Available data:
- Personal profiles (name, headline, experience, education)
- Company pages (employees, industry, size, posts)
- Job listings
- Posts and articles
// Get LinkedIn profile
const profile = await fetch(
`https://api.sociavault.com/v1/scrape/linkedin/profile?url=${encodeURIComponent(profileUrl)}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
// Get LinkedIn company
const company = await fetch(
`https://api.sociavault.com/v1/scrape/linkedin/company?url=${encodeURIComponent(companyUrl)}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
Related: LinkedIn Profile Scraper Guide
YouTube Scraping
YouTube data is valuable for content research:
Available data:
- Channel info (subscribers, videos, views, description)
- Videos (views, likes, comments, duration, description)
- Comments and replies
- Search results
- Transcripts/captions
// Get YouTube channel
const channel = await fetch(
`https://api.sociavault.com/v1/scrape/youtube/channel?handle=MrBeast`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
// Get video comments
const comments = await fetch(
`https://api.sociavault.com/v1/scrape/youtube/comments?videoId=dQw4w9WgXcQ&limit=100`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
Related: YouTube Channel Scraper
Facebook Scraping
Facebook's data is harder to access but still valuable:
Available data:
- Page info (followers, likes, about, posts)
- Public posts and engagement
- Comments on public posts
- Group info (public groups)
// Get Facebook page
const page = await fetch(
`https://api.sociavault.com/v1/scrape/facebook/page?url=${encodeURIComponent(pageUrl)}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
Related: Facebook Pages Data Extraction
Complete Example: Multi-Platform Analysis
Here's a real-world example that analyzes a brand across platforms:
const API_KEY = process.env.SOCIAVAULT_API_KEY;
async function analyzeBrandPresence(brand) {
const results = {};
// Instagram
try {
const igRes = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/profile?username=${brand.instagram}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const ig = await igRes.json();
results.instagram = {
followers: ig.data.follower_count,
posts: ig.data.media_count,
engagement: await calculateInstagramEngagement(brand.instagram)
};
} catch (e) {
results.instagram = { error: e.message };
}
// TikTok
try {
const ttRes = await fetch(
`https://api.sociavault.com/v1/scrape/tiktok/profile?username=${brand.tiktok}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const tt = await ttRes.json();
results.tiktok = {
followers: tt.data.follower_count,
likes: tt.data.heart_count,
videos: tt.data.video_count
};
} catch (e) {
results.tiktok = { error: e.message };
}
// Twitter
try {
const twRes = await fetch(
`https://api.sociavault.com/v1/scrape/twitter/profile?username=${brand.twitter}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const tw = await twRes.json();
results.twitter = {
followers: tw.data.followers_count,
tweets: tw.data.tweet_count
};
} catch (e) {
results.twitter = { error: e.message };
}
// YouTube
try {
const ytRes = await fetch(
`https://api.sociavault.com/v1/scrape/youtube/channel?handle=${brand.youtube}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const yt = await ytRes.json();
results.youtube = {
subscribers: yt.data.subscriber_count,
videos: yt.data.video_count,
views: yt.data.view_count
};
} catch (e) {
results.youtube = { error: e.message };
}
// Calculate totals
results.totalFollowers =
(results.instagram?.followers || 0) +
(results.tiktok?.followers || 0) +
(results.twitter?.followers || 0) +
(results.youtube?.subscribers || 0);
return results;
}
async function calculateInstagramEngagement(username) {
const postsRes = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/posts?username=${username}&limit=12`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const posts = await postsRes.json();
const totalEngagement = posts.data.reduce((sum, post) => {
return sum + (post.like_count || 0) + (post.comment_count || 0);
}, 0);
const profileRes = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/profile?username=${username}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
const profile = await profileRes.json();
const avgEngagement = totalEngagement / posts.data.length;
const rate = (avgEngagement / profile.data.follower_count) * 100;
return rate.toFixed(2) + '%';
}
// Usage
const brand = {
instagram: 'nike',
tiktok: 'nike',
twitter: 'Nike',
youtube: 'nike'
};
analyzeBrandPresence(brand).then(results => {
console.log('Brand Analysis Results:');
console.log(JSON.stringify(results, null, 2));
});
Storing Scraped Data
SQLite (Simple Projects)
const Database = require('better-sqlite3');
const db = new Database('social_data.db');
db.exec(`
CREATE TABLE IF NOT EXISTS profiles (
platform TEXT,
username TEXT,
follower_count INTEGER,
data JSON,
scraped_at TEXT,
PRIMARY KEY (platform, username)
)
`);
function saveProfile(platform, username, data) {
const stmt = db.prepare(`
INSERT OR REPLACE INTO profiles VALUES (?, ?, ?, ?, datetime('now'))
`);
stmt.run(platform, username, data.follower_count, JSON.stringify(data));
}
PostgreSQL (Production)
const { Pool } = require('pg');
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function saveProfile(platform, username, data) {
await pool.query(`
INSERT INTO profiles (platform, username, follower_count, data, scraped_at)
VALUES ($1, $2, $3, $4, NOW())
ON CONFLICT (platform, username)
DO UPDATE SET follower_count = $3, data = $4, scraped_at = NOW()
`, [platform, username, data.follower_count, JSON.stringify(data)]);
}
Export to CSV
const fs = require('fs');
function exportToCSV(data, filename) {
const headers = Object.keys(data[0]);
const rows = data.map(item =>
headers.map(h => JSON.stringify(item[h] ?? '')).join(',')
);
const csv = [headers.join(','), ...rows].join('\n');
fs.writeFileSync(filename, csv);
}
Best Practices
1. Rate Limiting
Don't hammer APIs. Add delays between requests:
async function scrapeWithDelay(urls, delayMs = 500) {
const results = [];
for (const url of urls) {
const data = await scrape(url);
results.push(data);
await new Promise(r => setTimeout(r, delayMs));
}
return results;
}
2. Error Handling
Implement retry logic for transient failures:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
// Rate limited - exponential backoff
await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
continue;
}
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(r => setTimeout(r, 1000));
}
}
}
3. Caching
Don't re-scrape data you already have:
const cache = new Map();
async function getCached(key, fetchFn, ttlMs = 3600000) {
if (cache.has(key)) {
const { data, timestamp } = cache.get(key);
if (Date.now() - timestamp < ttlMs) return data;
}
const data = await fetchFn();
cache.set(key, { data, timestamp: Date.now() });
return data;
}
4. Data Validation
Verify scraped data is valid:
function validateProfile(data) {
const required = ['username', 'follower_count'];
for (const field of required) {
if (data[field] === undefined || data[field] === null) {
throw new Error(`Missing required field: ${field}`);
}
}
if (typeof data.follower_count !== 'number' || data.follower_count < 0) {
throw new Error('Invalid follower_count');
}
return true;
}
Frequently Asked Questions
Is it legal to scrape social media?
Yes, scraping publicly available data from social media is generally legal. The hiQ Labs v. LinkedIn case (2022) established that scraping public profiles doesn't violate the Computer Fraud and Abuse Act. However, you must only scrape public data, respect platform terms when possible, and comply with data protection laws like GDPR.
What's the best way to scrape social media data?
APIs are the most reliable method. While DIY browser automation is free, it requires constant maintenance as platforms change their layouts. Official APIs have rate limits and restrictions. Third-party APIs like SociaVault offer the best balance—reliable access to comprehensive data without the maintenance overhead.
Can I scrape Instagram without getting blocked?
Yes, by using residential proxies, rotating user agents, adding random delays between requests, and respecting rate limits. Alternatively, use an API service that handles all anti-detection measures for you, ensuring consistent access without the technical complexity.
How much does it cost to scrape social media?
DIY solutions are free but require significant development and maintenance time. Official APIs range from free (limited) to thousands per month. Third-party scraping APIs like SociaVault start at $20/month for 100,000 credits, making them affordable for most use cases while avoiding technical overhead.
What data can I scrape from TikTok?
You can scrape public profile data (username, bio, follower count), video metadata (views, likes, comments, shares), video transcripts, comments, trending sounds, and hashtag data. All of this is publicly visible information that can be collected programmatically.
Do I need coding skills to scrape social media?
Not necessarily. While custom scraping requires programming knowledge (Python or JavaScript), many tools offer no-code interfaces. However, basic API knowledge helps you integrate scraped data into your workflows and applications effectively.
How do I avoid getting my IP banned when scraping?
Use rotating residential proxies, implement exponential backoff for retries, randomize delays between requests, limit concurrent requests, and respect rate limits. Most scraping APIs handle this automatically, so you don't need to manage it yourself.
Can I scrape private social media accounts?
No. Scraping private accounts violates both platform terms and potentially computer fraud laws. Only scrape publicly accessible data that anyone can view without authentication. Attempting to bypass privacy settings is illegal.
How fast can I scrape social media data?
Speed depends on your method. DIY browser automation: 5-10 requests/minute safely. Official APIs: Varies widely by platform. Third-party APIs: 100+ requests/minute with proper infrastructure. Rate limits exist to prevent server overload and detection.
What's the difference between scraping and using official APIs?
Official APIs provide structured, approved access but have strict rate limits and often exclude valuable data. Scraping accesses the same public data users see but requires technical setup. Third-party scraping APIs combine the reliability of APIs with comprehensive data access.
Getting Started
- Sign up at sociavault.com/auth/sign-up
- Get 50 free credits to test
- Explore the API with our documentation
- Start scraping with the examples above
Platform-specific guides:
Found this helpful?
Share it with others who might benefit
Ready to Try SociaVault?
Start extracting social media data with our powerful API. No credit card required.