Social media platforms hold massive amounts of data. Profiles, posts, comments, followers, engagement metrics—valuable data for marketing, research, and product development.

But getting that data? That's where it gets complicated.

This guide covers everything: what data exists, how to get it, legal considerations, and practical code examples.

What Data Can You Extract?

User Profiles

Every platform stores:

Basic info: Username, display name, bio, profile picture
Metrics: Followers, following, post count
Verification: Blue checkmarks, business accounts
Metadata: Account creation date, location, links

Content

Posts/Videos: Text, media URLs, captions
Engagement: Likes, comments, shares, views
Timestamps: When content was posted
Hashtags/Mentions: Tags and user mentions

Engagement Data

Comments: Text, author, timestamp, replies
Reactions: Like types, emoji reactions
Shares/Reposts: Who shared, when

Network Data

Followers: List of accounts following a user
Following: List of accounts a user follows
Connections: Mutual follows, relationships

Platform-by-Platform Breakdown

TikTok

Data Type	Availability	Method
Profiles	✅ Easy	API
Videos	✅ Easy	API
Comments	✅ Easy	API
Followers	⚠️ Limited	API (first 200)
Analytics	❌ Private	Business API only

// Get TikTok profile
const profile = await fetch(
  'https://api.sociavault.com/v1/scrape/tiktok/profile?username=charlidamelio',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

console.log({
  username: profile.data.username,
  followers: profile.data.follower_count,
  likes: profile.data.like_count,
  videos: profile.data.video_count
});

Instagram

Data Type	Availability	Method
Public profiles	✅ Easy	API
Public posts	✅ Easy	API
Reels	✅ Easy	API
Comments	✅ Easy	API
Stories	⚠️ Limited	Requires login
Private accounts	❌ No	Not accessible

// Get Instagram profile
const profile = await fetch(
  'https://api.sociavault.com/v1/scrape/instagram/profile?username=natgeo',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

// Get recent posts
const posts = await fetch(
  'https://api.sociavault.com/v1/scrape/instagram/posts?username=natgeo&count=20',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

YouTube

Data Type	Availability	Method
Channels	✅ Easy	API
Videos	✅ Easy	API
Comments	✅ Easy	API
Transcripts	✅ Easy	API
Analytics	⚠️ Limited	Creator Studio only

// Get YouTube channel
const channel = await fetch(
  'https://api.sociavault.com/v1/scrape/youtube/channel?channelId=UCX6OQ3DkcsbYNE6H8uQQuVA',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

// Get video transcript (great for AI/RAG)
const transcript = await fetch(
  'https://api.sociavault.com/v1/scrape/youtube/transcript?videoId=dQw4w9WgXcQ',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

Twitter/X

Data Type	Availability	Method
Profiles	✅ Available	API
Tweets	✅ Available	API
Replies	✅ Available	API
Followers	⚠️ Limited	Paginated
Analytics	❌ No	Not accessible

// Get Twitter user
const user = await fetch(
  'https://api.sociavault.com/v1/scrape/twitter/user?username=elonmusk',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

// Search tweets
const tweets = await fetch(
  'https://api.sociavault.com/v1/scrape/twitter/search?query=AI startups&count=50',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

Data Type	Availability	Method
Public profiles	⚠️ Limited	API
Companies	✅ Available	API
Posts	⚠️ Limited	API
Connections	❌ No	Private

Data Type	Availability	Method
Profiles	✅ Easy	API
Posts	✅ Easy	API
Comments	✅ Easy	API
Subreddits	✅ Easy	API
Upvotes	✅ Easy	API

// Get subreddit posts
const posts = await fetch(
  'https://api.sociavault.com/v1/scrape/reddit/posts?subreddit=programming&sort=hot&count=50',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());

Extraction Methods

1. Official APIs

Pros:

Legal and sanctioned
Stable endpoints
Good documentation

Cons:

Expensive (Twitter: $100/mo+)
Limited access
Strict rate limits
Long approval processes

2. Third-Party APIs (Recommended)

Pros:

One API for all platforms
No approval wait
Affordable pricing
Handles complexity for you

Cons:

Costs per request
Dependent on provider

// One API for everything
const platforms = ['tiktok', 'instagram', 'youtube', 'twitter'];

const profiles = await Promise.all(
  platforms.map(platform =>
    fetch(`https://api.sociavault.com/v1/scrape/${platform}/profile?username=creator123`, {
      headers: { 'Authorization': `Bearer ${API_KEY}` }
    }).then(r => r.json())
  )
);

3. Web Scraping

Pros:

Full control
No API costs

Cons:

Breaks constantly
Legal gray area
Resource intensive
Requires maintenance

See our Web Scraping vs API comparison.

4. Browser Extensions

Pros:

Visual interface
Works with your session

Cons:

Manual process
Doesn't scale
Limited features

Python Implementation

import os
import requests
from typing import Dict, List, Optional
from dataclasses import dataclass
from datetime import datetime

API_KEY = os.getenv('SOCIAVAULT_API_KEY')
API_BASE = 'https://api.sociavault.com/v1/scrape'

@dataclass
class Profile:
    platform: str
    username: str
    name: str
    followers: int
    following: int
    posts: int
    bio: str
    avatar_url: str
    
class SocialDataExtractor:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {'Authorization': f'Bearer {api_key}'}
    
    def get_profile(self, platform: str, username: str) -> Profile:
        response = requests.get(
            f'{API_BASE}/{platform}/profile',
            params={'username': username},
            headers=self.headers
        )
        response.raise_for_status()
        data = response.json()['data']
        
        return Profile(
            platform=platform,
            username=username,
            name=data.get('nickname') or data.get('full_name') or data.get('name', ''),
            followers=data.get('follower_count') or data.get('followers', 0),
            following=data.get('following_count') or data.get('following', 0),
            posts=data.get('video_count') or data.get('posts_count', 0),
            bio=data.get('bio') or data.get('description', ''),
            avatar_url=data.get('avatar_url') or data.get('profile_pic_url', '')
        )
    
    def get_posts(self, platform: str, username: str, count: int = 20) -> List[Dict]:
        endpoint = 'videos' if platform == 'tiktok' else 'posts'
        
        response = requests.get(
            f'{API_BASE}/{platform}/{endpoint}',
            params={'username': username, 'count': count},
            headers=self.headers
        )
        response.raise_for_status()
        return response.json()['data'].get('posts') or response.json()['data'].get('videos', [])
    
    def search(self, platform: str, query: str, count: int = 50) -> List[Dict]:
        response = requests.get(
            f'{API_BASE}/{platform}/search',
            params={'query': query, 'count': count},
            headers=self.headers
        )
        response.raise_for_status()
        return response.json()['data']

# Usage
extractor = SocialDataExtractor(API_KEY)

# Get profile
profile = extractor.get_profile('tiktok', 'charlidamelio')
print(f"{profile.name}: {profile.followers:,} followers")

# Get recent posts
posts = extractor.get_posts('instagram', 'natgeo', count=10)
for post in posts:
    print(f"- {post['like_count']:,} likes: {post['caption'][:50]}...")

JavaScript/TypeScript Implementation

interface Profile {
  platform: string;
  username: string;
  name: string;
  followers: number;
  following: number;
  posts: number;
  bio: string;
  avatarUrl: string;
}

interface Post {
  id: string;
  caption: string;
  likeCount: number;
  commentCount: number;
  timestamp: string;
  mediaUrl: string;
}

class SocialDataExtractor {
  private apiKey: string;
  private baseUrl = 'https://api.sociavault.com/v1/scrape';
  
  constructor(apiKey: string) {
    this.apiKey = apiKey;
  }
  
  private async fetch<T>(endpoint: string, params: Record<string, string>): Promise<T> {
    const url = new URL(`${this.baseUrl}${endpoint}`);
    Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
    
    const response = await fetch(url.toString(), {
      headers: { 'Authorization': `Bearer ${this.apiKey}` }
    });
    
    if (!response.ok) {
      throw new Error(`API error: ${response.status}`);
    }
    
    const json = await response.json();
    return json.data;
  }
  
  async getProfile(platform: string, username: string): Promise<Profile> {
    const data = await this.fetch<any>(`/${platform}/profile`, { username });
    
    return {
      platform,
      username,
      name: data.nickname || data.full_name || data.name || '',
      followers: data.follower_count || data.followers || 0,
      following: data.following_count || data.following || 0,
      posts: data.video_count || data.posts_count || 0,
      bio: data.bio || data.description || '',
      avatarUrl: data.avatar_url || data.profile_pic_url || ''
    };
  }
  
  async getPosts(platform: string, username: string, count = 20): Promise<Post[]> {
    const endpoint = platform === 'tiktok' ? '/videos' : '/posts';
    const data = await this.fetch<any>(`/${platform}${endpoint}`, {
      username,
      count: count.toString()
    });
    
    return (data.posts || data.videos || []).map((post: any) => ({
      id: post.id || post.post_id,
      caption: post.caption || post.description || '',
      likeCount: post.like_count || post.likes || 0,
      commentCount: post.comment_count || post.comments || 0,
      timestamp: post.timestamp || post.created_at,
      mediaUrl: post.url || post.media_url || ''
    }));
  }
  
  async search(platform: string, query: string, count = 50): Promise<any[]> {
    return this.fetch(`/${platform}/search`, { query, count: count.toString() });
  }
}

// Usage
const extractor = new SocialDataExtractor(process.env.SOCIAVAULT_API_KEY!);

const profile = await extractor.getProfile('tiktok', 'charlidamelio');
console.log(`${profile.name}: ${profile.followers.toLocaleString()} followers`);

Common Use Cases

1. Influencer Marketing

Find and vet creators:

def analyze_influencer(username, platforms=['tiktok', 'instagram']):
    results = {}
    
    for platform in platforms:
        profile = extractor.get_profile(platform, username)
        posts = extractor.get_posts(platform, username, count=30)
        
        avg_engagement = sum(p['like_count'] + p['comment_count'] for p in posts) / len(posts)
        engagement_rate = (avg_engagement / profile.followers) * 100 if profile.followers > 0 else 0
        
        results[platform] = {
            'followers': profile.followers,
            'engagement_rate': round(engagement_rate, 2),
            'posting_frequency': calculate_posting_frequency(posts),
            'top_content': get_top_posts(posts, 3)
        }
    
    return results

2. Market Research

Monitor industry trends:

def track_hashtag(hashtag, platform='tiktok', days=7):
    posts = extractor.get_hashtag_posts(platform, hashtag, count=500)
    
    return {
        'total_posts': len(posts),
        'total_views': sum(p.get('view_count', 0) for p in posts),
        'avg_engagement': calculate_avg_engagement(posts),
        'top_creators': get_top_creators(posts),
        'trending_sounds': extract_sounds(posts),
        'peak_posting_times': analyze_timestamps(posts)
    }

3. Competitor Analysis

Compare social performance:

def compare_competitors(usernames):
    results = []
    
    for username in usernames:
        data = {
            'username': username,
            'platforms': {}
        }
        
        for platform in ['tiktok', 'instagram', 'youtube']:
            try:
                profile = extractor.get_profile(platform, username)
                data['platforms'][platform] = {
                    'followers': profile.followers,
                    'posts': profile.posts
                }
            except:
                data['platforms'][platform] = None
        
        results.append(data)
    
    return sorted(results, key=lambda x: sum(
        p['followers'] for p in x['platforms'].values() if p
    ), reverse=True)

4. Content Research

Find what works:

def analyze_top_content(username, platform='instagram'):
    posts = extractor.get_posts(platform, username, count=100)
    
    sorted_posts = sorted(posts, key=lambda p: p['like_count'], reverse=True)
    
    top_posts = sorted_posts[:10]
    
    return {
        'top_posts': top_posts,
        'common_themes': extract_themes(top_posts),
        'optimal_length': avg_caption_length(top_posts),
        'best_hashtags': get_common_hashtags(top_posts),
        'best_posting_times': get_posting_times(top_posts)
    }

Legal Considerations

What's Generally OK

Public data (no login required)
Personal/internal use
Research with consent
Aggregate, anonymized data

What's Risky

Private/protected data
Violating ToS at scale
Reselling personal data
Scraping after cease & desist

Best Practices

Respect robots.txt - At least read it
Don't scrape private data - Stick to public info
Rate limit requests - Don't hammer servers
Store responsibly - Follow GDPR/CCPA
Use APIs when available - Safer legally

Getting Started

Sign up at sociavault.com
Get 50 free credits - No credit card required
Test in playground - Try endpoints at dashboard/playground
Build your integration - Use the code examples above

Ready to extract social media data?

Get started at sociavault.com.

Related:

Social Media Data Extraction: The Complete Guide 2025

What Data Can You Extract?

User Profiles

Content

Engagement Data

Network Data

Platform-by-Platform Breakdown

TikTok

Instagram

YouTube

Twitter/X

LinkedIn

Reddit

Extraction Methods

1. Official APIs

2. Third-Party APIs (Recommended)

3. Web Scraping

4. Browser Extensions

Python Implementation

JavaScript/TypeScript Implementation

Common Use Cases

1. Influencer Marketing

2. Market Research

3. Competitor Analysis

4. Content Research

Legal Considerations

What's Generally OK

What's Risky

Best Practices

Getting Started

Found this helpful?

Ready to Try SociaVault?