Back to Blog
General

Instagram Transcript API: Extract Speech-to-Text from Reels & Videos

February 5, 2026
4 min read
S
By SociaVault Team
instagramtranscriptapispeech-to-text

Instagram Transcript API: Convert Video Audio to Text

Need to extract what's being said in Instagram Reels and videos? The Transcript API provides AI-powered speech-to-text conversion for any Instagram video content.

Why Use Video Transcripts?

  • Content Analysis - Analyze spoken content at scale
  • SEO Optimization - Make video content searchable
  • Accessibility - Create captions for hearing-impaired users
  • Content Repurposing - Turn videos into blog posts or quotes
  • Sentiment Analysis - Analyze spoken tone and messaging
  • Brand Monitoring - Track verbal brand mentions

Using the Transcript API

const response = await fetch('https://api.sociavault.com/instagram/transcript', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.instagram.com/reel/ABC123xyz/'
  })
});

const transcript = await response.json();

Sample Response

{
  "postId": "3123456789012345678",
  "url": "https://www.instagram.com/reel/ABC123xyz/",
  "duration": 30,
  "transcript": {
    "text": "Hey everyone! Today I'm going to show you how to make the perfect smoothie bowl. First, you need frozen bananas, some berries, and a splash of almond milk...",
    "segments": [
      {
        "start": 0.0,
        "end": 2.5,
        "text": "Hey everyone!"
      },
      {
        "start": 2.5,
        "end": 7.2,
        "text": "Today I'm going to show you how to make the perfect smoothie bowl."
      },
      {
        "start": 7.2,
        "end": 12.8,
        "text": "First, you need frozen bananas, some berries, and a splash of almond milk..."
      }
    ],
    "language": "en",
    "confidence": 0.95
  }
}

Use Cases

Content Analysis at Scale

Analyze what creators are talking about:

const reelUrls = ['url1', 'url2', 'url3'];

const transcripts = await Promise.all(
  reelUrls.map(url => getTranscript(url))
);

// Find common topics
const allText = transcripts.map(t => t.transcript.text).join(' ');
const wordFrequency = analyzeWordFrequency(allText);

Brand Mention Detection

Find video mentions of your brand:

async function checkBrandMention(reelUrl, brandKeywords) {
  const { transcript } = await getTranscript(reelUrl);
  
  const mentioned = brandKeywords.some(keyword =>
    transcript.text.toLowerCase().includes(keyword.toLowerCase())
  );
  
  return mentioned;
}

// Check multiple videos
const reels = await getProfileReels('influencer');
for (const reel of reels) {
  const mentions = await checkBrandMention(reel.url, ['YourBrand', 'your brand']);
  if (mentions) {
    console.log(`Brand mention found: ${reel.url}`);
  }
}

Create Video Summaries

Use transcripts to generate content summaries:

async function summarizeVideo(url) {
  const { transcript } = await getTranscript(url);
  
  // Extract key points (simplified example)
  const sentences = transcript.text.split(/[.!?]+/);
  const summary = sentences.slice(0, 3).join('. ');
  
  return summary;
}

Generate Captions

Create accessibility captions from transcripts:

async function generateCaptions(url) {
  const { transcript } = await getTranscript(url);
  
  // Format as SRT
  const srt = transcript.segments.map((seg, i) => {
    const startTime = formatTime(seg.start);
    const endTime = formatTime(seg.end);
    
    return `${i + 1}
${startTime} --> ${endTime}
${seg.text}
`;
  }).join('\n');
  
  return srt;
}

function formatTime(seconds) {
  const h = Math.floor(seconds / 3600);
  const m = Math.floor((seconds % 3600) / 60);
  const s = Math.floor(seconds % 60);
  const ms = Math.floor((seconds % 1) * 1000);
  
  return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}

Sentiment Analysis

Analyze tone of spoken content:

async function analyzeVideoSentiment(url) {
  const { transcript } = await getTranscript(url);
  
  // Simple sentiment keywords (use proper NLP in production)
  const positiveWords = ['amazing', 'love', 'great', 'awesome', 'perfect'];
  const negativeWords = ['bad', 'hate', 'terrible', 'worst', 'awful'];
  
  const words = transcript.text.toLowerCase().split(/\s+/);
  const positive = words.filter(w => positiveWords.includes(w)).length;
  const negative = words.filter(w => negativeWords.includes(w)).length;
  
  return {
    sentiment: positive > negative ? 'positive' : negative > positive ? 'negative' : 'neutral',
    positiveCount: positive,
    negativeCount: negative
  };
}

Frequently Asked Questions

How accurate are the transcripts?

Transcripts typically achieve 90-95% accuracy for clear speech in English. Accuracy may vary with background music, multiple speakers, or non-English content.

What languages are supported?

The API auto-detects and transcribes content in 50+ languages. The language field in the response indicates the detected language.

How long does transcription take?

Most videos are transcribed within 10-30 seconds. Longer videos may take proportionally more time.

Can I get transcripts for any video length?

Yes, the API handles videos of any length. Very long videos (10+ minutes) are processed in segments.

Are timestamps included?

Yes, the segments array includes start and end timestamps for each sentence/phrase, enabling caption generation.

What if the video has no speech?

Videos without detectable speech return an empty transcript with a note indicating no speech was detected.

Get Started

Create your account and start extracting Instagram video transcripts.

Documentation: /docs/api-reference/instagram/transcript

Found this helpful?

Share it with others who might benefit

Ready to Try SociaVault?

Start extracting social media data with our powerful API. No credit card required.