Back to Blog
General

Instagram Transcript API: Extract Speech-to-Text from Reels & Videos

February 5, 2026
4 min read
S
By SociaVault Team
instagramtranscriptapispeech-to-text

Instagram Transcript API: Convert Video Audio to Text

Need to extract what's being said in Instagram Reels and videos? The Transcript API provides AI-powered speech-to-text conversion for any Instagram video content.

Why Use Video Transcripts?

  • Content Analysis - Analyze spoken content at scale
  • SEO Optimization - Make video content searchable
  • Accessibility - Create captions for hearing-impaired users
  • Content Repurposing - Turn videos into blog posts or quotes
  • Sentiment Analysis - Analyze spoken tone and messaging
  • Brand Monitoring - Track verbal brand mentions

Using the Transcript API

const response = await fetch(
  'https://api.sociavault.com/v1/scrape/instagram/transcript?url=' +
    encodeURIComponent('https://www.instagram.com/reel/ABC123xyz/'),
  {
    method: 'GET',
    headers: {
      'x-api-key': 'YOUR_API_KEY'
    }
  }
);

const result = await response.json();

Sample Response

The API runs the video through AI for transcription, so expect responses in 10-30 seconds. For carousel posts, a transcript is returned for each item.

{
  "success": true,
  "data": {
    "success": true,
    "transcripts": {
      "0": {
        "id": "3675185678007938307",
        "shortcode": "DMA4eb1RC0D",
        "text": "Hey everyone! Today I'm going to show you how to make the perfect smoothie bowl. First, you need frozen bananas, some berries, and a splash of almond milk."
      }
    }
  },
  "credits_used": 1,
  "endpoint": "instagram/transcript",
  "note": "AI-powered transcript (can take 10-30 seconds)"
}

The transcripts object is keyed by index ("0", "1", etc.) — one entry per video in the post. Each transcript includes the post id, shortcode, and the full text of spoken content.

Use Cases

Content Analysis at Scale

Analyze what creators are talking about:

async function getTranscript(url) {
  const res = await fetch(
    `https://api.sociavault.com/v1/scrape/instagram/transcript?url=${encodeURIComponent(url)}`,
    { headers: { 'x-api-key': 'YOUR_API_KEY' } }
  );
  return res.json();
}

const reelUrls = ['url1', 'url2', 'url3'];

const results = await Promise.all(reelUrls.map(url => getTranscript(url)));

// Combine all transcript text
const allText = results
  .map(r => Object.values(r.data.transcripts).map(t => t.text).join(' '))
  .join(' ');

const wordFrequency = analyzeWordFrequency(allText);

Brand Mention Detection

Find video mentions of your brand:

async function checkBrandMention(reelUrl, brandKeywords) {
  const result = await getTranscript(reelUrl);
  const transcripts = Object.values(result.data.transcripts);

  const allText = transcripts.map(t => t.text).join(' ').toLowerCase();

  return brandKeywords.some(keyword =>
    allText.includes(keyword.toLowerCase())
  );
}

// Check multiple videos
const reels = ['https://instagram.com/reel/ABC/', 'https://instagram.com/reel/XYZ/'];
for (const url of reels) {
  const mentioned = await checkBrandMention(url, ['YourBrand', 'your brand']);
  if (mentioned) {
    console.log(`Brand mention found: ${url}`);
  }
}

Create Video Summaries

Use transcripts to generate content summaries:

async function summarizeVideo(url) {
  const result = await getTranscript(url);
  const transcripts = Object.values(result.data.transcripts);

  // Get the full text from the first video
  const fullText = transcripts[0]?.text || '';

  // Extract key points (simplified example)
  const sentences = fullText.split(/[.!?]+/).filter(Boolean);
  const summary = sentences.slice(0, 3).join('. ');

  return summary;
}

Sentiment Analysis

Analyze tone of spoken content:

async function analyzeVideoSentiment(url) {
  const result = await getTranscript(url);
  const transcripts = Object.values(result.data.transcripts);
  const fullText = transcripts.map(t => t.text).join(' ');

  // Simple sentiment keywords (use proper NLP in production)
  const positiveWords = ['amazing', 'love', 'great', 'awesome', 'perfect'];
  const negativeWords = ['bad', 'hate', 'terrible', 'worst', 'awful'];

  const words = fullText.toLowerCase().split(/\s+/);
  const positive = words.filter(w => positiveWords.includes(w)).length;
  const negative = words.filter(w => negativeWords.includes(w)).length;

  return {
    sentiment: positive > negative ? 'positive' : negative > positive ? 'negative' : 'neutral',
    positiveCount: positive,
    negativeCount: negative
  };
}

Handle carousel posts that contain multiple videos:

async function processCarousel(url) {
  const result = await getTranscript(url);
  const transcripts = result.data.transcripts;

  for (const [index, transcript] of Object.entries(transcripts)) {
    console.log(`Video ${Number(index) + 1} (${transcript.shortcode}):`);
    console.log(transcript.text);
    console.log('---');
  }

  return Object.values(transcripts).map(t => t.text);
}

Frequently Asked Questions

How accurate are the transcripts?

Transcripts are AI-powered and typically achieve high accuracy for clear speech. Accuracy may vary with heavy background music, multiple overlapping speakers, or very low audio quality.

How long does transcription take?

Expect 10-30 seconds per video since each request runs through AI processing. Plan your timeouts accordingly.

Yes. For carousel posts containing multiple videos, the API returns a transcript for each video item, keyed by index ("0", "1", "2", etc.) in the transcripts object.

What if the video has no speech?

If no speech is detected in the video, the transcript will return null.

How many credits does it cost?

Each transcript request costs 1 credit, regardless of video length or the number of items in a carousel.

Get Started

Create your account and start extracting Instagram video transcripts.

Documentation: /docs/api-reference/instagram/transcript

Found this helpful?

Share it with others who might benefit

Ready to Try SociaVault?

Start extracting social media data with our powerful API. No credit card required.