Instagram Transcript API: Convert Video Audio to Text
Need to extract what's being said in Instagram Reels and videos? The Transcript API provides AI-powered speech-to-text conversion for any Instagram video content.
Why Use Video Transcripts?
- Content Analysis - Analyze spoken content at scale
- SEO Optimization - Make video content searchable
- Accessibility - Create captions for hearing-impaired users
- Content Repurposing - Turn videos into blog posts or quotes
- Sentiment Analysis - Analyze spoken tone and messaging
- Brand Monitoring - Track verbal brand mentions
Using the Transcript API
const response = await fetch(
'https://api.sociavault.com/v1/scrape/instagram/transcript?url=' +
encodeURIComponent('https://www.instagram.com/reel/ABC123xyz/'),
{
method: 'GET',
headers: {
'x-api-key': 'YOUR_API_KEY'
}
}
);
const result = await response.json();
Sample Response
The API runs the video through AI for transcription, so expect responses in 10-30 seconds. For carousel posts, a transcript is returned for each item.
{
"success": true,
"data": {
"success": true,
"transcripts": {
"0": {
"id": "3675185678007938307",
"shortcode": "DMA4eb1RC0D",
"text": "Hey everyone! Today I'm going to show you how to make the perfect smoothie bowl. First, you need frozen bananas, some berries, and a splash of almond milk."
}
}
},
"credits_used": 1,
"endpoint": "instagram/transcript",
"note": "AI-powered transcript (can take 10-30 seconds)"
}
The transcripts object is keyed by index ("0", "1", etc.) — one entry per video in the post. Each transcript includes the post id, shortcode, and the full text of spoken content.
Use Cases
Content Analysis at Scale
Analyze what creators are talking about:
async function getTranscript(url) {
const res = await fetch(
`https://api.sociavault.com/v1/scrape/instagram/transcript?url=${encodeURIComponent(url)}`,
{ headers: { 'x-api-key': 'YOUR_API_KEY' } }
);
return res.json();
}
const reelUrls = ['url1', 'url2', 'url3'];
const results = await Promise.all(reelUrls.map(url => getTranscript(url)));
// Combine all transcript text
const allText = results
.map(r => Object.values(r.data.transcripts).map(t => t.text).join(' '))
.join(' ');
const wordFrequency = analyzeWordFrequency(allText);
Brand Mention Detection
Find video mentions of your brand:
async function checkBrandMention(reelUrl, brandKeywords) {
const result = await getTranscript(reelUrl);
const transcripts = Object.values(result.data.transcripts);
const allText = transcripts.map(t => t.text).join(' ').toLowerCase();
return brandKeywords.some(keyword =>
allText.includes(keyword.toLowerCase())
);
}
// Check multiple videos
const reels = ['https://instagram.com/reel/ABC/', 'https://instagram.com/reel/XYZ/'];
for (const url of reels) {
const mentioned = await checkBrandMention(url, ['YourBrand', 'your brand']);
if (mentioned) {
console.log(`Brand mention found: ${url}`);
}
}
Create Video Summaries
Use transcripts to generate content summaries:
async function summarizeVideo(url) {
const result = await getTranscript(url);
const transcripts = Object.values(result.data.transcripts);
// Get the full text from the first video
const fullText = transcripts[0]?.text || '';
// Extract key points (simplified example)
const sentences = fullText.split(/[.!?]+/).filter(Boolean);
const summary = sentences.slice(0, 3).join('. ');
return summary;
}
Sentiment Analysis
Analyze tone of spoken content:
async function analyzeVideoSentiment(url) {
const result = await getTranscript(url);
const transcripts = Object.values(result.data.transcripts);
const fullText = transcripts.map(t => t.text).join(' ');
// Simple sentiment keywords (use proper NLP in production)
const positiveWords = ['amazing', 'love', 'great', 'awesome', 'perfect'];
const negativeWords = ['bad', 'hate', 'terrible', 'worst', 'awful'];
const words = fullText.toLowerCase().split(/\s+/);
const positive = words.filter(w => positiveWords.includes(w)).length;
const negative = words.filter(w => negativeWords.includes(w)).length;
return {
sentiment: positive > negative ? 'positive' : negative > positive ? 'negative' : 'neutral',
positiveCount: positive,
negativeCount: negative
};
}
Batch Processing with Carousel Support
Handle carousel posts that contain multiple videos:
async function processCarousel(url) {
const result = await getTranscript(url);
const transcripts = result.data.transcripts;
for (const [index, transcript] of Object.entries(transcripts)) {
console.log(`Video ${Number(index) + 1} (${transcript.shortcode}):`);
console.log(transcript.text);
console.log('---');
}
return Object.values(transcripts).map(t => t.text);
}
Related Endpoints
- Instagram Reels API - Get profile reels
- Instagram Post Info - Post metadata
- TikTok Transcript - TikTok video transcripts
- YouTube Transcript - YouTube transcripts
Frequently Asked Questions
How accurate are the transcripts?
Transcripts are AI-powered and typically achieve high accuracy for clear speech. Accuracy may vary with heavy background music, multiple overlapping speakers, or very low audio quality.
How long does transcription take?
Expect 10-30 seconds per video since each request runs through AI processing. Plan your timeouts accordingly.
Does it support carousel posts?
Yes. For carousel posts containing multiple videos, the API returns a transcript for each video item, keyed by index ("0", "1", "2", etc.) in the transcripts object.
What if the video has no speech?
If no speech is detected in the video, the transcript will return null.
How many credits does it cost?
Each transcript request costs 1 credit, regardless of video length or the number of items in a carousel.
Get Started
Create your account and start extracting Instagram video transcripts.
Documentation: /docs/api-reference/instagram/transcript
Found this helpful?
Share it with others who might benefit
Ready to Try SociaVault?
Start extracting social media data with our powerful API. No credit card required.