Instagram Transcript API: Convert Video Audio to Text
Need to extract what's being said in Instagram Reels and videos? The Transcript API provides AI-powered speech-to-text conversion for any Instagram video content.
Why Use Video Transcripts?
- Content Analysis - Analyze spoken content at scale
- SEO Optimization - Make video content searchable
- Accessibility - Create captions for hearing-impaired users
- Content Repurposing - Turn videos into blog posts or quotes
- Sentiment Analysis - Analyze spoken tone and messaging
- Brand Monitoring - Track verbal brand mentions
Using the Transcript API
const response = await fetch('https://api.sociavault.com/instagram/transcript', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.instagram.com/reel/ABC123xyz/'
})
});
const transcript = await response.json();
Sample Response
{
"postId": "3123456789012345678",
"url": "https://www.instagram.com/reel/ABC123xyz/",
"duration": 30,
"transcript": {
"text": "Hey everyone! Today I'm going to show you how to make the perfect smoothie bowl. First, you need frozen bananas, some berries, and a splash of almond milk...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hey everyone!"
},
{
"start": 2.5,
"end": 7.2,
"text": "Today I'm going to show you how to make the perfect smoothie bowl."
},
{
"start": 7.2,
"end": 12.8,
"text": "First, you need frozen bananas, some berries, and a splash of almond milk..."
}
],
"language": "en",
"confidence": 0.95
}
}
Use Cases
Content Analysis at Scale
Analyze what creators are talking about:
const reelUrls = ['url1', 'url2', 'url3'];
const transcripts = await Promise.all(
reelUrls.map(url => getTranscript(url))
);
// Find common topics
const allText = transcripts.map(t => t.transcript.text).join(' ');
const wordFrequency = analyzeWordFrequency(allText);
Brand Mention Detection
Find video mentions of your brand:
async function checkBrandMention(reelUrl, brandKeywords) {
const { transcript } = await getTranscript(reelUrl);
const mentioned = brandKeywords.some(keyword =>
transcript.text.toLowerCase().includes(keyword.toLowerCase())
);
return mentioned;
}
// Check multiple videos
const reels = await getProfileReels('influencer');
for (const reel of reels) {
const mentions = await checkBrandMention(reel.url, ['YourBrand', 'your brand']);
if (mentions) {
console.log(`Brand mention found: ${reel.url}`);
}
}
Create Video Summaries
Use transcripts to generate content summaries:
async function summarizeVideo(url) {
const { transcript } = await getTranscript(url);
// Extract key points (simplified example)
const sentences = transcript.text.split(/[.!?]+/);
const summary = sentences.slice(0, 3).join('. ');
return summary;
}
Generate Captions
Create accessibility captions from transcripts:
async function generateCaptions(url) {
const { transcript } = await getTranscript(url);
// Format as SRT
const srt = transcript.segments.map((seg, i) => {
const startTime = formatTime(seg.start);
const endTime = formatTime(seg.end);
return `${i + 1}
${startTime} --> ${endTime}
${seg.text}
`;
}).join('\n');
return srt;
}
function formatTime(seconds) {
const h = Math.floor(seconds / 3600);
const m = Math.floor((seconds % 3600) / 60);
const s = Math.floor(seconds % 60);
const ms = Math.floor((seconds % 1) * 1000);
return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}
Sentiment Analysis
Analyze tone of spoken content:
async function analyzeVideoSentiment(url) {
const { transcript } = await getTranscript(url);
// Simple sentiment keywords (use proper NLP in production)
const positiveWords = ['amazing', 'love', 'great', 'awesome', 'perfect'];
const negativeWords = ['bad', 'hate', 'terrible', 'worst', 'awful'];
const words = transcript.text.toLowerCase().split(/\s+/);
const positive = words.filter(w => positiveWords.includes(w)).length;
const negative = words.filter(w => negativeWords.includes(w)).length;
return {
sentiment: positive > negative ? 'positive' : negative > positive ? 'negative' : 'neutral',
positiveCount: positive,
negativeCount: negative
};
}
Related Endpoints
- Instagram Reels API - Get profile reels
- Instagram Post Info - Post metadata
- TikTok Transcript - TikTok video transcripts
- YouTube Transcript - YouTube transcripts
Frequently Asked Questions
How accurate are the transcripts?
Transcripts typically achieve 90-95% accuracy for clear speech in English. Accuracy may vary with background music, multiple speakers, or non-English content.
What languages are supported?
The API auto-detects and transcribes content in 50+ languages. The language field in the response indicates the detected language.
How long does transcription take?
Most videos are transcribed within 10-30 seconds. Longer videos may take proportionally more time.
Can I get transcripts for any video length?
Yes, the API handles videos of any length. Very long videos (10+ minutes) are processed in segments.
Are timestamps included?
Yes, the segments array includes start and end timestamps for each sentence/phrase, enabling caption generation.
What if the video has no speech?
Videos without detectable speech return an empty transcript with a note indicating no speech was detected.
Get Started
Create your account and start extracting Instagram video transcripts.
Documentation: /docs/api-reference/instagram/transcript
Found this helpful?
Share it with others who might benefit
Ready to Try SociaVault?
Start extracting social media data with our powerful API. No credit card required.