Back to Blog
General

What Fans Are Really Saying: World Cup Sentiment Analysis Across Platforms

June 27, 2026
11 min read
S
By SociaVault Team
world cupsocial media datasentiment analysissocial listeningfan engagement

What Fans Are Really Saying: World Cup Sentiment Analysis Across Platforms

TL;DR: Mention volume tells you that people are talking. Sentiment tells you whether that is good news. This guide shows you how to pull World Cup conversation from X, TikTok, YouTube, and Reddit, run a transparent sentiment score on it, and compare how fans feel across platforms, with working code in JavaScript and Python and an honest accounting of where simple sentiment scoring falls down.

A brand manager once watched their mention count triple during a match and popped the champagne, only to read the actual posts an hour later and realize the entire spike was people roasting their ad. The volume chart looked like a triumph. The conversation was a roast. That gap, between how much people are talking and how they actually feel, is the whole reason sentiment analysis exists.

During a World Cup, that gap matters more than usual. Emotions run hot, reactions are instant, and the difference between a beloved moment and a hated one can be a single refereeing decision. If you are tracking the tournament for a brand, a broadcaster, or a federation, raw volume will mislead you constantly. You need to know the tone, and you need to know how it differs by platform, because the crowd on Reddit is not the crowd on TikTok.

This post walks through a practical, honest sentiment pipeline. We will pull the conversation, score it, compare platforms, and, importantly, be clear about what a simple scoring approach can and cannot tell you.

Why Sentiment, and Why Per Platform

Two ideas drive everything here.

First, sentiment beats volume for decision-making. A spike in mentions is ambiguous until you know the tone. Positive spikes are opportunities to amplify. Negative spikes are fires to manage. You cannot tell which is which from a volume chart alone.

Second, platforms have personalities. The same World Cup moment produces different reactions depending on where you look:

  • X is fast, reactive, and often the most negative. It is where the hot takes live.
  • TikTok skews younger and more celebratory, heavy on humor and highlight culture.
  • YouTube comments tend to be longer and more considered, good for deeper reactions.
  • Reddit is analytical and tribal, with strong community norms per subreddit.

If you average all of them into one number you lose the most useful signal. A moment that is beloved on TikTok and hated on X is telling you something specific. Keep them separate.

The Endpoints We Will Use

To gather the conversation we pull from search and comment endpoints:

  • /v1/scrape/twitter/search for X posts on a topic
  • /v1/scrape/tiktok/search-keyword for TikTok videos on a topic
  • /v1/scrape/youtube/video/comments for comments on a specific match highlight or reaction video
  • /v1/scrape/reddit/search for Reddit discussion
  • /v1/scrape/instagram/comments when you want reactions on a specific Instagram post

Each call is roughly one credit. Full parameters are in the docs.

Step 1: Gather the Conversation

Start by pulling text from each platform for a topic, say a specific match or moment. We will normalize everything down to a list of text strings, because the sentiment step does not care which platform a string came from until we tag it.

const BASE = "https://api.sociavault.com";
const HEADERS = { "X-API-Key": "YOUR_API_KEY" };

async function get(path, params) {
  const res = await fetch(`${BASE}${path}?${new URLSearchParams(params)}`, {
    headers: HEADERS,
  });
  if (!res.ok) return [];
  const { data } = await res.json();
  return data.results ?? data.posts ?? data.comments ?? [];
}

async function gatherConversation(topic) {
  const [xPosts, tiktoks, reddit] = await Promise.all([
    get("/v1/scrape/twitter/search", { query: topic, limit: "100" }),
    get("/v1/scrape/tiktok/search-keyword", { query: topic, limit: "100" }),
    get("/v1/scrape/reddit/search", { query: topic, limit: "100" }),
  ]);

  return {
    x: xPosts.map((p) => p.text ?? p.content ?? ""),
    tiktok: tiktoks.map((p) => p.description ?? p.title ?? ""),
    reddit: reddit.map((p) => `${p.title ?? ""} ${p.selftext ?? ""}`),
  };
}

const convo = await gatherConversation("world cup final");
console.log(
  `X: ${convo.x.length}, TikTok: ${convo.tiktok.length}, Reddit: ${convo.reddit.length}`,
);
import requests

BASE = "https://api.sociavault.com"
HEADERS = {"X-API-Key": "YOUR_API_KEY"}

def get(path, params):
    res = requests.get(f"{BASE}{path}", headers=HEADERS, params=params)
    if not res.ok:
        return []
    data = res.json()["data"]
    return data.get("results") or data.get("posts") or data.get("comments") or []

def gather_conversation(topic):
    x_posts = get("/v1/scrape/twitter/search", {"query": topic, "limit": "100"})
    tiktoks = get("/v1/scrape/tiktok/search-keyword", {"query": topic, "limit": "100"})
    reddit = get("/v1/scrape/reddit/search", {"query": topic, "limit": "100"})

    return {
        "x": [p.get("text") or p.get("content") or "" for p in x_posts],
        "tiktok": [p.get("description") or p.get("title") or "" for p in tiktoks],
        "reddit": [f"{p.get('title','')} {p.get('selftext','')}" for p in reddit],
    }

convo = gather_conversation("world cup final")
print(f"X: {len(convo['x'])}, TikTok: {len(convo['tiktok'])}, Reddit: {len(convo['reddit'])}")

For reactions to a specific highlight video, pull the comments directly:

comments = get("/v1/scrape/youtube/video/comments",
               {"url": "https://www.youtube.com/watch?v=EXAMPLE", "limit": "200"})
texts = [c.get("text", "") for c in comments]
print(f"Pulled {len(texts)} YouTube comments")

Step 2: A Simple, Transparent Sentiment Score

There are two honest ways to score sentiment without training your own model: a lexicon approach you build yourself, or a ready-made library. We will show both, starting with a lexicon so you can see exactly how the sausage is made.

A lexicon scorer counts positive and negative words and returns a normalized score. It is crude, but it is transparent and fast, and you can tune the word lists for football slang.

const POSITIVE = new Set([
  "amazing",
  "incredible",
  "love",
  "loved",
  "goal",
  "win",
  "winning",
  "brilliant",
  "beautiful",
  "class",
  "legend",
  "hero",
  "magic",
  "stunning",
  "deserved",
  "joy",
  "best",
  "wonderful",
  "unbelievable",
  "clinical",
]);

const NEGATIVE = new Set([
  "terrible",
  "awful",
  "hate",
  "robbed",
  "disgrace",
  "embarrassing",
  "boring",
  "cheat",
  "dive",
  "var",
  "penalty",
  "disallowed",
  "worst",
  "rigged",
  "shameful",
  "choke",
  "bottled",
  "overrated",
  "disappointing",
]);

function scoreText(text) {
  const words = text.toLowerCase().match(/[a-z']+/g) ?? [];
  let pos = 0;
  let neg = 0;
  for (const w of words) {
    if (POSITIVE.has(w)) pos++;
    if (NEGATIVE.has(w)) neg++;
  }
  const total = pos + neg;
  if (total === 0) return 0; // neutral
  return (pos - neg) / total; // ranges from -1 to 1
}

function scoreCorpus(texts) {
  const scored = texts.map(scoreText);
  const nonNeutral = scored.filter((s) => s !== 0);
  const avg = nonNeutral.length
    ? nonNeutral.reduce((a, b) => a + b, 0) / nonNeutral.length
    : 0;
  return {
    average: Number(avg.toFixed(3)),
    positive: scored.filter((s) => s > 0).length,
    negative: scored.filter((s) => s < 0).length,
    neutral: scored.filter((s) => s === 0).length,
  };
}

console.log("X sentiment:", scoreCorpus(convo.x));

A quick note on the word lists above: terms like "var" and "penalty" lean negative in football conversation because they usually show up when fans are furious about a decision, but that is a judgment call you should revisit for your own context. That is exactly the kind of tuning a homegrown lexicon makes easy.

Here is the Python version, this time using a maintained library instead of a hand-rolled lexicon so you can compare approaches. VADER is a good fit because it is built for short social text and handles things like emphasis and negation better than a raw word count:

# pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

# Optional: nudge the lexicon toward football slang
analyzer.lexicon.update({
    "robbed": -2.5,
    "rigged": -3.0,
    "clinical": 2.0,
    "bottled": -2.0,
    "class": 2.0,
})

def score_corpus(texts):
    pos = neg = neu = 0
    compound_total = 0.0
    counted = 0
    for t in texts:
        if not t.strip():
            continue
        score = analyzer.polarity_scores(t)["compound"]
        compound_total += score
        counted += 1
        if score > 0.05:
            pos += 1
        elif score < -0.05:
            neg += 1
        else:
            neu += 1
    avg = round(compound_total / counted, 3) if counted else 0
    return {"average": avg, "positive": pos, "negative": neg, "neutral": neu}

print("X sentiment:", score_corpus(convo["x"]))

Both approaches return the same shape: an average score and a breakdown of positive, negative, and neutral counts. The JavaScript lexicon is fully transparent and tunable. The Python library is more robust out of the box. Pick based on whether you value control or accuracy more.

Step 3: Compare Across Platforms

Now run the scorer on each platform's corpus separately and put them side by side. This is the comparison that actually drives decisions.

const byPlatform = {
  x: scoreCorpus(convo.x),
  tiktok: scoreCorpus(convo.tiktok),
  reddit: scoreCorpus(convo.reddit),
};

for (const [platform, s] of Object.entries(byPlatform)) {
  const total = s.positive + s.negative + s.neutral;
  const posPct = total ? ((s.positive / total) * 100).toFixed(0) : 0;
  console.log(
    `${platform}: avg ${s.average}, ${posPct}% positive (n=${total})`,
  );
}
by_platform = {
    "x": score_corpus(convo["x"]),
    "tiktok": score_corpus(convo["tiktok"]),
    "reddit": score_corpus(convo["reddit"]),
}

for platform, s in by_platform.items():
    total = s["positive"] + s["negative"] + s["neutral"]
    pos_pct = round(s["positive"] / total * 100) if total else 0
    print(f"{platform}: avg {s['average']}, {pos_pct}% positive (n={total})")

A typical result during a contentious match might show TikTok running positive (highlights and celebrations), X running negative (refereeing rage), and Reddit somewhere in the middle (long analytical threads that weigh both). That spread is the insight. A single blended number would have hidden all of it.

Step 4: Track Sentiment Over the Match

Sentiment is most useful as a time series. Pull the conversation in windows, every fifteen minutes during a match, and plot the average per platform. The dips and spikes line up with events almost perfectly: a goal lifts TikTok, a disputed call tanks X. Store each window's score with a timestamp and you can replay the emotional arc of the whole match afterward.

import time
from datetime import datetime, timezone

timeline = []

def snapshot(topic):
    convo = gather_conversation(topic)
    row = {"ts": datetime.now(timezone.utc).isoformat()}
    for platform, texts in convo.items():
        row[platform] = score_corpus(texts)["average"]
    timeline.append(row)
    return row

# During a live match, call snapshot on a loop
# while match_is_live:
#     print(snapshot("world cup final"))
#     time.sleep(900)  # every 15 minutes

Line that timeline up against the match events and you have a clean story about exactly when fan emotion turned, which is gold for a post-match report or a broadcaster recap.

Being Honest About Accuracy

This is the part most sentiment tutorials skip, so let's be straight about it. Simple sentiment scoring is useful but genuinely limited:

  • Sarcasm wrecks it. "Oh brilliant, another penalty, love that for us" reads as positive to a lexicon and to most lightweight models. Football fans are relentlessly sarcastic, so expect real error here.
  • Context flips word meaning. "Sick goal" is a rave. "I'm sick of this" is a complaint. The word "sick" cannot tell them apart on its own.
  • Slang and emoji carry huge load. A single fire or crying-laughing emoji can outweigh the text. A basic word scorer ignores them entirely unless you add them.
  • Language and code-switching. A global tournament means dozens of languages, often mixed in one post. An English lexicon silently scores those as neutral, which biases your sample.
  • Neutral is not the same as no opinion. A zero score often just means your lexicon did not recognize the words, not that the person felt nothing.

So what do you do about it? Be realistic about the claim you make. This approach is reliable for relative comparison and trend direction, is X more negative than TikTok, did sentiment drop after the 60th minute, and unreliable for precise absolute claims, like saying exactly 73.4 percent of fans were happy. Report directions and comparisons, not false precision. And always spot-read a sample of the actual posts to sanity-check what the score is telling you. If you need higher accuracy, the next step up is a purpose-built multilingual sentiment model, which is a bigger investment and out of scope here. We also walk through a fuller dashboard build in our sentiment analysis dashboard guide.

A Practical Reporting Routine

For a tournament, here is a routine that holds up:

  1. Define your topics: the tournament overall, each match you care about, and any brand or player you are tracking.
  2. Gather the conversation per platform, keeping them separate.
  3. Score each corpus with a transparent method you can defend.
  4. Report the per-platform comparison, not a blended average.
  5. Track a time series during live matches to capture emotional swings.
  6. Spot-read real posts before you present any number.

That gives you something far more honest and more useful than a single sentiment gauge: a real read on how different fan communities feel, and how that changes as the action unfolds.

Where to Go Next

This connects directly to the rest of the series:

Ready to measure how fans actually feel? Start free with SociaVault and use your 50 credits to run sentiment across every platform before the next match. Everything you need is in the docs.

Volume tells you the crowd is loud. Sentiment tells you whether they are cheering or jeering. During a World Cup, that difference is the entire story.

Found this helpful?

Share it with others who might benefit

Ready to Try SociaVault?

Start extracting social media data with our powerful API. No credit card required.