LLM-Based Search Volume Prediction

Research

An analysis comparing Google Gemini's keyword volume predictions against actual Google Search Console data reveals weak-to-moderate correlation and limited accuracy.

Listen

Can your favorite large language model accurately estimate search volumes? The short answer is no, but it does have a general idea.

We put Google’s Gemini to the test, comparing its monthly search volume predictions to actual data from Google Search Console. We ran top-performing queries through Gemini and matched them against real search impression data.

What we found is that the direct correlation is weak. The artificial intelligence is much better at ranking keywords from high to low than predicting exact numbers. When we grouped the search volumes into five categories, from very low to very high, Gemini only got the exact category right about thirty-five percent of the time. However, it was in the right ballpark—either spot-on or just one category off—nearly seventy percent of the time.

This discrepancy happens because Google Search Console reflects your site's actual visibility and ranking, while the AI relies on broad, web-scale patterns.

The takeaway is simple. Use artificial intelligence for direction, not precision. It is great for spotting big versus small topics and sorting opportunities into tiers, but it is no replacement for your real analytics.

Can your favourite LLM accurately estimate query search volumes? No.
Does it have a general idea? Yes.

We put Google’s Gemini to the test by comparing its keyword volume predictions to actual search data from Google Search Console (GSC). Here’s what we learned and how we did it.

How We Collected and Compared the Data

Data Sources
- Predicted volumes: For each search query, we asked Google Gemini for a monthly search volume estimate and keyword difficulty, specifying the country for localization.
- Actual volumes: We extracted the real number of impressions for the same queries from our verified GSC property, aggregating over a full month.
Automation Pipeline
- Queries were selected from GSC data using Python, focusing on top-performing keywords and filtering out outliers or brand terms as needed.
- For each query, the Gemini API was called to generate search volume and difficulty estimates.
- Results were automatically stored in a database, along with actual impressions, clicks, and positions from GSC.
- The analysis and all visualizations were produced using custom scripts and dashboards.

What Did We Find?

1. Direct Correlation Is Weak-to-Moderate

Pearson correlation (linear): ~0.41
Spearman correlation (rank order): ~0.57
AI predictions align better in rank (high vs. low), but aren’t reliably linear.

2. Bucket Accuracy: More Forgiving, Still Limited

We grouped both Gemini and GSC volumes into 5 buckets: very low, low, medium, high, very high (using quantiles).
Exact bucket match: Only 35% of predictions landed in the same bucket as reality.
Exact or adjacent bucket: ~69% were at least “close” (the right bucket or one away).
Accuracy varied by bucket: Middle buckets (medium/high) tended to be more accurate, while extremes were less so.

3. Visuals Make It Clear

Scatterplots show broad scatter, with only a loose trend.
Bucket heatmaps and per-bucket bar charts show the model is “in the ballpark” but misses precise targeting often.

Why the Discrepancy?

GSC impressions and keyword volumes measure different things: Impressions can be influenced by your ranking, page coverage, and seasonality.
AI predictions use web-scale patterns, not your site’s visibility.
Noise in both sources: GSC can undercount, Gemini can overgeneralize, and search volumes themselves are inherently rough estimates.

Practical Takeaways

AI keyword volumes are directionally useful: They help spot “big” vs. “small” topics, but don’t expect precision.
Use buckets, not raw numbers: Focus on opportunity tiers (e.g., “high potential” keywords), not exact volume predictions.
Always verify with your real data: AI tools are a shortcut for ideation, not a replacement for analytics.

Dan Petrovic · May 19, 18:21