Back to blog
google-search-consoledata-analysis

Export Google Search Console Data Without Losing Half of It

Isaac Gounton·

The 1,000-Row Trap

Open the Performance report in Google Search Console. Click the export button. Choose CSV. You get a file with up to 1,000 rows per dimension — and nothing in the UI tells you that you're missing the other 24,000.

For most sites that gets you the visible iceberg and none of the tail. The queries Google calls "anonymized" — typically 30–50% of impressions, often the most informative ones — never even appear in the export. The long tail that drives most real-world organic traffic gets silently truncated.

If you've ever exported GSC data and felt like the numbers didn't match the dashboard, this is why.

Three paths get you the actual dataset. Each has a different cost-vs-power tradeoff, so I'll cover all three and tell you which to pick based on what you're trying to do.

Path 1: Bulk Data Export to BigQuery

Google's Bulk Data Export feature ships daily query, page, country, device, and search appearance data to a BigQuery dataset you control. No row limits. No anonymization on individual queries (you still get the is_anonymized_query flag, but you also get the impressions tied to that flag).

When to pick this: you want the full long tail, you're comfortable in SQL or a data warehouse, and you're going to do this for more than one site.

What it costs:

  • BigQuery storage: GSC data is small. A medium site (50k clicks/month) generates maybe 100 MB per year. Storage is ~$0.02/GB/month.
  • BigQuery query cost: $5 per TB scanned. You'll spend pennies unless you're running unbounded queries on years of data.

The setup:

  1. In Google Cloud Console, create a project (or use an existing one) and enable the BigQuery API.
  2. In Search Console, go to Settings → Bulk data export.
  3. Enter the GCP project ID and dataset name. Set a region.
  4. Add search-console-data-export@system.gserviceaccount.com as a BigQuery Data Editor on the project.

Within 48 hours, daily tables start landing. The schema gives you searchdata_site_impression (aggregated) and searchdata_url_impression (per URL) — both partitioned by date. From there it's plain SQL:

SELECT
  query,
  SUM(impressions) AS impressions,
  SUM(clicks) AS clicks,
  SAFE_DIVIDE(SUM(clicks), SUM(impressions)) AS ctr,
  AVG(position) AS avg_position
FROM `your-project.your-dataset.searchdata_site_impression`
WHERE data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
  AND is_anonymized_query = FALSE
GROUP BY query
ORDER BY impressions DESC
LIMIT 1000;

That query gives you the top 1,000 real queries from the last 30 days — same shape as the CSV export, but unbounded and free of the UI's 1,000-row ceiling.

The catch: Bulk export is forward-looking only. It doesn't backfill the 16 months of historical data GSC retains. If you turn it on today, you only get data from today forward. Combine it with one of the next two paths for historical coverage.

Path 2: The Search Console API

The Search Analytics API is what most BI tools and SEO platforms use under the hood. It exposes the same 16-month window the dashboard uses, with one important difference: per request, you can pull up to 50,000 rows instead of 1,000.

When to pick this: you're scripting, you need historical data, or you want to pull only specific slices (one country, one page tree, one date range) on demand.

The basics:

curl -X POST \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": "2025-11-01",
    "endDate": "2026-05-01",
    "dimensions": ["query", "page"],
    "rowLimit": 25000,
    "startRow": 0
  }' \
  "https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fexample.com%2F/searchAnalytics/query"

A few things worth knowing before you write the script:

  • Per-request row limit is 25,000 for query/page/country/device breakdowns (50,000 for less granular reports). Paginate with startRow for larger pulls.
  • Anonymized queries are excluded by default. They still consume impressions in your totals, which is why API numbers won't match the dashboard exactly.
  • OAuth, not API keys. You authenticate as a user (or service account) with verified ownership of the property.
  • Quotas are generous — 1,200 QPM, 30,000 QPD per project — but they exist.

A common pattern is one weekly cron that pulls the previous 7 days at query + page granularity, stores it in your own warehouse, and lets you join GSC against revenue, internal CMS data, or competitor rankings.

The catch: the API is per-property. If you have 12 sites, that's 12 OAuth handshakes and 12 separate sync jobs to maintain. This is the friction that makes most teams either give up on data ownership or pay someone else to handle the pipes.

Path 3: Tools That Handle It For You

If you don't want to maintain a BigQuery dataset or write a sync script, the third option is a tool that wraps the API and presents the data in a more useful shape. There are a lot of these. The honest distinction is:

  • Dashboards (Looker Studio's GSC connector, AgencyAnalytics, etc.) re-render the same data with prettier charts. They don't unlock new analysis — they unlock prettier sharing.
  • Analyzers (Ahrefs, Semrush, DadSEO) join GSC data against their own crawl, competitor, or strategy data. They unlock cross-referencing — which of your ranking pages have declining CTR, what topics you cover that competitors don't, where your ranked-but-not-clicked queries cluster.

The first kind helps you report. The second kind helps you decide.

For DadSEO specifically, GSC sync is the foundation of every audit. Every site connects via OAuth, syncs up to 16 months of data on initial import, and refreshes daily. Audits then run against that real query data — not a crawler's guess about what your site is about. That's what makes the difference between a generic "your CTR is low" finding and the kind of specific diagnosis that points at a fixable problem.

If you want to drive this from code rather than a dashboard, DadSEO also ships an REST API and an MCP server — same data, exposed to your scripts and to AI assistants directly.

Which Path Should You Pick?

Quick decision matrix:

You want to…Pick
Pull more than 1,000 rows once, by handAPI + a Postman/curl request
Build a historical warehouse from scratchAPI (for the 16-month backfill) → BigQuery
Get every query every day, going forwardBulk Data Export to BigQuery
Stop maintaining pipes and start making decisionsA tool that joins GSC with strategy data
Pipe GSC into an AI workflowAPI or MCP server

Most serious teams end up running bulk export for the forward stream + a one-time API backfill for the historical window. That gives you the full 16+ months in one queryable place, scoped to a single warehouse you own.

One Last Trap

When you finally have the full dataset, the analytical mistake to avoid is treating "more rows" as "more insight." A long-tail query with two impressions and zero clicks is a row, not a signal. The reason the 1,000-row CSV feels insufficient isn't usually that you're missing important data — it's that you're missing the coverage needed to cluster queries into topics and see patterns.

The framework I follow: don't analyze queries individually. Group them into topic clusters, then ask which clusters earn impressions but lose clicks. That's where the export pays for itself. Looking at row 1,001 in isolation is just looking at row 1,000 with a different filter.

Connect Google Search Console to DadSEO →

IG
Isaac Gounton

Founder of DadSEO. I build tools that turn SEO data into strategy — not scores. Previously spent years running audits that told me what was broken without telling me what mattered.

Read more about me →

Ready to see what your SEO data actually means?

DadSEO connects to Google Search Console and gives you a strategic diagnosis — threats, gaps, and opportunities ranked by impact.

Get Your Free Audit