Stop checking URLs one by one in Search Console. Build an automated index health dashboard that catches noindex flags, crawl errors, and coverage drops before they kill your rankings.
Google Search Console’s URL Inspection tool is great for debugging one URL. But when you have 50,000 pages, a client audit, or a PBN network to monitor, manual inspection is a non-starter. The python google search console api index inspection workflow lets you programmatically check index status, coverage issues, and crawl errors at scale.
In practice, when you run a site migration or launch a large content batch, index status can drop silently. A single noindex meta tag slipped into a template can take thousands of pages out of the index overnight. The API is the only way to catch that fast.
This isn’t about vanity metrics. It’s about catching SEO indexation failures before they show up in traffic drops. The Moz guide on indexation is a good primer on why this matters.
From sitemap, CMS export, or crawl. Expect duplicates and 404s in the list.
Use service account OAuth 2.0. Token expires every hour; refresh automatically.
API limit: 2000 URLs/day. Batch 10-20 URLs per request to reduce latency.
Fields: indexStatus, coverageState, crawlingAllowed, robotsTxtState.
Flag dropped URLs, noindex flags, soft 404s, blocked by robots.txt.
Send Slack/email if index rate drops below threshold, e.g., 85%.
| Property | API Field / Value | Practical Use | Hidden Risk / Failure Mode |
|---|---|---|---|
| Index status | indexStatus.result: PASSED / PENDING / FAILED | Confirm URL is indexed after publishing | PENDING can persist 24h+; don't alert too early |
| Coverage state | coverageState: Submitted and indexed / Submitted not indexed / Not submitted | Check if Google knows about the URL | Submitted not indexed often means quality issues or crawl budget |
| Crawling allowed | crawlingAllowed: true / false | Quick check if robots.txt blocks the page | False can be due to a single disallow directive; inspect robots.txt separately |
| Robots.txt state | robotsTxtState: ALLOWED / DISALLOWED / NOT_FOUND | Verify Googlebot can crawl | NOT_FOUND means no robots.txt; Google will crawl but may index poorly |
| Duplicate count | duplicateCount: integer | Detect canonicalization issues across pages | High duplicate count suggests thin content or URL params problem |
Scenario: You publish 200 blog posts. You want to verify indexation within 48 hours.
Setup:
Results after 48 hours:
Action: Fix robots.txt for the 5 blocked URLs. Resubmit the 3 missing URLs via the API. Re-check after 24h. Index rate goes from 90% to 97%.
No API workflow survives first contact with production data. Here are the real blockers:
Blocked URLs by robots.txt — The API will report crawlingAllowed: false, but the URL may still be indexed if discovered via external links. Don’t assume a blocked URL is deindexed. Check both fields.
Wrong filters — If you query the API without specifying a site URL property (scProperty), you get empty results. Double-check the property string (e.g., sc-domain:example.com vs https://www.example.com).
Bad data from duplicate lists — If your URL list contains duplicates, the API returns the same result multiple times, inflating your quota usage. Deduplicate before sending.
Empty results — If you request inspection of a URL that Google has never seen, the API returns coverageState: Not submitted. That’s expected. Don’t treat it as an error unless the URL was submitted via sitemap.
Slow vendors — Google API latency varies. At peak hours, a single URL inspection can take 5-8 seconds. Batch requests reduce the pain, but build in a timeout (e.g., 10 seconds per batch).
Once you have a reliable index inspection pipeline, you can extend it to other Search Console APIs. For example, the sitemap API lets you submit new URLs directly after inspection. This is especially useful for indexing a sitemap in Google quickly after a content push.
For more advanced scenarios — like monitoring indexation of backlinks or PBN networks — a careful approach is required. The PBN sandbox escape protocol article discusses safe indexing strategies that complement API-based monitoring.
Create a list of site properties (e.g., sc-domain:site1.com, sc-domain:site2.com). Loop through each property and call the API inspect method for each URL batch. Use a service account with access to all properties. Watch out for quota: 2000 URLs/day per property. For 5 sites with 500 URLs each, you consume 2500 requests — spread across different properties to avoid hitting limits.
Request a quota increase via Google Cloud Console (up to 10,000 URLs/day). Alternatively, spread inspections over multiple days. Use a priority queue: inspect high-value pages first (home, money pages), then batch the rest. Deduplicate URLs aggressively — agencies often send the same URL multiple times from different sitemaps.
The API field indexStatus.result returns FAILED if the page has a noindex meta tag or robots meta tag. But it doesn’t tell you the exact directive. To confirm, parse the page content separately with a headless browser or requests library and check for <meta name='robots' content='noindex'> or X-Robots-Tag header. This is a common failure: the API flags it, but you need to verify the cause.
Yes. Before sending a guest post pitch, run the target site’s URL through the API to ensure it’s indexed. If coverageState is ‘Submitted not indexed’, the site may have crawl budget issues — avoid pitching. Also check crawlingAllowed: if false, the page is blocked by robots.txt and your guest post link won’t be crawled.
Common errors: 403 (service account not added to Search Console), 404 (wrong property format), 429 (quota exceeded), and 500 (Google side). Handle 429 with exponential backoff. A 404 often means you used https://example.com instead of sc-domain:example.com. Log all errors with the URL and response body for debugging.
Use a cron job every 6-12 hours to inspect a rotating subset of URLs (e.g., 500 most important pages). Store results in a database with timestamps. Calculate index rate = indexed URLs / total inspected. Set a threshold (e.g., 90%). If rate drops below, send an alert. Visualize trends over time — a sudden drop is often a template change or server error.
Yes, but be careful: the API only inspects URLs you own or have access to. For backlinks on external sites, you cannot check their index status directly. Instead, use the Search Console Links API to get the list of external links pointing to your site, then inspect those landing pages on your own property. For third-party pages, you need a different tool.
indexStatus.result is a simple PASSED/FAILED/PENDING based on whether the URL is in the index. coverageState gives more detail: ‘Submitted and indexed’ means the URL was submitted via sitemap and indexed. ‘Submitted not indexed’ means Google knows about it but hasn’t indexed it yet. Use coverageState for diagnostics, indexStatus.result for quick checks.
Split the sitemap into 5 batches of 2,000 URLs. Inspect one batch per day for 5 days. Prioritize URLs that are most important (e.g., pages with traffic). Use the API’s batch endpoint to send 10-20 URLs per request to stay under the per-minute limit. Alternatively, request a quota increase to 10,000/day for large projects.
Indexation is not set-and-forget. It shifts with every template update, server migration, or CMS bug. A Python automation using the Search Console API is the only reliable way to catch drops early. Start with a small batch, handle the edge cases, and build up to full coverage.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.