What belongs in a sitemap
A sitemap is a list of the pages you actually want indexed — nothing more. Every URL in it should be canonical, return a 200 status, and be eligible for indexing. Including junk teaches engines to distrust your sitemap and wastes the crawl budget you want spent on real content.
In practice: include only canonical, indexable, 200-status URLs — no redirects, no noindex, no duplicates; use absolute URLs with your preferred protocol and host (https, www-or-not, consistently); set lastmod to the real last-modified date so engines know what to re-crawl; and keep priority and changefreq honest, or omit them, since engines largely treat them as hints.
Structure and scale
A single sitemap holds up to 50,000 URLs and 50MB uncompressed. Past that, use a sitemap index file that points to multiple sitemaps — and many sites do this deliberately by section (posts, products, pages) so they can spot crawl issues per area.
Reference your sitemap from robots.txt with a `Sitemap:` line and submit it in Google Search Console and Bing Webmaster Tools. AI engines and crawlers discover sitemaps the same way, so a discoverable, clean sitemap helps GEO too.
Common sitemap mistakes
Sitemap problems are quiet — pages just don't get indexed, and you rarely get an error. The fixes are simple once you know what to look for.
The usual culprits: listing redirected or 404 URLs, which waste crawl and signal staleness; listing noindex or canonicalized-away pages, which contradicts your own signals; stale lastmod dates that never change, which engines learn to ignore; forgetting to reference the sitemap from robots.txt; and not regenerating the sitemap when content changes.
frequently asked
- Do priority and changefreq actually matter?
- Google has said it largely ignores priority and changefreq, treating them as weak hints. lastmod is the most useful field when it's accurate. Don't over-invest in tuning priority — focus on listing only clean, canonical URLs.
- Should I include images and videos in my sitemap?
- You can use image and video sitemap extensions if media discovery matters for your site, but for most sites a clean URL sitemap is the priority. Add media extensions only when you have substantial media you want indexed.
- How do I know if my sitemap is healthy?
- Run your site through seo·check — it discovers your robots.txt and sitemap, counts the URLs, and flags sitemap health as part of the audit, so you can catch missing or broken sitemaps fast.
Published April 21, 2026 · Last updated June 16, 2026