Gary Illyes from Google said it is an “silly idea” to re-block or re-disallow pages after you allowed them to crawl that page in order for Google to see the noindex tags on that page. I apologize for the title of this story, it was a hard one to write.
This came up when Gary posted his PSA saying “your periodic reminder that crawlers that obey robotstxt won’t see a noindex directive on a page if said page is disallowed for crawling.” Then Patrick Stox followed up with a question he sees a lot where he said a “popular recommendation in SEO communities in this situation is to unblock, let the pages be crawled, and then block them again. To me, that doesn’t make sense. You end up where you started.”
Gary agreed with Patrick and said “yeah I wouldn’t re-disallow the pages once they were crawled.” He added “that sounds like a silly idea.”
The whole thing confuses me to be honest, I don’t get the logic, I mean, I kind of do, but what is the purpose? Is the end result all that different?
SEOs have set a page to block crawling via robots.txt, Google won’t index it and won’t deindex the page. But if you set a noindex tag on the page, then temporarily allow Googlebot to crawl that page to allow Google pick up on that noindex tag, then after Google picks up on it, you then put the disallow directive back. Why? You will end up back at the same state when you do this and then you have to do this all over again, just to repeat the process.
Here are the tweets:
your periodic reminder that crawlers that obey robotstxt won’t see a noindex directive on a page if said page is disallowed for crawling.
prompted by https://t.co/i7ouMoqNT6 which was answered by @patrickstox pic.twitter.com/98NLF2twz1
— Gary 鯨理/경리 Illyes (@methode) March 25, 2021
yeah i wouldn’t re-disallow the pages once they were crawled. that sounds like a silly idea
— Gary 鯨理/경리 Illyes (@methode) March 25, 2021
Forum discussion at Twitter.
Update: Some have questioned this, and this helps explain it so here are those questions:
I think the same… if finally they have been de-indexed, why we should let Google continue to spend time crawling them? ????
— Gianluca Fiorelli wears a ????. Be like Gianluca (@gfiorelli1) March 26, 2021
Here is how Patrick responded:
Then just leave them blocked. These are your states/outcomes.
1 = blocked, noindex = pages indexed
2 = unblock noindex = noindex
3 = blocked (again) noindex = 1. Why switch back? Pages will get indexed again.— Patrick Stox (@patrickstox) March 26, 2021