Bots, Spiders, and Crawlers: The Results
It's been 7 days since I updated my robots.txt and the results are in. Has anything changed? Has my site tanked? Time to find out. To make sure the logs were comparable I grabbed 6 full days' worth of data. I wrote the previous post on the 20th and so I grabbed data from September 14th to 19th—I'll refer to this as "before"—and I compared it against data from the 21st to 26th range, the "after". As mentioned in the previous post, I'm using Goaccess to parse the data and I ran it twice for each set of logs: one with the
ignore-crawlers option turned on and one with the option turned off. Ok, enough words, here's the data:
First thing first, a note on total requests: Goaccess will count all requests no matter what the
ignore-crawlers says which is why the number is identical with that turned on and off. Now, for the interesting parts. Overall hits on the server look about the same. There's a ~4500 requests difference over 6 days which is ~4% of the total ~117000 original server requests. The interesting part for me are the "Not Found" that have gone down by a significant %. The panel dedicated to the 404s shows 4525 total hits that went down to 3590 after the change. Not really all that important but still interesting to see.
For Goaccess, a "Unique Visitor" is a hit coming on the same date, from the same combination of IP and User Agent. And according to that metric, I lost ~8% of visitors unique visitors. Don't ask me how accurate that estimate is because I have no idea. That 8% might be bots that were escaping my
ignore-crawlers filter list and now got blocked by the robots.txt rule.
Goaccess also has a dedicated "Browsers" panel that shows which browsers are used when accessing my site and also very conveniently bundles all the crawlers together under the "Crawlers" label. Before, Crawlers were 54.6% of the total traffic. After the change? 54.06%. So that tells me that the vast majority of automated tools out there just don't give a fuck about what you put in your robots.txt.
As for the "Referring Sites" panel, the only thing I checked is how much traffic came from Google and that changed a bit: 900 hits before, and 376 after.
So, what's the takeaway here? I guess that the vast majority of crawlers don't give a shit about your robots.txt. As for me and this site, I think I'm gonna revert back and allow bots to just do whatever the hell they want. This seems a worthless battle to fight for no real benefit to my general mission which is to connect with other human beings.
If you have questions about this experiment let me know. I'm happy to poke around the results some more if there's anything you're interested in.