bad robot
The best part of running your own server is definitely reviewing the logs. There are a lot of silly people out there, and each and every one of them has written a program that would like to visit your server.
The fun comes from watching each bot, then trying to guess the nature of the bug.
133.9.84.100 - - [28/Mar/2015:14:18:50 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes#Tarsnap HTTP/1.1" 404 11 "-" "Mozzila/5.0 (compatible; Sonic/1.0; http://www.yama.info.waseda.ac.jp/~crawler/info.html)"
133.9.84.100 - - [28/Mar/2015:14:20:30 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes#Android HTTP/1.1" 404 11 "-" "Mozzila/5.0 (compatible; Sonic/1.0; http://www.yama.info.waseda.ac.jp/~crawler/info.html)"
133.9.84.100 - - [28/Mar/2015:14:21:18 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes#X HTTP/1.1" 404 11 "-" "Mozzila/5.0 (compatible; Sonic/1.0; http://www.yama.info.waseda.ac.jp/~crawler/info.html)"
Ah, yes, I have fond memories of my first web crawler too. Other crawlers work in more mysterious ways.
54.174.25.109 - - [24/Mar/2015:11:14:23 -0400] "GET /flak/post/comcast-ping-times" HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:15:37 -0400] "GET /flak/post/Heroku-subscription-status">Heroku HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:16:37 -0400] "GET /flak/tag/magreview">magreview</a> HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:18:22 -0400] "GET /flak/tag/business" HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:19:24 -0400] "GET /flak/rss" HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:21:19 -0400] "GET /flak/tag/rants" HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
54.174.25.109 - - [24/Mar/2015:11:22:27 -0400] "GET /flak/">flak</a></h1> HTTP/1.1" 404 11 "-" "ltx71 - (http://ltx71.com/)"
I like that we have a good mix here. Sometimes just the trailing quote, sometimes the quote and some of the trailing tag.
Request latency getting you down? Try multiplexing!
189.149.44.119 - - [28/Jun/2015:04:23:17 -0400] "GET /flak/post/out-with-the-old-in-with-the-less/https://xkcd.com/1343/ HTTP/1.1" 404 11 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; Q312461; Maxthon)"
195.91.240.66 - - [19/Jul/2015:01:07:01 -0400] "GET /flak/post/out-with-the-old-in-with-the-lesshttp://www.joelonsoftware.com/articles/fog0000000069.htmlhttp://www.jwz.org/doc/cadt.html HTTP/1.1" 404 11 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:39.0) Gecko/20100101 Firefox/39.0"
If somebody writes a blog post you really like, show your support by rereading it every ten minutes.
54.226.212.33 - - [20/Nov/2014:15:17:46 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:27:46 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:37:46 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:47:46 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:57:46 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
Or three times every ten minutes.
54.226.212.33 - - [20/Nov/2014:15:31:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:31:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:31:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:41:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:41:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:41:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:51:22 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:51:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
54.226.212.33 - - [20/Nov/2014:15:51:23 -0500] "GET /flak/post/retiring-crypt HTTP/1.1" 200 9992 "-" "-"
Mongoscale!
Sometimes a flak post is so good you can’t even wait ten minutes to read it over and over and over again.
217.67.201.162 - - [08/Apr/2014:06:56:51 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:51 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:51 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:52 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:52 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:52 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:06:56:52 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
Three hours and 17000 requests later, still going strong...
217.67.201.162 - - [08/Apr/2014:10:01:03 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:10:01:03 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:10:01:03 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
217.67.201.162 - - [08/Apr/2014:10:01:03 -0400] "GET /flak/post/a-brief-history-of-one-line-fixes HTTP/1.1" 200 4344 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0"
It’s a good thing I’m not running WordPress or we’d have run out of coal by now.
Honorable mention goes to the shotgun spider, which attempts to simultaneously download every page on my site from a wide variety of IPs.
108.62.72.15 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/tag/review HTTP/1.1" 200 8171 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
176.53.124.66 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/tag/philly HTTP/1.1" 200 7299 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
46.251.233.181 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/tag/programming HTTP/1.1" 200 8222 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
31.214.230.136 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/tag/web HTTP/1.1" 200 7050 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
8.29.121.135 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/tag/quote HTTP/1.1" 200 5653 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
23.19.74.2 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/post/stdwinjector HTTP/1.1" 200 4147 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
95.156.220.32 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/post/features-are-faults HTTP/1.1" 200 4892 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
167.88.104.199 - - [17/Nov/2014:04:10:23 -0500] "GET /flak/post/goreSSL HTTP/1.1" 200 4530 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
This is not exactly the most courteous way to crawl a site, but I have to give props for the synchronized clocks and military precision.
Most flak posts surge in popularity, peak quickly, then fade from view. But there’s a few evergreen posts that get recurring traffic long after first publication. The most popular such post on flak is the one about signify. Especially in Ukraine.
37.115.186.0 - - [04/Feb/2015:21:12:45 -0500] "GET /flak/post/signify/ HTTP/1.1" 404 ...
37.139.52.23 - - [06/Feb/2015:21:46:33 -0500] "GET /flak/post/signify/ HTTP/1.1" 404 ...
178.137.84.200 - - [08/Feb/2015:01:13:24 -0500] "GET /flak/post/signify/ HTTP/1.1" 404 ...
Alas, somebody added a trailing slash. Thousands upon thousands of visitors, from hundreds of different referring web sites, all being turned away at the door.
Web servers aren’t the only targets of bots. Occasionally someone will attempt to log in to my mail server as well.
Mar 20 08:31:20 ox smtpd[21426]: smtp-in: New session b7e7d19e684618c1 from host 200-203-63-23.pltce7002.dsl.brasiltelecom.net.br [200.203.63.23]
Mar 20 08:31:20 ox smtpd[21426]: smtp-in: New session b7e7d1a0f728f5cb from host 200-203-63-23.pltce7002.dsl.brasiltelecom.net.br [200.203.63.23]
Mar 20 08:31:20 ox smtpd[21426]: smtp-in: New session b7e7d1a16406eb83 from host 200-203-63-23.pltce7002.dsl.brasiltelecom.net.br [200.203.63.23]
Mar 20 08:31:21 ox smtpd[21426]: smtp-in: New session b7e7d1a2260a4afc from host 200-203-63-23.pltce7002.dsl.brasiltelecom.net.br [200.203.63.23]
Mar 20 08:31:21 ox smtpd[21426]: smtp-in: New session b7e7d1a3fec96c7e from host 200-203-63-23.pltce7002.dsl.brasiltelecom.net.br [200.203.63.23]
Mar 20 08:31:24 ox smtpd[21426]: smtp-in: Failed command on session b7e7d19e684618c1: "AUTH [...]" => 503 5.5.1 Invalid command: Command not supported
Mar 20 08:31:24 ox smtpd[21426]: smtp-in: Failed command on session b7e7d1a0f728f5cb: "AUTH [...]" => 503 5.5.1 Invalid command: Command not supported
Mar 20 08:31:24 ox smtpd[21426]: smtp-in: Failed command on session b7e7d1a16406eb83: "AUTH [...]" => 503 5.5.1 Invalid command: Command not supported
Mar 20 08:31:24 ox smtpd[21426]: smtp-in: Failed command on session b7e7d1a2260a4afc: "AUTH [...]" => 503 5.5.1 Invalid command: Command not supported
Mar 20 08:31:24 ox smtpd[21426]: smtp-in: Failed command on session b7e7d1a3fec96c7e: "AUTH [...]" => 503 5.5.1 Invalid command: Command not supported
Bonus points for parallelizing your code, but come on, dude/dame, check for error codes.
ox:/var/log> zgrep "host 200-203-63-23" maillog* | wc -l
4430
I mean, sure, maybe you’re not too concerned with efficiency when you’ve got a whole botnet at your disposal, but think of the savings that can be had if you only needed to rent half as many zombie hosts. If a host clearly doesn’t support the auth command, move on to the next host, and instead of spamming my log file, you could focus on spamming real people’s inboxes. It’s a win win for you and me.