Over the last few months I’ve accumulated a half a million HTTP headers. It obviously contains a lot of garbage, but it’s still fun to toy with in aggregate. Of course there are lots of hackers in there - because it’s totally unstructured data. It’s some of the worst organized content I’ve ever seen, but it’s still fun to mess with.
If anyone’s interested, I threw it up on my other blog. A half a million HTTP headers might sound like a lot, but really, a lot of it is duplicated data. For instance search engine spiders, or RSS readers are going to look the same every time. But still, I thought it was interesting, just to look at.