The Whitehouse.gov Website’s Robots.txt File Has 1839 Lines In It

By Daniel Miessler on January 23rd, 2007: Tagged as Internet | Security
  • Looking at most of those entries, it looks like they're excluding pages which look to be designed for text only browsers/screen readers.. nearly every directory ends in /text

    Disallow: /asia/2005/photoessay/china/text
    Disallow: /asia/2005/photoessay/japan/text
    Disallow: /asia/2005/photoessay/korea/text
    Disallow: /asia/2005/photoessay/mongolia/text
    Disallow: /asia/2005/photoessay/mrsbush1/text
    Disallow: /asia/2005/photoessay/mrsbush2/text


    and if you browse up one directory, you get the same story with pictures..

    I'd say it looks like they are doing it to work around for a poor file structure or possibly to keep search engines from finding duplicate text (although without pictures)

    *shrugs* I'm all for pointing out when the administration does something crooked, but I can't see fault in this one.. (granted, I've only checked out 20 or so of the links.. the only one that didn't go anywhere for me was /video/text )
  • sergei
    Search in Google for 'robots.txt' shows whitehouse.gov at position 5
  • ghost16825
    Ooooh, /secret/ directories. *nods head*
  • Yup, even I noted it sometime back as an excellent sitemap. ;-)
blog comments powered by Disqus

Twitter Microblog

twitter_icon      facebook_icon

Sample Original Content


Information Security

Tutorials and Primers

Culture & Society

Technology & Science

Politics

Philosophy & Religion

Miscellaneous

Tools & Projects


Blog Archives