Path: README.rdoc
Last Update: Fri Jan 22 21:07:50 -0600 2010


Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See for more information.


  • Multi-threaded design for high performance
  • Tracks 301 HTTP redirects to understand a page‘s aliases
  • Built-in BFS algorithm for determining page depth
  • Allows exclusion of URLs based on regular expressions
  • Choose the links to follow on each page with focus_crawl()
  • HTTPS support
  • Records response time for each page
  • CLI program can list all pages in a domain, calculate page depths, and more
  • Obey robots.txt
  • In-memory or persistent storage of pages during crawl, using TokyoCabinet or PStore


See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.


  • nokogiri
  • robots