Time driven crawling with Tarantula
Posted by Sandro Paganotti in
Ruby on Rails -
no comments
We are using Tarantula as an additional form of testing within our last project, Tarantula is called during a cruise control task to ensure that the application works also with realistic data (taken from the production database during the night).
The main problem we found in this approach was how to stop the crawler before it finish to load all the possible links collected during the session. This was critical for us ‘cause the web application currently handles thousands of elements and we could not wait hours.
So we chose to patch Tarantula in order to:- let the spider follow links in a random order
- stop the spider after a configurable period of time (eg: 10 min)
Here’s the crawler.rb you’ll need to put under ‘lib/relevance/tarantula/’ in order to obtain this effect, then you can change two instance_variables: test_max_time_links and test_max_time_forms in order to choose how many minutes the spider will spend on these elements.
In the next days we’ll fork the Tarantula projects in order to better share this new behavior.

