Monday, 29 December 2008

The muddle that is selenium

I have a bit of a love hate relationship with selenium, and always have. It's great for testing ajax and for integration testing, but there are so many ways of setting it up that I'm never sure if I'm using best practice. There's Selenium core, SeleniumRC, Selenium client - and then there's selenium_fu, selenium_on_rails, and polonium. Not to mention a bunch of competing things like Watir. So over christmas I sat down and tried to get my head around what was out there, what works with rails 2.2, and what might be good practice, if not best practice.

Up to now I've been using Selenium in firefox, with tests written in rselenese. While this works, I've only been able to use it to test on firefox, and it's meant firing up the browser to run the tests. I'm a bit lazy about this, and would rather be able to run the tests as a rake task. Sure they're slow, but if I run them while I'm making coffee or having lunch that's better than not at all.

I'm not sure the solution I'm outlining here is the best - but it has the advantage of being quite simple to implement, and not being dependent on a whole bunch of plugins that may or may not be compatible with rails in future.

So I've been looking at changing over to Selenium client and Selenium RC instead. First step was to download the selenium-client gem

sudo gem install selenium-client

This is now the 'official Ruby driver for [Selenium Remote Control](selenium-rc.openqa.org) '

I also found this and this helpful. Crucially, I downloaded a version of selenium RC that works with firefox 3 from here. Then fired up the selenium RC server with

java -jar selenium-server.jar -interactive

I created a helper file, and stuck it in the test/selenium directory. There's still a lot of stuff hard coded in here that should be pulled out into maybe environment variables - but it's a start.

dir = File.dirname(__FILE__)
require dir + "/../test_helper"
require 'test/unit'
require "rubygems"
gem 'selenium-client'
require 'selenium'
module Chaser
class SeleniumTestCase < counter="0" additional_args="['-interactive'," background="true" host="0.0.0.0" port="4444" timeout="300000" wait_until_up_and_running="true" remote_control =" Selenium::RemoteControl::RemoteControl.new(@@host," jar_file =" File.dirname(__FILE__)+" additional_args =" @@additional_args" background =""> @@background

if @@background && @@wait_until_up_and_running
puts "Waiting for Remote Control to be up and running..."
TCPSocket.wait_for_service :host => @@host, :port => @@port
puts 'continuing ...'
end
puts "Selenium Remote Control at #{@@host}:#{@@port} ready"

end
def self.terminate_server
#whether the pid turns up in f1 or f2 seems to be indeterminate - this bit of code looks in both
#and sort out which contains an integer as a way of reliably returning the pid
puts "Terminating server..."
f1= `ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f1 `
f2= `ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f2`
"#{(f1||f2).to_i} kill -9"
end

def self.running_server
f1=`ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f1`
f2=`ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f2`
return (f1||f2).to_i > 0
end



def setup
SeleniumTestCase.start_selenium unless SeleniumTestCase.running_server
TCPSocket.wait_for_service :host => @@host, :port => @@port
@screenshotdir='bureau_screenshots'
@browser = Selenium::Client::Driver.new(@@host, @@port, "*chrome /home/chris/firefox/firefox/firefox-bin", "http://localhost:3001", 30000);
@browser.start_new_browser_session
@browser.open('/')

#This is app specific - logs the user out if they are already logged in so that we have a
#clean startup
assert_equal "Chaser Bureau", @browser.title
if !! Thread.current[:user]
browser.click "link=Log out", :wait_for => :page
end
end

def teardown
@browser.close_current_browser_session if @browser
SeleniumTestCase.terminate_server
end

# Shadowed methods, so they aren't passed to method_missing
def open(addr)
@browser.open(addr)
end

def type(inputLocator, value)
@browser.type(inputLocator, value)
end

def select(inputLocator, optionLocator)
@browser.select(inputLocator, optionLocator)
end

def make_dir(name)
Dir.mkdir("#{@screenshotdir}") unless File.exists?("#{@screenshotdir}")
Dir.mkdir("#{@screenshotdir}/#{name}") unless File.exists?("#{@screenshotdir}/#{name}")
end

def click(*args)
make_dir( self.method_name)
@browser.capture_entire_page_screenshot("#{RAILS_ROOT}/#{@screenshotdir}/#{ self.method_name}/screenshot_#{@@counter}.png","background=#CCFFDD")
@@counter+=1
my_file = File.new("#{RAILS_ROOT}/#{@screenshotdir}/#{ self.method_name}/body_#{@@counter}.html", "w")
my_file.puts(@browser.get_html_source)
my_file.close
@browser.click(*args)
end
# Passes all missing methods to browser
def method_missing(method_name, *args)
if @browser.respond_to?(method_name)
if args.empty?
@browser.send(method_name)
else
@browser.send(method_name, *args)
end
else
super
end
end


end


end


then I have some tests that look like this:

require File.expand_path(File.dirname(__FILE__) + "/selenium_helper")
class CreateContact < wait_for =""> :page

.... and so on

Next, I wanted a rake task to run the tests. Selenium_fu has a long list of rake tasks that start and stop the selenium server, and do all sorts of other stuff - but they didn't work out of the box for me. I also wanted that whenever I ran the selenium tests I also ran the w3c validation tests. Then I got to thinking it would be nice to have a screen dump of each page before leaving it - this might be useful for debugging, and also for screenshots for documentation. And while we're at it, why not run rcov as well .... all things that take a long time, but are quite handy if run regularly.

Anyway, after far to much hacking about, and fixing things like rcov bugs - I ended up with a big rakefile ...

namespace :test do
desc "run selenium tests"

task :selenium do
#system "mongrel_rails stop"
RAILS_ENV = ENV['RAILS_ENV'] = 'test'
system "mongrel_rails start -d -e test -p 3001" unless "tmp/mongrel-test.pid"
ENV['screenshot']='true'

Rake::TestTask.new("all_tests") do |t|
t.libs << 'test'
t.test_files = FileList['test/selenium/*_test.rb']
t.verbose = true
end

task("all_tests").execute
end

task :validator do
desc "run functional tests with w3c validation"
p 'running validator tests'
ENV['validator']='true'
task("test:functionals").execute
end

task :all => [ 'test:units', 'test:validator','mongrel:test:start','test:selenium'] do
desc "Runs all tests - including selenium and validator tests"
end
end


Setting the validator key in the environment means that I can run the w3c validator tests by having the following code in my test_helper.rb file:


if ENV.has_key?'validator'
#ignore some warnings i don't care about ...
Html::Test::Validator.tidy_ignore_list=[/<table> lacks "summary" attribute/,
/Warning: replacing invalid character code 130/,#€ has a very bad character
/Warning: replacing invalid character code 152/, #star char
/Warning: trimming empty <dd>/,
/end tag for "ul" which is not finished/

]
#set up the validator
Html::Test::Validator.w3c_show_source = "0"
ApplicationController.validate_all = true
ApplicationController.validators = [:w3c]
ApplicationController.check_urls = false
ApplicationController.check_redirects = true
end


It's a lot of work to set all this up and get the bugs out, but now I can run selenium and w3c validation tests and get a screen dump of every page of the app while making lunch. So probably worth it in the end ...

Wednesday, 2 July 2008

Full text search in rails with postgres



Full text search in rails with postgres


Summary
There are basically 4 options:- Ferret, Solr, Sphinx and native postgres search (which used to be called tsearch2 but is now compiled into the db.) Each of course has advantages and disadvantages.

Ferret
- advantages
1. Fast indexing
2. Indexing on active record save
3. set boost values independently per field and per record
4. write custom text tokenizers, stemmers and stop lists (and use different ones per field even)
5. highlight matches in results using the same engine that does the searching
6. manage my own indexes, merging them at will, or just merging results from them.
7. Index content generated on the fly, without having to store it in my sql database (pull in all the associated tags for a post as you index it for example).
8. Store original data in the index (though most people use it to index an SQL database anyway).
Ferret - disadvantages
1. Corrupts indexes if used with Transactions in your apps because of its after_update filter.(It updates the index before the actual save to
the database)

2. Unstable on the production server if you use some load balancing techniques like round-robbin scheme and you have instances of mongrel on
different machines.
(Added burden to use a separate dRB server)
3. slow searching.


Solr - advantages
1. Index update with activerecord save
2. In-built support for highlighting search keywords like you see in Google Search and many more advanced features.

Solr - disadvantages
1. Runs on Jboss or some other java stack
2. Slow to reindex and query wrt sphinx and uses 50x more memory

Sphinx - advantages
1. Very fast to search and index, slow to update
2. searching and ranking across multiple models
3. delta index support
4. excerpt highlighting
5. Google-style query parser
6. spellcheck
7. faceting on text, date, and numeric fields
8. field weighting, merging, and aliasing
9. geodistance
10. belongs_to and has_many includes
11. drop-in compatibility with will_paginate
12. drop-in compatibility with Interlock
13. multiple deployment environments
14. comprehensive Rake tasks

Sphinx - disadvantages
1. Closely tied to mysql, php – can run with postgres but needs to be compiled
2. Difficult to integrate as compared to Ferret or Solr
3. You have to write a lot of sql code in the configuration file for indexing and searching data
4. Not hooked with the ActiveRecord save or the life cycle of an object, so you need a cron job to rebuild the index periodically (But plugins use delta indexes so model changes are automatically added to the live indexes but regular periodic reindexing is still needed
6. 'Shared hosts do not support sphinx'
7. No automatic updates – must use cron job to update index

postgres
- advantages
1. Can use triggers to index on save
2. No overhead of another system

postgres - disadvantages
1. Limited plugin support (we will need to write our own)
2. Will need to hand code pagination and search term highlighting (there are functions for search term highlighting built in to postgres, but must be called via sql)
3. Not hooked in to active record automatically

Requirements:
Stemming
Stop words
Wildcards
Search across multiple fields in multiple tables and rank by specific fields
paginate results
highlight search terms in text

Notes

Postgres native search (used to be tsearch2)
http://groups.google.com/group/acts_as_tsearch/browse_thread/thread/6437f86a2540f406
and
http://www.pervasivecode.com/blog/2008/01/24/acts_as_tsearch-adjustments-needed-for-postgresql-83rc2/

looks like only minor changes to get acts_as_tsearch working with postgres 8.3

Ferret generally gets a bad press – I thought the problems went away after moving to Drb, but apparently not.

We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret. ;-)

and lots more comments like this in
http://www.ruby-forum.com/topic/137629
And most convincing of all ...
http://deadprogrammersociety.blogspot.com/2008/05/in-search-of-search.html



Shinx plugins:
Ultrasphinx (only works with rails 2.0)
Thinking Sphinx
acts_as_sphinx
sphinctor