Saturday, 7 May 2011

Creating word documents in rails

This week, I needed to create a word document from data in a rails app. Needless to say, there is not a windows machine in sight. After a bit of googling around and thinking about maybe trying to use OpenOffice to do some of the heavy lifting I came across a couple of posts that suggested it might be possible to do what I needed by creating a docx file that could be used as a template, and then editing it. After all, a docx file is just a zip file with a bunch of xml files inside ...

Levente Bagi has a nice solution but it didn't really meet my needs, and seemed overly complicated in places. There was also this blog article which outlined the technique but didn't have a lot of detail. My problem was that I had to extract a bunch of stuff from an active record object (an Event) and then iterate through several associated objects (Event has_many Days, has_many Providers). So I ended up rolling my own - hopefully these notes will help anyone following on behind.

First lesson - don't try and use rubyzip or zipruby to compress the files when creating the docx file. For reasons I didn't really investigate, they don't work. I'm guessing the default compression is wrong for docx files, but don't have the stamina to wade through the documentation. Use system zip instead.

The approach I took was this:
  1. Create a template. Using MS Word, make a document that is the sort of thing you want to create programatically. I originally wanted to add images but this complicates things unnecessarily.
  2. Save this as a docx file.
  3. Unzip the docx file. You get a folder containing several subfolders. One of these is called word, and inside that is a file called document.xml. Open it up with something that will format xml nicely - I used netbeans. First I found the data that needs to be extracted from the Event object. I replaced that with a new xml node containing the name of the method I wanted to call on the Event object as text so in place of
  4. <w:t>My event</w:t>

    I had

    <w:t><insert>fd_event_name</insert></w:t>


    Continue with the same node name for all the methods to be called on this object
  5. Next find the chunk of html that represents the associated object. We are going to need to cut this out and put it in a new xml document so that we can iterate over it. So we create a new empty document with the same namespace definitions as in document.xml, add a new node called <fragment/> and then paste the text you cut from the template document inside. In place of the cut text in the master template, add a new node - in my case since the cut text will display information about the each day of the event, I called the node <days/> Now work through the fragment and add a new xml node containing the name of the method I wanted to call on the Day object as text so in place of

    <w:t>Sat May 7th 2011</w:t>

    I had

    <w:t><insert>date</insert></w:t>

    One refinement I needed to make was to pass an index and count for each associated object so that I could have headings like "Day 1 of 5" - just as before, I added nodes to the template where I needed these to appear.
  6. Repeat for other associated objects
  7. Now we need to create a new word document using these pieces. I created a method on the Event object
     def create_docx
    f=File.read("lib/docx_sections/template.xml")
    #substitute fields in main template
    doc = substitute(f,self)
    f=File.read("lib/docx_sections/day.xml")

    self.days.each_with_index do |day, i|
    doc.xpath("//days").before(substitute(f,day, i, self.days.size).xpath("//fragment").children)
    end
    f=File.read("lib/docx_sections/provider.xml")
    self.providers.each_with_index do |provider, i|
    doc.xpath("//providers").before(substitute(f,provider, i, self.providers.size).xpath("//fragment").children)
    end
    doc.xpath("//days").remove
    doc.xpath("//providers").remove
    doc = doc.to_s.gsub(/(\n|\t|\r)/, ' ').gsub(/>\s*<').squeeze(' ') build_docx(doc)
    end

    Let's go through this line by line. We read in the template.xml file, and call substitute with the file and self as parameters - we'll look at that method later. Then we do the same with the associations - read the template, iterate over the associated objects, call substitute. Then we remove the marker tags, compress the xml file to remove any whitespace we don't need, and build the docx file. Easy.

    So what about the substitute method. It could hardly be simpler. Nokogiri makes it easy to replace the marker nodes we added with the content we want. Find the "insert" node, get the text it contains, call the method of that name on the object and replace the node with the result. Similarly, replace the index and count nodes with the parameters we passed in.
    def substitute(xmlstring,obj, i = 0, count = 1)
    doc= Nokogiri::XML(xmlstring.clone)
    doc.xpath("//insert").each do |n|
    n.parent.content= obj.send(n.text.to_sym)
    end
    doc.xpath("//index").each do |n|
    n.parent.content= i + 1
    end
    doc.xpath("//count").each do |n|
    n.parent.content= count
    end

    doc
    end
    Finally, the build_docx method is essentially stolen from Levente Bagi.
    def build_docx(content)
    filename="#{self.event_organiser.fd_name}_#{self.fd_event_name}".gsub(/\s*/, '')
    in_temp_dir do |temp_dir|
    system("cp -r lib/word_template_files #{temp_dir}/plan_report")
    open("#{temp_dir}/plan_report/word/document.xml", "w") do |file|
    file.write(content)
    end
    system("cd #{temp_dir}/plan_report; zip -r ../#{filename}.docx *")
    system("cp #{temp_dir}/#{filename}.docx /home/chaser/downloads")
    end
    end

    def in_temp_dir
    temp_dir = "/tmp/docx_#{Time.now.to_f.to_s}"
    Dir.mkdir(temp_dir)
    yield(temp_dir)
    system("rm -Rf #{temp_dir}")
    end


As mentioned at the start of this post - I originally hoped to be able to add images to this document - but that would require understanding enough about the way docx files handle assets and frankly the users will probably want to change the images and layout to suit their needs so it's almost certainly not worth it. It would be nice to try though ...

Thursday, 16 September 2010

licenceable

I've been working quite a bit with Devise over the last week or so. I'm rewriting something that used to be a desktop application and turning it into a web app. The users currently have a seat-based licencing arrangement - 5 users, 10 users etc. Keeping this arrangement in a web application has proved a bit tricky. There are several posts in the devise group that suggest this is a fairly common requirement, but no obvious solutions. So here's my first hack at it - I've simply added one more method call to the resource inside the DatabaseAuthenticatable authenticate! method


require 'devise/strategies/authenticatable'

module Devise
module Strategies
# Default strategy for signing in a user, based on his email and password in the database.
class DatabaseAuthenticatable < Authenticatable
def authenticate!
resource = valid_password? && mapping.to.find_for_database_authentication(authentication_hash)

if resource && resource.licenced? && validate(resource){ resource.valid_password?(password) }
resource.after_database_authentication
success!(resource)
else
fail(:invalid)
end
end


end
end
end

Warden::Strategies.add(:database_authenticatable, Devise::Strategies::DatabaseAuthenticatable)



Then in the resource model - in this case the User model - I can have a licenced? method that checks whether the user has a licence to access the system. The method looks like this:



def licenced?
not_logged_in? and has_licence?
end



Next I added a 'last_signed_out_at' field to the resource record and override the destroy method in the Devise::SessionsController.


def destroy
current_user.update_attribute(:last_signed_out_at, Time.now) rescue nil
set_flash_message :notice, :signed_out if signed_in?(resource_name)
sign_out_and_redirect(resource_name)
end



On its own this is not enough to check whether the user is logged in, because they may just have closed the browser or their session may have timed out. We also need the fields added by the Trackable and Timeoutable modules. Together these changes allow us to write a not_logged_in? method in the User model


def not_logged_in?
ok = current_sign_in_at.nil? || last_request_at.nil?
ok = ok || last_request_at.to_i < Time.now.to_i - Devise.timeout_in.to_i rescue nil
ok= ok || last_signed_out_at > current_sign_in_at rescue nil
ok
end


Finally we need to decide whether the user is licenced. This will vary depending on the application of course. In our case, an event can have any number of users but only licence_count users can be logged in at the same time, so in the User model



def has_licence?
event.logged_in_users < event.licence_count
end

And finally, we need to know how many users are logged in for a given event. In our case an event has_many users, and in the Event model I just iterate through all the users and count the ones that are signed in


def logged_in_users
users.inject(0){|count, u|
count+= 1 unless u.not_logged_in?
}
end


If I can find the time and there is any interest I will try and bundle this up into a stand-alone extension but for now just putting it out there ...

Wednesday, 25 August 2010

Testing with devise

Today I ran across a problem getting tests to pass with Devise. There were in fact two issues. First, in rails 3 the Gemfile does not stipulate the order in which gems are loaded and this can mess things up a bit. Especially with mocha, which I use for mocking and stubbing.

This post describes the issue.

To solve the problem I put

require 'mocha'
Bundler.require(:test)
include Devise::TestHelpers



at the end of test/helper.rb

But the sets still didn't pass ....

The other problem to which the solution was more obvious when I actually thought about it was because I am using confirmable. So the setup needed to be:

def setup
@user = users(:one)
sign_in @user
@user.confirm!
end

Friday, 20 August 2010

i18n in rails 3

Well it's been a long time, but I'm going to make an effort to do more blogging now that rails 3 is almost here - partly to keep track of what I need to remember - but also because it might be useful to others.

I'm upgrading a big app to rails 3 before we go live with it - and of course there are a few glitches. Mostly these are to do with gems and plugins that don't work - more on them when I get to them. But the first big hiccup was internationalization. The app I'm upgrading is fully internationalized - but when I got the front page up i got things like

translation missing: 'en'::character varying, devise, sessions, user, signed_in

instead of the nice translations i was expecting.

The cause seems to lie in the I18n gem - where it gets the configuration object


class << self
# Gets I18n configuration object.
def config
Thread.current[:i18n_config] ||= I18n::Config.new
end


I patched this by dropping a file in the lib directory and requiring it from application.rb

module I18n
class << self
def config
# don't pick up the config object from the current thread -
# it returns 'en'::character varying
I18n::Config.new
end
end
end



My guess is that this might slow things down a touch - creating a new config object instead of using an existing one - but it works for me so far

Sunday, 4 January 2009

rcov bugs

There is a really irritating bug in the ruby bindings for Rcov that has been reported by Brian Candler here among other places. Applying Brian's patches solves part of the problem (I think they are now in trunk) but the problem that throws up the error
/usr/local/lib/ruby/1.8/rexml/formatters/pretty.rb:131:in `[]': no
implicit conversion from nil to integer (TypeError)

mentioned later in the post is more annoying - not least because it came back after I already got rid of it once! I have a hunch that this error can be triggered by the content of the html being generated by rcov, and in my case it was triggered by a test requiring a missing file. At first I didn't realise this was the problem of course, and the trace didn't give me a clue about which test caused the problem. The proximate cause lies in a method called wrap in /usr/local/lib/ruby/1.8/rexml/formatters/pretty.rb.

After messing around for a bit i found I that changing the wrap method to return just a string worked fine

def wrap(string, width)
# Recursively wrap string at width.
return string

end


This gave me an error that identified that culprit test. I could then fix the path to the missing file, change pretty.rb back to the original version, and carry on ...

Monday, 29 December 2008

The muddle that is selenium

I have a bit of a love hate relationship with selenium, and always have. It's great for testing ajax and for integration testing, but there are so many ways of setting it up that I'm never sure if I'm using best practice. There's Selenium core, SeleniumRC, Selenium client - and then there's selenium_fu, selenium_on_rails, and polonium. Not to mention a bunch of competing things like Watir. So over christmas I sat down and tried to get my head around what was out there, what works with rails 2.2, and what might be good practice, if not best practice.

Up to now I've been using Selenium in firefox, with tests written in rselenese. While this works, I've only been able to use it to test on firefox, and it's meant firing up the browser to run the tests. I'm a bit lazy about this, and would rather be able to run the tests as a rake task. Sure they're slow, but if I run them while I'm making coffee or having lunch that's better than not at all.

I'm not sure the solution I'm outlining here is the best - but it has the advantage of being quite simple to implement, and not being dependent on a whole bunch of plugins that may or may not be compatible with rails in future.

So I've been looking at changing over to Selenium client and Selenium RC instead. First step was to download the selenium-client gem

sudo gem install selenium-client

This is now the 'official Ruby driver for [Selenium Remote Control](selenium-rc.openqa.org) '

I also found this and this helpful. Crucially, I downloaded a version of selenium RC that works with firefox 3 from here. Then fired up the selenium RC server with

java -jar selenium-server.jar -interactive

I created a helper file, and stuck it in the test/selenium directory. There's still a lot of stuff hard coded in here that should be pulled out into maybe environment variables - but it's a start.

dir = File.dirname(__FILE__)
require dir + "/../test_helper"
require 'test/unit'
require "rubygems"
gem 'selenium-client'
require 'selenium'
module Chaser
class SeleniumTestCase < counter="0" additional_args="['-interactive'," background="true" host="0.0.0.0" port="4444" timeout="300000" wait_until_up_and_running="true" remote_control =" Selenium::RemoteControl::RemoteControl.new(@@host," jar_file =" File.dirname(__FILE__)+" additional_args =" @@additional_args" background =""> @@background

if @@background && @@wait_until_up_and_running
puts "Waiting for Remote Control to be up and running..."
TCPSocket.wait_for_service :host => @@host, :port => @@port
puts 'continuing ...'
end
puts "Selenium Remote Control at #{@@host}:#{@@port} ready"

end
def self.terminate_server
#whether the pid turns up in f1 or f2 seems to be indeterminate - this bit of code looks in both
#and sort out which contains an integer as a way of reliably returning the pid
puts "Terminating server..."
f1= `ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f1 `
f2= `ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f2`
"#{(f1||f2).to_i} kill -9"
end

def self.running_server
f1=`ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f1`
f2=`ps axo pid -o command | egrep 'java.*?selenium|mongrel.*?3001' | grep -v egrep | cut -d' ' -f2`
return (f1||f2).to_i > 0
end



def setup
SeleniumTestCase.start_selenium unless SeleniumTestCase.running_server
TCPSocket.wait_for_service :host => @@host, :port => @@port
@screenshotdir='bureau_screenshots'
@browser = Selenium::Client::Driver.new(@@host, @@port, "*chrome /home/chris/firefox/firefox/firefox-bin", "http://localhost:3001", 30000);
@browser.start_new_browser_session
@browser.open('/')

#This is app specific - logs the user out if they are already logged in so that we have a
#clean startup
assert_equal "Chaser Bureau", @browser.title
if !! Thread.current[:user]
browser.click "link=Log out", :wait_for => :page
end
end

def teardown
@browser.close_current_browser_session if @browser
SeleniumTestCase.terminate_server
end

# Shadowed methods, so they aren't passed to method_missing
def open(addr)
@browser.open(addr)
end

def type(inputLocator, value)
@browser.type(inputLocator, value)
end

def select(inputLocator, optionLocator)
@browser.select(inputLocator, optionLocator)
end

def make_dir(name)
Dir.mkdir("#{@screenshotdir}") unless File.exists?("#{@screenshotdir}")
Dir.mkdir("#{@screenshotdir}/#{name}") unless File.exists?("#{@screenshotdir}/#{name}")
end

def click(*args)
make_dir( self.method_name)
@browser.capture_entire_page_screenshot("#{RAILS_ROOT}/#{@screenshotdir}/#{ self.method_name}/screenshot_#{@@counter}.png","background=#CCFFDD")
@@counter+=1
my_file = File.new("#{RAILS_ROOT}/#{@screenshotdir}/#{ self.method_name}/body_#{@@counter}.html", "w")
my_file.puts(@browser.get_html_source)
my_file.close
@browser.click(*args)
end
# Passes all missing methods to browser
def method_missing(method_name, *args)
if @browser.respond_to?(method_name)
if args.empty?
@browser.send(method_name)
else
@browser.send(method_name, *args)
end
else
super
end
end


end


end


then I have some tests that look like this:

require File.expand_path(File.dirname(__FILE__) + "/selenium_helper")
class CreateContact < wait_for =""> :page

.... and so on

Next, I wanted a rake task to run the tests. Selenium_fu has a long list of rake tasks that start and stop the selenium server, and do all sorts of other stuff - but they didn't work out of the box for me. I also wanted that whenever I ran the selenium tests I also ran the w3c validation tests. Then I got to thinking it would be nice to have a screen dump of each page before leaving it - this might be useful for debugging, and also for screenshots for documentation. And while we're at it, why not run rcov as well .... all things that take a long time, but are quite handy if run regularly.

Anyway, after far to much hacking about, and fixing things like rcov bugs - I ended up with a big rakefile ...

namespace :test do
desc "run selenium tests"

task :selenium do
#system "mongrel_rails stop"
RAILS_ENV = ENV['RAILS_ENV'] = 'test'
system "mongrel_rails start -d -e test -p 3001" unless "tmp/mongrel-test.pid"
ENV['screenshot']='true'

Rake::TestTask.new("all_tests") do |t|
t.libs << 'test'
t.test_files = FileList['test/selenium/*_test.rb']
t.verbose = true
end

task("all_tests").execute
end

task :validator do
desc "run functional tests with w3c validation"
p 'running validator tests'
ENV['validator']='true'
task("test:functionals").execute
end

task :all => [ 'test:units', 'test:validator','mongrel:test:start','test:selenium'] do
desc "Runs all tests - including selenium and validator tests"
end
end


Setting the validator key in the environment means that I can run the w3c validator tests by having the following code in my test_helper.rb file:


if ENV.has_key?'validator'
#ignore some warnings i don't care about ...
Html::Test::Validator.tidy_ignore_list=[/<table> lacks "summary" attribute/,
/Warning: replacing invalid character code 130/,#€ has a very bad character
/Warning: replacing invalid character code 152/, #star char
/Warning: trimming empty <dd>/,
/end tag for "ul" which is not finished/

]
#set up the validator
Html::Test::Validator.w3c_show_source = "0"
ApplicationController.validate_all = true
ApplicationController.validators = [:w3c]
ApplicationController.check_urls = false
ApplicationController.check_redirects = true
end


It's a lot of work to set all this up and get the bugs out, but now I can run selenium and w3c validation tests and get a screen dump of every page of the app while making lunch. So probably worth it in the end ...

Wednesday, 2 July 2008

Full text search in rails with postgres



Full text search in rails with postgres


Summary
There are basically 4 options:- Ferret, Solr, Sphinx and native postgres search (which used to be called tsearch2 but is now compiled into the db.) Each of course has advantages and disadvantages.

Ferret
- advantages
1. Fast indexing
2. Indexing on active record save
3. set boost values independently per field and per record
4. write custom text tokenizers, stemmers and stop lists (and use different ones per field even)
5. highlight matches in results using the same engine that does the searching
6. manage my own indexes, merging them at will, or just merging results from them.
7. Index content generated on the fly, without having to store it in my sql database (pull in all the associated tags for a post as you index it for example).
8. Store original data in the index (though most people use it to index an SQL database anyway).
Ferret - disadvantages
1. Corrupts indexes if used with Transactions in your apps because of its after_update filter.(It updates the index before the actual save to
the database)

2. Unstable on the production server if you use some load balancing techniques like round-robbin scheme and you have instances of mongrel on
different machines.
(Added burden to use a separate dRB server)
3. slow searching.


Solr - advantages
1. Index update with activerecord save
2. In-built support for highlighting search keywords like you see in Google Search and many more advanced features.

Solr - disadvantages
1. Runs on Jboss or some other java stack
2. Slow to reindex and query wrt sphinx and uses 50x more memory

Sphinx - advantages
1. Very fast to search and index, slow to update
2. searching and ranking across multiple models
3. delta index support
4. excerpt highlighting
5. Google-style query parser
6. spellcheck
7. faceting on text, date, and numeric fields
8. field weighting, merging, and aliasing
9. geodistance
10. belongs_to and has_many includes
11. drop-in compatibility with will_paginate
12. drop-in compatibility with Interlock
13. multiple deployment environments
14. comprehensive Rake tasks

Sphinx - disadvantages
1. Closely tied to mysql, php – can run with postgres but needs to be compiled
2. Difficult to integrate as compared to Ferret or Solr
3. You have to write a lot of sql code in the configuration file for indexing and searching data
4. Not hooked with the ActiveRecord save or the life cycle of an object, so you need a cron job to rebuild the index periodically (But plugins use delta indexes so model changes are automatically added to the live indexes but regular periodic reindexing is still needed
6. 'Shared hosts do not support sphinx'
7. No automatic updates – must use cron job to update index

postgres
- advantages
1. Can use triggers to index on save
2. No overhead of another system

postgres - disadvantages
1. Limited plugin support (we will need to write our own)
2. Will need to hand code pagination and search term highlighting (there are functions for search term highlighting built in to postgres, but must be called via sql)
3. Not hooked in to active record automatically

Requirements:
Stemming
Stop words
Wildcards
Search across multiple fields in multiple tables and rank by specific fields
paginate results
highlight search terms in text

Notes

Postgres native search (used to be tsearch2)
http://groups.google.com/group/acts_as_tsearch/browse_thread/thread/6437f86a2540f406
and
http://www.pervasivecode.com/blog/2008/01/24/acts_as_tsearch-adjustments-needed-for-postgresql-83rc2/

looks like only minor changes to get acts_as_tsearch working with postgres 8.3

Ferret generally gets a bad press – I thought the problems went away after moving to Drb, but apparently not.

We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret. ;-)

and lots more comments like this in
http://www.ruby-forum.com/topic/137629
And most convincing of all ...
http://deadprogrammersociety.blogspot.com/2008/05/in-search-of-search.html



Shinx plugins:
Ultrasphinx (only works with rails 2.0)
Thinking Sphinx
acts_as_sphinx
sphinctor