Levente Bagi has a nice solution but it didn't really meet my needs, and seemed overly complicated in places. There was also this blog article which outlined the technique but didn't have a lot of detail. My problem was that I had to extract a bunch of stuff from an active record object (an Event) and then iterate through several associated objects (Event has_many Days, has_many Providers). So I ended up rolling my own - hopefully these notes will help anyone following on behind.
First lesson - don't try and use rubyzip or zipruby to compress the files when creating the docx file. For reasons I didn't really investigate, they don't work. I'm guessing the default compression is wrong for docx files, but don't have the stamina to wade through the documentation. Use system zip instead.
The approach I took was this:
- Create a template. Using MS Word, make a document that is the sort of thing you want to create programatically. I originally wanted to add images but this complicates things unnecessarily.
- Save this as a docx file.
- Unzip the docx file. You get a folder containing several subfolders. One of these is called word, and inside that is a file called document.xml. Open it up with something that will format xml nicely - I used netbeans. First I found the data that needs to be extracted from the Event object. I replaced that with a new xml node containing the name of the method I wanted to call on the Event object as text so in place of
- Next find the chunk of html that represents the associated object. We are going to need to cut this out and put it in a new xml document so that we can iterate over it. So we create a new empty document with the same namespace definitions as in document.xml, add a new node called <fragment/> and then paste the text you cut from the template document inside. In place of the cut text in the master template, add a new node - in my case since the cut text will display information about the each day of the event, I called the node <days/> Now work through the fragment and add a new xml node containing the name of the method I wanted to call on the Day object as text so in place of
<w:t>Sat May 7th 2011</w:t>
I had<w:t><insert>date</insert></w:t>
One refinement I needed to make was to pass an index and count for each associated object so that I could have headings like "Day 1 of 5" - just as before, I added nodes to the template where I needed these to appear. - Repeat for other associated objects
- Now we need to create a new word document using these pieces. I created a method on the Event object
def create_docx
f=File.read("lib/docx_sections/template.xml")
#substitute fields in main template
doc = substitute(f,self)
f=File.read("lib/docx_sections/day.xml")
self.days.each_with_index do |day, i|
doc.xpath("//days").before(substitute(f,day, i, self.days.size).xpath("//fragment").children)
end
f=File.read("lib/docx_sections/provider.xml")
self.providers.each_with_index do |provider, i|
doc.xpath("//providers").before(substitute(f,provider, i, self.providers.size).xpath("//fragment").children)
end
doc.xpath("//days").remove
doc.xpath("//providers").remove
doc = doc.to_s.gsub(/(\n|\t|\r)/, ' ').gsub(/>\s*<').squeeze(' ') build_docx(doc)
end
Let's go through this line by line. We read in the template.xml file, and call substitute with the file and self as parameters - we'll look at that method later. Then we do the same with the associations - read the template, iterate over the associated objects, call substitute. Then we remove the marker tags, compress the xml file to remove any whitespace we don't need, and build the docx file. Easy.
So what about the substitute method. It could hardly be simpler. Nokogiri makes it easy to replace the marker nodes we added with the content we want. Find the "insert" node, get the text it contains, call the method of that name on the object and replace the node with the result. Similarly, replace the index and count nodes with the parameters we passed in.def substitute(xmlstring,obj, i = 0, count = 1)
Finally, the build_docx method is essentially stolen from Levente Bagi.
doc= Nokogiri::XML(xmlstring.clone)
doc.xpath("//insert").each do |n|
n.parent.content= obj.send(n.text.to_sym)
end
doc.xpath("//index").each do |n|
n.parent.content= i + 1
end
doc.xpath("//count").each do |n|
n.parent.content= count
end
doc
enddef build_docx(content)
filename="#{self.event_organiser.fd_name}_#{self.fd_event_name}".gsub(/\s*/, '')
in_temp_dir do |temp_dir|
system("cp -r lib/word_template_files #{temp_dir}/plan_report")
open("#{temp_dir}/plan_report/word/document.xml", "w") do |file|
file.write(content)
end
system("cd #{temp_dir}/plan_report; zip -r ../#{filename}.docx *")
system("cp #{temp_dir}/#{filename}.docx /home/chaser/downloads")
end
end
def in_temp_dir
temp_dir = "/tmp/docx_#{Time.now.to_f.to_s}"
Dir.mkdir(temp_dir)
yield(temp_dir)
system("rm -Rf #{temp_dir}")
end
<w:t>My event</w:t>
I had<w:t><insert>fd_event_name</insert></w:t>
As mentioned at the start of this post - I originally hoped to be able to add images to this document - but that would require understanding enough about the way docx files handle assets and frankly the users will probably want to change the images and layout to suit their needs so it's almost certainly not worth it. It would be nice to try though ...
2 comments:
Your example is great, but could you please attach a sample of *.xml templates.
Post a Comment