How To Generate Excel Files with Ruby

While Harvest provides powerful reports sometimes nothing beats having all of your data in a spreadsheet.  Unfortunately, the defacto spreadsheet tool (Excel) has some tricky file formats, making programatic export more difficult than it should be. We recently made improvements to how we handle Excel exports in Harvest, and I’d like to share a few options for exporting spreadsheets with Ruby.

Perhaps the simplest way is to export Comma Separated Value (CSV) and let the Excel process these. Generating CSV files is generally fast, the format is well known but not trouble free. If you only export simple ASCII characters, Excel will work fine, but drop one non-English character in and Excel gets confused. Excel cannot select the encoding, so if you want to export UNICODE data (such as when you’re working in non-English languages) using CSV files with Excel won’t do. Sure all other spreadsheet programs can import UTF8 encoded CSV files, some of these programs are free (OpenOffice) so if you have the option of ignoring Excel CSV will do.

Next is using TSV files (Tab Separated Values), these solve the UNICODE problem but the user must remember to explicitly convert to native Excel format and this is not what they usually do with spreadsheets. Clearly, we must produce native Excel files, and this means either XLS (format used up till Office 2003 included) or the new XLSX documents based on the controversial OpenXML. The older format is undocumented but many hours of reverse engineering has been spent on it and there are quite a few libraries out there. For Ruby the most notable is Spreadsheet.

XLS & Spreadsheet has some drawbacks. First it has a column limit of 256 and 64K on the number of rows. Generally you will break down a lot sooner when exporting large number of rows as Spreadsheet is very slow to construct the global string table. See all strings in an XLS file are stored in a global string table then the row data merely refers to string by the index in the table. A nice optimization trick that makes opening XLS files significantly faster, but it also makes writes slow. Constructing the string table takes time and with the particular implementation Spreadsheet consumes a lot of memory as it keeps the entire document in memory all rows included, then upon save it constructs a Hash of all strings in the document. In short Spreadsheet is a great library for parsing and constucting decorated documents but breaks down above 20k rows.

We have implemented an XLSX generator to get Excel exports in Harvest. Unlike XLS the format is documented, albeit the thousands of pages of documentation provides only marginal help. However, it does support another string storage model, where all values are stored inline. This makes file generation faster while’st opening such documents for the first time will be somewhat slower. The resulting export will be about the same size since XLSX is basically a zip file of other xml documents. To use, first you need to install the gem:

  $gem install simple_xlsx_writer

Creating OpenXML files is fairly straight forward:

  require 'rubygems'
  require 'simple_xlsx'

  SimpleXlsx::Serializer.new("test.xlsx") do |doc|
    doc.add_sheet("People") do |sheet|
      sheet.add_row(%w{DoB Name Occupation})
      sheet.add_row([Date.parse("July 31, 1912"),
                     "Milton Friedman",
                     "Economist / Statistician"])
    end
  end

The gem will recognize a few basic ruby data types and generate cells of the right type. There is no other support for formating the document as in different colors, font styles etc just raw output compatible with the following programs:

  • Open Office 3.2 (Linux, Mac, Windows)
  • Neo Office 3.2 (Mac)
  • Microsoft Office 2007 (Windows)
  • Microsoft Office 2010 (Windows)
  • Microsoft Office 2008 for Mac (versions 12.2.5 or above)
  • Microsoft Excel Viewer (Windows)

One notable exception from this list is Numbers from iWork ’09 as it does not yet support documents in the inline string storage model. Apple may fix this eventually.

13 Comments so far
  • Wow :) great work harvest ! Nice you opensourced it :D

  • Awesome work, thanks for releasing this!

    For those of you needing more complete XLSX generation, check out DocRaptor( http://docraptor.com/ ) which supports cell styling, forumlas, etc.

  • Hi Dee & Harvest team,

    thanks a lot for releasing this as open source, much appreciated!

    I was evaluating simple_xlsx along with ruby spreadsheet ( http://spreadsheet.rubyforge.org/ ) and arydjmal’s to_xls ( http://github.com/arydjmal/to_xls ) and found it a good balance between simplicity (spreadsheet requires ole) and compatibility (to_xls generates plain xml that opens ok in Excel but not in some other applications).

    With simple_xlsx and the example code above I was getting the following error message: “Excel cannot open this file. The file might have been damaged or modified from its original format.” After some serious debugging turns out “dcterms:created” time format in docProps/core.xml doesn’t comply with the expected xml time format. To fix this, open “lib/simple_xlsx/serializer.rb” on line 109 and replace:
    2010-07-20T14:30:58.00Z
    with:
    #{Time.now.utc.xmlschema}
    This will insert the current time in the correct format in the xml output generated.

    I also found it useful to run this as a rails plugin rather than as a separate gem. First, install the code as a plugin:
    ./script/plugin install git://github.com/harvesthq/simple_xlsx_writer.git
    Next, add “vendor/plugins/simple_xlsx_writer/init.rb” and paste in the following:
    require ‘simple_xlsx’
    You will need the rubyzip gem (and perhaps fast_xs) to satisfy the dependencies.

    I also adapted the code to use Tempfile and ZipOutputStream rather than ZipFile. This way only a single temporary file (the zip file itself) is created, rather one of each file inside the zip archive. Read Michael Simons’ post for some further pointers ( http://info.michael-simons.eu/2008/01/21/using-rubyzip-to-create-zip-files-on-the-fly/ ) and get in touch if you’re interested in the modified version.

    Hope this helps. Thanks again for a great library!

    Kind regards, Jussi

  • Michael Wood September 7, 2010

    Hi, thanks for this! I’m in the same boat you were in. Spreadsheet gem is too slow for large reports, and CSV files don’t play nice with UTF-8.

    I did however find an issue that is preventing us from using it: http://github.com/harvesthq/simple_xlsx_writer/issues/issue/1

    Cheers.

  • Heads up:
    When using simple_xlsx with Ruby 1.9.2, i had to also

    require ‘date’

    in order to use the Date#parse() method.
    the sample script barfs otherwise.

  • Thanks for open sourcing! That’s absolutely lovely from you guys.

  • Chris Johnson February 2, 2011

    I am currently using the spreadsheet gem with some success. However, I have several ‘workarounds’ that I’d like to get away from.

    Has anyone tried modifying this to open an existing workbook and adding rows to it, instead of creating each workbook from scratch? I have existing workbooks that I use as ‘templates’, and if I could use it in this way, it would great. Just checking before I try to make this work…

  • Hi Chris, I’m afraid simple_xlsx_writer won’t let you read or modify existing Excel documents, which is what you need for a ‘templating’ approach. Have a look at ruby spreadsheet ( http://spreadsheet.rubyforge.org/ ) it might be able to do what you need. It won’t work though if your worksheets are massive or you specifically need to use the newer XLSX file format.

    Hope this helps.

  • This works nicely thanks, however it makes a mess when generating the xlsx, it does not clean up after it’s created the final xlsx file.

  • Rob, we have a patch somewhere that makes simple_xlsx use Tempfile and ZipOutputStream rather than ZipFile. This was tidier in our environment.

    Also have a look at the discussion here: https://github.com/harvesthq/simple_xlsx_writer/issues/closed#issue/2

  • We’ve had the same problem as you are describing with the Excel gem: it broke down when generating files with about 25k rows. We fixed this by tuning the garbage collector with the following values (this is for REE, but you can use similar values for 1.9)

    export RUBY_HEAP_MIN_SLOTS=1000000
    export RUBY_HEAP_SLOTS_INCREMENT=1000000
    export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
    export RUBY_GC_MALLOC_LIMIT=1000000000
    export RUBY_HEAP_FREE_MIN=500000

    Hope this helps.

    Mihai

  • Jeroen van Ingen August 11, 2012

    How about this gem nowadays: https://github.com/randym/axlsx

  • Jeroen van Ingen August 11, 2012

    In addition to my previous post. The gem is meant as in easy generator for spreadsheets in the format ‘Office Open XML’

Comments have been closed.
The HARVEST Blog News & small business tips from your beloved time tracking & invoicing app.