Easy Rails Asset Handling with HTML5 & S3

While working on the new Harvest Help Center, our team had a chance to look at some common web-app issues with fresh eyes. Asset upload is almost certainly a requirement of any modern document-based web site, usually for images or general downloadable assets. There are a few pre-built Rails plugins that address asset upload (most notably paperclip), but they require database tables and are often designed with server-side storage in mind. Having a robust server-side solution for assets provides many benefits, but we found they were unnecessary for the simple workflow we had in mind.

We worked at finding something simpler and came up smiling.

Why the traditional model doesn’t fit

Harvest uses Textile to format most of our documents. In Textile, an image is specified as follows:

!http://www.getharvest.com/images/this_image.jpg(alt text here)!

And linking to an asset is similar:

"Download a PDF":http://www.getharvest.com/assets/document.pdf

In both these situations the asset URL is written into the document itself. Even if we track the assets in a database, little can be done to modify the assets without breaking existing URLs. For instance, a plan to start with server-side image uploads then later move to a content delivery network, commonly called a CDN, isn’t an option. With the URLs in our documents, we would need to update every document before the CDN could benefit us.

Speaking of CDNs, we knew we wanted to be using one from the start. The added infrastructure needed for server-side asset storage would never be useful to us.

Once we recognized storing assets on our server and tracking them in a database offered little to our particular requirements, it was time to find a simpler solution.

We came up with a short list of requirements for simple asset handling:

  • No database tables.
  • Use Amazon S3 for storage, and support using Amazon CloudFront for a CDN.
  • Use HTML5 for asynchronous uploading.

The solution we arrived at uses the excellent Plupload, the AWS::S3 gem and some simple Rails logic.

Configuring AWS::S3 and Rails

First, register for an Amazon S3 account if you don’t already have one. Then use an S3 client like S3Hub to access your account and create buckets for your project:

  • project-development
  • project-staging
  • project-production

Add the AWS::S3 Gem to your Gemfile:

gem 'aws-s3', :require => 'aws/s3'

Run bundle install and we should be ready to start using S3. Configure AWS::S3 by adding a YAML file and initializer script. In config/initializers/s3_credentials.rb:

# Load AWS::S3 configuration values
#
S3_CREDENTIALS = \
  YAML.load_file(File.join(Rails.root, 'config/s3_credentials.yml'))[Rails.env]

# Set the AWS::S3 configuration
#
AWS::S3::Base.establish_connection! S3_CREDENTIALS['connection']

In the actual configuration file config/s3_credentials.yml:

development: &defaults
  connection:
    :access_key_id: AAAAAA_your-key-here
    :secret_access_key: 4rpsi235js_your-secret-here
    :use_ssl: true
    # :persistent: true
  bucket: project-development
  max_file_size: 10485760
  acl: public-read

test:
  <<: *defaults
  bucket: project-development

staging:
  <<: *defaults
  bucket: project-staging

production:
  <<: *defaults
  # prefix is optional. This is where you would put your CloudFront Domain
  # Name or your CloudFront CNAME if you have one configured.
  prefix: "http://project.s3.mydomain.com"
  bucket: project

Now you can interact with Amazon S3 from Rails.

Building a controller to handle uploads

For this simple uploader, we have a limited set of requirements for the server-side logic. Uploads should be routed to an action where the asset is pushed to S3, and a URL is returned to the uploading request. Start by creating a controller:

script/rails g controller uploads

And be sure your new controller is in config/routes.rb

resources :uploads

The AWS::S3 upload code can go into the create action:

class UploadsController < ApplicationController
  # Maybe you have some filters, like :authenticate_admin!

  def create
    s3 = AWS::S3::S3Object.store \
      params[:file].original_filename,
      params[:file].tempfile,
      S3_CREDENTIALS['bucket'],
      :content_type => params[:file].content_type,
      :access => :public_read
    render :json => {
      :url => public_s3_url(params[:file].original_filename)
    }
  end

private

  def public_s3_url filename
    if S3_CREDENTIALS['prefix'].present?
      "#{S3_CREDENTIALS['prefix']}/#{filename}"
    else
      request.protocol +
      AWS::S3::Base.connections['AWS::S3::Base'].options[:server] +
      "/#{S3_CREDENTIALS['bucket']}/#{filename}"
    end
  end

end

The method public_s3_url will add a prefix defined in config/s3_credentials.yml allowing us to use a CDN for uploaded assets instead of the URL automatically generated for S3 assets by AWS::S3.

HTML5 uploads with Plupload

Plupload is a great swiss-army knife for uploading assets. It intelligently handles falling back from one upload strategy to another if one is unsupported by a browser. For our own internal tools at Harvest, we only worry about modern browsers that support HTML5 uploads. This simplifies our code.

To use Plupload, you first need a container DOM element for the upload system:

<div style="margin: 2em 0;" id="upload_container">
  <div id="filelist"></div>
  <a id="pickfiles" href="#">[Select files]</a>
  <a id="uploadfiles" href="#">[Upload files]</a>
</div>

Plupload needs to be told how to accomplish a few tasks:

  • What the URL for processing uploads is.
  • How to show selected files before upload.
  • How to show progress during upload.
  • Most importantly, how to parse the returned JSON from our Rails logic and display that URL.

In your application.js, create a new instance of plupload.Uploader if the container element is present. We’re using jQuery as well as Plupload in this example:

$(function() {
  if( $("#filelist").length ){
    var uploader = new plupload.Uploader({
      runtimes : 'html5',
      browse_button : 'pickfiles',
      max_file_size : '10mb',
      url : '/uploads',
      multiple_queues : true
    });

    // When the user selects files for upload, show them on the page
    //
    uploader.bind('FilesAdded', function(up, files) {
      $.each(files, function(i, file) {
        $('#filelist').append(
          '<div id="' + file.id + '">' +
          file.name + ' (' + plupload.formatSize(file.size) + ') <b></b>' +
          '</div>'
        );
      });
    });

    // When the file is uploaded, parse the response JSON and show that URL.
    //
    uploader.bind('FileUploaded', function(up, file, response){
      var url = JSON.parse( response.response ).url;
      $("#"+file.id).addClass("uploaded").html( url );
    });

    // Show upload progress over time- with HTML5 doesn't
    // really show values besides 0 and 100.
    //
    uploader.bind('UploadProgress', function(up, file) {
      $('#' + file.id + " b").html(file.percent + "%");
    });

    // When the upload button is clicked, upload!
    //
    $('#uploadfiles').click(function(e) {
      uploader.start();
      e.preventDefault();
    });

    uploader.init();
  }
});

To upload an asset, the user can click “[Select files]” or simply drag files onto that link. They can choose one file, or several. Next they click “[Upload files]” and wait for the assets to be sent to S3. After the assets are uploaded, they copy-paste the resulting URLs into Textile markup. That’s a simple flow with less maintenance or complexity than many other solutions, and provides all the functionality we need for asset handling.

This solution isn’t ideal for all apps in all situations, but for many of our own internal projects at Harvest it’s a simple and powerful strategy. We hope you find it useful!

If you think solving common problems in new and imaginative ways is something you do well, be sure to check out our Harvest Careers page, we’re hiring smart people and would love to talk to you.

29 Comments so far
  • Neato!

    That’s my full comment. Your blog software wants me to write a longer comment though, so I figured I’d add this smiley:

    :D

  • Doesn’t this approach require the asset to pass through the web server running the Rails app? There are ways to direct upload to S3 which means you don’t have to have instances handling upload traffic. It’s worth considering in my opinion.

  • Matt Beale February 25, 2011 HARVEST

    @Anthony Right- This goes through the webserver. There may be some Flash methods to get assets on S3 directly, but nothing HTML5 that I could find. I was looking here:

    * http://stackoverflow.com/questions/4223509/html-5-amazon-s3-direct-uploader
    * https://forums.aws.amazon.com/message.jspa?messageID=160131

    It looks like Amazon doesn’t yet support a flash-less asynchronous strategy for uploading to S3, and isn’t very interested in doing it right now. The flash libraries I looked into didn’t impress me. Bypassing the server is the obvious next step, right you are.

    @Saha – Smiley back at ya :-D

  • Hey , Pupload rocks!,

    i’ve implemented in artenlinea.com with carrierwave, for assets management. but to avoid the instance lock of the mongrels i bypassed the file upload through nginx (nginx upload module), so the process is pretty quick and stable.

    also i delay the s3 file upload , showing to the user a local copy of the file and run the s3 migration method in background :)

    nice post!

  • Sounds very useful but one thing I seem to be missing–very basic, perhaps left as “an exercise for the reader”–is what you do with the uploaded file URL in order to make use of it elsewhere in an app, given that one of the requirements is “No database tables.”

    Let’s say one use case is allowing users to upload their own photo a la Gravatar. In that case how do you associate the image with the user without storing the uploaded file URL in the user’s db record?

  • Matt Beale February 25, 2011 HARVEST

    @miguel – Nice touches! Working with the nginx upload module would be a great, simple step if you wanted to scale something like this. We don’t need that kind of optimization yet, but I took a look and there is a bit of information already out there.

    @BillSaysThis – Correct, this doesn’t work all the time. I mentioned Textile at the start of this post, and that’s because the copy-paste of a URL works great in that flow. If you wanted to have users upload their own photos with this, you could stash the URL in a hidden input until they saved the page or just update the Upload controller logic. The scenario you’re talking about sounds more like a job for paperclip or another gem-based solution though.

  • fantastic!!!
    i like use file system storage for files..
    tankilooo :x

  • Just wanted to point out that Textile is fairly flexible, so you’re smart to be using it in this way. If you ever need to change all your URLs, just alias method chain the RedCloth formatter’s method that generates links and throw in a gsub.

  • Matt,

    Here’s an example of one way to handle a direct upload:

    https://github.com/mwilliams/d2s3

  • Actually you can upload directly to S3 using javascript.

    Here is an example of how to do it in rails:
    http://tnux.net/2010/01/17/swfupload-direct-to-amazon-s3-in-ruby-on-rails/

    I ported his code to Sinatra and jQuery SWFUpload here:
    https://github.com/BlissOfBeing/Sinatra-S3-Direct-Upload

  • It is extremely frustrating that in 2011 we cannot direct upload to S3 using HTML with drag and drop / multiple files

    It feels so close…. but the pieces are not tying together

  • Matt Beale February 28, 2011 HARVEST

    @anthony – That library looks like a good start for a flash-based solution, though the last changeset being dated summer of 2009 points to some basic issues, like Rails 3 compatibility. There are a few forks adding Rails 3 support and extending the basic features though, so plenty to work with. Thanks for pointing it out!

    @wayne – Great documentation there. It’s worth nothing that strategy is a flash solution, not javascript (swfupload). Good stuff!

  • Plupload also supports direct file upload to S3, for Flash and SilverLight runtimes.

  • slaingod March 9, 2011

    This seems to be missing a big point in deployment 101: You never want to use domains you don’t control to host your assets directly. Meaning, if a year from now Amazon decides to triple the price for S3 (or a competitor comes in at 1/3rd the price), you don’t want to be tied directly to the Amazon URLs. Simply make sure you always have sub-domains set up for assets (assets.example.com) or whatever and then just use those. Then you can start out locally hosting media, then move to a CDN whenever you are ready, and switch to another CDN when that need arises, just by changing the CNAME. You just give the domain name as the bucket name in S3.

    Beyond that, another option is to just use FUSE to mount the S3 drive locally, which might work better in some workflows (but still host using the asset domain). Beyond that I would recommend always making sure uploads are not publically accessible (mark those file as private at the very least) and only allow public access to uploads that have been processed thru ImageMagick/FFmpeg/etc. Hosting file uploads directly is a recipe for hosting malware.

  • @slaingod There is a prefix key in the s3_credentials.yml that should quell your pricing/domain change fears.
    “The method public_s3_url will add a prefix defined in config/s3_credentials.yml allowing us to use a CDN for uploaded assets instead of the URL automatically generated for S3 assets by AWS::S3.”

  • The YAML file above is giving ‘config/initializers/s3_credentials.rb:8: You have a nil object when you didn’t expect it! (NoMethodError)
    You might have expected an instance of ActiveRecord::Base.
    The error occurred while evaluating nil.[]‘ errors.

    I found that [Rails.env] is nil above, as you’re then trying to load whatever development should be in the yaml (what’s default in this case?). So removing that then I get the following error:

    connection.rb:272:in `validate’: The following connection options are invalid: bucket, acl, max_file_size. The valid connection options are: access_key_id, secret_access_key, server, port, use_ssl, persistent, proxy. (AWS::S3::InvalidConnectionOption)

    so it seems bucket, acl, and max_file_size shouldn’t be here? This is indeed confusing, would appreciate your comments.

  • Matt Beale October 11, 2011 HARVEST

    @michael It looks like we had some whitespace get messed up- the YAML file wasn’t indented properly below the ‘connection’ key. Give it another shot.

    The < < stuff is a merge key, often used in YAML to avoid repeating values. You can read about it here:

    http://viewsourcecode.org/why/redhanded/bits/yamlSMergeKey.html

  • Hi Matt – Thanks so much for replying! Ahh, I was missing the indentation between development and connection, no wonder. (i.e. all were under the root namespace). Yup, I’ve used merge keys’s before. Default being ‘connection’ now makes sense =)

    https://gist.github.com/abbb48a690dd386bc4b7#comments

    I really tried getting this to word today but found that it would upload the asset (a 1.7 MB mp4 video clip) to my S3 bucket and even the path get sets nicely. Of course, my params hash was very different to yours – there was no :file => { :original_filename => …., :tempfile => …. } at all.

    Instead all I had my hash was :name and :file, which pointed to a file object (the temp file in this case). You’ll see I monkeypatched the S3 uploader around these. The issue I’m having is the uploaded file seems to be corrupted (or seems that way). I set the content_type (oh, this was missing in my params as well) to ‘video/mp4′ via S3′s admin and yet the file remains corrupted, i.e. it will not play. If I download the file from S3 again, it’s slightly larger than the one uploaded, by a couple 1,000 bytes.

    Right now I’m testing in Rails 2.3.9, aws-s3, latest plupload using html5 runtime. This would have served quite beautifully for my use if the uploaded assets weren’t being messed up this way.

    You can email me if you like, at mike@bsodmike.com – I’d really appreciate your advice in getting this to work… Thanks!

  • Correction- *Default being connection, bucket, max_file_size, and acl. I should have picked up on this from the later bit of code where you did S3_CREDENTIALS['bucket'] and in the initialiser S3_CREDENTIALS['connection'] – in my hurry clearly over looked it… :)

  • Matt Beale October 11, 2011 HARVEST

    @michael Hm, it’s been a while since we wrote this and I’m not certain we tested explicitly against 2.3.x (3.1 was just released!). I’m also no familiar with the intricacies around video uploads, though if a JPG or PNG is pushed up un-corrupted I’m not sure why a video would be.

  • I’ll certainly give this a try in a fresh 3.1 app and try uploading an image. Got held up rest of the day due to the corruption issue trying to figure it out.

    I looked at CarrierWave’s S3 upload facility but again this is thread-blocking – I liked your approach as it was async & db agnostic. This approach also caught my eye but I’ve not fully tested it out:

    https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload

    Will try debug this further and get back to you. Thanks.

    P.S. I’m only tickering around in a 2.3.9 app because I must, and 3.1.1 was just released on the 7th – not had a chance to give that a spin but as per the changelog it’s just fixes…

  • Ya’ll should check out:

    https://github.com/dwilkie/carrierwave_direct

  • Just tried it with a 3.1 app, and the hash is very different…!

    “name”=>”n532376588_319205_503.jpg”, … , “file”=>#<ActionDispatch::Http::UploadedFile:0x000001032f1418 @original_filename="n532376588_319205_503.jpg", @content_type="image/jpeg", @headers="Content-Disposition: form-data; name=\"file\"; filename=\"n532376588_319205_503.jpg\"\r\nContent-Type: image/jpeg\r\n", @tempfile=#>}

    Images are working fine now …interesting!

  • i strongly recommned that you use the plupload plugin + carrierwave with nginx upload module, with that you don have to worry about rails locking process or the direct upload to s3, because the upload is handled entirely by nginx (in C) in secure and much faster way.
    And the post processing of the asset (resizing etc..) could be made in a queue process.

    cheers

  • @Matt- I can confirm the video upload works in 3.1. I guess I’ll just chalk this up to some 2.x oddity and look at a different approach. Thanks for the great tip though xD

  • Matt Beale October 12, 2011 HARVEST

    @michael Let us know if you figure out the 2.3.x issue! More than a few rack things changed in the 3.0 release.

    @miguel and @wayne CarrierWaveDirect looks pretty slick!

  • Hi Matt- I long gave up on the 2.3.x issue, as time was of the essence. Instead, I’ve since created a dedicated 3.1 app that takes care of the all the uploading aspects and serves the details via an API. Really simple and works quite well!

    Thanks!

  • hi
    I am getting error as undefined method keys in connection.rb. what to do? plz help..

  • Hi Azad, Make sure your YAML file is indented properly, even the slightest mistake will cause issues. Gist up your error and post a link for further help =)

Comments have been closed.
The HARVEST Blog News & small business tips from your beloved time tracking & invoicing app.