Digging Into RubyGems
Every developer has experienced an episode of painful dependency management. Missing libs, “dll hell,” and hours of wasted effort. Been there. It’s painful.
Luckily for those working in the Ruby ecosystem, there is a nice tool that helps with dependency management: RubyGems.
In this post we’ll dive a little deeper into how RubyGems works with your Ruby code to properly load and manage gems. Understanding the load process will better prepare you when things go wrong (and things will, won’t they?) It will also give you insights into how to hook and innovate outside the normal RubyGems’s behaviour if you so choose.
Ruby’s $LOAD_PATH
or Where’s my library?!
When you load a dependency via Ruby’s require
or load
, where does Ruby go to fetch that library? Answer: $LOAD_PATH
.
Ruby’s $LOAD_PATH
is an array of directories that Ruby will search in to find and load dependencies.
This is what my Mac OS X system Ruby’s $LOAD_PATH
looks like (Ruby 1.8.7):
$ irb irb> $LOAD_PATH => ["/opt/local/lib/ruby/site_ruby/1.8", "/opt/local/lib/ruby/site_ruby/1.8/i686-darwin10", "/opt/local/lib/ruby/site_ruby", "/opt/local/lib/ruby/vendor_ruby/1.8", "/opt/local/lib/ruby/vendor_ruby/1.8/i686-darwin10", "/opt/local/lib/ruby/vendor_ruby", "/opt/local/lib/ruby/1.8", "/opt/local/lib/ruby/1.8/i686-darwin10","."]
When I require 'myfile'
in my Ruby code (if I am running my system 1.8.7 version), Ruby will try and find the file myfile.rb
in one of the directories above and run it. If not, it will raise a LoadError
exception.
At a crude level, you could manually drop dependencies in the $LOAD_PATH
to load libraries. But programmers are lazy and that’s a lot of work. Wouldn’t it be cool if there was a tool to manage gems for you?
Enter RubyGems.
RubyGems: Painless dependency management
RubyGems is a tool to discover, distribute, manage and build gems. When you install RubyGems (http://docs.rubygems.org/read/chapter/3), it does two things: it writes the source to one of the directories in Ruby’s $LOAD_PATH
so you can require 'rubygems'
in Ruby and installs the command line tool gem to help manage the gems themselves. They work together: by installing gems in a standard place, RubyGems can then work more intelligently about how to make libraries accessible from Ruby.
After I installed RubyGems 1.6.2, I see that rubygems.rb
was installed in /opt/local/lib/ruby/site_ruby/1.8
, which is part of the $LOAD_PATH
. The gem command line tool was installed in /opt/local/bin
.
Furthermore, the gem command tool I use to install gems has the following environment setup:
$ gem environment RubyGems Environment: - RUBYGEMS VERSION: 1.6.2 - RUBY VERSION: 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10] - INSTALLATION DIRECTORY: /opt/local/lib/ruby/gems/1.8 - RUBY EXECUTABLE: /opt/local/bin/ruby - EXECUTABLE DIRECTORY: /opt/local/bin - RUBYGEMS PLATFORMS: - ruby - x86-darwin-10 - GEM PATHS: - /opt/local/lib/ruby/gems/1.8 - /Users/iha/.gem/ruby/1.8 - GEM CONFIGURATION: - :update_sources => true - :verbose => true - :benchmark => false - :backtrace => false - :bulk_threshold => 1000 - REMOTE SOURCES: - http://rubygems.org/
Notice the value of INSTALLATION DIRECTORY
; that’s where my gems are installed when I run gem install name_of_a_gem
. How does RubyGems decide to put gems there? It’s relative to where the ruby
executable directory is, which in my case is in /opt/local/bin/
.
No wasted effort deciding for yourself where to put gems, the gem tool decides for you. You’ll notice too that it’ll try and find gems from the remote site http://rubygmes.org, a popular gem hosting site. Nice!
Let’s go ahead and install Nokogiri, a popular XML parser, from rubygems.org.
$ gem install nokogiri # go off to remote source http://rubygems.org ... $ gem list</code> *** LOCAL GEMS *** nokogiri (1.4.4)
Great! We just installed our first gem. And going to the install directory we notice:
$ cd /opt/local/lib/ruby/gems/1.8/gems $ ls nokogiri-1.4.4/
as expected by the gem command-line tool’s environment.
Let’s run some Ruby code via irb and require Nokogiri.
$ irb irb> require 'nokogiri' LoadError: no such file to load -- nokogiri from (irb):1:in `require' from (irb):1 from :0
As expected, it can’t find Nokogiri, since the Nokogiri gem was installed in /opt/local/lib/ruby/gems/1.8/gems/nokogiri-1.4.4/
, which is not in $LOAD_PATH
, so Ruby can’t find it. To make installed gems accessible from Ruby we first have to load RubyGems (recall rubygems.rb
is in the $LOAD_PATH
):
irb> require 'rubygems' => true irb> require 'nokogiri' => true
Success! We just loaded our first gem.
At this point, we might reasonably assume that RubyGems has placed Nokogiri in $LOAD_PATH
, which is why require 'nokogiri'
now works, but this is not the case. In fact, $LOAD_PATH
has not changed at all:
irb> $LOAD_PATH => ["/opt/local/lib/ruby/gems/1.8/gems/nokogiri-1.4.4/lib", "/opt/local/lib/ruby/site_ruby/1.8", "/opt/local/lib/ruby/site_ruby/1.8/i686-darwin10", "/opt/local/lib/ruby/site_ruby", "/opt/local/lib/ruby/vendor_ruby/1.8", "/opt/local/lib/ruby/vendor_ruby/1.8/i686-darwin10", "/opt/local/lib/ruby/vendor_ruby", "/opt/local/lib/ruby/1.8", "/opt/local/lib/ruby/1.8/i686-darwin10", "."]
If $LOAD_PATH
is unchanged, how were we able to require Nokogiri?? By overriding require
.
Looking at the source, we notice that RubyGems 1.6.2 has done exactly that:
#-- # Copyright 2006 by Chad Fowler, Rich Kilmer, Jim Weirich and others. # All rights reserved. # See LICENSE.txt for permissions. #++ module Kernel if defined?(gem_original_require) then # Ruby ships with a custom_require, override its require remove_method :require else ## # The Kernel#require from before RubyGems was loaded. alias gem_original_require require private :gem_original_require end ## # When RubyGems is required, Kernel#require is replaced with our own which # is capable of loading gems on demand. # # When you call require 'x', this is what happens: # * If the file can be loaded from the existing Ruby loadpath, it # is. # * Otherwise, installed gems are searched for a file that matches. # If it's found in gem 'y', that gem is activated (added to the # loadpath). # # The normal require functionality of returning false if # that file has already been loaded is preserved. def require path if Gem.unresolved_deps.empty? or Gem.loaded_path? path then gem_original_require path else spec = Gem.searcher.find_active path unless spec then found_specs = Gem.searcher.find_in_unresolved path unless found_specs.empty? then found_specs = [found_specs.last] else found_specs = Gem.searcher.find_in_unresolved_tree path end found_specs.each do |found_spec| # FIX: this is dumb, activate a spec instead of name/version Gem.activate found_spec.name, found_spec.version end end return gem_original_require path end rescue LoadError => load_error if load_error.message.end_with?(path) and Gem.try_activate(path) then return gem_original_require(path) end raise load_error end private :require end
Requiring RubyGems loads the Gem
module which has it’s own internal path array. This array is used by the overridden require method to search for installed gems, in addition to looking in $LOAD_PATH
.
That path is:
irb> Gem.all_load_paths => ["/opt/local/lib/ruby/gems/1.8/gems/nokogiri-1.4.4/lib"]
So we see the path to Nokogiri in the Gem
module’s internal path, which is used in the overwritten require
to search for gems. The Gem
module constructs this array by going into the default gem respository (in this case /opt/local/lib/ruby/gems/1.8
) and writes the absolute path to the lib
directory of each gem into the array, as well as any gem-specific load paths defined in the .gemspec
file of each gem. You can override the default gem repository by defining the environment variable $GEM_PATH
.
This explains how we were able to require Nokogiri without changing $LOAD_PATH
.
So now we see the big picture: Manage gems via the gem command-line tool; require 'rubygems'
and then require any gem in Ruby after that.
Ruby 1.9
As of Ruby 1.9, you no longer need to explicitly require 'rubygems'
as it’s now baked right into Ruby. However, if you are running 1.8, be careful of literring your code with require 'rubygems'
all over as some prominent Rubyists have argued (https://gist.github.com/54177)–and I believe rightly–that it unecessarily couples your code to RubyGems, which is really a environment setup configuration.
The workaround is to set the RUBYOPT
environment variable to ‘rubygems’ so that Ruby will run with ‘-rubygems’ as an option to automatically load RubyGems on startup.
Conclusion or The Path To Enlightenment
It might have occurred to you that despite my plug for RubyGems, there’s a problem. By default require 'nokogiri'
pulls the latest gem from your gem repo. But what if I have different versions of Nokogiri? And what if my code needs to load different versions depending on some condition (say, testing versus development)?
Luckily there’s an answer: Bundler. We’ll explore Bundler and versioned dependency management in Rails in our next post. But in the meantime, you can require RubyGems and make your life that more painless for now.