Ruby’s inject/reduce and each_with_object

For an object oriented language, Ruby’s functional features are pretty awesome. The productivity boost of Enumerable methods was one of the most exciting things for me when I first encountered Ruby, and that has continued to be the case.

In the examples below, I’ll use trivial sum methods to illustrate. Assume the parameter passed is always an array of numbers, [1, 2, 3].

each, map, and select were simple to understand and implement, but inject (reduce) took a little more effort. When I see code like the example below, I remember the times when I was inject-phobic:

def verbose_sum(numbers)
  sum = 0
  numbers.each { |n| sum += n }
  sum
end

This is way more verbose than it needs to be. Consider the equivalent inject method:

def concise_sum(numbers)
  numbers.inject(0) { |sum, n| sum += n }
end

…which can be reduced even further, since we’re calling a method on each object that takes no arguments (the + method):

def more_concise_sum(numbers)
  numbers.inject(0, :+)
end

One can even omit the zero and it will be inferred:

def even_more_concise_sum(numbers)
  numbers.inject(:+)
end

Nice, eh? However, let’s revisit the block variant of the method and see what happens if we add a puts statement at the end of such a block, and then call it:

def concise_sum(numbers)
  numbers.inject(0) do |sum, n|
    sum += n
    puts "sum is now #{sum}."
  end
end

# produces: NoMethodError: undefined method `+' for nil:NilClass

What happened? When using inject, the value returned by the block is the value inject will use as the memo for the next iteration. Since puts returns nil, and it was the last expression in the block, it was used as the memo in the next iteration, and the error occurred.

Enter each_with_object. Instead of using the block’s return value as the memo for the next iteration, each_with_object unconditionally passes the object with which it was initialized. It relies on you to modify that object as per your needs in the block. So the each_with_object version of sum would look like this:

def ewo_sum(numbers)
  numbers.each_with_object(0) { |n, sum| sum += n }
end

Note that the order of the parameters is reversed, compared with inject. I remember the order by remembering that it’s the same order listed in the method name itself — each is the object for each iteration and comes first, and with_object is the memo object and comes next.

When we run this code, we get…zero. WTF!?!?!?!?

Let’s see if it works using a hash instead. For the example, this hash will contain each number as a key, with the key’s to_s representation as the value:

def stringified_key_hash(numbers)
  numbers.each_with_object({}) do |n, hsh|
    hsh[n] = n.to_s
  end
end

When we run this, we get:

=> {1=>"1", 2=>"2", 3=>"3"}

This worked! So how are the two different? As previously mentioned, the block must modify the object initially passed to the each_with_object method. In the case of stringified_key_hash, we’re fine because we’ve passed in a Hash instance, and when we modify it using []= in every iteration, we’re always dealing with that same hash instance.

In contrast, when we used each_with_object in ewo_sum, the initial value was a Fixnum whose value was 0. The expression “sum += n” assigned and returned a different instance of Fixnum. Note that the object id’s for sum differ before and after this expression is evaluated:

[21] pry(main)> sum = 0
=> 0
[22] pry(main)> sum.object_id
=> 1
[23] pry(main)> sum += 3
=> 3
[24] pry(main)> sum.object_id
=> 7

Since, as we said, the initial value is unconditionally passed to the block in each iteration, the revised value created in the block was discarded. So, when using each_with_object, be sure that the modifications are being made to the original memo instance.

Now let’s go back to the earlier point about having to return the memo as the last expression of the block. Since each_with_object unconditionally passes the initial object, there is no need for the block to return it. If we add a puts to stringified_key_hash, we still get the correct result:

def stringified_key_hash(numbers)
  numbers.each_with_object({}) do |n, hsh|
    hsh[n] = n.to_s
    puts "Hash is now #{hsh}."
  end
end

Hash is now {1=>"1"}.
Hash is now {1=>"1", 2=>"2"}.
Hash is now {1=>"1", 2=>"2", 3=>"3"}.
=> {1=>"1", 2=>"2", 3=>"3"}

A minor point about my choice of hsh as a variable name…it’s a good idea not to use hash as a variable name, because, in any object that is a class that includes Kernel in its ancestors, hash will be a method name:

[36] pry(main)> hash
=> -1606748642386923196
[37] pry(main)> Object.new.hash
=> 4200367341767882288

While it’s unlikely that this name collision would bite you, it’s not impossible. Better to avoid the possibility altogether.

And why do I use hsh and not a more descriptive name like the method name stringified_key_hash? We already have the more descriptive method name, where it is most valuable, since that name is for the exposed API, whereas the block variable is one that API users need never see. The need for a descriptive name for the block variable is greatly reduced by its narrow scope and its proximity to the more descriptive method name.

Conclusion

One could say that inject and each_with_object are different methods that behave differently intentionally, and one should choose which one to use based on the use case. However, in my (perhaps limited) experience, I have never encountered the need to return instances different from the initial instance in a block, and I find myself always using each_with_object these days. The only reason I even discovered the each_with_object Fixnum issue was that I was involved in a discussion about each_with_object and wanted to produce a minimal example of it.

That said, isn’t it great how many choices we have? More than any other piece of code I know of, the Enumerable (1.9, 2.0) module is a treasure trove that perpetually pleases.

 in Your System Prompt

In my daily work, I often connect to Linux boxes from my Mac. With several terminal windows open, it’s nice to easily see which ones are connected to my local Mac, and which ones are connected to other machines. One can certainly insert the host name into the system prompt. Here’s an example that contains the time, host name, and current directory:

export PS1="\n\t \h:\w\n> "

21:02:45 my_host_name:~
> 

http://blog.twistedcode.org/2008/03/customizing-your-bash-prompt.html has a lot of information about customized bash prompts.

Wait a minute, I thought, I wonder if there’s a Unicode character that can be included in the prompt that will jump out at me to tell me where I am…so I searched the web, and on http://hea-www.harvard.edu/~fine/OSX/unicode_apple_logo.html was the apple logo!

So I now have the Apple logo as the very first character of my system prompt. A picture grabs the eye more effectively than a letter, so it’s much easier now to tell that this terminal is connected to my Mac:

export PS1="\n \t \h:\w\n> "
 
 21:00:50 my_host_name:~
> 

Using Oracle in JRuby with Rails and Sequel

The Ruby culture prefers open source technologies, and when it comes to relational data bases, MySQL and Postgres are commonly used. However, there are times when the Rubyist will not be in a position to choose technologies and must inherit legacy decisions. For example, a common issue in the enterprise is the need to integrate with Oracle. In this article, I’ll talk about integrating Oracle and JRuby (1), using both Active Record (Ruby on Rails) and the Sequel gem.

Continue reading Using Oracle in JRuby with Rails and Sequel →

Copying (RVM) Data Between Hosts Using ssh, scp, and netcat

Occasionally I need to copy nontrivial amounts of data from one machine to another. I describe in this article three approaches to doing this on the command line, and explain why the third, using ssh and tar, is the best one.

As test data, I decided to use RVM’s hidden control directory, ~/.rvm. I deleted my non-MRI 1.9 rubies to reduce the transfer size.

I haven’t tested this, but I imagine that for installing rvm on multiple similar systems (e.g. those with compatible native compilers, libraries, etc.), it may be possible to save a lot of time by a full install of rubies and gems on only one machine, then on the others doing a minimal install of rvm and then copying the fully populated .rvm directory.

Not So Good — Using scp

Note: This approach requires that the ssh port (22) be open on the destination host, and sshd is running. On the Mac, this is done by enabling “Remote Login” in the Sharing Preferences.

A very simple way to do this is to use scp (secure copy, over ssh) with the -r (recursive) option. For example:

scp -r source_spec destination_spec

…where source_spec and destination_spec can be local or remote file or directory specifications. (I’ll use the term filespec to refer to both.) Remote filespecs should be in the format user@host:filespec. Don’t forget the colon, or the copy will be saved to the local host with a strange name! Here is an example that works correctly:

# To create ~/.rvm on the destination:
>time scp -rq ~/.rvm kbennett@destination_host:~/temp/rvm-copy/using-scp/
Password:
scp -rq ~/.rvm kbennett@destination_host:~/temp/rvm-copy/using-scp/  25.38s user 40.99s system 3% cpu 31:12.66 total

When I tried this, I was astonished to see that the destination directory consumed more than twice as much space as the original! To easily get the amount of space consumed by a directory tree, with the size in human readable format, run du -sh directory_name. For example:

# At the source:
>du -sh .
427M    .

# At the destination:
>du -sh .
1.1G    .

Continue reading Copying (RVM) Data Between Hosts Using ssh, scp, and netcat →

Building A Great Ruby Development Environment and Desktop with Linux Mint 13 “Maya” Mate

The purpose of this article is to provide for you a clear and simple guide to setting up a nice Linux environment for Ruby software development and more.

I’ve been using Linux as a development environment on and off for a decade. In recent years I’ve leaned towards Mac OS, partly because I’ve been very disappointed in the Linux desktops’ progress (or lack of it). Nevertheless, I use Linux on all my old PC laptops, and in VM’s on my Macs. Enter Linux Mint, version 13…

I really like the new Linux Mint 13 Mate distro and decided to install it on several systems. The desktop is simple, intuitive, and clean, and underneath it’s Ubuntu. Unlike the Ubuntu distro, however, Mint includes codecs that are needed for multimedia play. More information about multimedia software and the Mint installation itself is at http://www.howtoforge.com/the-perfect-desktop-linux-mint-13-maya. Besides functioning as a software development environment, another use for my Mint systems is to drive my HDTV with content from TV web sites, Hulu Plus, YouTube, Vimeo, etc. Unfortunately, Netflix streaming video does not work on Linux.

At some point I’d like to take the time to learn Chef and automate the process, but until then, I figured I’d at least document everything I did to reduce the time and effort with each new installation.

This article describes the development environment I settled on for now, and how to replicate it. It’s intended to enable you to get a high quality system up to speed as quickly as possible. A lot of my choices are subjective (e.g. zsh rather than bash), so feel free to skip or modify anything. I assume you have a minimal understanding of Linux, and I omit some detail that might be needed by Linux beginners. Where version numbers are embedded in file names, those versions may differ at the time of your installation, so modify the names accordingly.

Following is a step by step guide. Although I installed Linux Mint, most or all of these steps should work on standard Ubuntu distributions too.

Continue reading Building A Great Ruby Development Environment and Desktop with Linux Mint 13 “Maya” Mate →

Intro to Functional Programming in Ruby

Ruby is a flexible and versatile language. Although it’s almost always used as an object oriented language, it can be used for functional programming as well.

In versions prior to Ruby 1.8, doing so was more awkward because there would be a lot of lambdas cluttering the code. In 1.9, however, we have the -> shorthand, which makes functional style code more concise and more similar to traditional FP languages.

This post is inspired by Jim Weirich‘s keynote at RubyConf in Denver last Friday (Nov. 2, 2012), in which he abundantly illustrated Ruby’s FP abilities. His code looked so different from most Ruby code that one attendee entering late whispered to the person next to him, what language is that?

Here’s a walk through some basic functional programming in Ruby. A file containing the source code for this article, and some puts statements to illustrate the code, is here.

We’ll start with some simple examples and work up to creating the execution of a workflow defined as an array of lambdas.

Continue reading Intro to Functional Programming in Ruby →

WordPress Administration with Ruby

(This article is about the wordpress_config_parser gem, whose project page is at https://github.com/keithrbennett/wordpress_config_parser.)

The Problem

I’ve just consolidated blogs, email accounts, and web site data from multiple hosting companies onto a single hosting account. The WordPress blogs are the most important assets, and I want a good backup plan for them.

After some research, I find that WordPress data consists of files in the file system (e.g. photos), plus data in a data base, usually MySQL.

For the files, I make the whole shell account a big git repository, and use a Git host on the cloud to be the origin repo.

For the database, though, it’s not so simple. Most of the information online points to the use of the PhpMyAdmin web app to perform a backup. However, I want this backup to be automated, repeatable, and self documenting. I need something that can be run from the command line. What to do?

The Solution

Continue reading WordPress Administration with Ruby →

Stealth Conditionals in Ruby

When I first encountered the Ruby language in 2001, after working with Java, C++, and C for several years, I fell in love with it. How expressive, concise, clear, and malleable it is. A few years ago I even named a slide show What I Love About Ruby. I use it for presentations on beginning Ruby for novices.

But there’s one thing in Ruby I haven’t gotten used to…the widespread use of what I call stealth conditionals, conditionals that are “hidden” in the middle of a one line statement, as in:

do_something(foo, bar, baz) if some_condition

I strongly believe that just as we software developers strive to create user interfaces that communicate structure and content with visual cues to our users, we should do the same for each other in our source code.

Continue reading Stealth Conditionals in Ruby →

Hello, Nailgun; Goodbye, JVM Startup Delays

One of the frustrations of working with JRuby is that every single time you run it, you start a whole new JVM. This takes seconds:

>time jruby -e 'puts(123)'
123
jruby -e 'puts(123)'  1.94s user 0.11s system 178% cpu 1.144 total

If you’re using JRuby, and working with gem, rspec, irb, and other JRuby tools, this waiting time adds up and can be frustrating.

Enter Nailgun

Nailgun is a Java utility that starts up a JVM and behaves like a server, accepting client requests to run Java based software on it. The JRuby team did a great job of integrating it into JRuby, making it trivially simple to use.

Continue reading Hello, Nailgun; Goodbye, JVM Startup Delays →

Ruby’s Forwardable

Last night I had the pleasure of attending the Arlington Ruby User Group meeting in Arlington, Virginia. Marius Pop, a new Rubyist, presented on Ruby’s Forwardable module. Forwardable allows you to very succinctly specify that you want to define a method that simply calls (that is, delegates to) a method on one of the object’s instance variables, and returns its return value, if there is one. Here is an example file that illustrates this:

require 'forwardable'

class FancyList
  extend Forwardable
  
  def_delegator :@records, :size
  
  def initialize
    @records = []
  end
  
end

puts "FancyList.new.size = #{FancyList.new.size}"
puts "FancyList.new.respond_to?(:size) = #{FancyList.new.respond_to?(:size)}"

# Output is:
# FancyList.new.size = 0
# FancyList.new.respond_to?(:size) = true

After the meeting I thought of a class I had been working on recently that would benefit from this. It’s the LifeTableModel class in my Life Game Viewer application, a Java Swing app written in JRuby. The LifeTableModel is the model that backs the visual table (in Swing, a JTable). Often the table model will contain the logic that provides the data to the table, but in my case, it was more like a thin adapter between the table and other model objects that did the real work.

It turned out that almost half the methods were minimal enough to be replaced with Forwardable calls. The diff is shown here:

Continue reading Ruby’s Forwardable →