Most Rubyists know about monkeypatching: opening up someone else’s class (often, something like String or Object) to modify some of its methods after the fact. It’s both incredibly powerful when used judiciously, and incredibly dangerous the rest of the time. I’ve spent countless hours trying to debug conflicting definitions of #to_json, or trying to untangle ActiveRecord’s astonishing levels of dynamic method aliasing.

I’m here to introduce you to a far more exciting threat: set_trace_func. This invidious callback is invoked on every function call and line of the Ruby interpreter. Most people, if they’re aware of it at all, correctly assume it’s intended for profiling.

They couldn’t be more wrong.

class Fixnum
  def add(other)
    self + other
  end
end

set_trace_func proc { |event, file, line, id, binding, classname|
  if classname == Fixnum and id == :add and event == 'call'
    # We can, of course, find the receiver of the current method
    me = binding.eval("self")

    # And the binding gives us access to all variables declared
    # in that method's scope. At call time only the method arguments will be
    # defined.
    args = binding.eval("local_variables").inject({}) do |vars, name|
      value = binding.eval name
      vars[name] = value unless value.nil?
      vars
    end

    # We can also *change* those arguments.
    args.each do |name, value|
      if Numeric === value
        binding.eval "#{name} = #{value + 1}"
      end
    end
  end
}

puts 1.add 1 # => 3

Note that this allows you to interfere with methods you’ve never seen before, simply by relaxing the class or id restrictions. Spooky action at a distance!

It Never Happened

Nobody suspects the value of integer arguments to change when a function is called. However, a suspicious rubyist might open up that class and add some debugging statements, uncovering our treachery. Let’s be a little more subtle.

previous = {}
depth = 0
set_trace_func proc { |event, file, line, id, binding, classname|
  if event == 'c-call'
    if depth == 0 and rand < 0.5
      # Get the caller's local variables
      locals = binding.eval("local_variables").inject({}) do |vars, name|
        vars[name] = binding.eval name
        vars
      end

      # Pick some strings
      strings = locals.delete_if do |name, value|
        not value.kind_of? String
      end

      i = rand strings.size
      str1 = strings.keys[i]
      str2 = strings.keys[(i + 1) % strings.size]

      # And play musical chairs
      previous[str1] = strings[str1].dup
      previous[str2] = strings[str2].dup

      binding.eval "#{str1}.replace #{previous[str2].inspect}"
      binding.eval "#{str2}.replace #{previous[str1].inspect}"
    end

    depth += 1
  elsif event == 'c-return'
    depth -= 1

    if depth <= 0
      # Whoops, the music stopped! Everyone grab your original seat!
      depth = 0

      previous.each do |name, value|
        binding.eval "#{name}.replace #{value.inspect}"
      end

      previous = {}
    end
  end
}

a = "hello"
b = "world"

puts [a, b]
# => "world\nhello"

# Sometimes.

For best results, re-order the arguments to functions which take more than 2 non-hash arguments in a deterministic way.

Next Level Language Maneuver

For the Haskell and Erlang enthusiast, might I suggest:

# Enforce immutable programming. Silently.
lambda {
  default_frame = lambda do 
    {
      :locals => {}
    } 
  end

  # Stack contains the bound local variables for each method call.
  stack = [default_frame[]]

  set_trace_func proc { |event, file, line, id, binding, classname|
    if event == 'call' or event == 'c-call'
      stack << default_frame[]
    elsif event == 'return' or event == 'c-return'
      stack.pop
      stack << default_frame[] if stack.empty?
    end

    binding.eval("local_variables").each do |var|
      # Get the original and current values of this variable
      old = stack.last[:locals][var]
      new = binding.eval var
   
      if old != nil
        unless old == new
          # The variable has changed!
          binding.eval("lambda { |v| #{var} = v }")[old]
        end
      else
        # We haven't seen this variable before
        begin
          original = new.dup 
          # Immediately replace this variable with a *different* duplicate of 
          # itself to prevent mutator methods from leaking across contexts, or
          # corrupting our stack
          binding.eval("lambda { |v| #{var} = v}")[new.dup]
        rescue
          # Guess you can't dup that 
          original = new
        end
        
        stack.last[:locals][var] = original
      end
    end
  }
}.call

# Any grade schooler could tell you this would have been nonsense.
x = 1
x = 2
puts x    # => 1. Ahhhh, much better.

# Your functions are idempotent, right? Well, they are now!
array = [1, 2, 3]
array.delete 2
p array   # => [1, 2, 3]

# Makes destructive methods more relaxing!
string = 'good'
puts lambda { |str|
  str.replace 'evil' 
}[string]     # => evil
puts string   # => good

# Blocks don't close over their arguments, sadly.
elem = 0
[1,2,3].each do |elem|
  puts elem
end
puts elem     # => 0, 0, 0, and more 0.

Generalization to class and global variables is left to the reader.

Note that you can do this with fewer copies required, but keeping track of which bindings include references to a given mutated object is nontrivial.

Suggested Exercises

  1. PHP programmers may want to try implementing $REGISTER_GLOBALS for Rack.
  2. Take it one step further and convert all variables to global scope.
  3. Leak variables named ‘username’ and ‘password’ to unexpected places.
  4. Automatically initialize variables which are not explicitly set to helpful values.
  5. Override the assignment operator.
  6. Swap the values of similarly named variables.
  7. Automatically memoize a function. Try using throw/catch, signal handlers, or redefining methods in the binding to affect control flow.
  8. Unroll .each blocks “for speed.”
luke

Thanks for the cool article. your website is awesome! No big deal, just fyi: I noticed the indentation of your form.comment-form is a little off. The Name, Email, Http Labels are somewhat overlaid by their input boxes. I am using the Chrome Browser with 15" screen, if that helps. Cheers!

Post a Comment

Comments are moderated. Links have nofollow. Seriously, spammers, give it a rest.

Please avoid writing anything here unless you're a computer. This is also a trap:

Supports Github-flavored Markdown, including [links](http://foo.com/), *emphasis*, _underline_, `code`, and > blockquotes. Use ```clj on its own line to start an (e.g.) Clojure code block, and ``` to end the block.