AWS::S3 is not threadsafe. Hell, it's not even reusable; most methods go through a class constant. To use it in threaded code, it's necessary to isolate S3 operations in memory. Fork to the rescue!

def s3(key, data, bucket, opts) begin fork_to do AWS::S3::Base.establish_connection!( :access_key_id => KEY, :secret_access_key => SECRET ) AWS::S3::S3Object.store key, data, bucket, opts end rescue Timeout::Error raise SubprocessTimedOut end end def fork_to(timeout = 4) r, w, pid = nil, nil, nil begin # Open pipe r, w = IO.pipe # Start subprocess pid = fork do # Child begin r.close val = begin Timeout.timeout(timeout) do # Run block yield end rescue Exception => e e end w.write Marshal.dump val w.close ensure # YOU SHALL NOT PASS # Skip at_exit handlers. exit! end end # Parent w.close Timeout.timeout(timeout) do # Read value from pipe begin val = Marshal.load r.read rescue ArgumentError => e # Marshal data too short # Subprocess likely exited without writing. raise Timeout::Error end # Return or raise value from subprocess. case val when Exception raise val else return val end end ensure if pid Process.kill "TERM", pid rescue nil Process.kill "KILL", pid rescue nil Process.waitpid pid rescue nil end r.close rescue nil w.close rescue nil end end

There's a lot of bookkeeping here. In a nutshell we're forking and running a given block in a forked subprocess. The result of that operation is returned to the parent by a pipe. The rest is just timeouts and process accounting. Subprocesses have a tendency to get tied up, leaving dangling pipes or zombies floating around. I know there are weak points and race conditions here, but with robust retry code this approach is suitable for production.

Using this approach, I can typically keep ~8 S3 uploads running concurrently (on a fairly busy 6-core HT Nehalem) and obtain ~sixfold throughput compared to locking S3 operations with a mutex.

Sometimes you need to figure out where ruby code came from. For example, ActiveSupport showed up in our API and started breaking date handling and JSON. I could have used git bisect to find the commit that introduced the problem, but there's a more direct way.

set_trace_func proc { |event, file, line, id, binding, classname| if id == :require args = binding.eval("local_variables").inject({}) do |vars, name| value = binding.eval name vars[name] = value unless value.nil? vars end puts "req #{args.inspect}" if defined? ActiveSupport puts "AHA" exit! end end }

Introduce this snippet before requiring anything. It'll watch require statements, log them, and show you as soon as the offending constant appears. In this case, the culprit was require 'mail'.

Most Rubyists know about monkeypatching: opening up someone else's class (often, something like String or Object) to modify some of its methods after the fact. It's both incredibly powerful when used judiciously, and incredibly dangerous the rest of the time. I've spent countless hours trying to debug conflicting definitions of #to_json, or trying to untangle ActiveRecord's astonishing levels of dynamic method aliasing.

I'm here to introduce you to a far more exciting threat: set_trace_func. This invidious callback is invoked on every function call and line of the Ruby interpreter. Most people, if they're aware of it at all, correctly assume it's intended for profiling.

They couldn't be more wrong.

class Fixnum def add(other) self + other end end set_trace_func proc { |event, file, line, id, binding, classname| if classname == Fixnum and id == :add and event == 'call' # We can, of course, find the receiver of the current method me = binding.eval("self") # And the binding gives us access to all variables declared # in that method's scope. At call time only the method arguments will be # defined. args = binding.eval("local_variables").inject({}) do |vars, name| value = binding.eval name vars[name] = value unless value.nil? vars end # We can also *change* those arguments. args.each do |name, value| if Numeric === value binding.eval "#{name} = #{value + 1}" end end end } puts 1.add 1 # => 3

Note that this allows you to interfere with methods you've never seen before, simply by relaxing the class or id restrictions. Spooky action at a distance!

It Never Happened

Nobody suspects the value of integer arguments to change when a function is called. However, a suspicious rubyist might open up that class and add some debugging statements, uncovering our treachery. Let's be a little more subtle.

previous = {} depth = 0 set_trace_func proc { |event, file, line, id, binding, classname| if event == 'c-call' if depth == 0 and rand < 0.5 # Get the caller's local variables locals = binding.eval("local_variables").inject({}) do |vars, name| vars[name] = binding.eval name vars end # Pick some strings strings = locals.delete_if do |name, value| not value.kind_of? String end i = rand strings.size str1 = strings.keys[i] str2 = strings.keys[(i + 1) % strings.size] # And play musical chairs previous[str1] = strings[str1].dup previous[str2] = strings[str2].dup binding.eval "#{str1}.replace #{previous[str2].inspect}" binding.eval "#{str2}.replace #{previous[str1].inspect}" end depth += 1 elsif event == 'c-return' depth -= 1 if depth <= 0 # Whoops, the music stopped! Everyone grab your original seat! depth = 0 previous.each do |name, value| binding.eval "#{name}.replace #{value.inspect}" end previous = {} end end } a = "hello" b = "world" puts [a, b] # => "world\nhello" # Sometimes.

For best results, re-order the arguments to functions which take more than 2 non-hash arguments in a deterministic way.

Next Level Language Maneuver

For the Haskell and Erlang enthusiast, might I suggest:

# Enforce immutable programming. Silently. lambda { default_frame = lambda do { :locals => {} } end # Stack contains the bound local variables for each method call. stack = [default_frame[]] set_trace_func proc { |event, file, line, id, binding, classname| if event == 'call' or event == 'c-call' stack << default_frame[] elsif event == 'return' or event == 'c-return' stack.pop stack << default_frame[] if stack.empty? end binding.eval("local_variables").each do |var| # Get the original and current values of this variable old = stack.last[:locals][var] new = binding.eval var if old != nil unless old == new # The variable has changed! binding.eval("lambda { |v| #{var} = v }")[old] end else # We haven't seen this variable before begin original = new.dup # Immediately replace this variable with a *different* duplicate of # itself to prevent mutator methods from leaking across contexts, or # corrupting our stack binding.eval("lambda { |v| #{var} = v}")[new.dup] rescue # Guess you can't dup that original = new end stack.last[:locals][var] = original end end } }.call # Any grade schooler could tell you this would have been nonsense. x = 1 x = 2 puts x # => 1. Ahhhh, much better. # Your functions are idempotent, right? Well, they are now! array = [1, 2, 3] array.delete 2 p array # => [1, 2, 3] # Makes destructive methods more relaxing! string = 'good' puts lambda { |str| str.replace 'evil' }[string] # => evil puts string # => good # Blocks don't close over their arguments, sadly. elem = 0 [1,2,3].each do |elem| puts elem end puts elem # => 0, 0, 0, and more 0.

Generalization to class and global variables is left to the reader.

Note that you can do this with fewer copies required, but keeping track of which bindings include references to a given mutated object is nontrivial.

Suggested Exercises

  1. PHP programmers may want to try implementing $REGISTER_GLOBALS for Rack.
  2. Take it one step further and convert all variables to global scope.
  3. Leak variables named 'username' and 'password' to unexpected places.
  4. Automatically initialize variables which are not explicitly set to helpful values.
  5. Override the assignment operator.
  6. Swap the values of similarly named variables.
  7. Automatically memoize a function. Try using throw/catch, signal handlers, or redefining methods in the binding to affect control flow.
  8. Unroll .each blocks "for speed."

Yamr Yamr

Sometime in the last couple of weeks, the Yammer AIR client stopped fetching new messages. I've grown to really like the service, especially since it delivers a running stream of commits to the Git repos I'm interested in, so I broke down and wrote my own client.

Yamr is a little ruby/gtk app built on top of jstewart's yammer4r and the awesome danlucraft's Ruby Webkit-GTK+ bindings. No seriously, Dan, you rock.

Features

  • Reads messages
  • Posts messages
  • OAUTH support
  • Notifies you using libnotify, instead of that awful AIR thing.

Anyway, feel free to fork & hack away. You should be able to build ruby-webkit without much trouble on ubuntu; I've included directions in the readme. It's super-basic right now, but most of the core functionality is ready to start adding features. Enjoy!

All right boys and girls, I'm all for quality releases and everything, but Cortex Reaver 0.2.0 is raring to go. Just gem upgrade to get some awesome blogging goodness.

A bit of context, in case you haven't been keeping up with the real-time web craze:

RSSCloud is an... idea* for getting updates on RSS feeds to clients faster, while decreasing network load. In traditional RSS models, subscribers make an HTTP request every 10 minutes or so to a publisher to check for updates. In RSSCloud, a cloud server aggregates several feeds from authors. When feeds are changed, their authors send an HTTP request to the cloud server notifying them of the update. The cloud server contacts one or more subscribers of the feed, sending them a notice that the feed has changed. The subscribers then request the feed from the authors. Everyone gets their updates faster, and with fewer requests across the network.

The Problem

When you subscribe to an RSSCloud server, you tell it several things about how to notify you of changes:

  1. A SOAP/XML-RPC notify procedure (required but useless for REST)
  2. What port to call back on.
  3. What path to make the request to.
  4. The protocol you accept (XML-RPC, SOAP, or HTTP POST).
  5. The URLs of the feeds to subscribe to.

There's something missing! The RSSCloud walkthrough says:

Notifications are sent to the IP address the request came from. You can not request notification on behalf of another server.

That's great unless your originating IP address can't receive HTTP traffic. That rules out users behind a NAT or behind a firewall (without forwarded ports). That's most home users with routers, users on typical corporate networks, etc. It won't work on the iPhone. And, to a lesser degree, it rules out the cloud itself.

One of the common aspects of cloud computing is that compute nodes (and their IP addresses) may come and go as needed. For example, Vodpod.com is served by several different servers which (through a combination of heartbeat-failover, IP routing, and HTTP proxying) may enter and leave the cluster at any time without service interruption. So, if one of those servers subscribes to a feed, it might not be online to receive pings later. You'd have to subscribe to each feed from every host to guarantee that you'd continue to receive responses. The problem only becomes worse when you start looking at cloud services like EC2.

The RSSCloud mailing list has been tossing around the obvious solution for several weeks now: just include a "domain" parameter which says what FQDN or IP address to connect to. On Friday, Dave Winer included it in his walkthrough. Even so, most of the cloud servers (Wordpress, for example) out there don't support it yet.

A Partial Solution

What can you do to get around this?

One solution is to use PubSubHubbub, which uses a full callback URL. Additionally, Superfeedr will even use RSSCloud to offer real-time updates through PuSH, effectively bridging the two schemes.

Alternatively, you can lie (sort of) about your address. This is what we've done at Vodpod to get Wordpress to call us back correctly. When we subscribe, we actually re-bind the TCP socket to a publically accessible IP. That IP is guaranteed to go somewhere in the cluster which can accept the RSSCloud update ping. Here's a truly evil hack to do just that, by replacing Net::HTTP's TCP socket with our own.

res = Net::HTTP.new(uri.host, uri.port).start do |http| # Replace the socket with one that we bind to the interface we want to use. # The local IP address we'd like RSSCloud to call back. local_addr = Socket.pack_sockaddr_in 0, '208.101.30.10' # The RSSCloud server IP address remote_addr = Socket.pack_sockaddr_in uri.port, uri.host # Create a new socket s = Socket.new Socket::AF_INET, Socket::SOCK_STREAM, 0 # Bind it to the local address s.bind local_addr # Wrap for Net::HTTP and connect socket = Net::BufferedIO.new(s) s.connect remote_addr # Replace the HTTP client's connection http.instance_variable_set('@socket', socket) # And make the request http.request(req) end

*Dave says it's not a standard, or a spec. As far as I can tell, RSSCloud consists of a mailing list, a walkthrough of how implementations can handle the pings/cloud tag in RSS feeds, and a bunch of loosely federated implementations with varying degrees of compatibility. Some speak XML-RPC, some speak SOAP, some speak plain-old REST, etc...

I've been working a lot on Cortex Reaver lately, with several new features in the pipe. I'm using Vim for awesome syntax highlighting, refining the plugins/sidebar infrastructure, creating improved admin tools for long-running tasks (like rebuilding all the photo sizes) and fixing several bugs in the CRUD lifecycle. All that comes in a slick new visual style, including a new stylesheet/js compiler which makes page loads much faster (eliminating something like 20 external HTTP requests in the non-cached case). Finding time to really sit down and hack on CR has been tough lately with all the grad school/work stuff going on, but as new users are coming on board I'm motivated to keep improving.

Rails, what were you thinking? You went and wrote your own ridiculous JSON serializer in pure Ruby, when a perfectly good C-extension gem already does the job 20 times faster. What's worse, you gave your to_json method (which clobbers every innocent object it can get its grubby little hands on) a completely incompatible method signature from the standard gem version. You just can't mix the two, which is ALL KINDS OF FUN for those of us who need to push more than 10 reqs/sec.

Then there's awesome behavior like this:

puts {:rails => /fail/x}.to_json #=> {"rails" => /fail/x}

That's not even valid ECMAScript, let alone JSON. It's a standard for a reason, foo! It's not like you can opt out, either. You're stuck with this pathologically malingering monkeypatch any time you require ActiveSupport.

At least they figured it out eventually.

I released version 0.1.3 of Construct today. It incorporates a few bugfixes for nested schemas, and should be fit for general use.

I got tired of writing configuration classes for everything I do, and packaged it all up in a tiny gem: Construct.

Highlights

OpenStruct-style access to key-value pairs.

config.offices = ['Sydney', 'Tacoma']

Nested structures are easy to handle.

config.fruits = {
  :banana => 'slightly radioactive',
  :apple => 'safe'
}
config.fruits.banana # => 'slightly radioactive'

Overridable, self-documenting schemas for default values.

config.define(:address, :default => '1 North College St')
config.address # => '1 North College St'
config.address = 'Urnud'
config.address # => 'Urnud'

Straightforward YAML saving and loading.

config.to_yaml; Construct.load(yaml)

Define whatever methods you like on your config.

class Config < Construct
  def fooo
    foo + 'o'
  end
end

It's available as a gem:

gem install construct

I've migrated Aphyr.com off its old, dying hardware onto a spiffy new Linode. So far it's going pretty well! My new blog engine, Cortex Reaver is also up and running.

Currently waiting for my flight to depart from PDX. The 15 inches of snow Nature dropped on us this week meant long waits for most people, but I was able to get through ticketing and security in about half an hour, and the MAX got me here just fine (though we passed a couple jacknifed semis on the way). Now all I have to do is make my connection through SeaTac in an hour. That... could be interesting.

I got distracted from writing my backup system and started an IRC client... argh, why are the interesting problems so hard to stop working on? I essentially wasted my whole weekend on this.

On the other hand, it's pretty cool. :D

colors.png

It looks like there's a catastrophic memory leak in the Rails app I wrote last summer, and in trying to track it down, I needed a way to look at process memory use over time. So, I put together this little library, LinuxProcess, which uses the proc filesystem to make it easier to monitor processes with Ruby. Enjoy!

I've updated Sequenceable with new code supporting restriction of sequences to subsets through some sneaky SQL merging, ellipsized pagination ("1 ... 4, 5, 6 ... 10"), and proper handling of multiple sort columns.

Ruby on Rails is much, much, slower than I would like. It takes around .25 seconds to render the index page: about 10 times longer than Ragnar. I've alleviated the problem somewhat by switching to a Mongrel cluster with Apache's mod-balancer, but performance is still slow. I can't add any more foreign key constraints--pretty much every feasible relationship is locked down. I guess it's just down to ActiveRecord tuning, and figuring out how to make ERB run with any semblance of speed. Possibly memcached, too...

Anyway, sorry for the inexplicable downtime. Things are still moving around quite a bit.

Added ATOM feeds for journals, photographs, and a combined feed. Also added EXIF support to photographs, such that files with EXIF headers (those from about the last year or so) display some shot information as well.

Also, I caught bash programmable completion completing paths on remote servers over SSH. I was copying a file from the laptop to the server, hit tab to complete the directory on the server side... and it worked. That was quite surprising, when I realized that my ordinarily useless request had actually been carried out. Hurrah for bash making my life easier.

Had significant confusion yesterday night, when the tested and (so I thought) working code from the development machine threw strange exceptions on aphyr.com itself. The box claimed NoMethodError for Rational.reduce and Rational.to_f, both of which were quite clearly part of the standard library. Eventually realized that this was due to my custom Rational class, which has a very different interface from the standard library's version. Changed RUBYLIB to not load my custom libraries, and it worked.

After the last three months, I've come to the conclusion: Ruby is a wonderful language, and I don't want to write code in Perl any more. I like Perl: it's fast, powerful, and has a terrific community around it. If you wanted to run your television through a LEGO USB IR transceiver, yeah, there's probably something in CPAN for that. However, I'm finding that the rocky syntax of Perl gets in the way of my thinking. I don't want to use

$hash_of_hashes->{'key'}->{'key2'}

to get at at what should be a simple data structure. Using five special characters on a variable makes my code hard to understand, and makes it easier to cause bugs. It's a good language, but Perl has its limits. After spending months writing clean, joyful code, I think that the Ruby language maps more closely to the domains of the problems I'm trying to solve.

There are a lot of things I like very much about Ragnar: it's quite fast, extensively configurable, and compliant with web standards by design. XSLT transforms keep logic and presentation well separated, and the powerful query engine makes node-level logic simple. I plan to preserve the best aspects of this design, but refactor the code into a Ruby platform, separate node data taipus into a more traditional database schema for efficiency, and define a plugin architecture with callbacks for node lifecycle handling. For now, at least, I'll avoid the temptation to use Rails for this project: I prefer XSLT, and working this way is more fun for me. :-)

It'll be nice to have a new project.

So I'm back at work again, but my job has changed. No longer am I the stealthy IT ninja, whose responsibility it is to replace components the day before they they break, anticipate obscure printer errors that could bring ruin to the marketing department, repair desktops while their users are out for a cup of coffee, and arrive silently in an employee's cube before they hang up the phone. I'm still messing about with the network monitoring system (especially the TAP gateway, which fails silently half the time), but my official job is now within the realm of support. Working against time on a laptop with a failing hard drive, I'm writing a support web site with the Ruby on Rails framework which will interface with our customer relations management service.

Let me tell you this: Ruby. Is. Amazing.

I've set aside this week simply to learn the language and the framework, and the sheer amount of magic in Rails is astounding. I'm not entirely sure I like the eRuby template system for views, but the astounding simplicity of ActiveRecord makes the whole thing worth it. The way it manages relationships between tables takes all the work out of SQL management... and some of the methods available for model objects are startlingly useful. Data validation rules make a lot more sense when implemented as a part of a smart model object, rather than being controller-specific.

Then there's the controller logic, which when coupled with RoR's url_for() logic solves the problem I've been facing with Ragnar since the beginning: how to relate the URL to the scripts which interpret them. I've pushed the logic into the XSLT templates and allowed the designer of those to create their logic thus, but that methodology makes it difficult to dynamically generate URLs--they have to be created by the controller and passed to the template as parts of the XML document.

In any case, learning the Rails framework has been a lot of fun, and I'm looking forward to starting the real work next week.

Copyright © 2012 Kyle Kingsbury.
Non-commercial re-use with attribution encouraged; all other rights reserved.
Comments are the property of respective posters.