So there’s a blog post that advises every method should, when possible, return self. I’d like to suggest you do the opposite: wherever possible, return something other than self.

Mutation is hard

Mutation makes code harder to reason about. Mutable objects make equality comparisons tricky: if you use a mutable object as the key in a hashmap, for instance, then change one of its fields, what happens? Can you access the value by the new string value? By the old one? What about a set? An array? For a fun time, try these in various languages. Try it with mutable primitives, like Strings, if the language makes a distinction. Enjoy the results.

If you call a function with a mutable object as an argument, you have very few guarantees about the new object’s value. It’s up to you to enforce invariants like “certain fields must be read together”.

If you have two threads interacting with mutable objects concurrently, things get weird fast.

Now, nobody’s arguing that mutability is always bad. There are really good reasons to mutate: your program ultimately must change state; must perform IO, to be meaningful. Mutation is usually faster, reduces GC pressure, and can be safe! It just comes with costs! The more of your program deals with pure values, the easier it is to reason about. If you compare two objects now, you know they’ll compare the same later. You can pass arguments to functions without ever having to worry that they’ll be changed out from underneath you. It gets easier to reason about thread safety.

Moreover, you don’t need a fancy type system like Haskell to experience these benefits: even in the unityped default-mutable wonderland of Ruby, having a culture that makes mutation explicit (for instance, gsub vs gsub!), a culture where not clobbering state is the default, can make our jobs a little easier. Remember, we don’t have to categorically prevent bugs; just make them less likely. Every bit helps.

Returning nil, void, or self strongly suggests impurity

Any time you see a method like

public void foo(String X) { ... } function(a, b) { ... return undefined; } def foo(args) ... self end

you should read: “This function probably mutates state!” In an object oriented language, it might mutate the receiver (self or this). It might mutate any of its arguments. It might mutate variables in lexical scope. It might mutate the computing environment, by setting a global variable, or writing to the filesystem, or sending a network packet.

The hand-wavy argument for this is that there is exactly one meaningful pure function for each of these three return types: the constant void function, the constant nil function, and the identity function(s). If you see this signature used over and over, it’s a hint you’re staring at a big ball of mutable state.

Proof

We aim to show there is only one pure function returning void, one pure function returning nil, etc. In general, we wish to show for any value r you might care to return, there exists exactly one pure function which always returns r.

I’m going to try to write this for folks without a proofs background, but I will use some notation:

  • Capital letters, e.g. X, denote sets
  • f(x) is function application
  • a iff b means “a if, and only if, b”
  • | means “such that”
  • ∀ x means “for all x”
  • ∃ x means “there exists an x”
  • x ∈ X means “x is an element of the set X”
  • (x, y) is an ordered pair, like a tuple
  • X x Y is the Cartesian product: all ordered pairs of (x, y) taken from X and Y respectively.

Definitions

I’m going to depart slightly from the usual set-theoretic definitions to simplify the proof and reduce confusion with common CS terms. We’re interested in functions which might:

  • Take a receiver (e.g. this, self)
  • Take arguments
  • Return values
  • Throw exceptions
  • Depend on an environment
  • Mutate their environment

Let’s simplify.

  • A receiver is simply the first argument to a function.
  • Zero or multiple arguments can be represented as an ordered tuple: (), (arg1), (arg1, arg2, arg3, …).
  • Returning multiple return values (as in go) can be modeled by returning tuples.
  • Exceptions can be modeled as a special set of return values, e.g. (“exception”, “something bad!”)
  • In addition to mapping an argument to a return value, the function will map an initial environment e to a (possibly identical) final environment e'. The environment encapsulates IO, global variables, dynamic scope, mutable state, etc.

Now we adapt the usual set-theoretic graph definition of a function to our model:

Definition 1. A function f in an environment set E, from an input set X (the “domain”), to a set of return values Y (the “codomain”), written f: E, X -> Y, is the set of ordered tuples (e, e', x, y) where e and e' ∈ E, x ∈ X, and y ∈ Y, with two constraints:

  1. Completeness. ∀ x ∈ X, e ∈ E: ∃ (e, e', x, y) ∈ f.
  2. Determinism. ∀ (e, e', x, y) ∈ f: e' = e' and y = y if e = e and x = x

Completeness simply means that the function must return a value for all environments and x’s. Determinism just means that the environment and input x uniquely determine the new environment and return value. Nondeterministic functions are modeled by state in the environment.

We write function application in this model as f(e, x) = (e', y). Read: “Calling f on x in environment e returns y and changes the environment to e'.”

Definition 2. A function is pure iff ∀ (e, e', x, y) ∈ f, e = e'; e.g, its initial and final environments are identical.

There can be only one

We wish to show that for any value r, there is only one pure function which always returns r. Assume there exist two distinct pure functions f and g, over the same domain X, returning r. Remember, these functions are pure, so their initial and final environments are the same:

  • ∀ e ∈ E, x ∈ X: f(e, x) -> (e, r)
  • ∀ e ∈ E, x ∈ X: g(e, x) -> (e, r)

But by definition 1, f and g are simply:

  • f = {(e, e, x, r) | e ∈ E, x ∈ X}
  • g = {(e, e, x, r) | e ∈ E, x ∈ X}

… which are identical sets. We obtain a contradiction: f and g cannot be distinct; therefore, in any environment E and over any input set X, there exists only a single function returning r. ∎

You can make the exact same argument for functions that return their first (or nth) argument: they’re just variations on the identity function, one version for each arity:

  • (e, e, (x), x)
  • (e, e, (x, a), x)
  • (e, e, (x, a, b), x)
  • (e, e, (x, a, b, …), x)

Redundancy of functions over different domains

Given two pure single-valued functions over different domains f: E, X1 -> {r} and g: E, X2 -> {r}, let h be the set of all tuples in either f or g: h = f ∪ g.

Since f is pure, ∀ (e, e', x, y) ∈ f, e = e'; and the same for g. Therefore, ∀ (e, e', x, y) ∈ h, e = e' as well: h does not mutate its environment.

Since f has a mapping for all combinations of environments in E and inputs in X1, so does h. And the same goes for g: h has mappings for all combinations of environments in E and inputs in X2. h is therefore complete over E and X1 ∪ X2.

Since f and g always return r, ∀ (e, e', x, y) ∈ h, y = r too. Because h can never have multiple values for y (and because it does not mutate its environment), it is deterministic per definition 1.

Therefore, h is a pure function in E over X1 ∪ X2–and is therefore a pure function over either X1 or X2 alone. You can safely replace any instance of f or g with h: there isn’t really a point to having more than one pure function returning void, nil, etc. in your program, unless you’re doing it for static type safety.

Don’t believe me? Here’s a single Clojure function that can replace any pure function returning its first argument. Works on integers, strings, other functions… whatever types you like.

user=> (def selfie (fn [self & args] self))) #'user/selfie user=> (selfie 3) 3 user=> (selfie "channing" "tatum") "channing"

Returning self suggests impurity

You can write the same function more than one way. Here are two pure functions in Ruby that both return self:

def meow self end def stretch nil ENV["USER"] + " in spaaace" 5.3 / 3 self end

meow is just identity–but so is stretch, and, by our proof above, so is every other pure function returning self. The only difference is that stretch has useless dead code, which any compiler, linter, or human worth their salt will strip out. Writing code like this is probably silly. You can construct weird cases (interfaces, etc) where you want a whole bunch of identity functions, or (constantly nil), etc, but I think those are pretty rare.

What about calling a function then returning self?

def foo enjoy("http://shirtless-channing-tatum.biz") self end

There are only two cases. If enjoy is pure, so is foo, and we can replace the function by

def foo self end

If enjoy is impure (and let’s face it: shirtless Channing Tatum induces side effects in most callers), then foo is also impure, and we’re back to square one: mutation.

Final thoughts

When you see functions that return void, nil, or self, ask “what is this mutating?” If you have a pure function (say, returning the number of explosions in a film) and follow the advice of returning self as much as possible, you are turning a pure function into an impure one. You have to add state and mutability to the system. You should strive to do the opposite: reduce mutation wherever possible.

I assure you, return values are OK.

Most applications have configuration: how to open a connection to the database, what file to log to, the locations of key data files, etc.

Configuration is hard to express correctly. It’s dynamic because you don’t know the configuration at compile time–instead it comes from a file, the network, command arguments, etc. Config is almost always implicit, because it affects your functions without being passed in as an explicit parameter. Most languages address this in two ways:

Globals

Global variables are accessible in every scope, so they make great implicit parameters for functions.

module App API_SERVER = "api3" end def save(record) http_put(APP::API_SERVER, record) end

Classes are often global, so you can also attach config to that class’s eigenclass, singleton object, or what have you:

class App def self.config; @config; end end App.config.api_server = "api3" App.config.api_server

Erlang apps often handle config with a globally-named module:

{ok, Server} = app_config:get(api_server),

The global variable model is concise and simple; it’s what you should reach for right away. Every thread sees the same values. In fact, all code everywhere sees the same values. Yet there are shortcomings: what if you’re writing a library? What about tests, where you might call the same function with several different configurations? What if you’re running more than one copy of your application concurrently?

Object graph traversal

An advanced OOP programmer may solve the global problem by putting configuration into instances. The application sets up a graph of instances, each with the configuration it needs to do its job.

class App def initialize(config) @api_client = App::APIClient config[:api_server] @logger = Logger.new config[:logger] end end

… and so forth. What if the APIClient needs to use the logger? You could keep a pointer to the application around:

class APIClient def initialize(app, config) @app = app @server = config[:server] end def get @app.logger.log "getting" end end

And traverse the graph of objects in your application. This basically amounts to passing a configuration parameter into every constructor, but has the added benefit of letting you look up other objects in the Application: maybe other local services you might need. It’s a good way to let different components work together cleanly without making their dependencies explicit: the Application doesn’t need to know exactly what services an APIClient needs. Hoorah, encapsulation! It’s also thread-safe: you can create as many applications concurrently as you like, and they won’t step on each other.

On the other hand, you do a lot of traversing, and since these are instance variables, there’s no way to refer to them within other functions, like class methods. It’s also more difficult to test, since you have to stand up all the dependencies (mocked or otherwise) in order to create an object.

At this point, someone else reading this article is screaming “dependency injection frameworks” and pulling out XML. But before we pull out DI, let’s back up and think.

Backing up for a second

What we really want from configuration is to take functions like this:

f(config, x) = g(config, x * 2) g(config, y) = h(config, y + 1) h(config, z) = config + z

… and express them like this:

f(x) = g(x*2) g(y) = h(y+1) h(z) = config + z

We want the config variable to become implicit so that f and g are simplified. f and g do depend on config–but config may be irrelevant to their internal definition, and explicitly tracking every parameter dependency in the system can be exhausting. These implicit variables are known as dynamic scope in programming languages: variables which are bound in every function in a call stack, but are not explicit in their signatures. More particularly, we want two properties:

  1. The variable is bound only within and below the binding expression. When control returns from the binding expression, the variable reverts to its previous value.

  2. The variable is bound only for the thread that created it, and threads created from the bound scope; that is to say, two parallel invocations of f() can have different values of config. This lets us run, say, two copies of an application at the same time.

In Scala, one kind of implicit scope is provided by implicit parameters, which allow enclosing scope to carry down (at least) one level, to functions which have arguments of the same name and type, and which are tagged as “implicit”. (Well, at least, I think that’s what they do; A Tour of Scala: Implicit Parameters is beyond my mortal comprehension). Implicit parameters don’t carry across threads, which makes it a little tough to defer operations using, say, futures.

In Java, one might consider an InheritableThreadLocal for the task. That gives us the thread isolation property, provided that one remembers to clean up the thread local appropriately at the end of the binding context. Many Java libraries use this to provide, say, request context in a web app. Scala neatly wraps this construct with DynamicVariable, a mutable, thread-local, thread-inherited object which is bound only while a given closure is running. Since Scala doesn’t actually have dynamic scope, we still need to access the DynamicVariable object statically. No problem: we can bind it to a singleton object, just like the Ruby examples earlier:

class App { def start() { App.config.withValue(someConfigStructure) { httpServer.run(); } } } object App { val config = new DynamicVariable[MyConfig]; } class HttpServer { def run() { listen(App.config.value.httpPort) } }

There’s a bit of a wart in that we need to call config.value() in order to get the currently bound value, but the semantics are sound, the code is readable, and there’s no extraneous bookkeeping.

Dynamic scope

In languages that support dynamic scope (Most Lisps, Perl, Haskell (sort of)), we can express this directly:

(ns app.config) (def ^:dynamic config nil) (ns app.core) (defn start [] (binding [app.config/config some-config-structure] (http-server/run))) (ns app.http-server (:use app.config)) (defn run [] (listen (:http-port config)))

One of the arguments against dynamic scope is that it can lead to name capture: a dynamic binding for “config” could break a function deep in someone else’s code that used that variable name. Clojure uses namespaces to separate vars, neatly allowing us to write either “app.config/config”, or, having included app.config, use the short name “config”. Other code remains unaffected.

Dynamic var bindings in Clojure have a root value (shared between all threads), and an overrideable thread-local value. However, not all Clojure closures close over dynamic vars! New threads do not inherit the dynamic frames of their parents by default: only future, bound-fn, and friends capture their dynamic scope. (Thread. (fn [] …)) will run with fresh (root) dynamic bindings. Use (bound-fn) where you want to preserve the current dynamic bindings between threads, and (fn) where you wish to reset them.

Thread-inheritable dynamic vars in Clojure

Alternatively, we could adopt Scala’s approach: define a new kind of reference, backed by an InheritableThreadLocal:

(defn thread-inheritable "Creates a dynamic, thread-local, thread-inheritable object, with initial value 'value'. Set with (.set x value), read with (deref x)." [value] (doto (proxy [InheritableThreadLocal IDeref] [] (deref [] (.get this))) (.set value)))

That proxy expression creates a new InheritableThreadLocal which also implements IDeref, Clojure’s interface for dereferenceable things like vars, refs, atoms, agents, etc. Now we just need a macro to set the local within some scope.

(defn- set-dynamic-thread-vars! "Takes a map of vars to values, and assigns each." [bindings-map] (doseq [[v value] bindings-map] (.set v value))) (defmacro inheritable-binding "Creates new bindings for the (already-existing) dynamic thread-inherited vars, with the supplied initial values. Executes exprs in an implict do, then re-establishes the bindings that existed before. Bindings are made sequentially, like let." [bindings & body] `(let [inner-bindings# (hash-map ~@bindings) outer-bindings# (into {} (for [[k# v#] inner-bindings#] [k# (deref k#)]))] (try (set-dynamic-thread-vars! inner-bindings#) ~@body (finally (set-dynamic-thread-vars! outer-bindings#)))))

Now we can define a new var–say config, and rebind it dynamically.

(def config (thread-inheritable :default)) (prn "Initially" @config) (inheritable-binding [config :inside] ; In any functions we call, (deref config) will be :inside. (prn "Inside" @config) ; We can safely evaluate multiple bindings in parallel. It's the ; many-worlds hypothesis in action! (inheritable-binding [config :future] (future (prn "Future" @config))) ; Unlike regular ^:dynamic vars, bindings are inherited in child threads. (inheritable-binding [config :thread] (Thread. (fn [] (prn "In unbound thread" @config)))))

More realistically, one might write:

(defmacro with-config [m & body] `(inheritable-binding [config ~m] ~@body)) (defn start-server [] (listen (:port @config))) (with-config {:port 2} (start-server))

Voilà! Mutable, thread-safe, thread-inherited, implicit variables.

It’s worth noting that these variables are not a part of the dynamic binding, so they won’t be captured by (bound-fn). If you want to pass closures between existing threads, use ^:dynamic and (bound-fn). If you want your bindings to follow thread inheritance, use this bind-dynamic approach.

Closing thoughts

With all this in mind, remember LOGO? That little language has more in common with Lisp than you might think, though that discussion is, shall we say… out of this article’s scope.

TO RUNHTTPSERVER LISTEN :PORT END TO STARTAPP MAKE "PORT 8080 RUNHTTPSERVER END

AWS::S3 is not threadsafe. Hell, it’s not even reusable; most methods go through a class constant. To use it in threaded code, it’s necessary to isolate S3 operations in memory. Fork to the rescue!

def s3(key, data, bucket, opts) begin fork_to do AWS::S3::Base.establish_connection!( :access_key_id => KEY, :secret_access_key => SECRET ) AWS::S3::S3Object.store key, data, bucket, opts end rescue Timeout::Error raise SubprocessTimedOut end end def fork_to(timeout = 4) r, w, pid = nil, nil, nil begin # Open pipe r, w = IO.pipe # Start subprocess pid = fork do # Child begin r.close val = begin Timeout.timeout(timeout) do # Run block yield end rescue Exception => e e end w.write Marshal.dump val w.close ensure # YOU SHALL NOT PASS # Skip at_exit handlers. exit! end end # Parent w.close Timeout.timeout(timeout) do # Read value from pipe begin val = Marshal.load r.read rescue ArgumentError => e # Marshal data too short # Subprocess likely exited without writing. raise Timeout::Error end # Return or raise value from subprocess. case val when Exception raise val else return val end end ensure if pid Process.kill "TERM", pid rescue nil Process.kill "KILL", pid rescue nil Process.waitpid pid rescue nil end r.close rescue nil w.close rescue nil end end

There’s a lot of bookkeeping here. In a nutshell we’re forking and running a given block in a forked subprocess. The result of that operation is returned to the parent by a pipe. The rest is just timeouts and process accounting. Subprocesses have a tendency to get tied up, leaving dangling pipes or zombies floating around. I know there are weak points and race conditions here, but with robust retry code this approach is suitable for production.

Using this approach, I can typically keep ~8 S3 uploads running concurrently (on a fairly busy 6-core HT Nehalem) and obtain ~sixfold throughput compared to locking S3 operations with a mutex.

Sometimes you need to figure out where ruby code came from. For example, ActiveSupport showed up in our API and started breaking date handling and JSON. I could have used git bisect to find the commit that introduced the problem, but there's a more direct way.

set_trace_func proc { |event, file, line, id, binding, classname| if id == :require args = binding.eval("local_variables").inject({}) do |vars, name| value = binding.eval name vars[name] = value unless value.nil? vars end puts "req #{args.inspect}" if defined? ActiveSupport puts "AHA" exit! end end }

Introduce this snippet before requiring anything. It'll watch require statements, log them, and show you as soon as the offending constant appears. In this case, the culprit was require 'mail'.

Most Rubyists know about monkeypatching: opening up someone else’s class (often, something like String or Object) to modify some of its methods after the fact. It’s both incredibly powerful when used judiciously, and incredibly dangerous the rest of the time. I’ve spent countless hours trying to debug conflicting definitions of #to_json, or trying to untangle ActiveRecord’s astonishing levels of dynamic method aliasing.

I’m here to introduce you to a far more exciting threat: set_trace_func. This invidious callback is invoked on every function call and line of the Ruby interpreter. Most people, if they’re aware of it at all, correctly assume it’s intended for profiling.

They couldn’t be more wrong.

class Fixnum def add(other) self + other end end set_trace_func proc { |event, file, line, id, binding, classname| if classname == Fixnum and id == :add and event == 'call' # We can, of course, find the receiver of the current method me = binding.eval("self") # And the binding gives us access to all variables declared # in that method's scope. At call time only the method arguments will be # defined. args = binding.eval("local_variables").inject({}) do |vars, name| value = binding.eval name vars[name] = value unless value.nil? vars end # We can also *change* those arguments. args.each do |name, value| if Numeric === value binding.eval "#{name} = #{value + 1}" end end end } puts 1.add 1 # => 3

Note that this allows you to interfere with methods you’ve never seen before, simply by relaxing the class or id restrictions. Spooky action at a distance!

It Never Happened

Nobody suspects the value of integer arguments to change when a function is called. However, a suspicious rubyist might open up that class and add some debugging statements, uncovering our treachery. Let’s be a little more subtle.

previous = {} depth = 0 set_trace_func proc { |event, file, line, id, binding, classname| if event == 'c-call' if depth == 0 and rand < 0.5 # Get the caller's local variables locals = binding.eval("local_variables").inject({}) do |vars, name| vars[name] = binding.eval name vars end # Pick some strings strings = locals.delete_if do |name, value| not value.kind_of? String end i = rand strings.size str1 = strings.keys[i] str2 = strings.keys[(i + 1) % strings.size] # And play musical chairs previous[str1] = strings[str1].dup previous[str2] = strings[str2].dup binding.eval "#{str1}.replace #{previous[str2].inspect}" binding.eval "#{str2}.replace #{previous[str1].inspect}" end depth += 1 elsif event == 'c-return' depth -= 1 if depth <= 0 # Whoops, the music stopped! Everyone grab your original seat! depth = 0 previous.each do |name, value| binding.eval "#{name}.replace #{value.inspect}" end previous = {} end end } a = "hello" b = "world" puts [a, b] # => "world\nhello" # Sometimes.

For best results, re-order the arguments to functions which take more than 2 non-hash arguments in a deterministic way.

Next Level Language Maneuver

For the Haskell and Erlang enthusiast, might I suggest:

# Enforce immutable programming. Silently. lambda { default_frame = lambda do { :locals => {} } end # Stack contains the bound local variables for each method call. stack = [default_frame[]] set_trace_func proc { |event, file, line, id, binding, classname| if event == 'call' or event == 'c-call' stack << default_frame[] elsif event == 'return' or event == 'c-return' stack.pop stack << default_frame[] if stack.empty? end binding.eval("local_variables").each do |var| # Get the original and current values of this variable old = stack.last[:locals][var] new = binding.eval var if old != nil unless old == new # The variable has changed! binding.eval("lambda { |v| #{var} = v }")[old] end else # We haven't seen this variable before begin original = new.dup # Immediately replace this variable with a *different* duplicate of # itself to prevent mutator methods from leaking across contexts, or # corrupting our stack binding.eval("lambda { |v| #{var} = v}")[new.dup] rescue # Guess you can't dup that original = new end stack.last[:locals][var] = original end end } }.call # Any grade schooler could tell you this would have been nonsense. x = 1 x = 2 puts x # => 1. Ahhhh, much better. # Your functions are idempotent, right? Well, they are now! array = [1, 2, 3] array.delete 2 p array # => [1, 2, 3] # Makes destructive methods more relaxing! string = 'good' puts lambda { |str| str.replace 'evil' }[string] # => evil puts string # => good # Blocks don't close over their arguments, sadly. elem = 0 [1,2,3].each do |elem| puts elem end puts elem # => 0, 0, 0, and more 0.

Generalization to class and global variables is left to the reader.

Note that you can do this with fewer copies required, but keeping track of which bindings include references to a given mutated object is nontrivial.

Suggested Exercises

  1. PHP programmers may want to try implementing $REGISTER_GLOBALS for Rack.
  2. Take it one step further and convert all variables to global scope.
  3. Leak variables named ‘username’ and ‘password’ to unexpected places.
  4. Automatically initialize variables which are not explicitly set to helpful values.
  5. Override the assignment operator.
  6. Swap the values of similarly named variables.
  7. Automatically memoize a function. Try using throw/catch, signal handlers, or redefining methods in the binding to affect control flow.
  8. Unroll .each blocks “for speed.”

Yamr Yamr

Sometime in the last couple of weeks, the Yammer AIR client stopped fetching new messages. I've grown to really like the service, especially since it delivers a running stream of commits to the Git repos I'm interested in, so I broke down and wrote my own client.

Yamr is a little ruby/gtk app built on top of jstewart's yammer4r and the awesome danlucraft's Ruby Webkit-GTK+ bindings. No seriously, Dan, you rock.

Features

  • Reads messages
  • Posts messages
  • OAUTH support
  • Notifies you using libnotify, instead of that awful AIR thing.

Anyway, feel free to fork & hack away. You should be able to build ruby-webkit without much trouble on ubuntu; I've included directions in the readme. It's super-basic right now, but most of the core functionality is ready to start adding features. Enjoy!

All right boys and girls, I'm all for quality releases and everything, but Cortex Reaver 0.2.0 is raring to go. Just gem upgrade to get some awesome blogging goodness.

A bit of context, in case you haven't been keeping up with the real-time web craze:

RSSCloud is an... idea* for getting updates on RSS feeds to clients faster, while decreasing network load. In traditional RSS models, subscribers make an HTTP request every 10 minutes or so to a publisher to check for updates. In RSSCloud, a cloud server aggregates several feeds from authors. When feeds are changed, their authors send an HTTP request to the cloud server notifying them of the update. The cloud server contacts one or more subscribers of the feed, sending them a notice that the feed has changed. The subscribers then request the feed from the authors. Everyone gets their updates faster, and with fewer requests across the network.

The Problem

When you subscribe to an RSSCloud server, you tell it several things about how to notify you of changes:

  1. A SOAP/XML-RPC notify procedure (required but useless for REST)
  2. What port to call back on.
  3. What path to make the request to.
  4. The protocol you accept (XML-RPC, SOAP, or HTTP POST).
  5. The URLs of the feeds to subscribe to.

There's something missing! The RSSCloud walkthrough says:

Notifications are sent to the IP address the request came from. You can not request notification on behalf of another server.

That's great unless your originating IP address can't receive HTTP traffic. That rules out users behind a NAT or behind a firewall (without forwarded ports). That's most home users with routers, users on typical corporate networks, etc. It won't work on the iPhone. And, to a lesser degree, it rules out the cloud itself.

One of the common aspects of cloud computing is that compute nodes (and their IP addresses) may come and go as needed. For example, Vodpod.com is served by several different servers which (through a combination of heartbeat-failover, IP routing, and HTTP proxying) may enter and leave the cluster at any time without service interruption. So, if one of those servers subscribes to a feed, it might not be online to receive pings later. You'd have to subscribe to each feed from every host to guarantee that you'd continue to receive responses. The problem only becomes worse when you start looking at cloud services like EC2.

The RSSCloud mailing list has been tossing around the obvious solution for several weeks now: just include a "domain" parameter which says what FQDN or IP address to connect to. On Friday, Dave Winer included it in his walkthrough. Even so, most of the cloud servers (Wordpress, for example) out there don't support it yet.

A Partial Solution

What can you do to get around this?

One solution is to use PubSubHubbub, which uses a full callback URL. Additionally, Superfeedr will even use RSSCloud to offer real-time updates through PuSH, effectively bridging the two schemes.

Alternatively, you can lie (sort of) about your address. This is what we've done at Vodpod to get Wordpress to call us back correctly. When we subscribe, we actually re-bind the TCP socket to a publically accessible IP. That IP is guaranteed to go somewhere in the cluster which can accept the RSSCloud update ping. Here's a truly evil hack to do just that, by replacing Net::HTTP's TCP socket with our own.

res = Net::HTTP.new(uri.host, uri.port).start do |http| # Replace the socket with one that we bind to the interface we want to use. # The local IP address we'd like RSSCloud to call back. local_addr = Socket.pack_sockaddr_in 0, '208.101.30.10' # The RSSCloud server IP address remote_addr = Socket.pack_sockaddr_in uri.port, uri.host # Create a new socket s = Socket.new Socket::AF_INET, Socket::SOCK_STREAM, 0 # Bind it to the local address s.bind local_addr # Wrap for Net::HTTP and connect socket = Net::BufferedIO.new(s) s.connect remote_addr # Replace the HTTP client's connection http.instance_variable_set('@socket', socket) # And make the request http.request(req) end

*Dave says it's not a standard, or a spec. As far as I can tell, RSSCloud consists of a mailing list, a walkthrough of how implementations can handle the pings/cloud tag in RSS feeds, and a bunch of loosely federated implementations with varying degrees of compatibility. Some speak XML-RPC, some speak SOAP, some speak plain-old REST, etc...

I've been working a lot on Cortex Reaver lately, with several new features in the pipe. I'm using Vim for awesome syntax highlighting, refining the plugins/sidebar infrastructure, creating improved admin tools for long-running tasks (like rebuilding all the photo sizes) and fixing several bugs in the CRUD lifecycle. All that comes in a slick new visual style, including a new stylesheet/js compiler which makes page loads much faster (eliminating something like 20 external HTTP requests in the non-cached case). Finding time to really sit down and hack on CR has been tough lately with all the grad school/work stuff going on, but as new users are coming on board I'm motivated to keep improving.

Rails, what were you thinking? You went and wrote your own ridiculous JSON serializer in pure Ruby, when a perfectly good C-extension gem already does the job 20 times faster. What's worse, you gave your to_json method (which clobbers every innocent object it can get its grubby little hands on) a completely incompatible method signature from the standard gem version. You just can't mix the two, which is ALL KINDS OF FUN for those of us who need to push more than 10 reqs/sec.

Then there's awesome behavior like this:

puts {:rails => /fail/x}.to_json #=> {"rails" => /fail/x}

That's not even valid ECMAScript, let alone JSON. It's a standard for a reason, foo! It's not like you can opt out, either. You're stuck with this pathologically malingering monkeypatch any time you require ActiveSupport.

At least they figured it out eventually.

I released version 0.1.3 of Construct today. It incorporates a few bugfixes for nested schemas, and should be fit for general use.

I got tired of writing configuration classes for everything I do, and packaged it all up in a tiny gem: Construct.

Highlights

OpenStruct-style access to key-value pairs.

config.offices = ['Sydney', 'Tacoma']

Nested structures are easy to handle.

config.fruits = {
  :banana => 'slightly radioactive',
  :apple => 'safe'
}
config.fruits.banana # => 'slightly radioactive'

Overridable, self-documenting schemas for default values.

config.define(:address, :default => '1 North College St')
config.address # => '1 North College St'
config.address = 'Urnud'
config.address # => 'Urnud'

Straightforward YAML saving and loading.

config.to_yaml; Construct.load(yaml)

Define whatever methods you like on your config.

class Config < Construct
  def fooo
    foo + 'o'
  end
end

It's available as a gem:

gem install construct

I've migrated Aphyr.com off its old, dying hardware onto a spiffy new Linode. So far it's going pretty well! My new blog engine, Cortex Reaver is also up and running.

Currently waiting for my flight to depart from PDX. The 15 inches of snow Nature dropped on us this week meant long waits for most people, but I was able to get through ticketing and security in about half an hour, and the MAX got me here just fine (though we passed a couple jacknifed semis on the way). Now all I have to do is make my connection through SeaTac in an hour. That... could be interesting.

I got distracted from writing my backup system and started an IRC client... argh, why are the interesting problems so hard to stop working on? I essentially wasted my whole weekend on this.

On the other hand, it's pretty cool. :D

colors.png

It looks like there's a catastrophic memory leak in the Rails app I wrote last summer, and in trying to track it down, I needed a way to look at process memory use over time. So, I put together this little library, LinuxProcess, which uses the proc filesystem to make it easier to monitor processes with Ruby. Enjoy!

I've updated Sequenceable with new code supporting restriction of sequences to subsets through some sneaky SQL merging, ellipsized pagination ("1 ... 4, 5, 6 ... 10"), and proper handling of multiple sort columns.

Ruby on Rails is much, much, slower than I would like. It takes around .25 seconds to render the index page: about 10 times longer than Ragnar. I've alleviated the problem somewhat by switching to a Mongrel cluster with Apache's mod-balancer, but performance is still slow. I can't add any more foreign key constraints--pretty much every feasible relationship is locked down. I guess it's just down to ActiveRecord tuning, and figuring out how to make ERB run with any semblance of speed. Possibly memcached, too...

Anyway, sorry for the inexplicable downtime. Things are still moving around quite a bit.

Added ATOM feeds for journals, photographs, and a combined feed. Also added EXIF support to photographs, such that files with EXIF headers (those from about the last year or so) display some shot information as well.

Also, I caught bash programmable completion completing paths on remote servers over SSH. I was copying a file from the laptop to the server, hit tab to complete the directory on the server side... and it worked. That was quite surprising, when I realized that my ordinarily useless request had actually been carried out. Hurrah for bash making my life easier.

Had significant confusion yesterday night, when the tested and (so I thought) working code from the development machine threw strange exceptions on aphyr.com itself. The box claimed NoMethodError for Rational.reduce and Rational.to_f, both of which were quite clearly part of the standard library. Eventually realized that this was due to my custom Rational class, which has a very different interface from the standard library's version. Changed RUBYLIB to not load my custom libraries, and it worked.

After the last three months, I've come to the conclusion: Ruby is a wonderful language, and I don't want to write code in Perl any more. I like Perl: it's fast, powerful, and has a terrific community around it. If you wanted to run your television through a LEGO USB IR transceiver, yeah, there's probably something in CPAN for that. However, I'm finding that the rocky syntax of Perl gets in the way of my thinking. I don't want to use

$hash_of_hashes->{'key'}->{'key2'}

to get at at what should be a simple data structure. Using five special characters on a variable makes my code hard to understand, and makes it easier to cause bugs. It's a good language, but Perl has its limits. After spending months writing clean, joyful code, I think that the Ruby language maps more closely to the domains of the problems I'm trying to solve.

There are a lot of things I like very much about Ragnar: it's quite fast, extensively configurable, and compliant with web standards by design. XSLT transforms keep logic and presentation well separated, and the powerful query engine makes node-level logic simple. I plan to preserve the best aspects of this design, but refactor the code into a Ruby platform, separate node data taipus into a more traditional database schema for efficiency, and define a plugin architecture with callbacks for node lifecycle handling. For now, at least, I'll avoid the temptation to use Rails for this project: I prefer XSLT, and working this way is more fun for me. :-)

It'll be nice to have a new project.

So I'm back at work again, but my job has changed. No longer am I the stealthy IT ninja, whose responsibility it is to replace components the day before they they break, anticipate obscure printer errors that could bring ruin to the marketing department, repair desktops while their users are out for a cup of coffee, and arrive silently in an employee's cube before they hang up the phone. I'm still messing about with the network monitoring system (especially the TAP gateway, which fails silently half the time), but my official job is now within the realm of support. Working against time on a laptop with a failing hard drive, I'm writing a support web site with the Ruby on Rails framework which will interface with our customer relations management service.

Let me tell you this: Ruby. Is. Amazing.

I've set aside this week simply to learn the language and the framework, and the sheer amount of magic in Rails is astounding. I'm not entirely sure I like the eRuby template system for views, but the astounding simplicity of ActiveRecord makes the whole thing worth it. The way it manages relationships between tables takes all the work out of SQL management... and some of the methods available for model objects are startlingly useful. Data validation rules make a lot more sense when implemented as a part of a smart model object, rather than being controller-specific.

Then there's the controller logic, which when coupled with RoR's url_for() logic solves the problem I've been facing with Ragnar since the beginning: how to relate the URL to the scripts which interpret them. I've pushed the logic into the XSLT templates and allowed the designer of those to create their logic thus, but that methodology makes it difficult to dynamically generate URLs--they have to be created by the controller and passed to the template as parts of the XML document.

In any case, learning the Rails framework has been a lot of fun, and I'm looking forward to starting the real work next week.

Copyright © 2015 Kyle Kingsbury.
Non-commercial re-use with attribution encouraged; all other rights reserved.
Comments are the property of respective posters.