Riemann: Breaking the 10k barrier

2012-02-18

When I designed UState, I had a goal of a thousand state transitions per second. I hit about six hundred on my Macbook Pro, and skirted 1000/s on real hardware. Eventmachine is good, but I started to bump up against concurrency limits in MRI’s interpreter lock, my ability to generate and exchange SQL with SQLite, and protobuf parse times. So I set out to write a faster server. I chose Clojure for its expressiveness and powerful model of concurrent state–and more importantly, the JVM, which gets me Netty, a mature virtual machine with a decent thread model, and a wealth of fast libraries for parsing, state, and statistics. That project is called Riemann.

Today, I’m pleased to announce that Riemann crossed the 10,000 event/second mark in production. In fact it’s skirting 11k in my stress tests. (That final drop in throughput is an artifact of the graph system showing partially-complete data.)

Continue reading (285 words)

Highway One

Motorcycle

2011-12-15

Continue reading (1 words)

It Boggles the Mind

Software Security

2011-11-08

Microsoft released this little gem today, fixing a bug which allowed remote code execution on all Windows Vista, 6, and Server 2008 versions.

...allow remote code execution if an attacker sends a continuous flow of specially crafted UDP packets to a closed port on a target system.

Meanwhile, in an aging supervillain’s cavernous lair…

Continue reading (63 words)

Why is the RAM always gone?

Software Funny

2011-10-21

Continue reading (1 words)

Endian-ness

Software Funny

2011-10-21

Continue reading (1 words)

Do not expose Riak to the internet

Software Security Riak Databases

2011-10-19

Major thanks to John Muellerleile (@jrecursive) for his help in crafting this.

Actually, don’t expose pretty much any database directly to untrusted connections. You’re begging for denial-of-service issues; even if the operations are semantically valid, they’re running on a physical substrate with real limits.

Riak, for instance, exposes mapreduce over its HTTP API. Mapreduce is code; code which can have side effects; code which is executed on your cluster. This is an attacker’s dream.

Continue reading (619 words)

Riak-pipe mapreduce

Software Riak

2011-10-07

As a part of the exciting series of events (long story…) around our riak cluster this week, we switched over to riak-pipe mapreduce. Usually, when a node is down mapreduce times shoot through the roof, which causes slow behavior and even timeouts on the API. Riak-pipe changes that: our API latency for mapreduce-heavy requests like feeds and comments fell from 3-7 seconds to a stable 600ms. Still high, but at least tolerable.

[Update] I should also mention that riak-pipe MR throws about a thousand apparently random, recoverable errors per day. Things like map_reduce_error with no explanation in the logs, or {“lineno”:466,“message”:“SyntaxError: syntax error”,“source”:“()”} when the source is definitely not “()”. Still haven’t figured out why, but it seems vaguely node-dependent.

Continue reading (121 words)

Oracle, on NoSQL

Software NoSQL Oracle

2011-10-03

Do you really want to be contributing to an open source effort? ... Don't be risking your data on NoSQL databases.

Says the company which is scheduling talks around Oracle NoSQL at its OpenWorld conference.

[Edit] Their whitepaper on Oracle NoSQL DB is a hilarious inversion of the above.

Continue reading (52 words)

Systems Security: A Primer

Software Security Riak

2011-10-03

The riak-users list receives regular questions about how to secure a Riak cluster. This is an overview of the security problem, and some general techniques to approach it.

Theory

You can skip this, but it may be a helpful primer.

Continue reading (1867 words)

Progressive House

Funny

2011-10-01

Progressive House

Continue reading (2 words)

It's always DNS's fault

Software Operations

2011-09-29

One of the hard-won lessons of the last few weeks has been that inexplicable periodic latency jumps in network services should be met with an investigation into named.

API latency has been wonky the last couple weeks; for a few hours it will rise to roughly 5 to 10x normal, then drop again. Nothing in syslog, no connection table issues, ip stats didn’t reveal any TCP/IP layer difficulties, network was solid, no CPU, memory, or disk contention, no obviously correlated load on other hosts. Turns out it was Bind getting overwhelmed (we have, er, nontrivial DNS demands) and causing local domain resolution to slow down. For now I’m just pushing everything out in /etc/hosts, but will probably drop a local bind9 on every host as a cache.

Continue reading (141 words)

Scaling at Showyou

Software Showyou Operations

2011-09-27

John Mullerleile, Phil Kulak, and I gave a talk tonight, entitled “Scaling at Showyou.”

I gave an overview of the Showyou architecture, including our use of Riak, Solr, and Redis; strategies for robust systems; and our comprehensive monitoring system. You may want to check out:

Continue reading (177 words)