The riak-users list receives regular questions about how to secure a Riak
cluster. This is an overview of the security problem, and some general techniques to approach it.
You can skip this, but it may be a helpful primer.
Consider an application composed of agents (Alice, Bob) and a datastore (Store). All events in the system can be parameterized by time, position (whether the event took place in Alice, Bob, or Store), and the change in state. Of course, these events do not occur arbitrarily; they are connected by causal links (wires, protocols, code, etc.)
If Alice downloads a piece of information from the Store, the two events E
(Store sends information to Alice) and F (Alice receives information from
store) are causally connected by the edge EF. The combination of state events with causal connections between them comprises a directed acyclic graph.
A secure system can be characterized as one in which only certain events and
edges are allowed. For example, only after a nuclear war can persons on boats
fire ze missiles.
A system is secure if all possible events and edges fall within the
proscribed set. If you're a weirdo math person you might be getting excited about line graphs and dual spaces and possibly lightcones but... let's bring this back to earth.
Authentication vs Authorization
Authentication is the process of establishing where these events are taking place, in system space. Is the person or agent on the other end of the TCP socket really Alice? Or is it her nefarious twin? Is it the Iranian government?
Authorization is the problem of deciding what edges are allowed. Can Alice download a particular file? Can Bob mark himself as a publisher?
You can usually solve these problems independently of one another.
Asymmetric cryptography combined with PKI allows you to trust big entities,
like banks with SSL certificates. Usernames with expensively hashed, salted
passwords can verify the repeated identity of a user to a low degree of trust.
Oauth providers (like Facebook and Twitter), or OpenID also approach web
authentication. You can combine these methods with stronger systems, like RSA
secure tokens, challenge-response over a second channel (like texting a code to
the user's cell phone), or one-time passwords for higher guarantees.
Authorization tends to be expressed (more or less formally) in code. Sometimes
it's called a policy engine. It includes rules saying things like "Anybody
can download public files", "a given user can read their own messages", and
"only sysadmins can access debugging information".
There are a couple of common ways that security can fail. Sometimes the system,
as designed, allows insecure operations. Perhaps a check for user identity is
skipped when accessing a certain type of record, letting users view each
other's paychecks. Other times the abstraction fails; the SSL channel you
presumed to be reliable was tapped, allowing information to flow to an
eavesdropper, or the language runtime allows payloads from the network to be
executed as code. Thus, even if your model (for instance, application code) is
provably correct, it may not be fully secure.
As with all abstractions on unreliable substrates, any guarantees you can make
are probabilistic in nature. Your job is to provide reasonable guarantees
without overwhelming cost (in money, time, or complexity). And these problems are hard.
There are some overall strategies you can use to mitigate these risks. One of
them is known as defense in depth. You use overlapping systems which prevent
insecure things from happening at more than one layer. A firewall prevents
network packets from hitting an internal system, but it's reinforced by an SSL
certificate validation that verifies the identity of connections at the
You can also simplify building secure systems by choosing to whitelist approved
actions, as opposed to blacklisting bad ones. Instead of selecting evil
events and causal links (like Alice stealing sensitive data), you enumerate the
(typically much smaller) set of correct events and edges, deny all actions,
then design your system to explicitly allow the good ones.
Re-use existing primitives. Standard cryptosystems and protocols exist for
preventing messages from being intercepted, validating the identity of another
party, verifying that a message has not been tampered with or corrupted, and
exchanging sensitive information. A lot of hard work went into designing these systems; please use them.
Create layers. Your system will frequently mediate between an internal
high-trust subsystem (like a database) and an untrusted set of events (e.g. the
internet). Between them you can introduce a variety of layers, each of which
can make stricter guarantees about the safety of the edges between events. In the case of a web service:
- TCP/IP can make a reasonable guarantee that a stream is not corrupted.
- The SSL terminator can guarantee (to a good degree) that the stream of bytes you've received has not been intercepted or tampered with.
- The HTTP stack on top of it can validate that the stream represents a valid HTTP request.
- Your validation layer can verify that the parameters involved are of the correct type and size.
- An authentication layer can prove that the originating request came from a certain agent.
- An authorization layer can check that the operation requested by that person is allowed
- An application layer can validate that the request is semantically
valid--that it doesn't write a check for a negative amount, or overflow an internal buffer.
- The operation begins.
Minimize trust between discrete systems. Don't relay sensitive information over
channels that are insecure. Force other components to perform their own
authentication/authorization to obtain sensitive data.
Minimize the surface area for attack. Write less code, and have less ways to interact with the system. The fewer pathways are available, the easier they are to reinforce.
Finally, it's worth writing evil tests to experimentally verify the correctness
of your system. Start with the obvious cases and proceed to harder ones. As the
complexity grows, probabilistic methods like Quickcheck or fuzz testing can be
Remember those layers of security? Your datastore resides at the very center of
that. In any application which has shared state, your most trusted, validated,
safe data is what goes into the persistence layer. The datastore is the most
trusted component. A secure system isolates that trusted zone with layers of
intermediary security connecting it to the outside world.
Those layers perform the critical task of validating edges between database
events (e.g. store Alice's changes to her user record) and the world at large
(e.g. alice submits a user update). If your security model is completely open,
you can expose the database directly to the internet. Otherwise, you need code
to ensure these actions are OK.
The database can do some computation. It is, after all, software. Therefore
it can validate some actions. However, the datastore can only discriminate
between actions at the level of its abstraction. That can severely limit its
For instance, all datastores can choose to allow or deny connections. However, only relational stores can allow or deny actions on the the basis of the existence of related records, as with foreign key constraints. Only column-oriented stores can validate actions on the basis of columns, and so forth.
Your security model probably has rules like "Only allow HR employees to read
other employee's salaries" and "Only let IT remove servers". These constructs,
"HR employees", "Salaries", "IT", "remove", and "servers" may not map to the
datastore's abstraction. In a key-value store, "remove" can mean "write a copy
of a JSON document without a certain entry present". The key-value store is
blind to the contents of the value, and hence cannot enforce any security
policies which depend on it.
In almost every case, your security model will not be embeddable within the
datastore, and the datastore cannot enforce it for you. You will need to apply
the security model at least partially at a higher level.
Doing this is easy.
Allow only trusted hosts to initiate connections to the database, using
firewall rulesets. Usenames and passwords for database connections typically
provide little additional security, as they're stored in dozens of places
across the production environment. Relying on these credentials or any
authorization policy linked to them (e.g. SQL GRANT) is worthless when you
assume your host, or even client software, has been compromised. The attacker
will simply read these credentials from disk or off the wire, or exploit active connections in software.
On trusted hosts, between the datastore and the outside world, write the
application which enforces your security model. Separate layers into separate
processes and separate hosts, where reasonable. Finally, untrusted hosts
connect these layers to the internet. You can have as many or as few layers as
you like, depending on how strongly you need to guarantee isolation and
Putting it all together
Lets sell storage in Riak to people, over the web. We'll present the same API
as Riak, over HTTP.
Here's a security model: Only traffic from users with accounts is allowed. Users
can only read and write data from their respective buckets, which are
transparently assigned on write. Also, users should only be able to issue x requests/second, to prevent them from interfering with other users on the cluster.
We're going to presuppose the existence of an account service (perhaps Riak,
mysql, whatever) which stores account information, and a bucket service that
registers buckets to users.
- Internet. Users connect over HTTPS to an application node.
- The HTTPS server's SSL acceptor decrypts the message and ensures transport validity.
- The HTTP server validates that the request is in fact valid HTTP.
- The authentication layer examines the HTTP AUTH headers for a valid username and password, comparing them to bcrypt-hashed values on the account service.
- The rate limiter checks that this user has not made too many requests recently, and updates the request rate in the account service.
- The Riak validator checks to make sure that the request is a well-formed request to Riak; that it has the appropriate URL structure, accept header, vclock, etc. It constructs a new HTTP request to forward on to Riak.
- The bucket validator checks with the bucket service to see if the bucket to be used is taken. If it is, it verifies that the current authenticated user matches the bucket owner. If it isn't, it registers the bucket.
- The application node relays the request over the network to a Riak node.
- Riak nodes are allowed by the firewall to talk only to application nodes. The Riak node executes the request and returns a response.
- The response is immediately returned to the client.
Naturally, this only works for certain operations. Mapreduce, for instance,
excecutes code in Riak. Exposing it to the internet is asking for trouble.
That's why we need a Riak validation layer to ensure the request is acceptable;
it can allow only puts and gets.
I hope this gives you some idea of how to architect secure applications. Apologies for the shoddy editing--I don't have time for a second pass right now and wanted to get this out the door. Questions and suggestions in the comments, please! :-)