In reply to myself, it seems that upgrading from Java 6 (pre-installed on my mac) to Java 8 (from Oracle) has improved the speed of the multi-threaded version.
Luke Worth, on
I’m trying to figure out exercise 3. Currently I have:
(time (let [a (sum 0 (/ 1e8 2)) b (sum (/ 1e8 2) 1e8)] (+ a b)))
What have I done wrong?
Dom, on
I didn’t read the article because it was about a subject matter I know nothing about and have zero interest in. I really, really enjoyed the gifs though, so there’s that.
Tina smith, on
Can you please show me all the rooms?If not thanks any way.-Tina-
Simon Massey, on
+1 for Datomic next. that would be a clojure head-to-head!
NG, on
Do you have any updates since 2.6 was released? Have you ran the tests against the latest?
Martin Grotzke, on
Great article, thanks! I’d also love to read a test of solr!
I’d be interested to see some content included on how to build proper applications from the ground up. This tweet of yours pointing to Stewart Sierra’s work comes to mind: https://twitter.com/aphyr/status/450023818778533888
Mike, on
Thanks for making this series! I’m a novice trying to learn distributed systems and clojure, this blog is awesome. Two quick suggestions for the getting started section:
Ii’m on Ubuntu 14.04 and this works for me
which javacc
waiting eagerly, Thanks and appreciate the flawless explanation.
Bennie Kahler-Venter, on
I’d really be interested to see how splunk cluster hold up to network partitions. I suspect it will break really bad…
Rangel Spasov, on
Great stuff! All articles so far have been one of the best resources for learning Clojure from the ground up!
Dmitriy, on
Hi! Thanks for the great effort you’re putting to write this book! Do you accept donations?
Or, on
Wow, this chapter really steps up the level of awesomeness, I’m really looking forward to the rest. Thank you so much!
Bruno Kim, on
Phew, just finished reading all of Jepsen series and I feel both stupider and wiser: Feels good to know what I don’t know. I’ll forward some episodes to my Distributed Computing teacher, you hit the right spot of theory and practice (and Clojure!) and I expect more students to be interested by your work.
ajay, on
Thank you for this great tutorial.
anonymous, on
Is there a table of contents page that I’ve missed?
Right up top, at the tags list: http://aphyr.com/tags/Clojure-from-the-ground-up. I’ll be reformatting all this work for the book, eventually, where there’ll be actual chapter headings, etc. Had a bunch of other stuff on my plate recently but starting to get back to writing on CFTGU this month. :)
Aphyr, on
Also in parts of the article you refer to “primary”, when I think you mean “master”.
Jepsen uses “primary” to refer consistently to privileged replicas in each distributed system. I try to avoid the use of “master/slave” in this context because some folks find it offensive.
Tom, on
Am greatly appreciating this tutorial. Actually, after the second definition, astronauts has 2 entries.
Aphyr, on
This is looking west towards Three Sisters, in eastern Oregon.
Michael South, on
The database may be consistent, but the system isn’t. A concurrent request to the db will get the answer “yes, the transaction has committed”, but the same request of the remote client gets “no, the transaction has not yet committed.” The system may eventually become consistent, if the partition is healed and the acknowledgement reaches the client. But it isn’t consistent until that point.
And the client can’t just wait indefinitely for acknowledgement–the commit request may not have reached the server, in which case the client would deadlock forever. Not to mention practical concerns (a customer and clerk aren’t going to wait very long for a credit card transaction to complete). Introducing timeouts then causes the temporary inconsistency to become permanent.
Aphyr, on
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ according to this benchmark with YCSB (Yahoo Cloud Servicing Benchmark) mongodb, hbase and cassandra shows no problem with writing 50 million files. how hard are the benchmark you used in these tests?
These tests are trivial; usually writing only a few hundred or thousand records over the course of a few minutes. This suggests that YCSB tells us more about ideal performance than real-world correctness.
Sergey Maskalik, on
Fascinating, I’ve learned a huge amount about distributed databases and CAP from this post! Thank you
This seems to be making a mountain out of a molehill. When a transaction is “in doubt”, after commit has been sent, but before the commit it’s been ack’d or failed, there isn’t a violation of consistency. The test is whether concurrent requests get the same answer. It takes time to get that answer, and in the case of a network partition that affects us, it could take a very long time. But if the answer eventually comes, we can verify that it was consistent by comparing with other requests at the same time. The fact that we don’t know yet, doesn’t mean that it will fail the test once we can perform it.
Roman Shaposhnik, on
Great article and even greater blog! Any chance you can take SolrCloud up for a spin?
minh, on
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/
according to this benchmark with YCSB (Yahoo Cloud Servicing Benchmark) mongodb, hbase and cassandra shows no problem with writing 50 million files. how hard are the benchmark you used in these tests? -minh
Simon MacMullen, on
This is not going to be as detailed or clear as I’d like I’m afraid, I’m stupidly busy at the moment. But it’s probably worth saying something! First of all, thank you for a well-written and well-researched article.
Some individual points now:
pause_minority mode does not shut down fast enough in any released version of RabbitMQ. It just initiates an orderly shutdown - and only does that after making an attempt to reconnect to the majority (which can take some time if TCP connections are timing out). This is fixed in the nightly builds, pause_minority mode will now kill all network connections in a minority immediately that a minority is detected to reduce the extent to which we might make promises which can’t be kept, and then start an orderly shutdown.
Of course, reducing the window for failure is not the same as eliminating it altogether. Queue merging on partition recovery is an approach that could be used here. It’s a relatively involved task though so I would not want to promise anything soon.
It’s a medium-to-long-term aspiration to have some sort of HA-ish queue in federation, since it’s clear that people want some sort of replication that works over unreliable links. Of course, quite what the precise semantics of such an eventually-consistent queue would look like is To Be Determined…
In reply to myself, it seems that upgrading from Java 6 (pre-installed on my mac) to Java 8 (from Oracle) has improved the speed of the multi-threaded version.
I’m trying to figure out exercise 3. Currently I have:
(time (let [a (future (sum 0 (/ 1e8 2))) b (future (sum (/ 1e8 2) 1e8))] (+ @a @b)))but this has roughly the same runtime as just
(time (let [a (sum 0 (/ 1e8 2)) b (sum (/ 1e8 2) 1e8)] (+ a b)))What have I done wrong?
I didn’t read the article because it was about a subject matter I know nothing about and have zero interest in. I really, really enjoyed the gifs though, so there’s that.
Can you please show me all the rooms?If not thanks any way.-Tina-
+1 for Datomic next. that would be a clojure head-to-head!
Do you have any updates since 2.6 was released? Have you ran the tests against the latest?
Great article, thanks! I’d also love to read a test of solr!
Also getting a 404 on http://aphyr.com/data/posts/294/link (papers on isolation and consistency).
I’d be interested to see some content included on how to build proper applications from the ground up. This tweet of yours pointing to Stewart Sierra’s work comes to mind: https://twitter.com/aphyr/status/450023818778533888
Thanks for making this series! I’m a novice trying to learn distributed systems and clojure, this blog is awesome. Two quick suggestions for the getting started section:
Ii’m on Ubuntu 14.04 and this works for me
which javacccurl -O https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/leinAnd, just forgot to say. Thanks for this series of article on Clojure! Learned a lot from them.
Hi Kyle
I think your solution given above might be missing a default value for
results.(defn my-filter [f coll] (reduce (fn [results x] (if (f x) (conj results x) results)) [] coll))waiting eagerly, Thanks and appreciate the flawless explanation.
I’d really be interested to see how splunk cluster hold up to network partitions. I suspect it will break really bad…
Great stuff! All articles so far have been one of the best resources for learning Clojure from the ground up!
Hi! Thanks for the great effort you’re putting to write this book! Do you accept donations?
Wow, this chapter really steps up the level of awesomeness, I’m really looking forward to the rest. Thank you so much!
Phew, just finished reading all of Jepsen series and I feel both stupider and wiser: Feels good to know what I don’t know. I’ll forward some episodes to my Distributed Computing teacher, you hit the right spot of theory and practice (and Clojure!) and I expect more students to be interested by your work.
Thank you for this great tutorial.
Right up top, at the tags list: http://aphyr.com/tags/Clojure-from-the-ground-up. I’ll be reformatting all this work for the book, eventually, where there’ll be actual chapter headings, etc. Had a bunch of other stuff on my plate recently but starting to get back to writing on CFTGU this month. :)
Jepsen uses “primary” to refer consistently to privileged replicas in each distributed system. I try to avoid the use of “master/slave” in this context because some folks find it offensive.
Am greatly appreciating this tutorial. Actually, after the second definition, astronauts has 2 entries.
This is looking west towards Three Sisters, in eastern Oregon.
The database may be consistent, but the system isn’t. A concurrent request to the db will get the answer “yes, the transaction has committed”, but the same request of the remote client gets “no, the transaction has not yet committed.” The system may eventually become consistent, if the partition is healed and the acknowledgement reaches the client. But it isn’t consistent until that point.
And the client can’t just wait indefinitely for acknowledgement–the commit request may not have reached the server, in which case the client would deadlock forever. Not to mention practical concerns (a customer and clerk aren’t going to wait very long for a credit card transaction to complete). Introducing timeouts then causes the temporary inconsistency to become permanent.
These tests are trivial; usually writing only a few hundred or thousand records over the course of a few minutes. This suggests that YCSB tells us more about ideal performance than real-world correctness.
Fascinating, I’ve learned a huge amount about distributed databases and CAP from this post! Thank you
It looks like Etcd have merged thier fix for this issue, a parameter
quorum=true: https://github.com/coreos/etcd/pull/866This seems to be making a mountain out of a molehill. When a transaction is “in doubt”, after commit has been sent, but before the commit it’s been ack’d or failed, there isn’t a violation of consistency. The test is whether concurrent requests get the same answer. It takes time to get that answer, and in the case of a network partition that affects us, it could take a very long time. But if the answer eventually comes, we can verify that it was consistent by comparing with other requests at the same time. The fact that we don’t know yet, doesn’t mean that it will fail the test once we can perform it.
Great article and even greater blog! Any chance you can take SolrCloud up for a spin?
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ according to this benchmark with YCSB (Yahoo Cloud Servicing Benchmark) mongodb, hbase and cassandra shows no problem with writing 50 million files. how hard are the benchmark you used in these tests? -minh
This is not going to be as detailed or clear as I’d like I’m afraid, I’m stupidly busy at the moment. But it’s probably worth saying something! First of all, thank you for a well-written and well-researched article.
Some individual points now:
pause_minority mode does not shut down fast enough in any released version of RabbitMQ. It just initiates an orderly shutdown - and only does that after making an attempt to reconnect to the majority (which can take some time if TCP connections are timing out). This is fixed in the nightly builds, pause_minority mode will now kill all network connections in a minority immediately that a minority is detected to reduce the extent to which we might make promises which can’t be kept, and then start an orderly shutdown.
Of course, reducing the window for failure is not the same as eliminating it altogether. Queue merging on partition recovery is an approach that could be used here. It’s a relatively involved task though so I would not want to promise anything soon.
It’s a medium-to-long-term aspiration to have some sort of HA-ish queue in federation, since it’s clear that people want some sort of replication that works over unreliable links. Of course, quite what the precise semantics of such an eventually-consistent queue would look like is To Be Determined…
Gorgeous!