Aphyr, what am I doing wrong? I’m at the jepsen directory inside the docker container tjake/jepsen, and running this:
$ lein test jepsen.system.etcd-test
Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.pom from central
Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.pom from clojars
Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.jar from central
Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.jar from clojars
WARN ignoring checkouts directory knossos as it does not contain a project.clj file.
Exception in thread “main” java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath: , compiling:(/tmp/form-init7535662337139038973.clj:1:72)
at clojure.lang.Compiler.load(Compiler.java:7142)
at clojure.lang.Compiler.loadFile(Compiler.java:7086)
at clojure.main$load_script.invoke(main.clj:274)
at clojure.main$init_opt.invoke(main.clj:279)
at clojure.main$initialize.invoke(main.clj:307)
at clojure.main$null_opt.invoke(main.clj:342)
at clojure.main$main.doInvoke(main.clj:420)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:383)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.main.main(main.java:37)
Caused by: java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath:
at clojure.lang.RT.load(RT.java:443)
at clojure.lang.RT.load(RT.java:411)
at clojure.core$load$fn__5066.invoke(core.clj:5641)
at clojure.core$load.doInvoke(core.clj:5640)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5446)
at clojure.core$load_lib$fn__5015.invoke(core.clj:5486)
at clojure.core$load_lib.doInvoke(core.clj:5485)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$load_libs.doInvoke(core.clj:5524)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$require.doInvoke(core.clj:5607)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:626)
at user$eval85.invoke(form-init7535662337139038973.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6703)
at clojure.lang.Compiler.eval(Compiler.java:6693)
at clojure.lang.Compiler.load(Compiler.java:7130)
… 11 more
Tests failed
Any help would be appreciated.
Jose Monreal, on
I see, so there should be something similar to what etcd does, right?
Thanks!
Aphyr, on
Looks like the person who wrote the Consul test didn’t include any automation for installing Consul, so it’s running whatever you install on the nodes yourself.
Shaposhnikov Alexander, on
Indeed, it is time to test postgres once again, now for real
Jose Monreal, on
I’m running the tests using the docker folder you have in the code.
How can I check the version of consul that it’s using? how can I make it use the latest version?
JimD, on
Nitpick: “Data is sharded and balanced between servers” should be “among servers.” Technically “between” should only be used in reference to sets of two. For any more than that the term “among” should be preferred.
Jean-Francois Contour, on
it won’t but I was thinking about a health check from the load balancer that would require a green status from the cluster.
Paul, on
Got to love the Jepsen microscope
Aphyr, on
I am wondering : if we have 3 dedicated Master Node, and a load balancer (big IP) in front of them configured in active/passive mode (VIP) and all the indexing clients using that VIP. Do you think we would still loose documents ? And do you see any major drawback with this topology ? Thanks for your great job.
How is the load balancer going to prevent Elasticsearch nodes from electing new primaries when one appears to die?
Jean-Francois Contour, on
I am wondering : if we have 3 dedicated Master Node, and a load balancer (big IP) in front of them configured in active/passive mode (VIP) and all the indexing clients using that VIP.
Do you think we would still loose documents ? And do you see any major drawback with this topology ?
Thanks for your great job.
Aphyr, on
Kafka happened a while back, and NSQ isn’t really distributed; clients have to build their own dedup between independent queues.
Jennifer Groff, on
It was a wonderful chance to visit this kind of site and I am happy to know. thank you so much for giving us a chance to have this opportunity! I will be back soon for updates.
Jose Monreal, on
Do you still have this code to test again both etcd and consul?
Tedwed, on
Educational post! The involved analysis is very knowledgeable and a lot to know about this. Keep sharing here!
neurogenesis, on
+1 for other message queue services (ActiveMQ cluster vs. network of brokers, NSQ, Kafka).
Aphyr, on
Can you run ElasticSearch 1.5 through this gauntlet to understand if there was any progress.
Can you run ElasticSearch 1.5 through this gauntlet to understand if there was any progress.
JNM, on
Ever thought of taking a look at TIBCO’s ActiveSpaces? It’s immediately consistent, ACID, k/v + queries with indexing and pretty fast…
Dinesh, on
(defn pow
"Raises base to the given power. For instance, (pow 3 2) returns three squared, or nine."
[base power]
(apply * (repeat base power)))
This should be(repeat power base)
(repeat 2 3)
(3 3)
Aphyr, on
While testing how many replicas where configured?
Aerospike’s default is 2 replicas, but it doesn’t matter how many you use: the conflict resolution algorithm is unsafe for all values of n because it spins up fallback replicas on both sides of a partition.
Also, is my understanding correct - when it is stated that “increasing timeouts would.. Satisfy” means that cluster does not assume the network is partitioned for smaller time outs, and hence the system functions with a higher latency to respond?
Naw, it’ll still lose data–I’m just talking about availability there.
VJ, on
While testing how many replicas where configured? For example, in the the test for CAS, what was the number of replicas?
Also, is my understanding correct - when it is stated that “increasing timeouts would.. Satisfy” means that cluster does not assume the network is partitioned for smaller time outs, and hence the system functions with a higher latency to respond?
Aphyr, on
So perhaps the actions of a single user don’t form a single process or perhaps I’m missing something :) I hope you can shed some light on your example.
I’m playing somewhat fast and loose here because caches, new sessions, incognito mode, DNS, etc can break session guarantees, but assuming sessions hold, a user sees their Youtube video go through three states:
Not existing
Processing
Watchable
And other users see:
Not existing
Watchable
Because both hold the same order, this kind of system is sequentially consistent. You’re right though, that if a user refreshed the page and did not see their video, we’d have violated sequential consistency–this invariant requires that we thread causal state through all kinds of layers, from the human’s brain through their browser, the network, CDN, load balancers, caches, app servers, databases, etc.
Aphyr, on
How is that possible given when we are considering execution in a concurrent environment.
Serializable histories are under some transformation to a singlethreaded history; the singlethreaded history is where the state-transforming function we discussed comes into play. If you take, say the identity function as your model, every possible path through the history is valid, leaving the state unchanged. In order for a history to ever not be serializable, you have to declare some equivalent-singlethreaded-executions invalid, which is why the conditionals force a partial order.
Peter, on
Hi Aphyr,
thanks for your excellent document. It makes extremely complicated material a lot easier to digest.
A remark about the YouTube example. My problem is the following; for sequential consistency program order needs to be obeyed. And I assume for discussion sake, that the actions of a single user form a single process.
If a user would place a video on the queue (an action earlier in program order) and then would display the webpage (an action later in program order) and he would not see his video, than this is a violation of program order. To put it in your terms ‘lines have crossed’.
So perhaps the actions of a single user don’t form a single process or perhaps I’m missing something :) I hope you can shed some light on your example.
Yaniv Shemesh, on
Merci Leif!
Jennifer Groff, on
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.
Aphyr, on
How easy would this be in languages with named parameters with default values?
Named parameters usually mean wrapper functions have to explicitly pass down every argument to the function they wrap, instead of just passing it a single map and not having to care about the contents. I’ve found that pattern makes refactoring a long, involved process and unduly couples functions which I don’t think should have to care about each other’s arguments… but some people prefer it.
Aphyr, on
The problem with maps is that if they’re statically typed, then the values must be homogeneously typed
This is not a problem with maps; this is a problem with type systems that assume maps are homogenous. Some type systems can represent heterogenous maps. Take, for instance, this core.typed type from Knossos, which represents a map for tracking statistics. The type Stats is a map containing a key :extant-worlds which must be an AtomicLong, a key :skipped-worlds which must be a Metric, and so on.
(defaliasStats"Statistics for tracking analyzer performance"(HMap:mandatory{:extant-worldsAtomicLong:skipped-worldsMetric:visited-worldsMetric}))
Christian, on
How easy would this be in languages with named parameters with default values?
TokuMX only fixes the data loss problems in the mongo replication design (by tying acknowledgements to election terms a la raft). TokuMX doesn’t fix the stale/dirty reads problem.
Mongo clients just don’t expect to verify that the server they’re talking to is actually the primary; without changing clients or adding a lot of server-side logic to proxy for clients, I don’t think this is fixable in the replication design mongo uses and TokuMX inherited.
Juan, on
Hi Kyle,
Thanks for answering my points. I have read the link to the linearizability paper, pretty interesting.
Having then re-read your article clarifies where you are coming from. At the time I thought that your issue was that you weren’t sure of where the 0 came from and that you thought it was some kind of “garbage”. Because that value was perfectly explainable based on how I understand Mongo to work I got a bit confused.
I do agree with you that the way that minority partitions work at the moment mean that Mongo it is not linearizable. With SERVER-18022 finally fixed, your idea for a minority primary seeking quorum for reads sounds like a great approach to deal with this problem. Particularly if this could be configured at connection and query level so clients could select their consistency level based on their use case.
Let’s see what Mongo does with SERVER-17975, I have added my vote to it.
Thanks again,
Juan
Aphyr, on
As it stands and from the contents of the article alone, you cannot prove that your conclusions are correct, they are just one of the possibilities, along with other ones including some that support that mongo is working as expected.
You may want to refresh yourself with the notion of linearizability. Mongo’s histories are not equivalent to any singlethreaded history, regardless of which nodes are contacted.
In a partition situation, the minority partition will go into read only mode and won’t accept writes but would accept reads.
This is an asynchronous network, a non-realtime environment, and we have no synchronized clocks. Mongo has no way to reliably detect a partition, let alone enter a read-only mode. Besides, Mongo, as far as I can tell, never refuses a write which is valid on the local node–even at write concern majority it’s more than happy to commit to an isolated primary, even if it won’t acknowledge the operation to the client.
Mongo is normally a CP, but during a partition the minority partition/s behave more like an AP.
I suggest you re-read the CAP theorem; By allowing stale and dirty reads, Mongo does not provide C. By refusing to service requests on some nodes during a partition, it does not provide A.
These are the kind of trade-offs that you have to take when choosing a DB for your use case.
No, they really aren’t. Mongo’s chosen the lower availability of a linearizable system, and the less useful consistency of a totally-available system. It’s, well, strictly speaking not the worst of both worlds, but definitely non-optimal, haha.
reading a stale value is more desirable in a lot of uses cases than getting an error.
That’s true, which is why I devoted a significant portion of the article to showing where Mongo’s documentation claims stale reads will not occur.
Aphyr, what am I doing wrong? I’m at the jepsen directory inside the docker container tjake/jepsen, and running this:
$ lein test jepsen.system.etcd-test Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.pom from central Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.pom from clojars Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.jar from central Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.jar from clojars WARN ignoring checkouts directory knossos as it does not contain a project.clj file. Exception in thread “main” java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath: , compiling:(/tmp/form-init7535662337139038973.clj:1:72) at clojure.lang.Compiler.load(Compiler.java:7142) at clojure.lang.Compiler.loadFile(Compiler.java:7086) at clojure.main$load_script.invoke(main.clj:274) at clojure.main$init_opt.invoke(main.clj:279) at clojure.main$initialize.invoke(main.clj:307) at clojure.main$null_opt.invoke(main.clj:342) at clojure.main$main.doInvoke(main.clj:420) at clojure.lang.RestFn.invoke(RestFn.java:421) at clojure.lang.Var.invoke(Var.java:383) at clojure.lang.AFn.applyToHelper(AFn.java:156) at clojure.lang.Var.applyTo(Var.java:700) at clojure.main.main(main.java:37) Caused by: java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath: at clojure.lang.RT.load(RT.java:443) at clojure.lang.RT.load(RT.java:411) at clojure.core$load$fn__5066.invoke(core.clj:5641) at clojure.core$load.doInvoke(core.clj:5640) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.core$load_one.invoke(core.clj:5446) at clojure.core$load_lib$fn__5015.invoke(core.clj:5486) at clojure.core$load_lib.doInvoke(core.clj:5485) at clojure.lang.RestFn.applyTo(RestFn.java:142) at clojure.core$apply.invoke(core.clj:626) at clojure.core$load_libs.doInvoke(core.clj:5524) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:626) at clojure.core$require.doInvoke(core.clj:5607) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:626) at user$eval85.invoke(form-init7535662337139038973.clj:1) at clojure.lang.Compiler.eval(Compiler.java:6703) at clojure.lang.Compiler.eval(Compiler.java:6693) at clojure.lang.Compiler.load(Compiler.java:7130) … 11 more Tests failed
Any help would be appreciated.
I see, so there should be something similar to what etcd does, right? Thanks!
Looks like the person who wrote the Consul test didn’t include any automation for installing Consul, so it’s running whatever you install on the nodes yourself.
Indeed, it is time to test postgres once again, now for real
I’m running the tests using the docker folder you have in the code. How can I check the version of consul that it’s using? how can I make it use the latest version?
Nitpick: “Data is sharded and balanced between servers” should be “among servers.” Technically “between” should only be used in reference to sets of two. For any more than that the term “among” should be preferred.
it won’t but I was thinking about a health check from the load balancer that would require a green status from the cluster.
Got to love the Jepsen microscope
How is the load balancer going to prevent Elasticsearch nodes from electing new primaries when one appears to die?
I am wondering : if we have 3 dedicated Master Node, and a load balancer (big IP) in front of them configured in active/passive mode (VIP) and all the indexing clients using that VIP. Do you think we would still loose documents ? And do you see any major drawback with this topology ? Thanks for your great job.
Kafka happened a while back, and NSQ isn’t really distributed; clients have to build their own dedup between independent queues.
It was a wonderful chance to visit this kind of site and I am happy to know. thank you so much for giving us a chance to have this opportunity! I will be back soon for updates.
Do you still have this code to test again both etcd and consul?
Educational post! The involved analysis is very knowledgeable and a lot to know about this. Keep sharing here!
+1 for other message queue services (ActiveMQ cluster vs. network of brokers, NSQ, Kafka).
https://aphyr.com/tags/Elasticsearch
Can you run ElasticSearch 1.5 through this gauntlet to understand if there was any progress.
Ever thought of taking a look at TIBCO’s ActiveSpaces? It’s immediately consistent, ACID, k/v + queries with indexing and pretty fast…
(defn pow "Raises base to the given power. For instance, (pow 3 2) returns three squared, or nine." [base power] (apply * (repeat base power)))This should be
(repeat power base)(repeat 2 3) (3 3)
Aerospike’s default is 2 replicas, but it doesn’t matter how many you use: the conflict resolution algorithm is unsafe for all values of
nbecause it spins up fallback replicas on both sides of a partition.Naw, it’ll still lose data–I’m just talking about availability there.
While testing how many replicas where configured? For example, in the the test for CAS, what was the number of replicas? Also, is my understanding correct - when it is stated that “increasing timeouts would.. Satisfy” means that cluster does not assume the network is partitioned for smaller time outs, and hence the system functions with a higher latency to respond?
I’m playing somewhat fast and loose here because caches, new sessions, incognito mode, DNS, etc can break session guarantees, but assuming sessions hold, a user sees their Youtube video go through three states:
And other users see:
Because both hold the same order, this kind of system is sequentially consistent. You’re right though, that if a user refreshed the page and did not see their video, we’d have violated sequential consistency–this invariant requires that we thread causal state through all kinds of layers, from the human’s brain through their browser, the network, CDN, load balancers, caches, app servers, databases, etc.
Serializable histories are under some transformation to a singlethreaded history; the singlethreaded history is where the state-transforming function we discussed comes into play. If you take, say the identity function as your model, every possible path through the history is valid, leaving the state unchanged. In order for a history to ever not be serializable, you have to declare some equivalent-singlethreaded-executions invalid, which is why the conditionals force a partial order.
Hi Aphyr,
thanks for your excellent document. It makes extremely complicated material a lot easier to digest.
A remark about the YouTube example. My problem is the following; for sequential consistency program order needs to be obeyed. And I assume for discussion sake, that the actions of a single user form a single process.
If a user would place a video on the queue (an action earlier in program order) and then would display the webpage (an action later in program order) and he would not see his video, than this is a violation of program order. To put it in your terms ‘lines have crossed’.
So perhaps the actions of a single user don’t form a single process or perhaps I’m missing something :) I hope you can shed some light on your example.
Merci Leif!
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.
Named parameters usually mean wrapper functions have to explicitly pass down every argument to the function they wrap, instead of just passing it a single map and not having to care about the contents. I’ve found that pattern makes refactoring a long, involved process and unduly couples functions which I don’t think should have to care about each other’s arguments… but some people prefer it.
This is not a problem with maps; this is a problem with type systems that assume maps are homogenous. Some type systems can represent heterogenous maps. Take, for instance, this core.typed type from Knossos, which represents a map for tracking statistics. The type
Statsis a map containing a key:extant-worldswhich must be anAtomicLong, a key:skipped-worldswhich must be aMetric, and so on.(defalias Stats "Statistics for tracking analyzer performance" (HMap :mandatory {:extant-worlds AtomicLong :skipped-worlds Metric :visited-worlds Metric}))How easy would this be in languages with named parameters with default values?
Wizard wiz = Wizard("some string", priority=1, mode=SomeEnum, enableFoo=true, enableBar=false)Yaniv,
TokuMX only fixes the data loss problems in the mongo replication design (by tying acknowledgements to election terms a la raft). TokuMX doesn’t fix the stale/dirty reads problem.
Mongo clients just don’t expect to verify that the server they’re talking to is actually the primary; without changing clients or adding a lot of server-side logic to proxy for clients, I don’t think this is fixable in the replication design mongo uses and TokuMX inherited.
Hi Kyle,
Thanks for answering my points. I have read the link to the linearizability paper, pretty interesting.
Having then re-read your article clarifies where you are coming from. At the time I thought that your issue was that you weren’t sure of where the 0 came from and that you thought it was some kind of “garbage”. Because that value was perfectly explainable based on how I understand Mongo to work I got a bit confused.
I do agree with you that the way that minority partitions work at the moment mean that Mongo it is not linearizable. With SERVER-18022 finally fixed, your idea for a minority primary seeking quorum for reads sounds like a great approach to deal with this problem. Particularly if this could be configured at connection and query level so clients could select their consistency level based on their use case.
Let’s see what Mongo does with SERVER-17975, I have added my vote to it.
Thanks again, Juan
You may want to refresh yourself with the notion of linearizability. Mongo’s histories are not equivalent to any singlethreaded history, regardless of which nodes are contacted.
This is an asynchronous network, a non-realtime environment, and we have no synchronized clocks. Mongo has no way to reliably detect a partition, let alone enter a read-only mode. Besides, Mongo, as far as I can tell, never refuses a write which is valid on the local node–even at write concern majority it’s more than happy to commit to an isolated primary, even if it won’t acknowledge the operation to the client.
I suggest you re-read the CAP theorem; By allowing stale and dirty reads, Mongo does not provide C. By refusing to service requests on some nodes during a partition, it does not provide A.
No, they really aren’t. Mongo’s chosen the lower availability of a linearizable system, and the less useful consistency of a totally-available system. It’s, well, strictly speaking not the worst of both worlds, but definitely non-optimal, haha.
That’s true, which is why I devoted a significant portion of the article to showing where Mongo’s documentation claims stale reads will not occur.