Jose Monreal
Jose Monreal, on

Aphyr, what am I doing wrong? I’m at the jepsen directory inside the docker container tjake/jepsen, and running this:

$ lein test jepsen.system.etcd-test Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.pom from central Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.pom from clojars Retrieving org/clojure/tools.nrepl/0.2.6/tools.nrepl-0.2.6.jar from central Retrieving clojure-complete/clojure-complete/0.2.3/clojure-complete-0.2.3.jar from clojars WARN ignoring checkouts directory knossos as it does not contain a project.clj file. Exception in thread “main” java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath: , compiling:(/tmp/form-init7535662337139038973.clj:1:72) at clojure.lang.Compiler.load(Compiler.java:7142) at clojure.lang.Compiler.loadFile(Compiler.java:7086) at clojure.main$load_script.invoke(main.clj:274) at clojure.main$init_opt.invoke(main.clj:279) at clojure.main$initialize.invoke(main.clj:307) at clojure.main$null_opt.invoke(main.clj:342) at clojure.main$main.doInvoke(main.clj:420) at clojure.lang.RestFn.invoke(RestFn.java:421) at clojure.lang.Var.invoke(Var.java:383) at clojure.lang.AFn.applyToHelper(AFn.java:156) at clojure.lang.Var.applyTo(Var.java:700) at clojure.main.main(main.java:37) Caused by: java.io.FileNotFoundException: Could not locate jepsen/system/etcd_test__init.class or jepsen/system/etcd_test.clj on classpath: at clojure.lang.RT.load(RT.java:443) at clojure.lang.RT.load(RT.java:411) at clojure.core$load$fn__5066.invoke(core.clj:5641) at clojure.core$load.doInvoke(core.clj:5640) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.core$load_one.invoke(core.clj:5446) at clojure.core$load_lib$fn__5015.invoke(core.clj:5486) at clojure.core$load_lib.doInvoke(core.clj:5485) at clojure.lang.RestFn.applyTo(RestFn.java:142) at clojure.core$apply.invoke(core.clj:626) at clojure.core$load_libs.doInvoke(core.clj:5524) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:626) at clojure.core$require.doInvoke(core.clj:5607) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:626) at user$eval85.invoke(form-init7535662337139038973.clj:1) at clojure.lang.Compiler.eval(Compiler.java:6703) at clojure.lang.Compiler.eval(Compiler.java:6693) at clojure.lang.Compiler.load(Compiler.java:7130) … 11 more Tests failed

Any help would be appreciated.

Jose Monreal
Jose Monreal, on

I see, so there should be something similar to what etcd does, right? Thanks!

Aphyr
Aphyr, on

Looks like the person who wrote the Consul test didn’t include any automation for installing Consul, so it’s running whatever you install on the nodes yourself.

Shaposhnikov Alexander
Shaposhnikov Alexander, on

Indeed, it is time to test postgres once again, now for real

Jose Monreal
Jose Monreal, on

I’m running the tests using the docker folder you have in the code. How can I check the version of consul that it’s using? how can I make it use the latest version?

JimD
JimD, on

Nitpick: “Data is sharded and balanced between servers” should be “among servers.” Technically “between” should only be used in reference to sets of two. For any more than that the term “among” should be preferred.

Jean-Francois Contour
Jean-Francois Contour, on

it won’t but I was thinking about a health check from the load balancer that would require a green status from the cluster.

Paul
Paul, on

Got to love the Jepsen microscope

Aphyr
Aphyr, on

I am wondering : if we have 3 dedicated Master Node, and a load balancer (big IP) in front of them configured in active/passive mode (VIP) and all the indexing clients using that VIP. Do you think we would still loose documents ? And do you see any major drawback with this topology ? Thanks for your great job.

How is the load balancer going to prevent Elasticsearch nodes from electing new primaries when one appears to die?

Jean-Francois Contour
Jean-Francois Contour, on

I am wondering : if we have 3 dedicated Master Node, and a load balancer (big IP) in front of them configured in active/passive mode (VIP) and all the indexing clients using that VIP. Do you think we would still loose documents ? And do you see any major drawback with this topology ? Thanks for your great job.

Aphyr
Aphyr, on

Kafka happened a while back, and NSQ isn’t really distributed; clients have to build their own dedup between independent queues.

Jennifer Groff
Jennifer Groff, on

It was a wonderful chance to visit this kind of site and I am happy to know. thank you so much for giving us a chance to have this opportunity! I will be back soon for updates.

Jose Monreal
Jose Monreal, on

Do you still have this code to test again both etcd and consul?

Tedwed
Tedwed, on

Educational post! The involved analysis is very knowledgeable and a lot to know about this. Keep sharing here!

neurogenesis
neurogenesis, on

+1 for other message queue services (ActiveMQ cluster vs. network of brokers, NSQ, Kafka).

Aphyr
Aphyr, on

Can you run ElasticSearch 1.5 through this gauntlet to understand if there was any progress.

https://aphyr.com/tags/Elasticsearch

Koolaid
Koolaid, on

Can you run ElasticSearch 1.5 through this gauntlet to understand if there was any progress.

JNM
JNM, on

Ever thought of taking a look at TIBCO’s ActiveSpaces? It’s immediately consistent, ACID, k/v + queries with indexing and pretty fast…

Dinesh
Dinesh, on

(defn pow "Raises base to the given power. For instance, (pow 3 2) returns three squared, or nine." [base power] (apply * (repeat base power)))

This should be (repeat power base)

(repeat 2 3) (3 3)

Aphyr
Aphyr, on

While testing how many replicas where configured?

Aerospike’s default is 2 replicas, but it doesn’t matter how many you use: the conflict resolution algorithm is unsafe for all values of n because it spins up fallback replicas on both sides of a partition.

Also, is my understanding correct - when it is stated that “increasing timeouts would.. Satisfy” means that cluster does not assume the network is partitioned for smaller time outs, and hence the system functions with a higher latency to respond?

Naw, it’ll still lose data–I’m just talking about availability there.

VJ

While testing how many replicas where configured? For example, in the the test for CAS, what was the number of replicas? Also, is my understanding correct - when it is stated that “increasing timeouts would.. Satisfy” means that cluster does not assume the network is partitioned for smaller time outs, and hence the system functions with a higher latency to respond?

Aphyr
Aphyr, on

So perhaps the actions of a single user don’t form a single process or perhaps I’m missing something :) I hope you can shed some light on your example.

I’m playing somewhat fast and loose here because caches, new sessions, incognito mode, DNS, etc can break session guarantees, but assuming sessions hold, a user sees their Youtube video go through three states:

  1. Not existing
  2. Processing
  3. Watchable

And other users see:

  1. Not existing
  2. Watchable

Because both hold the same order, this kind of system is sequentially consistent. You’re right though, that if a user refreshed the page and did not see their video, we’d have violated sequential consistency–this invariant requires that we thread causal state through all kinds of layers, from the human’s brain through their browser, the network, CDN, load balancers, caches, app servers, databases, etc.

Aphyr
Aphyr, on

How is that possible given when we are considering execution in a concurrent environment.

Serializable histories are under some transformation to a singlethreaded history; the singlethreaded history is where the state-transforming function we discussed comes into play. If you take, say the identity function as your model, every possible path through the history is valid, leaving the state unchanged. In order for a history to ever not be serializable, you have to declare some equivalent-singlethreaded-executions invalid, which is why the conditionals force a partial order.

Peter
Peter, on

Hi Aphyr,

thanks for your excellent document. It makes extremely complicated material a lot easier to digest.

A remark about the YouTube example. My problem is the following; for sequential consistency program order needs to be obeyed. And I assume for discussion sake, that the actions of a single user form a single process.

If a user would place a video on the queue (an action earlier in program order) and then would display the webpage (an action later in program order) and he would not see his video, than this is a violation of program order. To put it in your terms ‘lines have crossed’.

So perhaps the actions of a single user don’t form a single process or perhaps I’m missing something :) I hope you can shed some light on your example.

Yaniv Shemesh
Yaniv Shemesh, on

Merci Leif!

Jennifer Groff
Jennifer Groff, on

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Aphyr
Aphyr, on

How easy would this be in languages with named parameters with default values?

Named parameters usually mean wrapper functions have to explicitly pass down every argument to the function they wrap, instead of just passing it a single map and not having to care about the contents. I’ve found that pattern makes refactoring a long, involved process and unduly couples functions which I don’t think should have to care about each other’s arguments… but some people prefer it.

Aphyr
Aphyr, on

The problem with maps is that if they’re statically typed, then the values must be homogeneously typed

This is not a problem with maps; this is a problem with type systems that assume maps are homogenous. Some type systems can represent heterogenous maps. Take, for instance, this core.typed type from Knossos, which represents a map for tracking statistics. The type Stats is a map containing a key :extant-worlds which must be an AtomicLong, a key :skipped-worlds which must be a Metric, and so on.

(defalias Stats "Statistics for tracking analyzer performance" (HMap :mandatory {:extant-worlds AtomicLong :skipped-worlds Metric :visited-worlds Metric}))
Christian
Christian, on

How easy would this be in languages with named parameters with default values?

Wizard wiz = Wizard("some string", priority=1, mode=SomeEnum, enableFoo=true, enableBar=false)

Leif
Leif, on

Yaniv,

TokuMX only fixes the data loss problems in the mongo replication design (by tying acknowledgements to election terms a la raft). TokuMX doesn’t fix the stale/dirty reads problem.

Mongo clients just don’t expect to verify that the server they’re talking to is actually the primary; without changing clients or adding a lot of server-side logic to proxy for clients, I don’t think this is fixable in the replication design mongo uses and TokuMX inherited.

Juan
Juan, on

Hi Kyle,

Thanks for answering my points. I have read the link to the linearizability paper, pretty interesting.

Having then re-read your article clarifies where you are coming from. At the time I thought that your issue was that you weren’t sure of where the 0 came from and that you thought it was some kind of “garbage”. Because that value was perfectly explainable based on how I understand Mongo to work I got a bit confused.

I do agree with you that the way that minority partitions work at the moment mean that Mongo it is not linearizable. With SERVER-18022 finally fixed, your idea for a minority primary seeking quorum for reads sounds like a great approach to deal with this problem. Particularly if this could be configured at connection and query level so clients could select their consistency level based on their use case.

Let’s see what Mongo does with SERVER-17975, I have added my vote to it.

Thanks again, Juan

Aphyr
Aphyr, on

As it stands and from the contents of the article alone, you cannot prove that your conclusions are correct, they are just one of the possibilities, along with other ones including some that support that mongo is working as expected.

You may want to refresh yourself with the notion of linearizability. Mongo’s histories are not equivalent to any singlethreaded history, regardless of which nodes are contacted.

In a partition situation, the minority partition will go into read only mode and won’t accept writes but would accept reads.

This is an asynchronous network, a non-realtime environment, and we have no synchronized clocks. Mongo has no way to reliably detect a partition, let alone enter a read-only mode. Besides, Mongo, as far as I can tell, never refuses a write which is valid on the local node–even at write concern majority it’s more than happy to commit to an isolated primary, even if it won’t acknowledge the operation to the client.

Mongo is normally a CP, but during a partition the minority partition/s behave more like an AP.

I suggest you re-read the CAP theorem; By allowing stale and dirty reads, Mongo does not provide C. By refusing to service requests on some nodes during a partition, it does not provide A.

These are the kind of trade-offs that you have to take when choosing a DB for your use case.

No, they really aren’t. Mongo’s chosen the lower availability of a linearizable system, and the less useful consistency of a totally-available system. It’s, well, strictly speaking not the worst of both worlds, but definitely non-optimal, haha.

reading a stale value is more desirable in a lot of uses cases than getting an error.

That’s true, which is why I devoted a significant portion of the article to showing where Mongo’s documentation claims stale reads will not occur.

Copyright © 2015 Kyle Kingsbury.
Non-commercial re-use with attribution encouraged; all other rights reserved.
Comments are the property of respective posters.