Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmApache Helix2021-07-30T09:37:21+00:00
https://github.com/apache/helix
jmzookeeper helix sharding scalability scaling via:kishorebytes partitioning architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:558ffe51f061/How to do distributed locking2016-02-09T10:08:40+00:00
http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
jmdistributed locking redis algorithms coding distcomp redlock martin-kleppman zookeeperhttps://pinboard.in/https://pinboard.in/u:jm/b:8a6991e635e7/Holistic Configuration Management at Facebook2015-10-21T09:42:20+00:00
http://blog.acolyer.org/2015/10/16/holistic-configuration-management-at-facebook/
jmfacebook configuration zookeeper git ops architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:4d78be2bfbde/librato/disco-java2015-10-12T09:32:23+00:00
https://github.com/librato/disco-java
jmzookeeper service-discovery librato java open-source load-balancinghttps://pinboard.in/https://pinboard.in/u:jm/b:1bc6194a9eef/Please stop calling databases CP or AP2015-05-14T23:29:48+00:00
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
jmIn his excellent blog post [...] Jeff Hodges recommends that you use the CAP theorem to critique systems. A lot of people have taken that advice to heart, describing their systems as “CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions), or sometimes “CA” (meaning “I still haven’t read Coda’s post from almost 5 years ago”).
I agree with all of Jeff’s other points, but with regard to the CAP theorem, I must disagree. The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest. Instead, we should use more precise terminology to reason about our trade-offs.
]]>cap databases storage distcomp ca ap cp zookeeper consistency reliability networkinghttps://pinboard.in/https://pinboard.in/u:jm/b:36f8f38a1e8c/The Discovery of Apache ZooKeeper's Poison Packet - PagerDuty2015-05-08T15:47:39+00:00
http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
jmzookeeper bugs error-handling bounds-checking oom poison-packets pagerduty packets tcpdump xen aes linux kernelhttps://pinboard.in/https://pinboard.in/u:jm/b:9374f1efc815/Pinterest's highly-available configuration service2015-03-09T17:47:39+00:00
http://engineering.pinterest.com/post/112895488589/serving-configuration-data-at-scale-with-high
jms3 zookeeper ha pinterest config storagehttps://pinboard.in/https://pinboard.in/u:jm/b:4a1bc45aa1a2/how Curator fixed issues with the Hive ZooKeeper Lock Manager Implementation2015-02-25T16:47:47+00:00
https://www.mapr.com/blog/how-refine-hive-zookeeper-lock-manager-implementation#.VO38VlOsWQx
jmApache Curator is open source software which is able to handle all of the above scenarios transparently. Curator is a Netflix ZooKeeper Library and it provides a high-level API, CuratorFramework, that simplifies using ZooKeeper. By using a singleton CuratorFramework instance in the new ZooKeeperHiveLockManager implementation, we not only fixed the ZooKeeper connection issues, but also made the code easy to understand and maintain.
]]>zookeeper apis curator netflix distributed-locks coding hivehttps://pinboard.in/https://pinboard.in/u:jm/b:9063a2bd9b32/Why You Shouldn’t Use ZooKeeper for Service Discovery2014-12-18T22:05:31+00:00
https://www.knewton.com/tech/blog/2014/12/eureka-shouldnt-use-zookeeper-service-discovery/
jmIn CAP terms, ZooKeeper is CP, meaning that it’s consistent in the face of partitions, not available. For many things that ZooKeeper does, this is a necessary trade-off. Since ZooKeeper is first and foremost a coordination service, having an eventually consistent design (being AP) would be a horrible design decision. Its core consensus algorithm, Zab, is therefore all about consistency. For coordination, that’s great. But for service discovery it’s better to have information that may contain falsehoods than to have no information at all. It is much better to know what servers were available for a given service five minutes ago than to have no idea what things looked like due to a transient network partition. The guarantees that ZooKeeper makes for coordination are the wrong ones for service discovery, and it hurts you to have them.
Yes! I've been saying this for months -- good to see others concurring.]]>architecture zookeeper eureka outages network-partitions service-discovery cap partitionshttps://pinboard.in/https://pinboard.in/u:jm/b:a3a2893b25f2/Zookeeper: not so great as a highly-available service registry2014-11-04T17:11:30+00:00
http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html
jmI went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away”, dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads).
So: within that offline AZ, service discovery *reads* (as well as writes) stopped working due to a lack of ZK quorum. This is quite a feasible outage scenario for EC2, by the way, since (at least when I was working there) the network links between AZs, and the links with the external internet, were not 100% overlapping.
In other words, if you want a highly-available service discovery system in the fact of network partitions, you want an AP service discovery system, rather than a CP one -- and ZK is a CP system.
Another risk, noted on the Netflix Eureka mailing list at https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/tA9UnerrBHUJ :
ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessarily consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.
I guess this means that a long partition can trigger SESSION_EXPIRED state, resulting in ZK client libraries requiring a restart/reconnect to fix. I'm not entirely clear what happens to the ZK cluster itself in this scenario though.
Finally, Pinterest ran into other issues relying on ZK for service discovery and registration, described at http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest ; sounds like this was mainly around load and the "thundering herd" overload problem. Their workaround was to decouple ZK availability from their services' availability, by building a Smartstack-style sidecar daemon on each host which tracked/cached ZK data.]]>zookeeper service-discovery ops ha cap ap cp service-registry availability ec2 aws network partitions eureka smartstack pinteresthttps://pinboard.in/https://pinboard.in/u:jm/b:3e1c7dd5a4b5/Building a Global, Highly Available Service Discovery Infrastructure with ZooKeeper2014-05-13T13:16:07+00:00
http://whilefalse.blogspot.ie/2012/12/building-global-highly-available.html?m=1
jmThis is the written version of a presentation [Camille Fournier] made at the ZooKeeper Users Meetup at Strata/Hadoop World in October, 2012 (slides available here). This writeup expects some knowledge of ZooKeeper.
good advice from one of the ZK committers.]]>zookeeper service-discovery architecture distcomp camille-fournier availability wan networkhttps://pinboard.in/https://pinboard.in/u:jm/b:325d568fa5bb/ZooKeeper Resilience at Pinterest2014-03-04T17:43:30+00:00
http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest
jmops architecture clustering network partitions cap reliability smartstack airbnb pinterest zookeeperhttps://pinboard.in/https://pinboard.in/u:jm/b:4058a6842614/Answer to How many topics (queues) can be created in Apache Kafka? - Quora2014-03-02T21:11:10+00:00
http://www.quora.com/How-many-topics-queues-can-be-created-in-Apache-Kafka/answers/4288247?srid=kw&share=1
jm'As far as I understand (this was true as of 2013, when I last looked into this issue) there's at least one Apache ZooKeeper znode per topic in Kafka. While there is no hard limitation in Kafka itself (Kafka is linearly scalable), it does mean that the maximum number of znodes comfortable supported by ZooKeeper (on the order of about ten thousand) is the upper limit of Kafka's scalability as far as the number of topics goes.'
]]>kafka queues zookeeper znodes architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:30d5585112f7/Apache Curator2014-01-30T22:09:06+00:00
http://curator.apache.org/
jmzookeeper netflix apache curator java libraries open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:2a32e89d7f29/Replicant: Replicated State Machines Made Easy2013-12-28T17:37:37+00:00
http://hackingdistributed.com/2013/12/26/introducing-replicant/
jmThe next time you reach for ZooKeeper, ask yourself whether it provides the primitive you really need. If ZooKeeper's filesystem and znode abstractions truly meet your needs, great. But the odds are, you'll be better off writing your application as a replicated state machine.
]]>zookeeper paxos replicant replication consensus state-machines distcomphttps://pinboard.in/https://pinboard.in/u:jm/b:2ac3f8a2da26/etcd2013-08-03T22:50:11+00:00
https://github.com/coreos/etcd
jmA highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on:
Simple: curl'able user facing API (HTTP+JSON);
Secure: optional SSL client cert authentication;
Fast: benchmarked 1000s of writes/s per instance;
Reliable: Properly distributed using Raft;
Etcd is written in go and uses the raft consensus algorithm to manage a highly availably replicated log.
One of the core components of CoreOS -- http://coreos.com/ .]]>configuration distributed raft ha doozer zookeeper go replication consensus-algorithm etcd coreoshttps://pinboard.in/https://pinboard.in/u:jm/b:46fb3e2741fb/Netflix Curator2013-03-06T12:35:37+00:00
https://github.com/Netflix/curator/wiki/Framework
jma high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are:
Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases.
Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface
Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue
]]>zookeeper java netflix distcomp libraries oss open-source distributedhttps://pinboard.in/https://pinboard.in/u:jm/b:7f39fc5f7d2c/Curator Framework: Reducing the Complexity of Building Distributed Systems | Marketing Technology2013-03-06T11:40:18+00:00
http://www.optify.net/marketing-technology/curator-framework-reducing-the-complexity-of-building-distributed-systems
jmzookeeper curator netflix oss libraries distributedhttps://pinboard.in/https://pinboard.in/u:jm/b:f73215cb8140/Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans2013-03-06T10:34:40+00:00
http://techo-ecco.com/blog/monitoring-apache-hadoop-cassandra-and-zookeeper-using-graphite-and-jmxtrans/
jmgraphite monitoring ops zookeeper cassandra hadoop jmx jmxtrans graphshttps://pinboard.in/https://pinboard.in/u:jm/b:b5aa3112e5b1/Building an Impenetrable ZooKeeper (PDF)2012-11-13T20:56:34+00:00
https://raw.github.com/strangeloop/strangeloop2012/master/slides/sessions/Ting-BuildingAnImpenetrableZooKeeper.pdf
jmvia:bill-dehora zookeeper ops syadminhttps://pinboard.in/https://pinboard.in/u:jm/b:a33427382334/Autometrics: Self-service metrics collection2012-02-16T12:03:36+00:00
http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
jmkafka zookeeper linkedin sysadmin service-metricshttps://pinboard.in/https://pinboard.in/u:jm/b:61ed412e2670/