Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmHyperBitBit2017-03-24T10:32:13+00:00
https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf
jmalgorithms programming cs hyperloglog estimation cardinality counting hyperbitbithttps://pinboard.in/https://pinboard.in/u:jm/b:2aaffbaaae74/hyperlogsandwich2015-05-06T16:03:42+00:00
https://github.com/chanian/hyperlogsandwich/wiki
jmA probabilistic data structure for frequency/k-occurrence cardinality estimation of multisets. Sample implementation
(via Patrick McFadin)]]>via:patrickmcfadin hyperloglog cardinality data-structures algorithms hyperlogsandwich counting estimation lossy multisetshttps://pinboard.in/https://pinboard.in/u:jm/b:f340d5f0864e/Redis adds support for HyperLogLog2014-04-02T10:38:41+00:00
https://news.ycombinator.com/item?id=7506774
jmhll bloom-filters hyperloglog redis data-structures estimation cardinality probabilistic probability hashing randomhttps://pinboard.in/https://pinboard.in/u:jm/b:1231febb74e0/Improving compaction in Cassandra with cardinality estimation2014-01-28T22:51:37+00:00
http://www.datastax.com/dev/blog/improving-compaction-in-cassandra-with-cardinality-estimation
jmhyperloglog hll algorithms cassandra bloom-filters sstables cardinalityhttps://pinboard.in/https://pinboard.in/u:jm/b:cca5dc02afd5/Recordinality2013-08-20T20:41:05+00:00
https://github.com/cscotta/recordinality
jmRecordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"
]]>sketching coding algorithms recordinality cardinality estimation hll hashing murmurhash javahttps://pinboard.in/https://pinboard.in/u:jm/b:56d75229aca1/Sketch of the Day: K-Minimum Values2013-06-25T21:48:52+00:00
http://blog.aggregateknowledge.com/2012/07/09/sketch-of-the-day-k-minimum-values/
jmalgorithms coding space-saving cardinality streams stream-processing estimation sets sketchinghttps://pinboard.in/https://pinboard.in/u:jm/b:c15a7a7f9322/hlld2013-06-21T10:46:27+00:00
https://github.com/armon/hlld#readme
jma high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
HyperLogLog's are a relatively new sketching data structure. They are used to estimate cardinality, i.e. the unique number of items in a set. They are based on the observation that any bit in a "good" hash function is indepedenent of any other bit and that the probability of getting a string of N bits all set to the same value is 1/(2^N). There is a lot more in the math, but that is the basic intuition. What is even more incredible is that the storage required to do the counting is log(log(N)). So with a 6 bit register, we can count well into the trillions. For more information, its best to read the papers referenced at the end. TL;DR: HyperLogLogs enable you to have a set with about 1.6% variance, using 3280 bytes, and estimate sizes in the trillions.
(via:cscotta)]]>hyper-log-log hlld hll data-structures memcached daemons sketching estimation big-data cardinality algorithms via:cscottahttps://pinboard.in/https://pinboard.in/u:jm/b:e287f3096462/Approximate Heavy Hitters -The SpaceSaving Algorithm2013-05-14T20:51:22+00:00
http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/
jmalgorithms coding space-saving cardinality streams stream-processing estimationhttps://pinboard.in/https://pinboard.in/u:jm/b:f8929aee50d5/HyperLogLog++: Google’s Take On Engineering HLL2013-02-09T22:05:54+00:00
http://blog.aggregateknowledge.com/2013/01/24/hyperloglog-googles-take-on-engineering-hll/
jmhyperloglog cardinality estimation streaming stream-processing cephttps://pinboard.in/https://pinboard.in/u:jm/b:42d150efb4ee/clearspring / stream-lib2013-02-09T21:46:02+00:00
https://github.com/clearspring/stream-lib#readme
jmalgorithms coding streams cep stream-processing approximation probabilistic space-saving top-k cardinality estimation bloom-filters q-digest loglog hyperloglog murmurhash lookup3https://pinboard.in/https://pinboard.in/u:jm/b:5ec31bbded7e/aaw/hyperloglog-redis - GitHub2013-01-15T22:44:02+00:00
https://github.com/aaw/hyperloglog-redis#readme
jmcardinality sets redis algorithms ruby gems hyperlogloghttps://pinboard.in/u:jm/b:a5e139c07bf4/