Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmManaging Log Output in Buildkite2021-09-23T14:35:26+00:00
https://buildkite.com/docs/pipelines/managing-log-output#collapsing-output
jmlogging cli hacks buildkite streamshttps://pinboard.in/https://pinboard.in/u:jm/b:a1a18fc1571c/Top speed for top-k queries2017-06-22T09:57:38+00:00
http://lemire.me/blog/2017/06/21/top-speed-for-top-k-queries/
jmalgorithms benchmarks performance top-k streams streaming quickselect binary-heap priority-queuehttps://pinboard.in/https://pinboard.in/u:jm/b:ff9c53473f3f/Kafka Streams - Scaling up or down2016-10-13T10:58:32+00:00
http://aseigneurin.github.io/2016/10/07/kafka-streams-scaling-up-or-down.html
jmscaling scalability architecture kafka streams opshttps://pinboard.in/https://pinboard.in/u:jm/b:dc796f3ac598/Hyperscan2015-10-21T14:33:51+00:00
https://github.com/01org/hyperscan
jma high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.
Via Tony Finch]]>via:fanf regexps regex dpi hyperscan dfa nfa hybrid-automata text-matching matching text strings streamshttps://pinboard.in/https://pinboard.in/u:jm/b:f41962a90f1b/GZinga2015-10-11T07:32:43+00:00
http://www.ebaytechblog.com/2015/10/09/gzinga-seekable-and-splittable-gzip/
jmebay gzip compression seeking streams splitting logs gzingahttps://pinboard.in/https://pinboard.in/u:jm/b:a5222dbc677d/Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis2015-09-08T12:56:09+00:00
http://bytes.babbel.com/en/articles/2015-09-01-aws-data-pipeline.html
jm
Our new data pipeline with Kinesis in place allows us to plug new consumers without causing any damage to the current system, so it’s possible to rewrite all Queue Workers one by one and replace them with Kinesis Workers. In general, the transition to Kinesis was smooth and there were not so tricky parts.
Another outcome was significantly reduced costs – handling almost the same amount of data as SQS, Kinesis appeared to be many times cheaper than SQS.
]]>aws kinesis kafka streaming data-pipelines streams sqs queues architecture kclhttps://pinboard.in/https://pinboard.in/u:jm/b:7e9df1f932f5/Mining High-Speed Data Streams: The Hoeffding Tree Algorithm2015-08-27T13:10:34+00:00
http://blog.acolyer.org/2015/08/26/mining-high-speed-data-streams/
jmThis paper proposes a decision tree learner for data streams, the Hoeffding Tree algorithm, which comes with the guarantee that the learned decision tree is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. This work constitutes a significant step in developing methodology suitable for modern ‘big data’ challenges and has initiated a lot of follow-up research. The Hoeffding Tree algorithm has been covered in various textbooks and is available in several public domain tools, including the WEKA Data Mining platform.
]]>hoeffding-tree algorithms data-structures streaming streams cep decision-trees ml learning papershttps://pinboard.in/https://pinboard.in/u:jm/b:c955f284bd21/"last seen" sketch2015-07-15T16:56:10+00:00
https://vividcortex.com/blog/2015/06/22/sampling-a-stream-of-events-with-a-sketch/
jmsketch algorithms estimation approximation sampling streams big-datahttps://pinboard.in/https://pinboard.in/u:jm/b:1d2591ab4ead/OG-Commons/Guavate.java2015-04-12T21:59:05+00:00
https://github.com/OpenGamma/OG-Commons/blob/master/modules/collect/src/main/java/com/opengamma/collect/Guavate.java
jmguava java-8 java fluentiterable streams fluent codinghttps://pinboard.in/https://pinboard.in/u:jm/b:c0189036549d/A collection of links for streaming algorithms and data structures2015-04-07T11:19:55+00:00
https://gist.github.com/debasishg/8172796
jmalgorithms streaming big-data streams hll probabilistic data-structures frequency counting sketches cuckoo-filters bloom-filters minhash count-minhttps://pinboard.in/https://pinboard.in/u:jm/b:e5eb4d3418ce/Reactive Programming for a demanding world2015-03-31T09:53:30+00:00
http://www.slideshare.net/mariofusco/reactive-programming-for-a-demanding-world-building-eventdriven-and-responsive-applications-with-rxjava
jmrxjava rx reactive coding backpressure streams observableshttps://pinboard.in/https://pinboard.in/u:jm/b:587b4918634e/The official REST Proxy for Kafka2015-03-25T22:08:12+00:00
http://blog.confluent.io/2015/03/25/a-comprehensive-open-source-rest-proxy-for-kafka/
jmThe REST Proxy is an open source HTTP-based proxy for your Kafka cluster. The API supports many interactions with your cluster, including producing and consuming messages and accessing cluster metadata such as the set of topics and mapping of partitions to brokers. Just as with Kafka, it can work with arbitrary binary data, but also includes first-class support for Avro and integrates well with Confluent’s Schema Registry. And it is scalable, designed to be deployed in clusters and work with a variety of load balancing solutions.
We built the REST Proxy first and foremost to meet the growing demands of many organizations that want to use Kafka, but also want more freedom to select languages beyond those for which stable native clients exist today. However, it also includes functionality beyond traditional clients, making it useful for building tools for managing your Kafka cluster. See the documentation for a more detailed description of the included features.
]]>kafka rest proxies http confluent queues messaging streams architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:4c70bf63df1b/[KAFKA-1555] provide strong consistency with reasonable availability2014-10-17T13:46:10+00:00
https://issues.apache.org/jira/browse/KAFKA-1555
jmkafka replication cap consistency streamshttps://pinboard.in/https://pinboard.in/u:jm/b:4be67d45034f/"Quantiles on Streams" [paper, 2009]2014-10-08T09:49:58+00:00
http://www.cs.ucsb.edu/~suri/psdir/ency.pdf
jmlatency percentiles coding quantiles streams papers algorithmshttps://pinboard.in/https://pinboard.in/u:jm/b:662952253d16/Collection Pipeline2014-07-28T13:02:16+00:00
http://martinfowler.com/articles/collection-pipeline/
jmmartin-fowler patterns coding ruby clojure streams pipelines pipes unix lambda fp java languageshttps://pinboard.in/https://pinboard.in/u:jm/b:3294217f4efc/Spark Streaming2014-05-16T21:35:38+00:00
http://spark.apache.org/docs/latest/streaming-programming-guide.html#overview
jman extension of the core Spark API that allows enables high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s in-built machine learning algorithms, and graph processing algorithms on data streams.
]]>spark streams stream-processing cep scalability apache machine-learning graphshttps://pinboard.in/https://pinboard.in/u:jm/b:62c1e3c0e756/Sketch of the Day – Frugal Streaming2013-09-16T21:24:35+00:00
http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/
jmmemory streaming stream-processing clever algorithms hacks streamshttps://pinboard.in/https://pinboard.in/u:jm/b:4a29ca0d196f/Sketch of the Day: K-Minimum Values2013-06-25T21:48:52+00:00
http://blog.aggregateknowledge.com/2012/07/09/sketch-of-the-day-k-minimum-values/
jmalgorithms coding space-saving cardinality streams stream-processing estimation sets sketchinghttps://pinboard.in/https://pinboard.in/u:jm/b:c15a7a7f9322/Approximate Heavy Hitters -The SpaceSaving Algorithm2013-05-14T20:51:22+00:00
http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/
jmalgorithms coding space-saving cardinality streams stream-processing estimationhttps://pinboard.in/https://pinboard.in/u:jm/b:f8929aee50d5/good blog post on histogram-estimation stream processing algorithms2013-02-21T11:24:12+00:00
http://walfield.org/blog/2010/09/04/data-stream-processing.html
jmAfter reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly.
]]>streams streaming stream-processing histograms percentiles estimation waves statistics algorithmshttps://pinboard.in/https://pinboard.in/u:jm/b:55bc9e866719/Distributed Streams Algorithms for Sliding Windows [PDF]2013-02-21T11:16:34+00:00
http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf
jmwaves papers streaming algorithms percentiles histogram distcomp distributed aggregation statistics estimation streamshttps://pinboard.in/https://pinboard.in/u:jm/b:dcc1a04930b0/Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure2013-02-12T11:55:30+00:00
http://blog.aggregateknowledge.com/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
jmhyperloglog loglog algorithms stream-processing streams estimation demos javascripthttps://pinboard.in/https://pinboard.in/u:jm/b:b92522d23b56/'Medians and Beyond: New Aggregation Techniques for Sensor Networks' [paper, PDF]2013-02-09T21:54:42+00:00
http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
jmq-digest algorithms streams approximation histograms median percentiles quantileshttps://pinboard.in/https://pinboard.in/u:jm/b:893da73adce4/clearspring / stream-lib2013-02-09T21:46:02+00:00
https://github.com/clearspring/stream-lib#readme
jmalgorithms coding streams cep stream-processing approximation probabilistic space-saving top-k cardinality estimation bloom-filters q-digest loglog hyperloglog murmurhash lookup3https://pinboard.in/https://pinboard.in/u:jm/b:5ec31bbded7e/'Efficient Computation of Frequent and Top-k Elements in Data Streams' [paper, PDF]2013-02-09T21:30:30+00:00
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf
jmspace-saving approximation streams stream-processing cep papers pdf algorithmshttps://pinboard.in/https://pinboard.in/u:jm/b:aa06ce6e347d/Real-time Analytics in Scala [slides, PDF]2013-02-09T21:17:12+00:00
http://noelwelsh.com/assets/downloads/scala-exchange-2012.pdf
jmstreams algorithms approximation coding scala slideshttps://pinboard.in/https://pinboard.in/u:jm/b:15d275caa928/Data distribution in the cloud with Node.js2012-10-24T10:08:00+00:00
http://www.slideshare.net/darach/node-dublin
jmvia:sbtourist events event-processing streaming data ex-iona darach-ennis push-technology cep javascript node.js streamshttps://pinboard.in/https://pinboard.in/u:jm/b:8b61791ab9bd/