Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmWhen Boring is Awesome: Building a scalable time-series database on PostgreSQL2017-04-05T15:00:38+00:00
https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2
jmdatabase postgresql postgres timeseries tsd storage state via:nelsonhttps://pinboard.in/https://pinboard.in/u:jm/b:9956c3efa969/ASAP: Automatic Smoothing for Attention Prioritization in Streaming Time Series Visualization2017-03-15T11:12:49+00:00
https://arxiv.org/pdf/1703.00983.pdf
jmdataviz graphs metrics peter-bailis asap smoothing aggregation time-series tsdhttps://pinboard.in/https://pinboard.in/u:jm/b:e26e62cfb964/Beringei: A high-performance time series storage engine | Engineering Blog | Facebook Code2017-02-06T12:08:26+00:00
https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/
jmBeringei is different from other in-memory systems, such as memcache, because it has been optimized for storing time series data used specifically for health and performance monitoring. We designed Beringei to have a very high write rate and a low read latency, while being as efficient as possible in using RAM to store the time series data. In the end, we created a system that can store all the performance and monitoring data generated at Facebook for the most recent 24 hours, allowing for extremely fast exploration and debugging of systems and services as we encounter issues in production.
Data compression was necessary to help reduce storage overhead. We considered several existing compression schemes and rejected the techniques that applied only to integer data, used approximation techniques, or needed to operate on the entire dataset. Beringei uses a lossless streaming compression algorithm to compress points within a time series with no additional compression used across time series. Each data point is a pair of 64-bit values representing the timestamp and value of the counter at that time. Timestamps and values are compressed separately using information about previous values. Timestamp compression uses a delta-of-delta encoding, so regular time series use very little memory to store timestamps.
From analyzing the data stored in our performance monitoring system, we discovered that the value in most time series does not change significantly when compared to its neighboring data points. Further, many data sources only store integers (despite the system supporting floating point values). Knowing this, we were able to tune previous academic work to be easier to compute by comparing the current value with the previous value using XOR, and storing the changed bits. Ultimately, this algorithm resulted in compressing the entire data set by at least 90 percent.
]]>beringei compression facebook monitoring tsd time-series storage architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:3d15d674f951/Nobody Loves Graphite Anymore - VividCortex2015-11-05T22:22:21+00:00
https://www.vividcortex.com/blog/2015/11/05/nobody-loves-graphite-anymore/
jmGraphite has a place in our current monitoring stack, and together with StatsD will always have a special place in the hearts of DevOps practitioners everywhere, but it’s not representative of state-of-the-art in the last few years. Graphite is where the puck was in 2010. If you’re skating there, you’re missing the benefits of modern monitoring infrastructure.
The future I foresee is one where time series capabilities (the raw power needed, which I described in my time series requirements blog post, for example) are within everyone’s reach. That will be considered table stakes, whereas now it’s pretty revolutionary.
Like I've been saying -- we need Time Series As A Service! This should be undifferentiated heavy lifting.]]>graphite tsd time-series vividcortex statsd ops monitoring metricshttps://pinboard.in/https://pinboard.in/u:jm/b:f81dace94111/The New InfluxDB Storage Engine: A Time Structured Merge Tree2015-10-07T14:41:30+00:00
https://influxdb.com/blog/2015/10/07/the_new_influxdb_storage_engine_a_time_structured_merge_tree.html
jmThe new engine has similarities with LSM Trees (like LevelDB and Cassandra’s underlying storage). It has a write ahead log, index files that are read only, and it occasionally performs compactions to combine index files. We’re calling it a Time Structured Merge Tree because the index files keep contiguous blocks of time and the compactions merge those blocks into larger blocks of time. Compression of the data improves as the index files are compacted. Once a shard becomes cold for writes it will be compacted into as few files as possible, which yield the best compression.
]]>influxdb storage lsm-trees leveldb tsm-trees data-structures algorithms time-series tsd compressionhttps://pinboard.in/https://pinboard.in/u:jm/b:c404b4e0eba5/How We Scale VividCortex's Backend Systems - High Scalability2015-03-30T16:55:14+00:00
http://highscalability.com/blog/2015/3/30/how-we-scale-vividcortexs-backend-systems.html
jmtime-series tsd storage mysql sql baron-schwartz ops performance scalability scaling gohttps://pinboard.in/https://pinboard.in/u:jm/b:fe014fc1ee1b/One year of InfluxDB and the road to 1.02015-02-19T22:21:45+00:00
http://influxdb.com/blog/2014/09/26/one-year-of-influxdb-and-the-road-to-1_0.html
jmhalf of the [Monitorama] attendees were employees and entrepreneurs at monitoring, metrics, DevOps, and server analytics companies. Most of them had a story about how their metrics API was their key intellectual property that took them years to develop. The other half of the attendees were developers at larger organizations that were rolling their own DevOps stack from a collection of open source tools. Almost all of them were creating a “time series database” with a bunch of web services code on top of some other database or just using Graphite. When everyone is repeating the same work, it’s not key intellectual property or a differentiator, it’s a barrier to entry. Not only that, it’s something that is hindering innovation in this space since everyone has to spend their first year or two getting to the point where they can start building something real. It’s like building a web company in 1998. You have to spend millions of dollars and a year building infrastructure, racking servers, and getting everything ready before you could run the application. Monitoring and analytics applications should not be like this.
]]>graphite monitoring metrics tsd time-series analytics influxdb open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:cbe8b281afb6/Observability at Twitter2013-09-11T21:40:35+00:00
https://blog.twitter.com/2013/observability-at-twitter
jmThere are separate online clusters for different data sets: application and operating system metrics, performance critical write-time aggregates, long term archives, and temporal indexes. A typical production instance of the time series database is based on four distinct Cassandra clusters, each responsible for a different dimension (real-time, historical, aggregate, index) due to different performance constraints. These clusters are amongst the largest Cassandra clusters deployed in production today and account for over 500 million individual metric writes per minute. Archival data is stored at a lower resolution for trending and long term analysis, whereas higher resolution data is periodically expired. Aggregation is generally performed at write-time to avoid extra storage operations for metrics that are expected to be immediately consumed. Indexing occurs along several dimensions–service, source, and metric names–to give users some flexibility in finding relevant data.
]]>twitter monitoring metrics service-metrics tsd time-series storage architecture cassandrahttps://pinboard.in/https://pinboard.in/u:jm/b:3f55dd143a82/Blueflood by rackerlabs2013-09-02T15:19:32+00:00
http://blueflood.io/
jmcassandra tsd storage time-series data open-source java rackspacehttps://pinboard.in/https://pinboard.in/u:jm/b:9991bb557631/Boundary Product Update: Trends Dashboard Now Available2013-04-12T14:15:16+00:00
http://boundary.com/blog/2013/04/11/introducing-trends/
jmboundary time-series tsd prediction metrics smoothing dataviz dashboardshttps://pinboard.in/https://pinboard.in/u:jm/b:c66d0ff14d46/Boundary Techtalk - Large-scale OLAP with Kobayashi2013-04-10T17:12:32+00:00
http://boundary.com/blog/2012/08/21/boundary-techtalk-large-scale-olap-with-kobayashi/
jmDietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database. The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.
]]>video boundary tsd riak eventual-consistency storage kobayashi olap time-serieshttps://pinboard.in/https://pinboard.in/u:jm/b:a71181b0a65d/Metric Collection and Storage with Cassandra | DataStax2013-03-12T21:48:41+00:00
http://www.datastax.com/dev/blog/metric-collection-and-storage-with-cassandra
jmdatastax nosql metrics analytics cassandra tsd time-series storagehttps://pinboard.in/https://pinboard.in/u:jm/b:307f55e4ee22/Cubism.js2012-04-24T20:04:30+00:00
http://square.github.com/cubism/
jmjavascript library visualization dataviz tsd data apache open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:c44f693703d5/Occursions2012-03-11T00:18:38+00:00
http://sourceforge.net/projects/occursions/
jmlogs search tsd big-data log4j via:proggithttps://pinboard.in/https://pinboard.in/u:jm/b:8fb2aade3b72/dygraphs JavaScript Visualization Library2009-12-10T23:17:31+00:00
http://www.danvk.org/dygraphs/
jmtime-series data tsd graphs charts javascript via:reddit dataviz visualization opensource dygraphshttps://pinboard.in/u:jm/b:9980e90a6d9d/