Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmReal-time analytics on network flow data with Apache Pinot2022-09-14T17:05:24+00:00
https://engineering.linkedin.com/blog/2022/real-time-analytics-on-network-flow-data-with-apache-pinot
jmInFlow requires storage of tens of TBs of data with a retention of 30 days. To support its real-time troubleshooting use case, the data must be queryable in real-time with sub-second latency so that engineers can query the data without any hassles during outages. For the storage layer, InFlow leverages Apache Pinot.
]]>pinot latency metrics linkedin network-flows realtime analytics storagehttps://pinboard.in/https://pinboard.in/u:jm/b:d408bc8a0e1e/Star-Tree Index: Powering Fast Aggregations on Pinot | LinkedIn Engineering2020-01-22T23:05:56+00:00
https://engineering.linkedin.com/blog/2019/06/star-tree-index--powering-fast-aggregations-on-pinot
jmWith such huge improvements for both latency and throughput, the Star-Tree index only costs about 12% extra storage space compared to data without indexing techniques and 6% extra compared to data with inverted index.
]]>star-tree sql querying search pinot linkedin algorithms databases indexing indexeshttps://pinboard.in/https://pinboard.in/u:jm/b:476e7d283e21/LinkedIn called me a white supremacist2016-05-30T11:33:51+00:00
http://www.slate.com/articles/technology/technology/2016/05/linkedin_called_me_a_white_supremacist.html
jmn the morning of May 12, LinkedIn, the networking site devoted to making professionals “more productive and successful,” emailed scores of my contacts and told them I’m a professional racist. It was one of those updates that LinkedIn regularly sends its users, algorithmically assembled missives about their connections’ appearances in the media. This one had the innocent-sounding subject, “News About William Johnson,” but once my connections clicked in, they saw a small photo of my grinning face, right above the headline “Trump put white nationalist on list of delegates.” [.....] It turns out that when LinkedIn sends these update emails, people actually read them. So I was getting upset. Not only am I not a Nazi, I’m a Jewish socialist with family members who were imprisoned in concentration camps during World War II. Why was LinkedIn trolling me?
]]>ethics fail algorithm linkedin big-data racism libelhttps://pinboard.in/https://pinboard.in/u:jm/b:b490b40d6799/Open Sourcing Dr. Elephant: Self-Serve Performance Tuning for Hadoop and Spark2016-04-13T16:00:35+00:00
https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark
jm[LinkedIn] are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand, analyze, and improve the performance of their flows.
neat, although I've been bitten too many times by LinkedIn OSS release quality at this point to jump in....
]]>linkedin oss hadoop spark performance tuning opshttps://pinboard.in/https://pinboard.in/u:jm/b:48eaac9e2c7d/Open-sourcing PalDB, a lightweight companion for storing side data2015-10-28T15:35:31+00:00
https://engineering.linkedin.com/blog/2015/10/open-sourcing-paldb--a-lightweight-companion-for-storing-side-da
jmlinkedin open-source storage side-data data config paldb java apache databaseshttps://pinboard.in/https://pinboard.in/u:jm/b:5f9ff3f038de/Introducing Nurse: Auto-Remediation at LinkedIn2015-08-04T12:52:06+00:00
https://engineering.linkedin.com/sre/introducing-nurse-auto-remediation-linkedin
jmnurse auto-remediation outages linkedin ops monitoringhttps://pinboard.in/https://pinboard.in/u:jm/b:f862fe9da2a1/Optimizing Java CMS garbage collections, its difficulties, and using JTune as a solution | LinkedIn Engineering2015-04-11T20:21:41+00:00
http://engineering.linkedin.com/java/optimizing-java-cms-garbage-collections-its-difficulties-and-using-jtune-solution
jmjava jvm tuning gc cms linkedin performance opshttps://pinboard.in/https://pinboard.in/u:jm/b:717dfe95f072/Amazing comment from a random sysadmin who's been targeted by the NSA2015-01-18T08:07:00+00:00
https://news.ycombinator.com/item?id=8905321
jm'Here's a story for you.
I'm not a party to any of this. I've done nothing wrong, I've never been suspected of doing anything wrong, and I don't know anyone who has done anything wrong. I don't even mean that in the sense of "I pissed off the wrong people but technically haven't been charged." I mean that I am a vanilla, average, 9-5 working man of no interest to anybody. My geographical location is an accident of my birth. Even still, I wasn't accidentally born in a high-conflict area, and my government is not at war. I'm a sysadmin at a legitimate ISP and my job is to keep the internet up and running smoothly.
This agency has stalked me in my personal life, undermined my ability to trust my friends attempting to connect with me on LinkedIn, and infected my family's computer. They did this because they wanted to bypass legal channels and spy on a customer who pays for services from my employer. Wait, no, they wanted the ability to potentially spy on future customers. Actually, that is still not accurate - they wanted to spy on everybody in case there was a potentially bad person interacting with a customer.
After seeing their complete disregard for anybody else, their immense resources, and their extremely sophisticated exploits and backdoors - knowing they will stop at nothing, and knowing that I was personally targeted - I'll be damned if I can ever trust any electronic device I own ever again.
You all rationalize this by telling me that it "isn't surprising", and that I don't live in the [USA,UK] and therefore I have no rights.
I just have one question.
Are you people even human?'
]]>nsa via:ioerror privacy spying surveillance linkedin sysadmins gchq securityhttps://pinboard.in/https://pinboard.in/u:jm/b:2e2b4bfd43e0/FelixGV/tehuti2014-10-09T10:53:00+00:00
https://github.com/FelixGV/tehuti
jmasl2 apache open-source tehuti metrics percentiles quantiles statistics measurement latency kafka voldemort linkedinhttps://pinboard.in/https://pinboard.in/u:jm/b:a2f55ebce7bb/Tehuti2014-10-08T09:45:50+00:00
https://groups.google.com/forum/#!msg/project-voldemort/Y52UyHQ8tBA/9Ei79_RvS3EJ
jmkafka metrics dropwizard java scala jvm timers ewma statistics measurement latency sampling tehuti voldemort linkedin jay-krepshttps://pinboard.in/https://pinboard.in/u:jm/b:b56664c1a098/Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications2014-04-08T21:54:13+00:00
http://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications
jmperformance optimization linkedin java jvm gc tuninghttps://pinboard.in/https://pinboard.in/u:jm/b:77cabe371f7c/Home · linkedin/rest.li Wiki2014-02-03T16:19:02+00:00
https://github.com/linkedin/rest.li/wiki
jmRest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs. Rest.li fills a niche for building RESTful service architectures at scale, offering a developer workflow for defining data and REST APIs that promotes uniform interfaces, consistent data modeling, type-safety, and compatibility checked API evolution.
The new underlying comms layer for Voldemort, it seems.]]>voldemort d2 rest.li linkedin json rest http api frameworks javahttps://pinboard.in/https://pinboard.in/u:jm/b:43fa6b5210e4/The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering2013-12-16T21:36:40+00:00
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
jmcoding databases log network kafka jay-kreps linkedin architecture storagehttps://pinboard.in/https://pinboard.in/u:jm/b:385d5e788e83/Response to "Optimizing Linux Memory Management..."2013-10-23T16:28:35+00:00
http://kerneldave.blogspot.ie/2013/10/response-to-optimizing-linux-memory.html
jmDo not read in to this article too much, especially for trying to understand how the Linux VM or the kernel works. The authors misread the "global spinlock on the zone" source code and the interpretation in the article is dead wrong.
]]>linux tuning vm kernel linkedin memory numahttps://pinboard.in/https://pinboard.in/u:jm/b:028705973c7a/Voldemort on Solid State Drives [paper]2013-09-04T14:26:32+00:00
http://www.slideshare.net/amywtang/wbdb2012-voldemortssd
jm
With SSD, we find that garbage collection will become a very significant bottleneck, especially for systems which have little control over the storage layer and rely on Java memory management. Big heapsizes make the cost of garbage collection expensive, especially the single threaded CMS Initial mark. We believe that data systems must revisit their caching strategies with SSDs. In this regard, SSD has provided an efficient solution for handling fragmentation and moving towards predictable multitenancy.]]>voldemort storage ssd disk linkedin big-data jvm tuning ops gchttps://pinboard.in/https://pinboard.in/u:jm/b:086f50102a3b/Using set cover algorithm to optimize query latency for a large scale distributed graph | LinkedIn Engineering2013-08-27T09:56:00+00:00
http://engineering.linkedin.com/real-time-distributed-graph/using-set-cover-algorithm-optimize-query-latency-large-scale-distributed
jmlinkedin algorithms coding distributed-systems graph databases querying set-cover set replicationhttps://pinboard.in/https://pinboard.in/u:jm/b:b1fa0ed30cad/Building a Modern Website for Scale (QCon NY 2013) [slides]2013-06-17T10:37:00+00:00
http://www.slideshare.net/r39132/q-con-ny2013modernwebsitescalabilityfinal-22989785
jmgc-scout gc java scaling scalability linkedin qcon async threadpools rest slas timeouts networking distcomp netty tcp udp failover fault-tolerance packet-losshttps://pinboard.in/https://pinboard.in/u:jm/b:8766348f43f5/Paper: "Root Cause Detection in a Service-Oriented Architecture" [pdf]2013-06-17T10:05:53+00:00
http://www.sigmetrics.org/sigmetrics2013/pdfs/p93.pdf
jm
This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean
average precision in finding root causes compared to baseline and current state-of-the-art methods.
This is a topic close to my heart after working on something similar for 3 years in Amazon!
Looks interesting, although (a) I would have liked to see more case studies and examples of "real world" outages it helped with; and (b) it's very much a machine-learning paper rather than a systems one, and there is no discussion of fault tolerance in the design of the detection system, which would leave me worried that in the case of a large-scale outage event, the system itself will disappear when its help is most vital. (This was a major design influence on our team's work.)
Overall, particularly given those 2 issues, I suspect it's not in production yet. Ours certainly was ;)]]>linkedin soa root-cause alarming correlation service-metrics machine-learning graphs monitoringhttps://pinboard.in/https://pinboard.in/u:jm/b:3867e176e952/Hadoop Operations at LinkedIn [slides]2013-03-20T21:54:23+00:00
http://www.slideshare.net/allenwittenauer/2013-hadoopsummitemea
jmhadoop scaling linkedin opshttps://pinboard.in/https://pinboard.in/u:jm/b:be04033aba6e/Announcing the Voldemort 1.3 Open Source Release2013-03-19T22:57:20+00:00
http://engineering.linkedin.com/voldemort/announcing-voldemort-13-open-source-release
jmvoldemort linkedin open-source bdb nosqlhttps://pinboard.in/https://pinboard.in/u:jm/b:c30ad1b6f8b2/Autometrics: Self-service metrics collection2012-02-16T12:03:36+00:00
http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
jmkafka zookeeper linkedin sysadmin service-metricshttps://pinboard.in/https://pinboard.in/u:jm/b:61ed412e2670/Apache Kafka2012-02-12T00:59:16+00:00
http://incubator.apache.org/kafka/index.html
jmkafka linkedin apache distributed messaging pubsub queue incubator scalinghttps://pinboard.in/https://pinboard.in/u:jm/b:92e2d30f6bea/Dutch grepping Facebook for welfare fraud2011-09-10T13:34:07+00:00
http://www.irishtimes.com/newspaper/world/2011/0910/1224303851410.html
jmgrep dutch holland via:tjmcintyre privacy facebook twitter linkedin welfare dole fraud false-positives searchinghttps://pinboard.in/https://pinboard.in/u:jm/b:6616dc33ebe2/