Pinboard (jm)

Pinboard (jm) https://pinboard.in/u:jm/public/ recent bookmarks from jm How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code 2015-10-21T09:36:31+00:00 http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ jm spark tesseract hbase solr leptonica pdfs scanning cloudera hadoop architecture https://pinboard.in/ https://pinboard.in/u:jm/b:2b695d958d5f/ Paper review: "Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems" 2015-03-27T09:36:04+00:00 http://muratbuffalo.blogspot.co.uk/2015/03/paper-review-simple-testing-can-prevent.html jm race-conditions startup bugs failure fault-tolerance hbase redis reliability ops papers concurrency exception-handling cassandra hdfs mapreduce https://pinboard.in/ https://pinboard.in/u:jm/b:3dd7b48e5fed/ DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second 2013-04-23T13:03:14+00:00 http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html jm datasift architecture scalability data twitter firehose hbase kafka zeromq https://pinboard.in/ https://pinboard.in/u:jm/b:5c07ab4273cd/ Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing 2013-02-28T09:57:37+00:00 http://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/ jm yahoo yarn cloud-computing private-clouds big-data latency storm hadoop elastic-computing hbase https://pinboard.in/ https://pinboard.in/u:jm/b:350773902d3f/ Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack 2013-02-25T15:35:01+00:00 http://blog.markedup.com/2013/02/cassandra-hive-and-hadoop-how-we-picked-our-analytics-stack/ jm riak mongodb cassandra hbase performance analytics hadoop hive big-data storage databases nosql https://pinboard.in/ https://pinboard.in/u:jm/b:b335abec7a75/ HBase Real-time Analytics & Rollbacks via Append-based Updates 2012-12-17T13:59:22+00:00 http://blog.sematext.com/2012/04/22/hbase-real-time-analytics-rollbacks-via-append-based-updates/ jm'Replace update (Get+Put) operations at write time with simple append-only writes and defer processing of updates to periodic jobs or perform aggregations on the fly if user asks for data earlier than individual additions are processed. The idea is simple and not necessarily novel, but given the specific qualities of HBase, namely fast range scans and high write throughput, this approach works very well.' ]]> counters analytics hbase append sematext aggregation big-data https://pinboard.in/ https://pinboard.in/u:jm/b:158d78e4914f/ Storage Infrastructure Behind Facebook Messages 2011-10-25T22:35:16+00:00 http://perspectives.mvdirona.com/2011/10/25/StorageInfrastructureBehindFacebookMessages.aspx jm testing shadowing haystack hbase facebook scalability lzo messaging sms via:james-hamilton https://pinboard.in/ https://pinboard.in/u:jm/b:ad64e79b1478/ Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers 2011-10-22T21:20:06+00:00 http://www.cloudera.com/blog/2011/03/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3/ jm memory allocation java gc jvm hbase memstore via:dehora slab-allocator https://pinboard.in/ https://pinboard.in/u:jm/b:76e4aad99f52/ Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day 2011-03-28T21:11:31+00:00 http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html jm facebook hbase scalability performance hadoop scribe events analytics architecture tail append https://pinboard.in/u:jm/b:4f62efcb61b3/