Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmHenry Robinson on testing and fault discovery in distributed systems2015-09-24T10:21:45+00:00
https://twitter.com/HenryR/status/646821418206756864
jm
'Let's talk about finding bugs in distributed systems for a bit.
These chaos monkey-style fault testing systems are all well and good, but by being application independent they're a very blunt instrument.
Particularly they make it hard to search the fault space for bugs in a directed manner, because they don't 'know' what the system is doing.
Application-aware scripting of faults in a dist. systems seems to be rarely used, but allows you to directly stress problem areas.
For example, if a bug manifests itself only when one RPC returns after some timeout, hard to narrow that down with iptables manipulation.
But allow a script to hook into RPC invocations (and other trace points, like DTrace's probes), and you can script very specific faults.
That way you can simulate cross-system integration failures, *and* write reproducible tests for the bugs they expose!
Anyhow, I've been doing this in Impala, and it's been very helpful. Haven't seen much evidence elsewhere.'
]]>henry-robinson testing fault-discovery rpc dtrace tracing distributed-systems timeouts chaos-monkey impalahttps://pinboard.in/https://pinboard.in/u:jm/b:3f0396cd7411/inside Impala's architecture2015-02-05T17:22:43+00:00
http://blog.acolyer.org/2015/02/05/impala-a-modern-open-source-sql-engine-for-hadoop/
jmimpala papers hadoop sql llvm hdfs architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:2c57e4ff4eab/ByteArrayOutputStream is really, really slow sometimes in JDK62014-06-30T15:33:07+00:00
http://the-paper-trail.org/blog/535/
jmThis leads us to the bug. The size of the array is determined by Math.max(buf.length << 1, newcount). Ordinarily, buf.length << 1 returns double buf.length, which would always be much larger than newcount for a 2 byte write. Why was it not? The problem is that for all integers larger than Integer.MAX_INTEGER / 2, shifting left by one place causes overflow, setting the sign bit. The result is a negative integer, which is always less than newcount. So for all byte arrays larger than 1073741824 bytes (i.e. one GB), any write will cause the array to resize, and only to exactly the size required.
Ouch.
]]>bugs java jdk6 bytearrayoutputstream impala performance overflowhttps://pinboard.in/https://pinboard.in/u:jm/b:c0b4f2f043fa/Big, Small, Hot or Cold - Your Data Needs a Robust Pipeline2014-02-07T23:26:18+00:00
http://www.hakkalabs.co/articles/big-small-hot-or-cold-your-data-needs-a-robust-pipeline-examples-from-stripe-tapad-etsy-square
jmstripe tapad etsy square big-data analytics kafka impala hadoop hdfs parquet thrifthttps://pinboard.in/https://pinboard.in/u:jm/b:5eeef028b0ee/Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop2013-05-14T20:05:59+00:00
http://blog.cloudera.com/blog/2013/05/cloudera-impala-1-0-its-here-its-real-its-already-the-standard-for-sql-on-hadoop/?goback=%2Egde_4800543_member_237186289
jmwe are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.
Along with some great benchmark numbers against Hive. nifty stuff]]>cloudera impala sql querying etl olap hadoop analytics business-intelligence reportshttps://pinboard.in/https://pinboard.in/u:jm/b:fc3f8a9f1d17/