Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmApache Pulsar: Seamless Storage Evolution2021-06-15T22:27:13+00:00
https://www.verizonmedia.com/technology/blog/apache-pulsar-overview
jmstreaming distcomp distributed apache pulsar dcpmm nvme persistent-memory performance architecture storagehttps://pinboard.in/https://pinboard.in/u:jm/b:98dee1973e10/Apache Iceberg (incubating)2019-01-14T23:22:27+00:00
https://iceberg.apache.org/
jmIceberg tracks individual data files in a table instead of directories. This allows writers to create data files in-place and only adds files to the table in an explicit commit.
Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic operation. The table metadata file tracks the table schema, partitioning config, other properties, and snapshots of the table contents.
The atomic transitions from one table metadata file to the next provide snapshot isolation. Readers use the latest table state (snapshot) that was current when they load the table metadata and are not affected by changes until they refresh and pick up a new metadata location.
excellent -- this will let me obsolete so much of our own code :)
]]>presto storage s3 hive iceberg apache asf data architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:07b6a7ecf2f8/Apache Airflow at Pandora – Algorithm and Blues2018-03-16T23:52:35+00:00
https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
jmairflow python apache pandora open-source scheduling dagshttps://pinboard.in/https://pinboard.in/u:jm/b:1360818f0557/Generate Mozilla Security Recommended Web Server Configuration Files2018-02-06T16:38:08+00:00
https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=nginx-1.10.3&openssl=1.0.1e&hsts=yes&profile=modern
jmweb openssl nginx lighttpd apache haproxy hsts security ssl tls opshttps://pinboard.in/https://pinboard.in/u:jm/b:70aa27c02be7/[LEGAL-303] ASF, RocksDB, and Facebook's BSD+patent grant licensing2017-07-17T10:29:51+00:00
https://issues.apache.org/jira/browse/LEGAL-303
jmreact rocksdb licensing asl2 apache asf facebook open-source patentshttps://pinboard.in/https://pinboard.in/u:jm/b:a8d0f17f02f0/Is it Pokemon or Big Data ?2015-11-25T17:15:21+00:00
https://pixelastic.github.io/pokemonorbigdata/
jmpokemon big-data apache hadoop funny quizzeshttps://pinboard.in/https://pinboard.in/u:jm/b:47b444c9bf43/Open-sourcing PalDB, a lightweight companion for storing side data2015-10-28T15:35:31+00:00
https://engineering.linkedin.com/blog/2015/10/open-sourcing-paldb--a-lightweight-companion-for-storing-side-da
jmlinkedin open-source storage side-data data config paldb java apache databaseshttps://pinboard.in/https://pinboard.in/u:jm/b:5f9ff3f038de/excellent offline mapping app MAPS.ME goes open source2015-09-30T15:08:47+00:00
https://github.com/mapsme/omim
jmmaps.me mapping maps open-source apache ios android mobilehttps://pinboard.in/https://pinboard.in/u:jm/b:a1c891d21ad8/Stormpot2015-09-08T10:43:41+00:00
http://chrisvest.github.io/stormpot/
jman object pooling library for Java. Use it to recycle objects that are expensive to create. The library will take care of creating and destroying your objects in the background. Stormpot is very mature, is used in production, and has done over a trillion claim-release cycles in testing. It is faster and scales better than any competing pool.
Apache-licensed, and extremely fast: https://medium.com/@chrisvest/released-stormpot-2-4-eeab4aec86d0]]>java stormpot object-pooling object-pools pools allocation gc open-source apache performancehttps://pinboard.in/https://pinboard.in/u:jm/b:ef38e7baa4ad/Festina Lente2015-07-29T21:57:09+00:00
http://drbacchus.com/festina-lente/
jmnoirin-plunkett memorials eulogies rip asf apachehttps://pinboard.in/https://pinboard.in/u:jm/b:9cb6106513a9/Apache HTrace2015-05-12T16:06:27+00:00
http://htrace.incubator.apache.org/
jmzipkin tracing trace apache incubator java debugginghttps://pinboard.in/https://pinboard.in/u:jm/b:7241ab3bdb05/Spark 1.2 released2014-12-22T14:14:17+00:00
http://databricks.com/blog/2014/12/19/announcing-spark-1-2.html
jmSpark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.
]]>spark sorting hadoop map-reduce batch databricks apache nettyhttps://pinboard.in/https://pinboard.in/u:jm/b:6d93115441ec/FelixGV/tehuti2014-10-09T10:53:00+00:00
https://github.com/FelixGV/tehuti
jmasl2 apache open-source tehuti metrics percentiles quantiles statistics measurement latency kafka voldemort linkedinhttps://pinboard.in/https://pinboard.in/u:jm/b:a2f55ebce7bb/Spark Streaming2014-05-16T21:35:38+00:00
http://spark.apache.org/docs/latest/streaming-programming-guide.html#overview
jman extension of the core Spark API that allows enables high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s in-built machine learning algorithms, and graph processing algorithms on data streams.
]]>spark streams stream-processing cep scalability apache machine-learning graphshttps://pinboard.in/https://pinboard.in/u:jm/b:62c1e3c0e756/Building a large scale CDN with Apache Traffic Server2014-05-07T10:53:57+00:00
https://www.youtube.com/watch?v=q1mndAYZlio
jmcdn comcast video presentations apache traffic-server vodhttps://pinboard.in/https://pinboard.in/u:jm/b:9d0c215518cc/SpamAssassin 3.4.0 released2014-02-12T10:53:51+00:00
http://spamassassin.apache.org/
jmantispam open-source spamassassin apachehttps://pinboard.in/https://pinboard.in/u:jm/b:04b9d97e777d/Apache Curator2014-01-30T22:09:06+00:00
http://curator.apache.org/
jmzookeeper netflix apache curator java libraries open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:2a32e89d7f29/Randomly Failed! The State of Randomness in Current Java Implementations2013-08-12T09:06:00+00:00
http://www.scribd.com/doc/131955288/Randomly-Failed-The-State-of-Randomness-in-Current-Java-Implementations
jm
The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being influenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.
More on the BitCoin drama: https://bitcointalk.org/index.php?topic=271486.40 , http://bitcoin.org/en/alert/2013-08-11-android]]>android java prng random security bugs apache-harmony apache crypto bitcoin papershttps://pinboard.in/https://pinboard.in/u:jm/b:016d49a82951/Ivan Ristić: Defending against the BREACH attack2013-08-07T20:33:04+00:00
http://blog.ivanristic.com/2013/08/defending-against-the-breach-attack.html
jmThe award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they're not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.
]]>mitm attacks hacking security compression http https protocols tls ssl tcp chunked-encoding apachehttps://pinboard.in/https://pinboard.in/u:jm/b:13c0a7ba2031/js-hll2013-06-25T21:37:50+00:00
http://blog.aggregateknowledge.com/2013/06/18/open-source-release-js-hll/
jmOne of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures -- perform arbitrary unions and intersections on effectively unbounded sets all on the client -- was exhilarating to us. [...] we are pleased to announce the open-source release of AK’s HyperLogLog implementation for JavaScript, js-hll. We are releasing this code under the Apache License, Version 2.0.
We knew that we couldn’t just release a bunch of JavaScript code without allowing you to see it in action — that would be a crime. We passed a few ideas around and the one that kept bubbling to the top was a way to kill two birds with one stone. We wanted something that would showcase what you can do with HLL in the browser and give us a tool for explaining HLLs. It is typical for us to explain how HLL intersections work using a Venn diagram. You draw some overlapping circles with a border that represents the error and you talk about how if that border is close to or larger than the intersection then you can’t say much about the size of that intersection. This works just ok on a whiteboard but what you really want is to just build a visualization that allows you to select from some sets and see the overlap. Maybe even play with the precision a little bit to see how that changes the result. Well, we did just that!
]]>javascript ui hll hyperloglog algorithms sketching js sets intersection union apache open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:aedd388dbaa3/Kafka 0.8 Producer Performance2013-04-10T22:24:27+00:00
https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
jmperformance kafka apache benchmarks ops queueinghttps://pinboard.in/https://pinboard.in/u:jm/b:2f67886faaa2/Riak CS is now ASL2 open source2013-03-20T13:35:04+00:00
http://basho.com/riak-cs-is-now-open-source/
jmriak riak-cs nosql storage basho open-source github apache asl2https://pinboard.in/https://pinboard.in/u:jm/b:6120c0a8fb9a/Cubism.js2012-04-24T20:04:30+00:00
http://square.github.com/cubism/
jmjavascript library visualization dataviz tsd data apache open-sourcehttps://pinboard.in/https://pinboard.in/u:jm/b:c44f693703d5/Apache Kafka2012-02-12T00:59:16+00:00
http://incubator.apache.org/kafka/index.html
jmkafka linkedin apache distributed messaging pubsub queue incubator scalinghttps://pinboard.in/https://pinboard.in/u:jm/b:92e2d30f6bea/Apache considered harmful2011-11-23T21:59:04+00:00
http://www.mikealrogers.com/posts/apache-considered-harmful.html
jmgit asf apache via:hn github programminghttps://pinboard.in/https://pinboard.in/u:jm/b:fd68b4e57a32/Lucene Utilities and Bloom Filters - Greplin:tech2011-04-13T23:20:49+00:00
http://tech.blog.greplin.com/lucene-utilities-and-bloom-filters
jmsearch bloom-filters greplin open-source apache false-positiveshttps://pinboard.in/u:jm/b:b2a894a638d9/Akka2011-03-27T22:20:47+00:00
http://akka.io/
jmscala java concurrency scalability apache akka actors erlang fault-tolerance eventshttps://pinboard.in/u:jm/b:d8d97dabbd34/avatraxiom: Improving Web Security: Six Ways the Apache.org JIRA Attack Could Have Been Prevented by Better Code2010-04-13T17:05:28+00:00
http://avatraxiom.livejournal.com/102080.html
jmasf apache bugzilla jira xss security hackshttps://pinboard.in/u:jm/b:72c5f6462cc3/ElasticSearch2010-02-12T21:24:08+00:00
http://www.elasticsearch.com/products/elasticsearch/
jmsearch distributed rest json apache elasticsearch httphttps://pinboard.in/u:jm/b:0b0daaf0ae91/The Apache Software Foundation Announces Apache SpamAssassin Version 3.3.02010-01-26T16:34:43+00:00
http://www.prnewswire.com/news-releases/the-apache-software-foundation-announces-apache-spamassassin-version-330-82677727.html
jmasf apache spamassassin releases 3.3.0 anti-spamhttps://pinboard.in/u:jm/b:1d69fed5b2e1/Subversion Submitted to Become a Project at The Apache Software Foundation2009-11-04T17:56:42+00:00
http://www.earthtimes.org/articles/show/subversion-submitted-to-become-a-project-at-the-apache-software-foundation,1028705.shtml
jmsvn subversion asf apache open-source incubatorhttps://pinboard.in/u:jm/b:de5ce9add3cd/DDOS mystery involving Linux and mod_ssl2009-10-19T15:09:38+00:00
https://blogs.apache.org/infra/entry/ddos_mystery_involving_linux_and#comments
jmapache asf ddos https httpd mod_sslhttps://pinboard.in/u:jm/b:a02a0daec994/glTail.rb - realtime logfile visualization2009-07-21T09:28:15+00:00
http://www.fudgie.org/
jmdataviz visualization tail gltail opengl linux apache spamd spamassassin logs statistics sysadmin analytics animation analysis server ruby monitoring logging logfileshttps://pinboard.in/u:jm/b:be37d5036892/Launchpad is now open source2009-07-21T08:59:49+00:00
http://blog.canonical.com/?p=192
jmcanonical launchpad open-source apache hosting projects ubuntu agplhttps://pinboard.in/u:jm/b:9990e9fe0e8b/