Pinboard (jm)

Pinboard (jm) https://pinboard.in/u:jm/public/ recent bookmarks from jm How Do You Find an Illegal Image Without Looking at It? 2026-04-07T10:18:58+00:00 https://mahmoud-salem.net/the-invisible-shield jm csam detection filtering photodna pdq classifiers photos videos classification hashing fuzzy-hashing via:erin-kissane https://pinboard.in/ https://pinboard.in/u:jm/b:b9d795d6a889/ Breaking CityHash64, MurmurHash2/3, wyhash, and more 2025-05-01T11:53:49+00:00 https://orlp.net/blog/breaking-hash-functions/ jm hashing security infosec hashdos collisions cityhash murmurhash farmhash wyhash https://pinboard.in/ https://pinboard.in/u:jm/b:b2090029dc7d/ How the New sqlite3_rsync Utility Works 2024-11-06T17:05:58+00:00 https://nochlin.com/blog/how-the-new-sqlite3_rsync-utility-works jm sqlite hashing rsync synchronization replication databases storage algorithms https://pinboard.in/ https://pinboard.in/u:jm/b:7f69d3bfe7d1/ Cyan4973/xxHash: Extremely fast non-cryptographic hash algorithm 2021-02-01T11:48:46+00:00 https://github.com/Cyan4973/xxHash/ jm hashing hash xxhash performance coding speed algorithms https://pinboard.in/ https://pinboard.in/u:jm/b:57cfebc0c1ce/ Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression 2021-02-01T11:45:50+00:00 https://blog.centminmod.com/2021/01/30/2214/fast-tar-and-rsync-transfer-speed-for-linux-backups-using-zstd-compression/ jmNewer Tar 1.32+ and Rsync 3.2.3 versions have added Facebook’s zstd compression algorithm and Rsync has added lz4 and xxHash checksum algorithms which give Tar and Rsync a tremendous boost in transfer speed. ]]> tar rsync backups xxhash hashing performance speed zstd compression lz4 https://pinboard.in/ https://pinboard.in/u:jm/b:93b4075d656b/ dropbox/setsum 2020-12-08T22:33:57+00:00 https://github.com/dropbox/setsum jm checksums hashing dropbox sums summarising algorithms streaming https://pinboard.in/ https://pinboard.in/u:jm/b:82bacd312ca4/ When Bloom filters don't bloom 2020-03-03T14:46:36+00:00 https://blog.cloudflare.com/when-bloom-filters-dont-bloom/ jmModern CPUs are really good at sequential memory access when it's possible to predict memory fetch patterns (see Cache prefetching). Random memory access on the other hand is very costly. Advanced data structures are very interesting, but beware. Modern computers require cache-optimized algorithms. When working with large datasets, not fitting L3, prefer optimizing for reduced number loads, over optimizing the amount of memory used. I guess it's fair to say that Bloom filters are great, as long as they fit into the L3 cache. The moment this assumption is broken, they are terrible. This is not news, Bloom filters optimize for memory usage, not for memory access. For example, see the Cuckoo Filters paper. ]]> cloudflare bloom-filters performance data-structures cpu cache l3 hashing perf perftools https://pinboard.in/ https://pinboard.in/u:jm/b:0d0316cd680e/ BLAKE3 2020-02-24T17:28:59+00:00 https://www.infoq.com/news/2020/01/blake3-fast-crypto-hash/ jm blake3 blake hashing hashes algorithms speed performance optimization sha https://pinboard.in/ https://pinboard.in/u:jm/b:386b7e9bbdfd/ Historic S3 data corruption due to a fault load balancer 2020-01-22T14:05:12+00:00 https://forums.aws.amazon.com/thread.jspa?threadID=22709 jmWe've isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20 [2008]. It was taken out of service at 11am PDT Sunday, 6/22. While it was in service it handled a small fraction of Amazon S3's total requests in the US. Intermittently, under load, it was corrupting single bytes in the byte stream. When the requests reached Amazon S3, if the Content-MD5 header was specified, Amazon S3 returned an error indicating the object did not match the MD5 supplied. When no MD5 is specified, we are unable to determine if transmission errors occurred, and Amazon S3 must assume that the object has been correctly transmitted. Based on our investigation with both internal and external customers, the small amount of traffic received by this particular load balancer, and the intermittent nature of the above issue on this one load balancer, this appears to have impacted a very small portion of PUTs during this time frame. One of the things we'll do is improve our logging of requests with MD5s, so that we can look for anomalies in their 400 error rates. Doing this will allow us to provide more proactive notification on potential transmission issues in the future, for customers who use MD5s and those who do not. In addition to taking the actions noted above, we encourage all of our customers to take advantage of mechanisms designed to protect their applications from incorrect data transmission. For all PUT requests, Amazon S3 computes its own MD5, stores it with the object, and then returns the computed MD5 as part of the PUT response code in the ETag. By validating the ETag returned in the response, customers can verify that Amazon S3 received the correct bytes even if the Content MD5 header wasn't specified in the PUT request. Because network transmission errors can occur at any point between the customer and Amazon S3, we recommend that all customers use the Content-MD5 header and/or validate the ETag returned on a PUT request to ensure that the object was correctly transmitted. This is a best practice that we'll emphasize more heavily in our documentation to help customers build applications that can handle this situation. ]]> aws s3 outages postmortems load-balancing data-corruption corruption failure md5 hashing hashes https://pinboard.in/ https://pinboard.in/u:jm/b:7067b5a9a1e4/ SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust 2020-01-07T15:08:10+00:00 https://eprint.iacr.org/2020/014 jm Abstract: The SHA-1 hash function was designed in 1995 and has been widely used during two decades. A theoretical collision attack was first proposed in 2004 [WYY05], but due to its high complexity it was only implemented in practice in 2017, using a large GPU cluster [SBK+17]. More recently, an almost practical chosen-prefix collision attack against SHA-1 has been proposed [LP19]. This more powerful attack allows to build colliding messages with two arbitrary prefixes, which is much more threatening for real protocols. In this paper, we report the first practical implementation of this attack, and its impact on real-world security with a PGP/GnuPG impersonation attack. We managed to significantly reduce the complexity of collisions attack against SHA-1: on an Nvidia GTX 970, identical-prefix collisions can now be computed with a complexity of 261.2261.2 rather than 264.7264.7, and chosen-prefix collisions with a complexity of 263.4263.4 rather than 267.1267.1. When renting cheap GPUs, this translates to a cost of 11k US\$ for a collision, and 45k US\$ for a chosen-prefix collision, within the means of academic researchers. Our actual attack required two months of computations using 900 Nvidia GTX 1060 GPUs (we paid 75k US\$ because GPU prices were higher, and we wasted some time preparing the attack). Therefore, the same attacks that have been practical on MD5 since 2009 are now practical on SHA-1. In particular, chosen-prefix collisions can break signature schemes and handshake security in secure channel protocols (TLS, SSH). We strongly advise to remove SHA-1 from those type of applications as soon as possible. We exemplify our cryptanalysis by creating a pair of PGP/GnuPG keys with different identities, but colliding SHA-1 certificates. A SHA-1 certification of the first key can therefore be transferred to the second key, leading to a forgery. This proves that SHA-1 signatures now offers virtually no security in practice. The legacy branch of GnuPG still uses SHA-1 by default for identity certifications, but after notifying the authors, the modern branch now rejects SHA-1 signatures (the issue is tracked as CVE-2019-14855). (Via Tony Finch)]]> via:fanf security sha sha-1 crypto hashes hashing pgp gpg collisions https://pinboard.in/ https://pinboard.in/u:jm/b:468127bda2ca/ XXH3 2019-03-19T10:51:56+00:00 http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html jm hashing algorithms xxhash xxh3 checksums performance https://pinboard.in/ https://pinboard.in/u:jm/b:abc914fca9e9/ multiformats/multihash: Self describing hashes - for future proofing 2018-09-17T16:29:24+00:00 https://github.com/multiformats/multihash jm ipfs hashing multihash crypto hashes sha https://pinboard.in/ https://pinboard.in/u:jm/b:c9c746b195ae/ Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo) 2018-06-18T10:23:24+00:00 https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/ jmTurns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing. Apparently this is binary multiplicative hashing, and Google's brotli, webp, and Snappy libs all use a constant derived heuristically from a compression test corpus along the same lines (see comments). (Via Michael Fogleman)]]> algorithms hashing hash fibonacci golden-ratio coding hacks brotli webp snappy hash-tables hashmaps load-distribution https://pinboard.in/ https://pinboard.in/u:jm/b:9fbbdd34c27e/ _Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems_, ACM Transactions on Storage, July 2014 2018-05-29T09:53:48+00:00 http://hpc.ac.upc.edu/PDFs/dir05/file004529.pdf jm randomness architecture algorithms storage hashing slicing scaling https://pinboard.in/ https://pinboard.in/u:jm/b:8a9fa65f6c59/ google/highwayhash: Fast strong hash functions: SipHash/HighwayHash 2018-01-12T13:43:51+00:00 https://github.com/google/highwayhash jm 64 bits and therefore infeasible to reverse. Permuting equalizes the distribution of the resulting bytes. The internal state occupies four 256-bit AVX2 registers. Due to limitations of the instruction set, the registers are partitioned into two 512-bit halves that remain independent until the reduce phase. The algorithm outputs 64 bit digests or up to 256 bits at no extra cost. In addition to high throughput, the algorithm is designed for low finalization cost. The result is more than twice as fast as SipTreeHash. We also provide an SSE4.1 version (80% as fast for large inputs and 95% as fast for short inputs), an implementation for VSX on POWER and a portable version (10% as fast). A third-party ARM implementation is referenced below. Statistical analyses and preliminary cryptanalysis are given in https://arxiv.org/abs/1612.06257.' (via Tony Finch)]]> siphash highwayhash via:fanf hashing hashes algorithms mac google hash https://pinboard.in/ https://pinboard.in/u:jm/b:c96748eca1a7/ The naked truth about Facebook’s revenge porn tool 2017-11-10T21:27:42+00:00 https://www.engadget.com/2017/11/10/the-naked-truth-about-facebook-s-revenge-porn-tool/ jm If Facebook wanted to implement a truly trusted system for revenge porn victims, they could put the photo hashing on the user side of things -- so only the hash is transferred to Facebook. To verify the claim that the image is truly a revenge porn issue, the victim could have the images verified through a trusted revenge porn advocacy organization. Theoretically, the victim then would have a verified, privacy-safe version of the photo, and a hash that could be also sent to Google and other sites. ]]> facebook privacy hashing pictures images revenge-porn abuse via:jwz https://pinboard.in/ https://pinboard.in/u:jm/b:e9bd8d39864c/ Facebook asks users for nude photos in project to combat revenge porn 2017-11-08T09:44:37+00:00 https://www.theguardian.com/technology/2017/nov/07/facebook-revenge-porn-nude-photos jm photodna hashing images facebook revenge-porn messenger nudes photos https://pinboard.in/ https://pinboard.in/u:jm/b:f9f8eec39539/ Image comparison algorithms 2017-04-12T21:22:06+00:00 http://stackoverflow.com/questions/843972/image-comparison-fast-algorithm/844113#844113 jm algorithms hashing comparison diff images similarity search ffffound mltshp https://pinboard.in/ https://pinboard.in/u:jm/b:7cb94c5de107/ Accidentally Quadratic — Rust hash iteration+reinsertion 2016-11-24T20:47:47+00:00 http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion jmIt was recently discovered that some surprising operations on Rust’s standard hash table types could go quadratic. Quite a nice unexpected accidental detour into O(n^2) ]]> big-o hashing robin-hood-hashing siphash algorithms hashtables rust https://pinboard.in/ https://pinboard.in/u:jm/b:927f9d69dc79/ Stop it with short PGP key IDs! 2016-06-08T11:03:44+00:00 http://gwolf.org/node/4070 jmWhat happened today? We still don't really know, but it seems we found a first potentially malicious collision — that is, the first "nonacademic" case. Enrico found two keys sharing the 9F6C6333 short ID, apparently belonging to the same person (as would be the case of Asheesh, mentioned above). After contacting Gustavo, though, he does not know about the second — That is, it can be clearly regarded as an impersonation attempt. Besides, what gave away this attempt are the signatures it has: Both keys are signed by what appears to be the same three keys: B29B232A, F2C850CA and 789038F2. Those three keys are not (yet?) uploaded to the keyservers, though... But we can expect them to appear at any point in the future. We don't know who is behind this, or what his purpose is. We just know this looks very evil. Now, don't panic: Gustavo's key is safe. Same for his certifiers, Marga, Agustín and Maxy. It's just a 32-bit collision. So, in principle, the only parties that could be cheated to trust the attacker are humans, right? Nope. Enrico tested on the PGP pathfinder & key statistics service, a keyserver that finds trust paths between any two arbitrary keys in the strong set. Surprise: The pathfinder works on the short key IDs, even when supplied full fingerprints. So, it turns out I have three faked trust paths into our impostor. ]]> pgp gpg keys collisions hashing security debian https://pinboard.in/ https://pinboard.in/u:jm/b:67ea6e3fe421/ Rendezvous hashing - Wikipedia, the free encyclopedia 2016-04-13T14:01:11+00:00 https://en.m.wikipedia.org/wiki/Rendezvous_hashing jm Rendezvous or Highest Random Weight (HRW) hashing[1][2] is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are to assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method. ]]> hrw hashing hashes consistent-hashing rendezvous-hashing algorithms discovery distributed-computing https://pinboard.in/ https://pinboard.in/u:jm/b:8be4d585c6d4/ BLAKE2: simpler, smaller, fast as MD5 2016-04-07T22:51:11+00:00 https://blake2.net/blake2.pdf jm crypto hash blake2 hashing blake algorithms sha1 sha3 simd performance mac https://pinboard.in/ https://pinboard.in/u:jm/b:33cb0a51f577/ The general birthday problem 2016-02-01T11:03:25+00:00 http://www.johndcook.com/blog/2016/01/30/general-birthday-problem/ jm hashing hashes collisions birthday-problem birthday-paradox coding probability statistics https://pinboard.in/ https://pinboard.in/u:jm/b:5e19813a6fb5/ AV vendors still relying on MD5 to identify malware 2015-06-10T15:07:42+00:00 http://blog.silentsignal.eu/2015/06/10/poisonous-md5-wolves-among-the-sheep/ jm md5 hashing antivirus malware security via:fanf bugs https://pinboard.in/ https://pinboard.in/u:jm/b:11ef4e54eeb8/ Trend Micro Locality Sensitive Hash 2015-05-18T12:59:31+00:00 https://github.com/trendmicro/tlsh jma fuzzy matching library. Given a byte stream with a minimum length of 512 bytes, TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value. Paper here: https://drive.google.com/file/d/0B6FS3SVQ1i0GTXk5eDl3Y29QWlk/edit via adulau]]> nilsimsa sdhash ssdeep locality-sensitive hashing algorithm hashes trend-micro tlsh hash fuzzy-matching via:adulau https://pinboard.in/ https://pinboard.in/u:jm/b:35798e024e53/ "Cuckoo Filter: Practically Better Than Bloom" 2015-03-09T14:29:55+00:00 http://www.pdl.cmu.edu/PDL-FTP/FS/cuckoo-conext2014.pdf jm algorithms paper bloom-filters cuckoo-filters cuckoo-hashing data-structures false-positives big-data probabilistic hashing set-membership approximation https://pinboard.in/ https://pinboard.in/u:jm/b:a7df31b55f43/ What's the probability of a hash collision? 2014-11-18T11:50:47+00:00 http://davidjohnstone.net/pages/hash-collision-probability jm probability hashing hashes collision risk md5 sha sha1 calculators https://pinboard.in/ https://pinboard.in/u:jm/b:7941face31b6/ How I created two images with the same MD5 hash 2014-11-04T18:14:08+00:00 http://natmchugh.blogspot.co.uk/2014/10/how-i-created-two-images-with-same-md5.html jmI found that I was able to run the algorithm in about 10 hours on an AWS large GPU instance bringing it in at about $0.65 plus tax. Bottom line: MD5 is feasibly attackable by pretty much anyone now.]]> crypto images md5 security hashing collisions ec2 via:hn https://pinboard.in/ https://pinboard.in/u:jm/b:3b301b6423b9/ 3 Rules of thumb for Bloom Filters 2014-08-25T21:06:48+00:00 http://corte.si/%2Fposts/code/bloom-filter-rules-of-thumb/index.html jmI often need to do rough back-of-the-envelope reasoning about things, and I find that doing a bit of work to develop an intuition for how a new technique performs is usually worthwhile. So, here are three broad rules of thumb to remember when discussing Bloom filters down the pub: One byte per item in the input set gives about a 2% false positive rate. The optimal number of hash functions is about 0.7 times the number of bits per item. 3 - The number of hashes dominates performance. But see also http://stackoverflow.com/a/9554448 , http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf (thanks Tony Finch!) ]]> bloom-filters algorithm probabilistic rules reasoning via:norman-maurer false-positives hashing coding https://pinboard.in/ https://pinboard.in/u:jm/b:b369d6a01322/ MinHash for dummies 2014-08-05T10:28:58+00:00 http://matthewcasperson.blogspot.ie/2013/11/minhash-for-dummies.html jm shingling algorithms minhash hashing duplicates duplicate-detection fuzzy-matching java https://pinboard.in/ https://pinboard.in/u:jm/b:37541529ed34/ NYC generates hash-anonymised data dump, which gets reversed 2014-06-25T15:36:55+00:00 https://medium.com/@vijayp/f6bc289679a1 jmThere are about 1000*26**3 = 21952000 or 22M possible medallion numbers. So, by calculating the md5 hashes of all these numbers (only 24M!), one can completely deanonymise the entire data. Modern computers are fast: so fast that computing the 24M hashes took less than 2 minutes. (via Bruce Schneier) The better fix is a HMAC (see http://benlog.com/2008/06/19/dont-hash-secrets/ ), or just to assign opaque IDs instead of hashing.]]> hashing sha1 md5 bruce-schneier anonymization deanonymization security new-york nyc taxis data big-data hmac keyed-hashing salting https://pinboard.in/ https://pinboard.in/u:jm/b:86f2bc539afe/ Jump Consistent Hash: A Fast, Minimal Memory, Consistent Hash Algorithm 2014-06-17T14:19:13+00:00 http://arxiv.org/pdf/1406.2294v1.pdf jm hashing consistent-hashing google guava memory algorithms sharding https://pinboard.in/ https://pinboard.in/u:jm/b:7990efcb5b77/ Shuffle Sharding 2014-04-15T10:59:26+00:00 http://www.awsarchitectureblog.com/2014/04/shuffle-sharding.html jm hashing load-balancing sharding partitions dist-sys distcomp architecture coding https://pinboard.in/ https://pinboard.in/u:jm/b:22d91731447c/ Redis adds support for HyperLogLog 2014-04-02T10:38:41+00:00 https://news.ycombinator.com/item?id=7506774 jm hll bloom-filters hyperloglog redis data-structures estimation cardinality probabilistic probability hashing random https://pinboard.in/ https://pinboard.in/u:jm/b:1231febb74e0/ _An Improved Construction For Counting Bloom Filters_ 2013-09-18T21:43:29+00:00 http://www.eecs.harvard.edu/~michaelm/postscripts/esa2006b.pdf jm bloom-filter data-structures algorithms counting cbf storage false-positives d-left-hashing hashing https://pinboard.in/ https://pinboard.in/u:jm/b:77b7dfebb1ae/ Recordinality 2013-08-20T20:41:05+00:00 https://github.com/cscotta/recordinality jmRecordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?" ]]> sketching coding algorithms recordinality cardinality estimation hll hashing murmurhash java https://pinboard.in/ https://pinboard.in/u:jm/b:56d75229aca1/ Lectures in Advanced Data Structures (6.851) 2013-04-29T10:32:24+00:00 http://courses.csail.mit.edu/6.851/spring12/lectures/ jmData structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures: TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible. GEOMETRY When data has more than one dimension (e.g. maps, database tables). DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close. MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache. HASHING Hashing is the most used data structure in computer science. And it's still an active area of research. INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible. DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes. STRINGS Searching for phrases in giant text (think Google or DNA). SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler). (via Tim Freeman)]]> data-structures lectures mit video data algorithms coding csail strings integers hashing sorting bst memory https://pinboard.in/ https://pinboard.in/u:jm/b:5c72d87f4ea4/ Abusing hash kernels for wildly unprincipled machine learning 2013-04-04T23:01:51+00:00 http://jeremydhoon.github.com/2013/03/19/abusing-hash-kernels-for-wildly-unprincipled-machine-learning/ jm ai machine-learning python data hashing features feature-selection anti-spam spamassassin https://pinboard.in/ https://pinboard.in/u:jm/b:28d641a0b96e/ java - Given that HashMaps in jdk1.6 and above cause problems with multi-threading, how should I fix my code - Stack Overflow 2013-02-01T11:49:23+00:00 http://stackoverflow.com/questions/14010906/given-that-hashmaps-in-jdk1-6-and-above-cause-problems-with-multi-threading-how jm java hashmap concurrency bugs fail security hashing jdk via:cscotta https://pinboard.in/ https://pinboard.in/u:jm/b:8b7f56ad583d/ fail0verflow :: 2013-01-23T09:37:29+00:00 http://fail0verflow.com/blog/2013/megafail.html jm crypto hashing security cbc mac sha1 aes https://pinboard.in/u:jm/b:dd79c7b9bdc3/ SipHash: a fast short-input PRF 2012-10-28T21:33:51+00:00 https://www.131002.net/siphash/ jm hashing siphash djb security algorithms https://pinboard.in/ https://pinboard.in/u:jm/b:ed75c7d5a6ba/ experimental CPU-cache-aware hash table implementations in Cloudera's Impala 2012-10-24T16:38:11+00:00 https://github.com/cloudera/impala/blob/master/be/src/experiments/hashing/cache-hash-table.h jm hashing hash-tables data-structures performance c++ l1 cache cpu https://pinboard.in/ https://pinboard.in/u:jm/b:1122a37fd9d7/ Avoiding Hash Lookups in a Ruby Implementation 2012-09-05T09:13:05+00:00 http://blog.headius.com/2012/09/avoiding-hash-lookups-in-ruby.html jm via:declanmcgrath hash optimization ruby performance jruby hashing data-structures big-o optimisation https://pinboard.in/ https://pinboard.in/u:jm/b:f9de450427ec/ Analyzing Flame's MD5 Collision Attack [slides, PDF] 2012-06-11T23:36:36+00:00 http://www.trailofbits.com/resources/flame-md5.pdf jm via:fanf flame security malware md5 collisions hashing pki tls ssl microsoft https://pinboard.in/ https://pinboard.in/u:jm/b:1e484697f020/ feedback loop n-gram analyzer 2011-09-29T21:10:15+00:00 http://petermblair.com/fbl-n-gram-analyzer/ jm anti-spam spam fbl feedback filtering n-grams similarity hashing redis searching https://pinboard.in/ https://pinboard.in/u:jm/b:00bea3b79665/ Dr. Neal Krawetz explains perceptual hashing 2011-06-07T22:42:12+00:00 http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html jm algorithm images analysis programming dct hashing perceptual-hash tineye via:hn image https://pinboard.in/ https://pinboard.in/u:jm/b:f0804de861e3/ deeptoad - Project Hosting on Google Code 2010-11-30T23:29:17+00:00 http://code.google.com/p/deeptoad/ jm via:nelson deeptoad software open-source fuzzy hashing https://pinboard.in/u:jm/b:4b09934a1883/ 3 Rules of thumb for Bloom Filters 2010-11-09T00:08:21+00:00 http://corte.si/posts/code/bloom-filter-rules-of-thumb/index.html jm via:jzawodny bloom-filters hashing algorithms coding tips false-positives https://pinboard.in/u:jm/b:a6801bafe8ec/ Stop using unsafe keyed hashes, use HMAC 2009-10-30T22:23:02+00:00 http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-hashes-use-hmac/ jm hmac security crypto hashing md5 hashes sha256 sha1 https://pinboard.in/u:jm/b:e18fe54cec21/