Pinboard (jm)

Pinboard (jm) https://pinboard.in/u:jm/public/ recent bookmarks from jm Questioning an Interface: From Parquet to Vortex 2025-11-27T11:22:53+00:00 https://www.polarsignals.com/blog/posts/2025/11/25/interface-parquet-vortex jmLike Parquet, Vortex minimizes bytes on disk. However, Vortex is also designed with a core use-case in mind: decoding and querying data directly from object storage on GPUs. This key idea translates very well to our use-case even though we don’t run our queries on GPUs (yet?). Specifically, the file format is designed to maximize throughput and parallelism from the metadata format to the SIMD/SIMT friendly encodings used. Crucially, it also acknowledges that part of making queries fast is not only good filter pushdown, but also general-purpose compute pushdown. If anything cannot be pushed down, Vortex’s encodings can be tuned to offer zero-copy conversion to Arrow for further query execution using any general-purpose query execution engine. Vortex also learns from Parquet’s limitations around extensibility and aims to be as future-proof as possible. New encodings can ship with WASM decoders so encoding adoption is not limited by reader libraries having to implement support. The main Rust library is also designed to be fully extensible, so you can write your own layouts/encodings and plug them in as first-class citizens. Given how well Vortex’s design matched our needs, we tried it out and got a 70% average performance improvement on all our queries. With the newer encodings that Vortex offers, we got 10% better uncompressed storage size and only 3% larger compressed storage size compared to snappy-compressed Parquet. ]]> gpu vortex parquet compression storage file-formats files pushdown simd https://pinboard.in/ https://pinboard.in/u:jm/b:45f10d084d4b/ simdjson/simdjson-java 2023-10-09T08:06:01+00:00 https://github.com/simdjson/simdjson-java jm simd java json parsing formats performance libraries https://pinboard.in/ https://pinboard.in/u:jm/b:587e2c3aab0d/ tolower() in bulk at speed 2022-06-28T15:29:01+00:00 https://dotat.at/@/2022-06-27-tolower-swar.html jm c optimization performance hacks tolower swar simd https://pinboard.in/ https://pinboard.in/u:jm/b:86b30b730c54/ Vectorized and performance-portable Quicksort 2022-06-04T16:36:28+00:00 https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html jm algorithms sorting quicksort vectorization simd avx512 avx2 https://pinboard.in/ https://pinboard.in/u:jm/b:e1fc38aa3ff5/ SWAR indexOf byte search 2022-01-24T10:08:22+00:00 https://github.com/netty/netty/pull/10737/files jm simd swar indexof bytebuffer java optimization performance search netty hacks https://pinboard.in/ https://pinboard.in/u:jm/b:b1bb44f3ef08/ SWAR algorithm to count characters in a UTF-8 string 2021-11-29T10:18:31+00:00 https://github.com/WojciechMula/toys/tree/master/swar-utf8-length jm simd swar hacks performance optimization coding utf-8 https://pinboard.in/ https://pinboard.in/u:jm/b:f56d6e491cf3/ SIMD is coming to the JVM 2021-09-13T11:31:58+00:00 https://www.morling.dev/blog/fizzbuzz-simd-style/ jm simd performance java vectorization aarch64 x64 https://pinboard.in/ https://pinboard.in/u:jm/b:7e62e6169542/ Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs 2019-03-01T13:43:00+00:00 https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/ jma software based, large-scale regex matcher designed to match multiple patterns at once (up to tens of thousands of patterns at once) and to ‘stream‘ (that is, match patterns across many different ‘stream writes’ without holding on to all the data you’ve ever seen). To my knowledge this makes it unique. RE2 is software based but doesn’t scale to large numbers of patterns; nor does it stream (although it could). It occupies a fundamentally different niche to Hyperscan; we compared the performance of RE2::Set (the RE2 multiple pattern interface) to Hyperscan a while back. Most back-tracking matchers (such as libpcre) are one pattern at a time and are inherently incapable of streaming, due to their requirement to backtrack into arbitrary amounts of old input. ]]> regex regular-expressions algorithms hyperscan sensory-networks regexps simd nfa https://pinboard.in/ https://pinboard.in/u:jm/b:d79bceb1eeef/ simdjson 2019-02-22T21:34:53+00:00 https://github.com/lemire/simdjson/blob/master/README.md jm fast json parsing speed simd avx c++ algorithms hacks daniel-lemire https://pinboard.in/ https://pinboard.in/u:jm/b:18aa6824082c/ BLAKE2: simpler, smaller, fast as MD5 2016-04-07T22:51:11+00:00 https://blake2.net/blake2.pdf jm crypto hash blake2 hashing blake algorithms sha1 sha3 simd performance mac https://pinboard.in/ https://pinboard.in/u:jm/b:33cb0a51f577/ Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com 2013-01-27T22:46:47+00:00 http://www.strchr.com/strcmp_and_strlen_using_sse_4.2 jm sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding https://pinboard.in/ https://pinboard.in/u:jm/b:dc8ab7793636/ Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs [PDF] 2010-06-29T10:25:07+00:00 http://www.vldb.org/pvldb/2/vldb09-257.pdf jm sort merge hash join databases performance cpu simd multicore https://pinboard.in/u:jm/b:e3ed61671f24/