<?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (jm)</title>
    <link>https://pinboard.in/u:jm/public/</link>
    <description>recent bookmarks from jm</description>
    <items>
      <rdf:Seq>	<rdf:li rdf:resource="https://www.polarsignals.com/blog/posts/2025/11/25/interface-parquet-vortex"/>
	<rdf:li rdf:resource="https://github.com/simdjson/simdjson-java"/>
	<rdf:li rdf:resource="https://dotat.at/@/2022-06-27-tolower-swar.html"/>
	<rdf:li rdf:resource="https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html"/>
	<rdf:li rdf:resource="https://github.com/netty/netty/pull/10737/files"/>
	<rdf:li rdf:resource="https://github.com/WojciechMula/toys/tree/master/swar-utf8-length"/>
	<rdf:li rdf:resource="https://www.morling.dev/blog/fizzbuzz-simd-style/"/>
	<rdf:li rdf:resource="https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/"/>
	<rdf:li rdf:resource="https://github.com/lemire/simdjson/blob/master/README.md"/>
	<rdf:li rdf:resource="https://blake2.net/blake2.pdf"/>
	<rdf:li rdf:resource="http://www.strchr.com/strcmp_and_strlen_using_sse_4.2"/>
	<rdf:li rdf:resource="http://www.vldb.org/pvldb/2/vldb09-257.pdf"/>
      </rdf:Seq>
    </items>
  </channel><item rdf:about="https://www.polarsignals.com/blog/posts/2025/11/25/interface-parquet-vortex">
    <title>Questioning an Interface: From Parquet to Vortex</title>
    <dc:date>2025-11-27T11:22:53+00:00</dc:date>
    <link>https://www.polarsignals.com/blog/posts/2025/11/25/interface-parquet-vortex</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Interesting -- a new, GPU-optimised storage format:

<blockquote>Like Parquet, Vortex minimizes bytes on disk. However, Vortex is also designed with a core use-case in mind: decoding and querying data directly from object storage on GPUs. This key idea translates very well to our use-case even though we don’t run our queries on GPUs (yet?). Specifically, the file format is designed to maximize throughput and parallelism from the metadata format to the SIMD/SIMT friendly encodings used.

Crucially, it also acknowledges that part of making queries fast is not only good filter pushdown, but also general-purpose compute pushdown. If anything cannot be pushed down, Vortex’s encodings can be tuned to offer zero-copy conversion to Arrow for further query execution using any general-purpose query execution engine.

Vortex also learns from Parquet’s limitations around extensibility and aims to be as future-proof as possible. New encodings can ship with WASM decoders so encoding adoption is not limited by reader libraries having to implement support. The main Rust library is also designed to be fully extensible, so you can write your own layouts/encodings and plug them in as first-class citizens.

Given how well Vortex’s design matched our needs, we tried it out and got a 70% average performance improvement on all our queries. With the newer encodings that Vortex offers, we got 10% better uncompressed storage size and only 3% larger compressed storage size compared to snappy-compressed Parquet.</blockquote>

]]></description>
<dc:subject>gpu vortex parquet compression storage file-formats files pushdown simd</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:45f10d084d4b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:gpu"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:vortex"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:parquet"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:compression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:storage"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:file-formats"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:files"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:pushdown"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/simdjson/simdjson-java">
    <title>simdjson/simdjson-java</title>
    <dc:date>2023-10-09T08:06:01+00:00</dc:date>
    <link>https://github.com/simdjson/simdjson-java</link>
    <dc:creator>jm</dc:creator><description><![CDATA["A Java version of simdjson" -- Java parsing using SIMD instructions to parse gigabytes of JSON per second.  Early days, requires Java 20, and only covers a small number of architectures, but it's getting there]]></description>
<dc:subject>simd java json parsing formats performance libraries</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:587e2c3aab0d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:java"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:json"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:parsing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:formats"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:libraries"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://dotat.at/@/2022-06-27-tolower-swar.html">
    <title>tolower() in bulk at speed</title>
    <dc:date>2022-06-28T15:29:01+00:00</dc:date>
    <link>https://dotat.at/@/2022-06-27-tolower-swar.html</link>
    <dc:creator>jm</dc:creator><description><![CDATA[tolower() using SWAR (SIMD within a register) techniques -- nice hacks from Tony Finch

]]></description>
<dc:subject>c optimization performance hacks tolower swar simd</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:86b30b730c54/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:c"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hacks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:tolower"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:swar"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html">
    <title>Vectorized and performance-portable Quicksort</title>
    <dc:date>2022-06-04T16:36:28+00:00</dc:date>
    <link>https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html</link>
    <dc:creator>jm</dc:creator><description><![CDATA[This is a super-cool building block from Google Open Source:

"We've created the first vectorized Quicksort:

- Sorts arrays of numbers ~10x as fast as C++ std:sort

- Outperforms state-of-the-art specific algorithms

- Is portable across all modern CPU architectures

We are interested to see what new applications and capabilities will be unlocked by being able to sort at 1 GB/s on a single CPU core."

Part of their Highway library of vectorized code, https://github.com/google/highway , "a C++ library that provides portable SIMD/vector intrinsics."  Low-level, hyperoptimized libs like this will be very important to ameliorate climate change impact of datacenter usage, so it's a great idea.]]></description>
<dc:subject>algorithms sorting quicksort vectorization simd avx512 avx2</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:e1fc38aa3ff5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sorting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:quicksort"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:vectorization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:avx512"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:avx2"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/netty/netty/pull/10737/files">
    <title>SWAR indexOf byte search</title>
    <dc:date>2022-01-24T10:08:22+00:00</dc:date>
    <link>https://github.com/netty/netty/pull/10737/files</link>
    <dc:creator>jm</dc:creator><description><![CDATA[SIMD-Within-A-Register implementation of ByteBuf.indexOf() in Java, used in Netty. nice performance optimization technique]]></description>
<dc:subject>simd swar indexof bytebuffer java optimization performance search netty hacks</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:b1bb44f3ef08/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:swar"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:indexof"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:bytebuffer"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:java"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:netty"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hacks"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/WojciechMula/toys/tree/master/swar-utf8-length">
    <title>SWAR algorithm to count characters in a UTF-8 string</title>
    <dc:date>2021-11-29T10:18:31+00:00</dc:date>
    <link>https://github.com/WojciechMula/toys/tree/master/swar-utf8-length</link>
    <dc:creator>jm</dc:creator><description><![CDATA[I'm enjoying this world of SIMD hyperoptimization -- "SWAR" in this case refers to "SIMD within a register" -- performing SIMD parallel operations on data contained in a single processor register.]]></description>
<dc:subject>simd swar hacks performance optimization coding utf-8</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:f56d6e491cf3/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:swar"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hacks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:utf-8"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://www.morling.dev/blog/fizzbuzz-simd-style/">
    <title>SIMD is coming to the JVM</title>
    <dc:date>2021-09-13T11:31:58+00:00</dc:date>
    <link>https://www.morling.dev/blog/fizzbuzz-simd-style/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[as of Java 16, incoming, there's a new IntVector, LongVector, ... et al set of classes to implement CPU-level vectorization instructions on x64 and AArch64 architectures ]]></description>
<dc:subject>simd performance java vectorization aarch64 x64</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:7e62e6169542/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:java"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:vectorization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:aarch64"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:x64"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/">
    <title>Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs</title>
    <dc:date>2019-03-01T13:43:00+00:00</dc:date>
    <link>https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>a software based, large-scale regex matcher designed to match multiple patterns at once (up to tens of thousands of patterns at once) and to ‘stream‘ (that is, match patterns across many different ‘stream writes’ without holding on to all the data you’ve ever seen). To my knowledge this makes it unique.

RE2 is software based but doesn’t scale to large numbers of patterns; nor does it stream (although it could). It occupies a fundamentally different niche to Hyperscan; we compared the performance of RE2::Set (the RE2 multiple pattern interface) to Hyperscan a while back.

Most back-tracking matchers (such as libpcre) are one pattern at a time and are inherently incapable of streaming, due to their requirement to backtrack into arbitrary amounts of old input.</blockquote>

]]></description>
<dc:subject>regex regular-expressions algorithms hyperscan sensory-networks regexps simd nfa</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:d79bceb1eeef/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regex"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regular-expressions"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hyperscan"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sensory-networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regexps"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:nfa"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/lemire/simdjson/blob/master/README.md">
    <title>simdjson</title>
    <dc:date>2019-02-22T21:34:53+00:00</dc:date>
    <link>https://github.com/lemire/simdjson/blob/master/README.md</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Daniel Lemire's latest cool hack -- a SIMD library to parse gigabytes of JSON document per second]]></description>
<dc:subject>fast json parsing speed simd avx c++ algorithms hacks daniel-lemire</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:18aa6824082c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:fast"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:json"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:parsing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:speed"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:avx"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:c++"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hacks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:daniel-lemire"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://blake2.net/blake2.pdf">
    <title>BLAKE2: simpler, smaller, fast as MD5</title>
    <dc:date>2016-04-07T22:51:11+00:00</dc:date>
    <link>https://blake2.net/blake2.pdf</link>
    <dc:creator>jm</dc:creator><description><![CDATA['We present the cryptographic hash function BLAKE2, an improved version
of the SHA-3 finalist BLAKE optimized for speed in software. Target applications include
cloud storage, intrusion detection, or version control systems. BLAKE2 comes
in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for
smaller architectures. On 64-bit platforms, BLAKE2 is often faster than MD5, yet provides
security similar to that of SHA-3. We specify parallel versions BLAKE2bp and
BLAKE2sp that are up to 4 and 8 times faster, by taking advantage of SIMD and/or
multiple cores. BLAKE2 has more benefits than just speed: BLAKE2 uses up to 32%
less RAM than BLAKE, and comes with a comprehensive tree-hashing mode as well
as an efficient MAC mode.']]></description>
<dc:subject>crypto hash blake2 hashing blake algorithms sha1 sha3 simd performance mac</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:33cb0a51f577/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:crypto"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hash"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:blake2"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hashing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:blake"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sha1"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sha3"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:mac"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.strchr.com/strcmp_and_strlen_using_sse_4.2">
    <title>Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com</title>
    <dc:date>2013-01-27T22:46:47+00:00</dc:date>
    <link>http://www.strchr.com/strcmp_and_strlen_using_sse_4.2</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Using new Intel Core i7 instructions to speed up string manipulation. Fascinating stuff. SSE ftw]]></description>
<dc:subject>sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:dc8ab7793636/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sse"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:assembly"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:intel"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:i7"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:intel-core"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strstr"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strchr"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strlen"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.vldb.org/pvldb/2/vldb09-257.pdf">
    <title>Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs [PDF]</title>
    <dc:date>2010-06-29T10:25:07+00:00</dc:date>
    <link>http://www.vldb.org/pvldb/2/vldb09-257.pdf</link>
    <dc:creator>jm</dc:creator><description><![CDATA[sort-and-merge is likely to be faster on future SIMD-capable multicore CPUs RSN]]></description>
<dc:subject>sort merge hash join databases performance cpu simd multicore</dc:subject>
<dc:identifier>https://pinboard.in/u:jm/b:e3ed61671f24/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sort"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:merge"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hash"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:join"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:databases"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:cpu"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:multicore"/>
</rdf:Bag></taxo:topics>
</item>
</rdf:RDF>