Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmA Branchless UTF-8 Decoder2017-10-09T10:27:09+00:00
http://nullprogram.com/blog/2017/10/06/
jmThis week I took a crack at writing a branchless UTF-8 decoder: a function that decodes a single UTF-8 code point from a byte stream without any if statements, loops, short-circuit operators, or other sorts of conditional jumps. [...] Why branchless? Because high performance CPUs are pipelined. That is, a single instruction is executed over a series of stages, and many instructions are executed in overlapping time intervals, each at a different stage.
Neat hack (via Tony Finch)]]>algorithms optimization unicode utf8 branchless coding c via:fanfhttps://pinboard.in/https://pinboard.in/u:jm/b:e6a102583433/A dive into a UTF-8 validation regexp2014-06-18T09:22:13+00:00
https://blog.jcoglan.com/2014/06/17/utf-8-its-what-strings-are-made-of/
jmOnce again, I find myself checking over the UTF-8 validation code in websocket-driver, and once again I find I cannot ever remember how to make sense of this regex that performs the validation. I just copied it off a webpage once and it took a while (and reimplementing UTF-8 myself) to fully understand what it does. If you write software that processes text, you’ll probably need to understand this too.
]]>utf-8 unicode utf8 javascript node encoding text strings validation websockets regular-expressions regexpshttps://pinboard.in/https://pinboard.in/u:jm/b:a7465566d88d/Vlnt2009-11-04T23:16:01+00:00
http://lucene.apache.org/java/2_4_0/fileformats.html#VInt
jmutf8 compression utf lucene avro hadoop java fomats numerichttps://pinboard.in/u:jm/b:886f6e667a47/