<?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (jm)</title>
    <link>https://pinboard.in/u:jm/public/</link>
    <description>recent bookmarks from jm</description>
    <items>
      <rdf:Seq>	<rdf:li rdf:resource="https://shipilev.net/jvm-anatomy-park/10-string-intern/"/>
	<rdf:li rdf:resource="https://news.ycombinator.com/item?id=11535701"/>
	<rdf:li rdf:resource="http://left-pad.io/"/>
	<rdf:li rdf:resource="https://github.com/01org/hyperscan"/>
	<rdf:li rdf:resource="https://github.com/minimaxir/big-list-of-naughty-strings"/>
	<rdf:li rdf:resource="http://lcamtuf.blogspot.ie/2014/10/psa-dont-run-strings-on-untrusted-files.html"/>
	<rdf:li rdf:resource="https://github.com/cloudflare/lua-aho-corasick"/>
	<rdf:li rdf:resource="https://blog.jcoglan.com/2014/06/17/utf-8-its-what-strings-are-made-of/"/>
	<rdf:li rdf:resource="http://blog.phusion.nl/2010/12/06/efficient-substring-searching/"/>
	<rdf:li rdf:resource="http://www.mlsec.org/harry/"/>
	<rdf:li rdf:resource="http://courses.csail.mit.edu/6.851/spring12/lectures/"/>
	<rdf:li rdf:resource="http://isabel-drost.de/hadoop/slides/simon_lucene_2011.pdf"/>
	<rdf:li rdf:resource="http://www.strchr.com/strcmp_and_strlen_using_sse_4.2"/>
	<rdf:li rdf:resource="http://arxiv.org/pdf/1209.6449.pdf"/>
	<rdf:li rdf:resource="http://news.ycombinator.com/item?id=2208760"/>
      </rdf:Seq>
    </items>
  </channel><item rdf:about="https://shipilev.net/jvm-anatomy-park/10-string-intern/">
    <title>don't use String.intern() in Java</title>
    <dc:date>2017-05-15T10:23:27+00:00</dc:date>
    <link>https://shipilev.net/jvm-anatomy-park/10-string-intern/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>String.intern is the gateway to native JVM String table, and it comes with caveats: throughput, memory footprint, pause time problems will await the users. Hand-rolled deduplicators/interners to reduce memory footprint are working much more reliably, because they are working on Java side, and also can be thrown away when done. GC-assisted String deduplication does alleviate things even more. In almost every project we were taking care of, removing String.intern from the hotpaths was the very profitable performance optimization. Do not use it without thinking, okay?</blockquote>

]]></description>
<dc:subject>strings interning java performance tips</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:936351988ed0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:interning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:java"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:tips"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://news.ycombinator.com/item?id=11535701">
    <title>Building a Regex Search Engine for DNA | Hacker News</title>
    <dc:date>2016-04-21T13:03:58+00:00</dc:date>
    <link>https://news.ycombinator.com/item?id=11535701</link>
    <dc:creator>jm</dc:creator><description><![CDATA[The original post is pretty mediocre -- a search engine which handles a corpus of "thousands" of plasmids from "a scientist's personal library", and which doesn't handle fuzzy matches? I think that's called grep -- but the HN comments are good]]></description>
<dc:subject>grep regular-expressions hacker-news strings dna genomics search elasticsearch</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:c1dfd325084f/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:grep"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regular-expressions"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hacker-news"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:dna"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:genomics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:elasticsearch"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://left-pad.io/">
    <title>left-pad.io</title>
    <dc:date>2016-03-24T12:07:15+00:00</dc:date>
    <link>http://left-pad.io/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>A microservice saviour appears!
In order to prevent such a terrible tragedy from occurring ever again during
our lifetimes, `left-pad.io` has been created to provide all the functionality
of `left-pad` AND the overhead of a TLS handshake and an HTTP request.
Less code is better code, leave the heavy lifting to `left-pad.io`, The String
Experts™.</blockquote>

]]></description>
<dc:subject>humor javascript jokes npm packages left-pad strings microservices http</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:63260b4afafc/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:humor"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:javascript"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:jokes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:npm"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:packages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:left-pad"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:microservices"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:http"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/01org/hyperscan">
    <title>Hyperscan</title>
    <dc:date>2015-10-21T14:33:51+00:00</dc:date>
    <link>https://github.com/01org/hyperscan</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.</blockquote>

Via Tony Finch]]></description>
<dc:subject>via:fanf regexps regex dpi hyperscan dfa nfa hybrid-automata text-matching matching text strings streams</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:f41962a90f1b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:via:fanf"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regexps"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regex"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:dpi"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hyperscan"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:dfa"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:nfa"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hybrid-automata"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:text-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:text"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:streams"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/minimaxir/big-list-of-naughty-strings">
    <title>minimaxir/big-list-of-naughty-strings</title>
    <dc:date>2015-08-16T21:23:59+00:00</dc:date>
    <link>https://github.com/minimaxir/big-list-of-naughty-strings</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Late to this one -- a nice list of bad input (Unicode zero-width spaces, etc) for testing]]></description>
<dc:subject>testing strings text data unicode utf-8 tests input corrupt</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:c50633506c43/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:testing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:text"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:data"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:unicode"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:utf-8"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:tests"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:input"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:corrupt"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://lcamtuf.blogspot.ie/2014/10/psa-dont-run-strings-on-untrusted-files.html">
    <title>PSA: don't run 'strings' on untrusted files (CVE-2014-8485)</title>
    <dc:date>2014-10-27T22:15:46+00:00</dc:date>
    <link>http://lcamtuf.blogspot.ie/2014/10/psa-dont-run-strings-on-untrusted-files.html</link>
    <dc:creator>jm</dc:creator><description><![CDATA[ffs.<blockquote>Perhaps simply by the virtue of being a part of that bundle, the strings utility tries to leverage the common libbfd infrastructure to detect supported executable formats and "optimize" the process by extracting text only from specific sections of the file. Unfortunately, the underlying library can be hardly described as safe: a quick pass with afl (and probably with any other competent fuzzer) quickly reveals a range of troubling and likely exploitable out-of-bounds crashes due to very limited range checking</blockquote>

]]></description>
<dc:subject>strings libbfd gnu security fuzzing buffer-overflows</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:f30d5b5352c5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:libbfd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:gnu"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:security"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:fuzzing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:buffer-overflows"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/cloudflare/lua-aho-corasick">
    <title>cloudflare/lua-aho-corasick</title>
    <dc:date>2014-08-29T21:55:32+00:00</dc:date>
    <link>https://github.com/cloudflare/lua-aho-corasick</link>
    <dc:creator>jm</dc:creator><description><![CDATA[A nice Lua/C++ implementation of Aho-Corasick for fast string matching against multiple patterns (via JGC).  This uses an interesting technique to get better performance by compacting the data structure into a single buffer, to avoid following pointers all over RAM and busting the cache.]]></description>
<dc:subject>optimization speed performance aho-corasick tries string-matching strings algorithms lua c++ via:jgc</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:35ea65a2e9a1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:speed"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:performance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:aho-corasick"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:tries"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:lua"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:c++"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:via:jgc"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://blog.jcoglan.com/2014/06/17/utf-8-its-what-strings-are-made-of/">
    <title>A dive into a UTF-8 validation regexp</title>
    <dc:date>2014-06-18T09:22:13+00:00</dc:date>
    <link>https://blog.jcoglan.com/2014/06/17/utf-8-its-what-strings-are-made-of/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>Once again, I find myself checking over the UTF-8 validation code in websocket-driver, and once again I find I cannot ever remember how to make sense of this regex that performs the validation. I just copied it off a webpage once and it took a while (and reimplementing UTF-8 myself) to fully understand what it does. If you write software that processes text, you’ll probably need to understand this too.</blockquote>

]]></description>
<dc:subject>utf-8 unicode utf8 javascript node encoding text strings validation websockets regular-expressions regexps</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:a7465566d88d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:utf-8"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:unicode"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:utf8"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:javascript"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:node"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:encoding"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:text"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:validation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:websockets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regular-expressions"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:regexps"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://blog.phusion.nl/2010/12/06/efficient-substring-searching/">
    <title>Efficient substring searching</title>
    <dc:date>2014-03-31T13:44:45+00:00</dc:date>
    <link>http://blog.phusion.nl/2010/12/06/efficient-substring-searching/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[This is a couple of years old, but I like this:

<blockquote>Turbo Boyer-Moore is disappointing, its name doesn’t do it justice. In academia constant overhead doesn’t matter, but here we see that it matters a lot in practice. Turbo Boyer-Moore’s inner loop is so complex that we think we’re better off using the original Boyer-Moore.</blockquote>

A good demo of how large values of O(n) can be slower than small values of O(mn).]]></description>
<dc:subject>algorithms search strings coding big-o string-search searching</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:cad2a9fdecec/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:big-o"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:searching"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.mlsec.org/harry/">
    <title>Harry - A Tool for Measuring String Similarity</title>
    <dc:date>2014-01-20T15:43:05+00:00</dc:date>
    <link>http://www.mlsec.org/harry/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[<blockquote>a small tool for comparing strings and measuring their similarity. The tool supports several common distance and kernel functions for strings as well as some exotic similarity measures. The focus of Harry lies on implicit similarity measures, that is, comparison functions that do not give rise to an explicit vector space. Examples of such similarity measures are the Levenshtein distance and the Jaro-Winkler distance.
For comparison Harry loads a set of strings from input, computes the specified similarity measure and writes a matrix of similarity values to output. The similarity measure can be computed based on the granularity of characters as well as words contained in the strings. The configuration of this process, such as the input format, the similarity measure and the output format, are specified in a configuration file and can be additionally refined using command-line options.
Harry is implemented using OpenMP, such that the computation time for a set of strings scales linear with the number of available CPU cores. Moreover, efficient implementations of several similarity measures, effective caching of similarity values and low-overhead locking further speedup the computation.</blockquote>

via kragen.]]></description>
<dc:subject>via:kragen strings similarity levenshtein-distance algorithms openmp jaro-winkler edit-distance cli commandline hamming-distance compression</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:7f75587e4bd7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:via:kragen"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:similarity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:levenshtein-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:openmp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:jaro-winkler"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:cli"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:commandline"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hamming-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:compression"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://courses.csail.mit.edu/6.851/spring12/lectures/">
    <title>Lectures in Advanced Data Structures (6.851)</title>
    <dc:date>2013-04-29T10:32:24+00:00</dc:date>
    <link>http://courses.csail.mit.edu/6.851/spring12/lectures/</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Good lecture notes on the current state of the art in data structure research.

<blockquote>Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).
</blockquote>

(via Tim Freeman)]]></description>
<dc:subject>data-structures lectures mit video data algorithms coding csail strings integers hashing sorting bst memory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:5c72d87f4ea4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:lectures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:mit"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:video"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:data"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:csail"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:integers"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:hashing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sorting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:bst"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:memory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://isabel-drost.de/hadoop/slides/simon_lucene_2011.pdf">
    <title>Lucene 4 - Revisiting Problems For Speed [slides]</title>
    <dc:date>2013-04-23T20:33:04+00:00</dc:date>
    <link>http://isabel-drost.de/hadoop/slides/simon_lucene_2011.pdf</link>
    <dc:creator>jm</dc:creator><description><![CDATA[a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011.  The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!]]></description>
<dc:subject>benchmarks slides lucene search fuzzy-matching text-matching strings algorithms coding fsm tries</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:90ddcc0d1fee/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:benchmarks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:slides"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:lucene"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:fuzzy-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:text-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:fsm"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:tries"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.strchr.com/strcmp_and_strlen_using_sse_4.2">
    <title>Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com</title>
    <dc:date>2013-01-27T22:46:47+00:00</dc:date>
    <link>http://www.strchr.com/strcmp_and_strlen_using_sse_4.2</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Using new Intel Core i7 instructions to speed up string manipulation. Fascinating stuff. SSE ftw]]></description>
<dc:subject>sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:dc8ab7793636/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sse"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:simd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:assembly"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:intel"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:i7"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:intel-core"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strstr"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strchr"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strlen"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:coding"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/pdf/1209.6449.pdf">
    <title>Fast Packed String Matching for Short Patterns [paper, PDF]</title>
    <dc:date>2013-01-18T11:21:08+00:00</dc:date>
    <link>http://arxiv.org/pdf/1209.6449.pdf</link>
    <dc:creator>jm</dc:creator><description><![CDATA['Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other
fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared
trying to exploit the power of the word RAM model to speed-up the
performances of classical string matching algorithms. [...]
In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.'  Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms.  (via Tony Finch)]]></description>
<dc:subject>rabin-karp algorithms strings string-matching papers via:fanf</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:jm/b:1a932a1f5d7e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:rabin-karp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:papers"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:via:fanf"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://news.ycombinator.com/item?id=2208760">
    <title>Hacker News thread on a new string search algorithm</title>
    <dc:date>2011-02-13T23:50:12+00:00</dc:date>
    <link>http://news.ycombinator.com/item?id=2208760</link>
    <dc:creator>jm</dc:creator><description><![CDATA[Great comments -- the Burrows-Wheeler Transform is crazy stuff]]></description>
<dc:subject>strings search algorithms burrows-wheeler-transform sequencing genome compression dna string-matching</dc:subject>
<dc:identifier>https://pinboard.in/u:jm/b:26291a82aeb8/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:burrows-wheeler-transform"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:sequencing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:genome"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:compression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:dna"/>
	<rdf:li rdf:resource="https://pinboard.in/u:jm/t:string-matching"/>
</rdf:Bag></taxo:topics>
</item>
</rdf:RDF>