<?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (Vaguery)</title>
    <link>https://pinboard.in/u:Vaguery/public/</link>
    <description>recent bookmarks from Vaguery</description>
    <items>
      <rdf:Seq>	<rdf:li rdf:resource="https://arxiv.org/abs/2309.16100"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2201.01268"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2112.15563"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1109.5635"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2005.07678"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2203.13545"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2112.12678"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2111.13134"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2210.02292"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2304.14395"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1703.00667"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1908.07846"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2103.02361"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1904.04513"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1705.11130"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.10087"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2108.08613"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1909.02512"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2107.06188"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2106.01190"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2009.06080"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2011.04072"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.00332"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1904.05459"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2002.06786"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1902.04419"/>
	<rdf:li rdf:resource="https://link.springer.com/article/10.1007/s11009-019-09706-8"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1605.03654"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1809.08490"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1009.4061"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1107.2422"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1812.08101"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1606.02868"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1309.3441"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1808.06313"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1710.10964"/>
	<rdf:li rdf:resource="https://github.com/friemen/diffit"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1709.05725"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1712.08573"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1710.03248"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1710.03395"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1705.01887"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1406.5895"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1108.3615v1"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1701.00948"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1512.03421"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1502.04644"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1406.0670"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1407.5841"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1505.00667"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1311.2318"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1612.05320"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1509.05240"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1611.01479"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1709.05314"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1709.02034"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1604.06707"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1301.5080"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1102.2480"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1410.7669"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1108.3620"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1108.5574"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1212.5106"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1505.00707"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1511.02393"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1702.07015"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1608.03533"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1702.00318"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1701.00928"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1611.05537"/>
      </rdf:Seq>
    </items>
  </channel><item rdf:about="https://arxiv.org/abs/2309.16100">
    <title>[2309.16100] Generating functions of substitutions</title>
    <dc:date>2026-04-20T11:43:03+00:00</dc:date>
    <link>https://arxiv.org/abs/2309.16100</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We prove that a substitution is aperiodic if and only if some of its associated generating functions are transcendental. These generating functions have a recursive structure arising from the substitution which we use to study their roots in the case of the Fibonacci substitution.
]]></description>
<dc:subject>rewriting-systems strings nonlinear-dynamics mathematical-recreations mathematics to-understand consider:feature-discovery consider:inverse-problem</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:90c29aada2bb/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nonlinear-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:mathematical-recreations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:feature-discovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:inverse-problem"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2201.01268">
    <title>[2201.01268] Geometrical representation of subshifts for primitive substitutions</title>
    <dc:date>2026-01-01T01:34:24+00:00</dc:date>
    <link>https://arxiv.org/abs/2201.01268</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[For any primitive substitution whose Perron eigenvalue is Pisot unit, we construct a domain exchange measurably conjugated to the subshift. And we give a condition for the subshift to be a finite extension of a torus translation. For the particular case of weakly irreducible Pisot substitution, we show that the subshift is either a finite extension of a torus translation, either a power of the subshift is weakly mixing. And we provide an algorithm to compute eigenvalues of the subshift associated to any primitive pseudo-unimodular substitution.
]]></description>
<dc:subject>rewriting-systems strings rather-interesting number-theory group-theory computational-dynamics formal-languages</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:d7dbc0fd23fe/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:number-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:group-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2112.15563">
    <title>[2112.15563] Entropy-Variance curves of binary sequences generated by random substitutions of constant length</title>
    <dc:date>2026-01-01T01:30:54+00:00</dc:date>
    <link>https://arxiv.org/abs/2112.15563</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We study some properties of binary sequences generated by random substitutions of constant length. Specifically, assuming the alphabet {0,1}, we consider the following asymmetric substitution rule of length k: 0→⟨0,0,…,0⟩ and 1→⟨Y1,Y2,…,Yk⟩, where Yi is a Bernoulli random variable with parameter p∈[0,1]. We obtain by recurrence the discrete probability distribution of the stochastic variable that counts the number of ones in the sequence formed after a number i of substitutions (iterations). We derive its first two statistical moments, mean and variance, and the entropy of the generated sequences as a function of the substitution length k for any successive iteration i, and characterize the values of p where the maxima of these measures occur. Finally, we obtain the parametric curves entropy-variance for each iteration and substitution length. We find two regimes of dependence between these two variables that, to our knowledge, have not been previously described. Besides, it allows to compare sequences with the same entropy but different variance and vice versa.]]></description>
<dc:subject>nonlinear-dynamics rewriting-systems strings looking-to-see stochastic-systems information-theory rather-interesting to-understand</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:6b78b87504f8/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nonlinear-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:stochastic-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:information-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1109.5635">
    <title>[1109.5635] Approximating Edit Distance in Near-Linear Time</title>
    <dc:date>2025-12-31T18:17:15+00:00</dc:date>
    <link>https://arxiv.org/abs/1109.5635</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We show how to compute the edit distance between two strings of length n up to a factor of 2^{Õ(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{Õ(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time.
]]></description>
<dc:subject>strings edit-distance algorithms computational-complexity rather-interesting to-understand follow-cites</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7479f24f2b7a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:follow-cites"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2005.07678">
    <title>[2005.07678] Edit Distance in Near-Linear Time: it's a Constant Factor</title>
    <dc:date>2025-12-31T17:56:32+00:00</dc:date>
    <link>https://arxiv.org/abs/2005.07678</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We present an algorithm for approximating the edit distance between two strings of length n in time n1+ε up to a constant factor, for any ε>0. Our result completes a research direction set forth in the recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time. The recent results of [Koucky-Saks, STOC'20] and [Brakensiek-Rubinstein, STOC'20] have shown near-linear time algorithms that obtain an additive approximation, near-linear in n (equivalently, constant-factor approximation when the edit distance value is close to n). In contrast, our algorithm obtains a constant-factor approximation in near-linear time for any input strings.
In contrast to prior algorithms, which are mostly recursing over smaller substrings, our algorithm gradually smoothes out the local contribution to the edit distance over progressively larger substrings. To accomplish this, we iteratively construct a distance oracle data structure for the metric of edit distance on all substrings of input strings, of length niε for i=0,1,…,1/ε. The distance oracle approximates the edit distance over these substrings in a certain average sense, just enough to estimate the overall edit distance.
]]></description>
<dc:subject>strings computational-complexity algorithms rather-interesting approximation to-write-about to-implement consider:Push-code</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:71df08420968/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-implement"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:Push-code"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2203.13545">
    <title>[2203.13545] Automorphism groups of random substitution subshifts</title>
    <dc:date>2025-09-11T20:39:50+00:00</dc:date>
    <link>https://arxiv.org/abs/2203.13545</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We prove that for a suitably nice class of random substitutions, their corresponding subshifts have automorphism groups that contain an infinite simple subgroup and a copy of the automorphism group of a full shift. Hence, they are countable, non-amenable and non-residually finite. To show this, we introduce the concept of shuffles and generalised shuffles for random substitutions, as well as a local version of recognisability for random substitutions that will be of independent interest. Without recognisability, we need a more refined notion of recognisable words in order to understand their automorphisms. We show that the existence of a single recognisable word is often enough to embed the automorphism group of a full shift in the automorphism group of the random substitution subshift.
]]></description>
<dc:subject>strings rewriting-systems automata stochastic-systems rather-interesting dynamical-systems group-theory to-understand to-simulate consider:computation quasicrystals aperiodic-tiling symbolic-dynamics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:9b2124858de2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:automata"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:stochastic-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamical-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:group-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:computation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:quasicrystals"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:aperiodic-tiling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:symbolic-dynamics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2112.12678">
    <title>[2112.12678] Dynamic Suffix Array with Sub-linear update time and Poly-logarithmic Lookup Time</title>
    <dc:date>2025-08-17T12:19:53+00:00</dc:date>
    <link>https://arxiv.org/abs/2112.12678</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The Suffix Array SAS[1…n] of an n-length string S is a lexicographically sorted array of the suffixes of S. The suffix array is one of the most well known and widely used data structures in string algorithms. We present a data structure for maintaining a representation of the suffix array of a dynamic string which undergoes symbol substitutions, deletions, and insertions.
For every string manipulation, our data structure can be updated in O(n23) time (ignoring multiplicative polylogarithmic factors) with n being the current length of the string. For an input query i∈[1…n], our data structure reports SAS[i] in O(log5(n)) time.
We also present a faster data structure, with O(n‾√) update time (ignoring multiplicative polylogarithmic factors), for maintaining the Inverted Suffix Array of a dynamic string undergoing symbol substitutions updates. For an input query i∈[1…n], our data structure reports the i'th entry in the inverted suffix array in O(log4(n)) time.
Our data structures can be used to obtain sub-linear dynamic algorithms for several classical string problems for which efficient dynamic solutions were not previously known.]]></description>
<dc:subject>data-structures symbolic-dynamics strings computational-complexity rather-interesting to-understand this-might-be-just-the-thing ReQ</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:b2621692edf2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:symbolic-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:this-might-be-just-the-thing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:ReQ"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2111.13134">
    <title>[2111.13134] Automaticity of uniformly recurrent substitutive sequences</title>
    <dc:date>2025-08-17T12:17:13+00:00</dc:date>
    <link>https://arxiv.org/abs/2111.13134</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We provide a complete characterisation of automaticity of uniformly recurrent substitutive sequences in terms of the incidence matrix of the return substitution of an underlying purely substitutive sequence. This gives an answer to a recent question of Allouche, Dekking and Queffélec in the uniformly recurrent case. We also show that the same criterion characterises automaticity of minimal substitutive systems.
]]></description>
<dc:subject>rewriting-systems strings rules representation rather-interesting to-understand symbolic-dynamics consider:approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:9dd60aff888a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rules"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:symbolic-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2210.02292">
    <title>[2210.02292] Double-Ended Palindromic Trees: A Linear-Time Data Structure and Its Applications</title>
    <dc:date>2023-09-09T13:40:02+00:00</dc:date>
    <link>https://arxiv.org/abs/2210.02292</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The palindromic tree (a.k.a. eertree) is a linear-size data structure that provides access to all palindromic substrings of a string. In this paper, we propose a generalized version of eertree, called double-ended eertree, which supports linear-time online double-ended queue operations on the stored string. At the heart of our construction, is a class of substrings, called surfaces, of independent interest. Namely, surfaces are neither prefixes nor suffixes of any other palindromic substrings and characterize the link structure of all palindromic substrings in the eertree. 
As an application, we develop a framework for range queries involving palindromes on strings, including counting distinct palindromic substrings, and finding the longest palindromic substring, shortest unique palindromic substring and shortest absent palindrome of any substring. In particular, offline queries only use linear space. Apart from range queries, we enumerate palindromic rich strings with a given word in linear time on the length of the given word.
]]></description>
<dc:subject>strings palindromes algorithms computational-complexity data-structures rather-interesting to-understand</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:fa87703719d4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:palindromes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2304.14395">
    <title>[2304.14395] string2string: A Modern Python Library for String-to-String Algorithms</title>
    <dc:date>2023-05-16T11:01:30+00:00</dc:date>
    <link>https://arxiv.org/abs/2304.14395</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: this https URL.
]]></description>
<dc:subject>strings algorithms library rather-interesting python to-try consider:trees</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:92f9faee842c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:library"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:python"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-try"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:trees"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1703.00667">
    <title>[1703.00667] A resource-frugal probabilistic dictionary and applications in bioinformatics</title>
    <dc:date>2022-03-19T12:18:50+00:00</dc:date>
    <link>https://arxiv.org/abs/1703.00667</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Indexing massive data sets is extremely expensive for large scale problems. In many fields, huge amounts of data are currently generated, however extracting meaningful information from voluminous data sets, such as computing similarity between elements, is far from being trivial. It remains nonetheless a fundamental need. This work proposes a probabilistic data structure based on a minimal perfect hash function for indexing large sets of keys. Our structure out-compete the hash table for construction, query times and for memory usage, in the case of the indexation of a static set. To illustrate the impact of algorithms performances, we provide two applications based on similarity computation between collections of sequences, and for which this calculation is an expensive but required operation. In particular, we show a practical case in which other bioinformatics tools fail to scale up the tested data set or provide lower recall quality results.
]]></description>
<dc:subject>data-structures sequences strings to-understand to-write-about consider:genetic-programming consider:runtime-search-algorithms</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:01cbd625ac4e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:sequences"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:runtime-search-algorithms"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1908.07846">
    <title>[1908.07846] Representing text as abstract images enables image classifiers to also simultaneously classify text</title>
    <dc:date>2022-03-11T11:37:01+00:00</dc:date>
    <link>https://arxiv.org/abs/1908.07846</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We introduce a novel method for converting text data into abstract image representations, which allows image-based processing techniques (e.g. image classification networks) to be applied to text-based comparison problems. We apply the technique to entity disambiguation of inventor names in US patents. The method involves converting text from each pairwise comparison between two inventor name records into a 2D RGB (stacked) image representation. We then train an image classification neural network to discriminate between such pairwise comparison images, and use the trained network to label each pair of records as either matched (same inventor) or non-matched (different inventors), obtaining highly accurate results. Our new text-to-image representation method could also be used more broadly for other NLP comparison problems, such as disambiguation of academic publications, or for problems that require simultaneous classification of both text and image datasets.
]]></description>
<dc:subject>neural-networks strings image-processing representation rather-interesting to-write-about to-visualize consider:genetic-programming consider:robustness</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:eac50b22a9e5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:neural-networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:image-processing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-visualize"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:robustness"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2103.02361">
    <title>[2103.02361] Topological Mixing of Random Substitutions</title>
    <dc:date>2022-03-05T11:39:07+00:00</dc:date>
    <link>https://arxiv.org/abs/2103.02361</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We investigate topological mixing of compatible random substitutions. For primitive random substitutions on two letters whose second eigenvalue is greater than one in modulus, we identify a simple, computable criterion which is equivalent to topological mixing of the associated subshift. This generalises previous results on deterministic substitutions. In the case of recognisable, irreducible Pisot random substitutions, we show that the associated subshift is not topologically mixing. Without recognisability, we rely on more specialised methods for excluding mixing and we apply these methods to show that the random Fibonacci substitution subshift is not topologically mixing.
]]></description>
<dc:subject>rewriting-systems strings nonlinear-dynamics ergodic-systems information-theory rather-interesting to-write-about to-simulate consider:Kauffmaniac-chemistry artificial-chemistry</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:efdde26d0346/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nonlinear-dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:ergodic-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:information-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:Kauffmaniac-chemistry"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:artificial-chemistry"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1904.04513">
    <title>[1904.04513] Towards a complete perspective on labeled tree indexing: new size bounds, efficient constructions, and beyond</title>
    <dc:date>2022-03-04T11:27:07+00:00</dc:date>
    <link>https://arxiv.org/abs/1904.04513</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A labeled tree (or a trie) is a natural generalization of a string, which can also be seen as a compact representation of a set of strings. This paper considers the labeled tree indexing problem, and provides a number of new results on space bound analysis, and on algorithms for efficient construction and pattern matching queries. Kosaraju [FOCS 1989] was the first to consider the labeled tree indexing problem, and he proposed the suffix tree for a backward trie, where the strings in the trie are read in the leaf-to-root direction. In contrast to a backward trie, we call a usual trie as a forward trie. Despite a few follow-up works after Kosaraju's paper, indexing forward/backward tries is not well understood yet. In this paper, we show a full perspective on the sizes of indexing structures such as suffix trees, DAWGs, CDAWGs, suffix arrays, affix trees, affix arrays for forward and backward tries. Some of them take O(n) space in the size n of the input trie, while the others can occupy O(n2) space in the worst case. In particular, we show that the size of the DAWG for a forward trie with n nodes is Ω(σn), where σ is the number of distinct characters in the trie. This becomes Ω(n2) for an alphabet of size σ=Θ(n). Still, we show that there is a compact O(n)-space implicit representation of the DAWG for a forward trie, whose space requirement is independent of the alphabet size. This compact representation allows for simulating each DAWG edge traversal in O(logσ) time, and can be constructed in O(n) time and space over any integer alphabet of size O(n). In addition, this readily extends to the first indexing structure that permits bidirectional pattern searches over a trie within linear space in the input trie size.
]]></description>
<dc:subject>strings data-structures optimization algorithms rather-interesting performance-measure information-theory engineering-design to-write-about to-simulate consider:error</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7cd802a335d2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:performance-measure"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:information-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:engineering-design"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:error"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1705.11130">
    <title>[1705.11130] Computations for symbolic substitutions</title>
    <dc:date>2022-03-02T11:38:43+00:00</dc:date>
    <link>https://arxiv.org/abs/1705.11130</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We provide a survey of results from symbolic dynamics and algebraic topology relating to Grout, a new user-friendly program developed to calculate combinatorial properties and topological invariants of a large class of symbolic substitutions. We study their subshifts (and related spaces) with an emphasis on examples of computations. We implement a check to verify that no counterexample exists to the so-called "strong coincidence conjecture" for a large number of substitutions on three and four letters.
]]></description>
<dc:subject>symbolic-regression rewriting-systems strings aperiodic-tiling looking-to-see enumeration rather-interesting</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:d13c42eaa4a8/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:symbolic-regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:aperiodic-tiling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:enumeration"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.10087">
    <title>[2101.10087] Automating Program Structure Classification</title>
    <dc:date>2022-02-08T12:26:52+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.10087</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[When students write programs, their program structure provides insight into their learning process. However, analyzing program structure by hand is time-consuming, and teachers need better tools for computer-assisted exploration of student solutions. As a first step towards an education-oriented program analysis toolkit, we show how supervised machine learning methods can automatically classify student programs into a predetermined set of high-level structures. We evaluate two models on classifying student solutions to the Rainfall problem: a nearest-neighbors classifier using syntax tree edit distance and a recurrent neural network. We demonstrate that these models can achieve 91% classification accuracy when trained on 108 programs. We further explore the generality, trade-offs, and failure cases of each model.
]]></description>
<dc:subject>strings software-engineering classification rather-interesting to-write-about consider:genetic-programming consider:clustering-ineffable-solutions</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7a910b0aa8b4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:software-engineering"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:classification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:clustering-ineffable-solutions"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2108.08613">
    <title>[2108.08613] A Conditional Lower Bound for Episode Matching</title>
    <dc:date>2022-01-29T14:04:51+00:00</dc:date>
    <link>https://arxiv.org/abs/2108.08613</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Given two strings S and P, the Episode Matching problem is to compute the length of the shortest substring of S that contains P as a subsequence. The best known upper bound for this problem is Õ(nm) by Das et al. (1997), where n,m are the lengths of S and P, respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that an O((nm)1−ϵ) algorithm (even for binary strings) would refute the popular Strong Exponential Time Hypothesis (SETH). The proof is based on a simple reduction from Orthogonal Vectors.
]]></description>
<dc:subject>edit-distance computational-complexity algorithms strings rather-interesting to-write-about consider:genetic-programming consider:visualization</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:0462c88f827d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:visualization"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1909.02512">
    <title>[1909.02512] Descriptional Complexity of Semi-Simple Splicing Systems</title>
    <dc:date>2022-01-29T13:56:16+00:00</dc:date>
    <link>https://arxiv.org/abs/1909.02512</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Splicing systems are generative mechanisms introduced by Tom Head in 1987 to model the biological process of DNA recombination. The computational engine of a splicing system is the "splicing operation", a cut-and-paste binary string operation defined by a set of "splicing rules" r=(α1,α2;α3,α4) where α1,α2,α3,α4 are words over an alphabet Σ. For two strings x=x1α1α2x2 and y=y1α3α4y2, applying the splicing rule r produces the string z=x1α1α4y2. 
In this paper we focus on a particular type of splicing systems, called (i,j) semi-simple splicing systems, i=1,2 and j=3,4, wherein all splicing rules have the property that the two strings in positions i and j are singleton letters, while the other two strings are empty. The language generated by such a system consists of the set of words that are obtained starting from an initial set called "axiom set", by iteratively applying the splicing rules to strings in the axiom set as well as to intermediately produced strings. We consider semi-simple splicing systems where the axiom set is a regular language, and investigate the descriptional complexity of such systems in terms of the size of the minimal deterministic finite automata that recognize the languages they generate.
]]></description>
<dc:subject>strings formal-languages biologically-inspired rewriting-systems rather-interesting to-write-about to-simulate consider:random-examples consider:dynamics consider:visualization grammar</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:f907e61ee423/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:biologically-inspired"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:random-examples"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:dynamics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:visualization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:grammar"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2107.06188">
    <title>[2107.06188] The degree of asymmetry of sequences</title>
    <dc:date>2021-12-19T12:36:35+00:00</dc:date>
    <link>https://arxiv.org/abs/2107.06188</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We explore the notion of degree of asymmetry for integer sequences and related combinatorial objects. The degree of asymmetry is a new combinatorial statistic that measures how far an object is from being symmetric. We define this notion for compositions, words, matchings, binary trees and permutations, we find generating functions enumerating these objects with respect to their degree of asymmetry, and we describe the limiting distribution of this statistic in each case.
]]></description>
<dc:subject>symmetry combinatorics rather-interesting feature-construction group-theory strings to-understand to-write-about consider:code</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:497a7b1767d6/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:symmetry"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:group-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:code"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2106.01190">
    <title>[2106.01190] Counting Lyndon Subsequences</title>
    <dc:date>2021-11-16T10:27:29+00:00</dc:date>
    <link>https://arxiv.org/abs/2106.01190</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Counting substrings/subsequences that preserve some property (e.g., palindromes, squares) is an important mathematical interest in stringology. Recently, Glen et al. studied the number of Lyndon factors in a string. A string w=uv is called a Lyndon word if it is the lexicographically smallest among all of its conjugates vu. In this paper, we consider a more general problem "counting Lyndon subsequences". We show (1) the maximum total number of Lyndon subsequences in a string, (2) the expected total number of Lyndon subsequences in a string, (3) the expected number of distinct Lyndon subsequences in a string.
]]></description>
<dc:subject>strings combinatorics enumeration algorithms rather-interesting to-write-about consider:implementation consider:concentration</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:2e21349d14e5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:enumeration"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:implementation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:concentration"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2009.06080">
    <title>[2009.06080] The Penney's Game with Group Action</title>
    <dc:date>2021-07-14T11:10:42+00:00</dc:date>
    <link>https://arxiv.org/abs/2009.06080</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Consider equipping an alphabet  with a group action that partitions the set of words into equivalence classes which we call patterns. We answer standard questions for the Penney's game on patterns and show non-transitivity for the game on patterns as the length of the pattern tends to infinity. We also analyze bounds on the pattern-based Conway leading number and expected wait time, and further explore the game under the cyclic and symmetric group actions.
]]></description>
<dc:subject>mathematical-recreations strings LOL-at-grad-school-when-I-tried-to-convince-Ewens-about-this probability-theory combinatorics formal-languages to-write-about to-visualize</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:62d7868af328/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:mathematical-recreations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:LOL-at-grad-school-when-I-tried-to-convince-Ewens-about-this"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:probability-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-visualize"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2011.04072">
    <title>[2011.04072] The Harmonic Edit Distance</title>
    <dc:date>2021-07-08T19:47:40+00:00</dc:date>
    <link>https://arxiv.org/abs/2011.04072</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[This short note introduces a new distance between strings, where the cost of an insertion or deletion is inversely proportional to the string length. It improves upon previous results by admitting a simple, explicit formula involving only the length of the longest common subsequence and satisfying the triangle inequality at the same time, while not requiring any parameter tuning.
]]></description>
<dc:subject>strings distance metrics edit-distance to-understand to-visualize</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7b5b52765d07/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:metrics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-visualize"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.00332">
    <title>[2101.00332] Permutations with exactly one copy of a decreasing pattern of length k</title>
    <dc:date>2021-07-08T10:49:03+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.00332</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We construct an injection from the set of permutations of length n that contain exactly one copy of the decreasing pattern of length k to the set of permutations of length n+2 that avoid that pattern. We then prove that the generating function counting the former is not rational, and in the case when k is even and k≥4, it is not even algebraic. We extend our injection and our nonrationality result to a larger class of patterns.
]]></description>
<dc:subject>permutations strings to-understand enumeration feature-construction that-rationality-thing-tho</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:5dc2bb0caf1f/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:enumeration"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:that-rationality-thing-tho"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1904.05459">
    <title>[1904.05459] Constant factor approximations to edit distance on far input pairs in nearly linear time</title>
    <dc:date>2021-01-21T21:08:56+00:00</dc:date>
    <link>https://arxiv.org/abs/1904.05459</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[For any T≥1, there are constants R=R(T)≥1 and ζ=ζ(T)>0 and a randomized algorithm that takes as input an integer n and two strings x,y of length at most n, and runs in time O(n1+1T) and outputs an upper bound U on the edit distance ED(x,y) that with high probability, satisfies U≤R(ED(x,y)+n1−ζ). In particular, on any input with ED(x,y)≥n1−ζ the algorithm outputs a constant factor approximation with high probability. 
A similar result has been proven independently by Brakensiek and Rubinstein (2019).
]]></description>
<dc:subject>strings edit-distance proof computational-complexity to-understand consider:looking-to-see consider:visualization</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:ba05fdb80253/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:proof"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:visualization"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2002.06786">
    <title>[2002.06786] DAWGs for parameterized matching: online construction and related indexing structures</title>
    <dc:date>2021-01-21T21:00:34+00:00</dc:date>
    <link>https://arxiv.org/abs/2002.06786</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Two strings x and y over Σ∪Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ∪Π→Σ∪Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n|Π|log(|Π|+|Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. We also discuss parameterized compact DAWGs.
]]></description>
<dc:subject>strings bioinformatics algorithms rather-interesting hard-problems pattern-matching to-write-about to-simulate consider:representation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:9a95cb8f1ca9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:bioinformatics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:hard-problems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:pattern-matching"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1902.04419">
    <title>[1902.04419] On Conflict Free DNA Codes</title>
    <dc:date>2020-10-13T21:06:27+00:00</dc:date>
    <link>https://arxiv.org/abs/1902.04419</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[DNA storage has emerged as an important area of research. The reliability of DNA storage system depends on designing the DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy a special constraint. Each codeword of the DNA code has a specific property that any two consecutive sub-strings of the DNA codeword will not be the same (a generalization of homo-polymers constraint). This is in addition to the usual constraints such as Hamming, reverse, reverse-complement and GC-content. We believe that the new constraint will help further in reducing the errors during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to calculate the size of the DNA codes with all the above constraints, which improves the lower bounds from the existing literature, for some specific cases. Moreover, a recursive isometric map between binary vectors and DNA strings is proposed. Using the map and the well known binary codes we obtain few classes of DNA codes with all the constraints including the property that the constructed DNA codewords are free from the hairpin-like secondary structures.
]]></description>
<dc:subject>strings DNA-computing combinatorics rather-interesting rather-odd multiobjective-optimization (would-be-better) to-write-about to-simulate consider:MO-approach permutations</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:cfbefc372b0b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:DNA-computing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-odd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:multiobjective-optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:(would-be-better)"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:MO-approach"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://link.springer.com/article/10.1007/s11009-019-09706-8">
    <title>Stochastic Analysis of Minimal Automata Growth for Generalized Strings | SpringerLink</title>
    <dc:date>2020-05-02T13:13:39+00:00</dc:date>
    <link>https://link.springer.com/article/10.1007/s11009-019-09706-8</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Generalized strings describe various biological motifs that arise in molecular and computational biology. In this manuscript, we introduce an alternative but efficient algorithm to construct the minimal deterministic finite automaton (DFA) associated with any generalized string. We exploit this construction to characterize the typical growth of the minimal DFA (i.e., with the least number of states) associated with a random generalized string of increasing length. Even though the worst-case growth may be exponential, we characterize a point in the construction of the minimal DFA when it starts to grow linearly and conclude it has at most a polynomial number of states with asymptotically certain probability. We conjecture that this number is linear.

]]></description>
<dc:subject>strings rewriting-systems combinatorics information-theory rather-interesting to-write-about to-simulate consider:grammars consider:parsimony consider:approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:305506a97f4b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:information-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:grammars"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:parsimony"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1605.03654">
    <title>[1605.03654] $q$-Quasiadditive Functions</title>
    <dc:date>2019-11-03T11:40:05+00:00</dc:date>
    <link>https://arxiv.org/abs/1605.03654</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In this paper, we introduce the notion of q-quasiadditivity of arithmetic functions, as well as the related concept of q-quasimultiplicativity, which generalises strong q-additivity and -multiplicativity, respectively. We show that there are many natural examples for these concepts, which are characterised by functional equations of the form f(qk+ra+b)=f(a)+f(b) or f(qk+ra+b)=f(a)f(b) for all b<qk and a fixed parameter r. In addition to some elementary properties of q-quasiadditive and q-quasimultiplicative functions, we prove characterisations of q-quasiadditivity and q-quasimultiplicativity for the special class of q-regular functions. The final main result provides a general central limit theorem that includes both classical and new examples as corollaries.
]]></description>
<dc:subject>number-theory group-theory to-understand rather-odd strings rewriting-systems combinatorics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:abc5a19654b0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:number-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:group-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-odd"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1809.08490">
    <title>[1809.08490] On 3-Inflatable Permutations</title>
    <dc:date>2019-09-09T11:05:16+00:00</dc:date>
    <link>https://arxiv.org/abs/1809.08490</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Call a permutation k-inflatable if it can be "blown up" into a convergent sequence of permutations by a uniform inflation construction, such that this sequence is symmetric with respect to densities of induced subpermutations of length k. We study properties of 3-inflatable permutations, finding a general formula for limit densities of pattern permutations in the uniform inflation of a given permutation. We also characterize and find examples of 3-inflatable permutations of various lengths, including the shortest examples with length 17.
]]></description>
<dc:subject>permutations strings rather-interesting mathematical-recreations dynamical-systems formal-languages constraint-satisfaction to-write-about to-simulate</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:614d8c6b8614/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:mathematical-recreations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamical-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:constraint-satisfaction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1009.4061">
    <title>[1009.4061] More Kolakoski Sequences</title>
    <dc:date>2019-09-08T11:42:36+00:00</dc:date>
    <link>https://arxiv.org/abs/1009.4061</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Our goal in this article is to review the known properties of the mysterious Kolakoski sequence and at the same time look at generalizations of it over arbitrary two letter alphabets. Our primary focus will here be the case where one of the letters is odd while the other is even, since in the other cases the sequences in question can be rewritten as (well-known) primitive substitution sequences. We will look at word and letter frequencies, squares, palindromes and complexity.
]]></description>
<dc:subject>combinatorics rewriting-systems rather-interesting strings to-write-about to-simulate consider:performance-measures consider:approximation consider:genetic-programming consider:constraint-satisfaction</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:e0e0916da2d9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-simulate"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:performance-measures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:constraint-satisfaction"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1107.2422">
    <title>[1107.2422] A Linear Time Algorithm for Seeds Computation</title>
    <dc:date>2019-06-12T13:21:04+00:00</dc:date>
    <link>https://arxiv.org/abs/1107.2422</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A seed in a word is a relaxed version of a period in which the occurrences of the repeating subword may overlap. We show a linear-time algorithm computing a linear-size representation of all the seeds of a word (the number of seeds might be quadratic). In particular, one can easily derive the shortest seed and the number of seeds from our representation. Thus, we solve an open problem stated in the survey by Smyth (2000) and improve upon a previous O(n log n) algorithm by Iliopoulos, Moore, and Park (1996). Our approach is based on combinatorial relations between seeds and subword complexity (used here for the first time in context of seeds). In the previous papers, the compact representation of seeds consisted of two independent parts operating on the suffix tree of the word and the suffix tree of the reverse of the word, respectively. Our second contribution is a simpler representation of all seeds which avoids dealing with the reversed word. 
A preliminary version of this work, with a much more complex algorithm constructing the earlier representation of seeds, was presented at the 23rd Annual ACM-SIAM Symposium of Discrete Algorithms (SODA 2012).
]]></description>
<dc:subject>strings combinatorics patterns rather-interesting algorithms computational-complexity nudge-targets to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:32e9a0a0c82e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:patterns"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1812.08101">
    <title>[1812.08101] Efficient Representation and Counting of Antipower Factors in Words</title>
    <dc:date>2019-03-02T01:15:07+00:00</dc:date>
    <link>https://arxiv.org/abs/1812.08101</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A k-antipower (for k≥2) is a concatenation of k pairwise distinct words of the same length. The study of antipower factors of a word was initiated by Fici et al. (ICALP 2016) and first algorithms for computing antipower factors were presented by Badkobeh et al. (Inf. Process. Lett., 2018). We address two open problems posed by Badkobeh et al. Our main results are algorithms for counting and reporting factors of a word which are k-antipowers. They work in (nklogk) time and (nklogk+C) time, respectively, where C is the number of reported factors. For k=o(n/logn‾‾‾‾‾‾‾√), this improves the time complexity of (n2/k) of the solution by Badkobeh et al. Our main algorithmic tools are runs and gapped repeats. We also present an improved data structure that checks, for a given factor of a word and an integer k, if the factor is a k-antipower.]]></description>
<dc:subject>strings combinatorics algorithms patterns feature-construction rather-interesting nudge-targets</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:591df0e5acde/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:patterns"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1606.02868">
    <title>[1606.02868] Anti-Powers in Infinite Words</title>
    <dc:date>2019-03-02T01:14:33+00:00</dc:date>
    <link>https://arxiv.org/abs/1606.02868</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In combinatorics of words, a concatenation of k consecutive equal blocks is called a power of order k. In this paper we take a different point of view and define an anti-power of order k as a concatenation of k consecutive pairwise distinct blocks of the same length. As a main result, we show that every infinite word contains powers of any order or anti-powers of any order. That is, the existence of powers or anti-powers is an unavoidable regularity. Indeed, we prove a stronger result, which relates the density of anti-powers to the existence of a factor that occurs with arbitrary exponent. As a consequence, we show that in every aperiodic uniformly recurrent word, anti-powers of every order begin at every position. We further show that every infinite word avoiding anti-powers of order 3 is ultimately periodic, while there exist aperiodic words avoiding anti-powers of order 4. We also show that there exist aperiodic recurrent words avoiding anti-powers of order 6.]]></description>
<dc:subject>strings combinatorics patterns rather-interesting to-write-about consider:algorithms</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:6c5eef4d30df/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:patterns"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:algorithms"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1309.3441">
    <title>[1309.3441] On the shape of subword complexity sequences of finite words</title>
    <dc:date>2018-12-20T12:40:17+00:00</dc:date>
    <link>https://arxiv.org/abs/1309.3441</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The subword complexity of a word w over a finite alphabet  is a function that assigns for each positive integer n, the number of distinct subwords of length n in w. The subword complexity of a word is a good measure of the randomness of the word and gives insight to what the word itself looks like. In this paper, we discuss the properties of subword complexity sequences, and consider different variables that influence their shape. We also compute the number of distinct subword complexity sequences for certain lengths of words over different alphabets, and state some conjectures about the growth of these numbers.]]></description>
<dc:subject>strings combinatorics enumeration pattern-discovery complexity-measures information-theory rather-interesting to-write-about for-puzzles you-know-for-kids</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:609accd48e8f/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:enumeration"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:pattern-discovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:complexity-measures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:information-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:for-puzzles"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:you-know-for-kids"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1808.06313">
    <title>[1808.06313] Binomial coefficients and multifactorial numbers through generative grammars</title>
    <dc:date>2018-12-17T13:18:25+00:00</dc:date>
    <link>https://arxiv.org/abs/1808.06313</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In this paper, the formal derivative operator defined with respect to context-free grammars is used to prove some properties about binomial coefficients and multifactorial numbers. In addition, we extend the formal derivative operator to matrix grammars and show that multifactorial numbers can also be generated.
]]></description>
<dc:subject>combinatorics strings rather-interesting counting to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7462352c014a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:counting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1710.10964">
    <title>[1710.10964] At the Roots of Dictionary Compression: String Attractors</title>
    <dc:date>2018-07-04T11:38:52+00:00</dc:date>
    <link>https://arxiv.org/abs/1710.10964</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this fact, decades of research have generated myriads of so-called dictionary compressors: algorithms able to reduce the text's size by exploiting its repetitiveness. Lempel-Ziv 77 is probably one of the most successful and known tools of this kind, followed by straight-line programs, run-length Burrows-Wheeler transform, and other less-known schemes. In this paper, we show that these techniques are different solutions to the same, elegant, combinatorial problem: to find a small set of positions capturing all distinct text's substrings. We call such a set a string attractor. We first show reductions between dictionary compressors and string attractors. This gives us the approximation ratios of dictionary compressors with respect to the smallest string attractor and allows us to solve several open problems related to the asymptotic relations between the output sizes of different dictionary compressors. We then show that the k-attractor problem - that is, deciding whether a text has a size-t set of positions capturing all substrings of length at most k - is NP-complete for k >= 3. We provide approximation techniques for the smallest k-attractor, show that the problem is APX-complete for constant k, and give strong inapproximability results. To conclude, we provide matching lower- and upper- bounds for the random access problem on string attractors. Our optimal data structure is universal: by our reductions to string attractors, it supports random access on any dictionary-compression scheme. In particular, our solution matches the lower bound also on LZ77, straight-line programs, collage systems, and macro schemes, and therefore essentially closes (at once) the random access problem for all these compressors.]]></description>
<dc:subject>compression strings feature-extraction representation algorithms computational-complexity to-understand nudge-targets consider:looking-to-see</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:9b85017a8384/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:compression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-extraction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://github.com/friemen/diffit">
    <title>friemen/diffit: Clojure(Script) diff and patch implementations for vector and map.</title>
    <dc:date>2018-01-28T11:52:07+00:00</dc:date>
    <link>https://github.com/friemen/diffit</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Diff calculates the edit-distance and a minimal edit-script to transform the first into the second collection via patch. The vector based diff implementation follows An O(NP) Sequence Comparison Algorithm by Wu, Manber, Myers and Miller.

]]></description>
<dc:subject>clojure strings library to-use metrics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7e1daba044c1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:clojure"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:library"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-use"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:metrics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1709.05725">
    <title>[1709.05725] FlashProfile: Interactive Synthesis of Syntactic Profiles</title>
    <dc:date>2018-01-26T12:16:56+00:00</dc:date>
    <link>https://arxiv.org/abs/1709.05725</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We address the problem of learning comprehensive syntactic profiles for a set of strings. Real-world datasets, typically curated from multiple sources, often contain data in various formats. Thus any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify various formats is infeasible in standard big-data scenarios. 
We present a technique for generating comprehensive syntactic profiles in terms of user-defined patterns that also allows for interactive refinement. We define a syntactic profile as a set of succinct patterns that describe the entire dataset. Our approach efficiently learns such profiles, and allows refinement by exposing a desired number of patterns. 
Our implementation, FlashProfile, shows a median profiling time of 0.7s over 142 tasks on 74 real datasets. We also show that access to the generated data profiles allow for more accurate synthesis of programs, using fewer examples in programming-by-example workflows.]]></description>
<dc:subject>pattern-discovery rather-interesting strings data-mining statistics nudge-targets consider:looking-to-see consider:representation consider:performance-measures</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:3b20b6fb4a78/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:pattern-discovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-mining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:performance-measures"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1712.08573">
    <title>[1712.08573] Longest common substring with approximately $k$ mismatches</title>
    <dc:date>2018-01-15T15:11:05+00:00</dc:date>
    <link>https://arxiv.org/abs/1712.08573</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature. In this paper we first show a conditional lower bound based on the SETH hypothesis implying that there is little hope to improve existing solutions. We then introduce a new but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a solution with strongly subquadratic running time. We also apply these results to obtain a strongly subquadratic approximation algorithm for the longest common substring with k mismatches problem and show conditional hardness of improving its approximation ratio.]]></description>
<dc:subject>strings computational-complexity algorithms rather-interesting nudge-targets consider:looking-to-see</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:fcd81ec25a1e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1710.03248">
    <title>[1710.03248] Synthesizing Bijective Lenses</title>
    <dc:date>2017-11-27T20:57:46+00:00</dc:date>
    <link>https://arxiv.org/abs/1710.03248</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as database views and view updaters, and more. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. 
To solve this, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are two ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently enough. Unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). We synthesize terms of a equivalent language and convert those generated terms into our lens language. We prove the correctness of our synthesis algorithm. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions. We evaluate Optician on a benchmark suite of 39 examples including both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files.
]]></description>
<dc:subject>programming-language to-understand data-structures translation strings to-write-about nudge-targets?</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:a853f89f4ece/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:programming-language"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:translation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets?"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1710.03395">
    <title>[1710.03395] Efficient Dynamic Dictionary Matching with DAWGs and AC-automata</title>
    <dc:date>2017-11-17T13:55:31+00:00</dc:date>
    <link>https://arxiv.org/abs/1710.03395</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The dictionary matching is a task to find all occurrences of pattern strings in a set D (called a dictionary) on a text string T. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in O(dlogσ) preprocessing time and O(nlogσ+occ) matching time, where d is the total length of the patterns in D, n is the length of the text, σ is the alphabet size, and occ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from D. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms. For a pattern of length m, our first algorithm supports insertions in O(mlogσ+logd/loglogd) time and pattern matching in O(nlogσ+occ) time for the semi-dynamic setting and supports both insertions and deletions in O(σm+logd/loglogd) time and pattern matching in O(n(logd/loglogd+logσ)+occ(logd/loglogd)) time for the dynamic setting by some modifications. This algorithm is based on the directed acyclic word graph. Our second algorithm, which is based on the AC-automaton, supports insertions in O(mlogσ+uf+uo) time for the semi-dynamic setting and supports both insertions and deletions in O(σm+uf+uo) time for the dynamic setting, where uf and uo respectively denote the numbers of states of which the failure function and the output function need to be updated. This algorithm performs pattern matching in O(nlogσ+occ) time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods, since any algorithm which explicitly maintains the AC-automaton requires Ω(uf+uo) update time.]]></description>
<dc:subject>algorithms strings rather-interesting formal-languages representation rewriting-systems to-write-about consider:rediscovery nudge-targets</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:88225bd758af/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1705.01887">
    <title>[1705.01887] Streaming for Aibohphobes: Longest Palindrome with Mismatches</title>
    <dc:date>2017-11-17T13:28:50+00:00</dc:date>
    <link>https://arxiv.org/abs/1705.01887</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A palindrome is a string that reads the same as its reverse, such as "aibohphobia" (fear of palindromes). Given an integer d>0, a d-near-palindrome is a string of Hamming distance at most d from its reverse. We study the natural problem of identifying a longest d-near-palindrome in data streams. The problem is relevant to the analysis of DNA databases, and to the task of repairing recursive structures in documents such as XML and JSON. We present an algorithm that returns a d-near-palindrome whose length is within a multiplicative (1+ϵ)-factor of the longest d-near-palindrome. Our algorithm also returns the set of mismatched indices of the d-near-palindrome, using (dlog7nϵlog(1+ϵ)) bits of space, and (dlog6nϵlog(1+ϵ)) update time per arriving symbol. We show that Ω(dlogn) space is necessary for estimating the length of longest d-near-palindromes with high probability. We further obtain an additive-error approximation algorithm and a comparable lower bound, as well as an exact two-pass algorithm that solves the longest d-near-palindrome problem using (d2n‾√log6n) bits of space.]]></description>
<dc:subject>strings computational-complexity algorithms probability-theory inference online-learning rather-interesting to-write-about nudge-targets consider:looking-to-see consider:representation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:a55596c193ce/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:probability-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:inference"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:online-learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1406.5895">
    <title>[1406.5895] Universal Lyndon Words</title>
    <dc:date>2017-11-05T14:24:06+00:00</dc:date>
    <link>https://arxiv.org/abs/1406.5895</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study \emph{universal Lyndon words}, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit combinatorial and structural properties of these words. We then define particular prefix codes, which we call Hamiltonian lex-codes, and show that every Hamiltonian lex-code is in bijection with the set of the shortest unrepeated prefixes of the conjugates of a universal Lyndon word. This allows us to give an algorithm for constructing all the universal Lyndon words.]]></description>
<dc:subject>combinatorics strings to-write-about nudge-targets consider:rediscovery consider:performance-measures</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:285cfae33bf8/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:performance-measures"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1108.3615v1">
    <title>[1108.3615v1] Interactions between Digital Geometry and Combinatorics on Words</title>
    <dc:date>2017-11-05T14:01:04+00:00</dc:date>
    <link>https://arxiv.org/abs/1108.3615v1</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We review some recent results in digital geometry obtained by using a combinatorics on words approach to discrete geometry. Motivated on the one hand by the well-known theory of Sturmian words which model conveniently discrete lines in the plane, and on the other hand by the development of digital geometry, this study reveals strong links between the two fields. Discrete figures are identified with polyominoes encoded by words. The combinatorial tools lead to elegant descriptions of geometrical features and efficient algorithms. Among these, radix-trees are useful for efficiently detecting path intersection, Lyndon and Christoffel words appear as the main tools for describing digital convexity; equations on words allow to better understand tilings by translations.]]></description>
<dc:subject>combinatorics representation rather-interesting strings tiling to-write-about domino-tiling mathematical-recreations</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:28edf3445129/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:tiling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:domino-tiling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:mathematical-recreations"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1701.00948">
    <title>[1701.00948] Abelian-Square-Rich Words</title>
    <dc:date>2017-11-05T13:59:04+00:00</dc:date>
    <link>https://arxiv.org/abs/1701.00948</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[An abelian square is the concatenation of two words that are anagrams of one another. A word of length n can contain at most Θ(n2) distinct factors, and there exist words of length n containing Θ(n2) distinct abelian-square factors, that is, distinct factors that are abelian squares. This motivates us to study infinite words such that the number of distinct abelian-square factors of length n grows quadratically with n. More precisely, we say that an infinite word w is {\it abelian-square-rich} if, for every n, every factor of w of length n contains, on average, a number of distinct abelian-square factors that is quadratic in n; and {\it uniformly abelian-square-rich} if every factor of w contains a number of distinct abelian-square factors that is proportional to the square of its length. Of course, if a word is uniformly abelian-square-rich, then it is abelian-square-rich, but we show that the converse is not true in general. We prove that the Thue-Morse word is uniformly abelian-square-rich and that the function counting the number of distinct abelian-square factors of length 2n of the Thue-Morse word is 2-regular. As for Sturmian words, we prove that a Sturmian word sα of angle α is uniformly abelian-square-rich if and only if the irrational α has bounded partial quotients, that is, if and only if sα has bounded exponent.]]></description>
<dc:subject>strings combinatorics patterns rather-interesting to-write-about nudge-targets</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:232df0bcfdf6/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:patterns"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1512.03421">
    <title>[1512.03421] The extended 1-perfect trades in small hypercubes</title>
    <dc:date>2017-10-21T12:45:05+00:00</dc:date>
    <link>https://arxiv.org/abs/1512.03421</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[An extended 1-perfect trade is a pair (T0,T1) of two disjoint binary distance-4 even-weight codes such that the set of words at distance 1 from T0 coincides with the set of words at distance 1 from T1. Such trade is called primary if any pair of proper subsets of T0 and T1 is not a trade. Using a computer-aided approach, we classify nonequivalent primary extended 1-perfect trades of length 10, constant-weight extended 1-perfect trades of length 12, and Steiner trades derived from them. In particular, all Steiner trades with parameters (5,6,12) are classified.]]></description>
<dc:subject>combinatorics to-understand graph-theory strings optimization constraint-satisfaction looking-to-see nudge-targets consider:looking-to-see consider:classification consider:feature-discovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:df76bf4f0e9c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:graph-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:constraint-satisfaction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:classification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:feature-discovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1502.04644">
    <title>[1502.04644] Beyond the Runs Theorem</title>
    <dc:date>2017-10-21T12:37:12+00:00</dc:date>
    <link>https://arxiv.org/abs/1502.04644</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Recently, a short and elegant proof was presented showing that a binary word of length n contains at most n−3 runs. Here we show, using the same technique and a computer search, that the number of runs in a binary word of length n is at most 2223n<0.957n.]]></description>
<dc:subject>strings proof permutations looking-to-see to-write-about nudge-targets</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:fdce8d2bac01/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:proof"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1406.0670">
    <title>[1406.0670] Decision Algorithms for Fibonacci-Automatic Words, with Applications to Pattern Avoidance</title>
    <dc:date>2017-10-19T22:40:02+00:00</dc:date>
    <link>https://arxiv.org/abs/1406.0670</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We implement a decision procedure for answering questions about a class of infinite words that might be called (for lack of a better name) "Fibonacci-automatic". This class includes, for example, the famous Fibonacci word f = 01001010..., the fixed point of the morphism 0 -> 01 and 1 -> 0. We then recover many results about the Fibonacci word from the literature (and improve some of them), such as assertions about the occurrences in f of squares, cubes, palindromes, and so forth. As an application of our method we prove a new result: there exists an aperiodic infinite binary word avoiding the pattern x x x^R. This is the first avoidability result concerning a nonuniform morphism proven purely mechanically.]]></description>
<dc:subject>strings combinatorics classification nudge-targets consider:looking-to-see consider:feature-discovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:b6c028a2d38e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:classification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:feature-discovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1407.5841">
    <title>[1407.5841] Mechanical Proofs of Properties of the Tribonacci Word</title>
    <dc:date>2017-10-18T12:28:51+00:00</dc:date>
    <link>https://arxiv.org/abs/1407.5841</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We implement a decision procedure for answering questions about a class of infinite words that might be called (for lack of a better name) "Tribonacci-automatic". This class includes, for example, the famous Tribonacci word T = 0102010010202 ..., the fixed point of the morphism 0 -> 01, 1 -> 02, 2 -> 0. We use it to reprove some old results about the Tribonacci word from the literature, such as assertions about the occurrences in T of squares, cubes, palindromes, and so forth. We also obtain some new results.]]></description>
<dc:subject>formal-languages strings automata representation feature-construction rather-interesting nudge-targets consider:looking-to-see consider:rediscovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:6caa8fac1070/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:automata"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1505.00667">
    <title>[1505.00667] An Unusual Continued Fraction</title>
    <dc:date>2017-10-18T12:24:32+00:00</dc:date>
    <link>https://arxiv.org/abs/1505.00667</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We consider the real number σ with continued fraction expansion [a0,a1,a2,…]=[1,2,1,4,1,2,1,8,1,2,1,4,1,2,1,16,…], where ai is the largest power of 2 dividing i+1. We compute the irrationality measure of σ2 and demonstrate that σ2 (and σ) are both transcendental numbers. We also show that certain partial quotients of σ2 grow doubly exponentially, thus confirming a conjecture of Hanna and Wilson.]]></description>
<dc:subject>continued-fractions strings number-theory rather-interesting to-understand nudge-targets consider:representation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:dc4cbf1a3da7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:continued-fractions"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:number-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1311.2318">
    <title>[1311.2318] Counting the Palstars</title>
    <dc:date>2017-10-18T12:22:24+00:00</dc:date>
    <link>https://arxiv.org/abs/1311.2318</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A palstar (after Knuth, Morris, and Pratt) is a concatenation of even-length palindromes. We show that, asymptotically, there are Θ(αnk) palstars of length 2n over a k-letter alphabet, where αk is a constant such that 2k−1<αk<2k−12. In particular, α2≐3.33513193.]]></description>
<dc:subject>strings feature-construction classification nudge-targets consider:looking-to-see</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:8a68ae4595ad/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:classification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1612.05320">
    <title>[1612.05320] Minimum Critical Exponents for Palindromes</title>
    <dc:date>2017-10-18T12:14:14+00:00</dc:date>
    <link>https://arxiv.org/abs/1612.05320</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We determine the minimum possible critical exponent for all palindromes over finite alphabets.
]]></description>
<dc:subject>strings feature-construction rather-interesting nudge-targets to-write-about consider:rediscovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:8422f600a045/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1509.05240">
    <title>[1509.05240] Periods and borders of random words</title>
    <dc:date>2017-10-18T11:58:49+00:00</dc:date>
    <link>https://arxiv.org/abs/1509.05240</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We investigate the behavior of the periods and border lengths of random words over a fixed alphabet. We show that the asymptotic probability that a random word has a given maximal border length k is a constant, depending only on k and the alphabet size ℓ. We give a recurrence that allows us to determine these constants with any required precision. This also allows us to evaluate the expected period of a random word. For the binary case, the expected period is asymptotically about n−1.641. We also give explicit formulas for the probability that a random word is unbordered or has maximum border length one.]]></description>
<dc:subject>probability-theory strings rather-interesting consider:Benford's-law? to-write-about to-understand</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:6d6f29bf1d8a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:probability-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:Benford's-law?"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1611.01479">
    <title>[1611.01479] Space-Efficient Re-Pair Compression</title>
    <dc:date>2017-09-28T00:22:18+00:00</dc:date>
    <link>https://arxiv.org/abs/1611.01479</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let n, σ, and d be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and 5n+4σ2+4d+n‾√ words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of ⌈log2n⌉ bits and a re-writable input text composed by n such words. Our first algorithm runs in expected (n/ϵ) time and uses (1+ϵ)n+n‾√ words of space on top of the text for any parameter 0<ϵ≤1 chosen in advance. Our second algorithm runs in expected (nlogn) time and improves the space to n+n‾√ words.]]></description>
<dc:subject>strings compression algorithms rather-interesting to-write-about discrete-mathematics computational-complexity nudge-targets consider:rediscovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:8345d8e06667/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:compression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:discrete-mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1709.05314">
    <title>[1709.05314] String Attractors</title>
    <dc:date>2017-09-28T00:19:45+00:00</dc:date>
    <link>https://arxiv.org/abs/1709.05314</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Let S be a string of length n. In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions [1,n] such that every distinct substring of S has an occurrence crossing one of the attractor's elements. We first show that the minimum attractor's size yields upper-bounds to the string's repetitiveness as measured by its linguistic complexity and by the length of its longest repeated substring. We then prove that all known compressors for repetitive strings induce a string attractor whose size is bounded by their associated repetitiveness measure, and can therefore be considered as approximations of the smallest one. Using further reductions, we derive the approximation ratios of these compressors with respect to the smallest attractor and solve several open problems related to the asymptotic relations between repetitiveness measures (in particular, between the the sizes of the Lempel-Ziv factorization, the run-length Burrows-Wheeler transform, the smallest grammar, and the smallest macro scheme). These reductions directly provide approximation algorithms for the smallest string attractor. We then apply string attractors to solve efficiently a fundamental problem in the field of compressed computation: we present a universal compressed data structure for text extraction that improves existing strategies simultaneously for \emph{all} known dictionary compressors and that, by recent lower bounds, almost matches the optimal running time within the resulting space. To conclude, we consider generalizations of string attractors to labeled graphs, show that the attractor problem is NP-complete on trees, and provide a logarithmic approximation computable in polynomial time.]]></description>
<dc:subject>strings compression discrete-mathematics rather-interesting feature-construction to-write-about data-structures nudge-targets consider:looking-to-see combinatorics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:82fe4afe6e07/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:compression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:discrete-mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:feature-construction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:data-structures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1709.02034">
    <title>[1709.02034] Detecting Patterns Can Be Hard: Circuit Lower Bounds for the String Matching Problem</title>
    <dc:date>2017-09-26T14:58:46+00:00</dc:date>
    <link>https://arxiv.org/abs/1709.02034</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[Detecting patterns in strings and images is a fundamental and well studied problem. We study the circuit complexity of the string matching problem under two popular choices of gates: De Morgan and threshold gates. For strings of length n and patterns of length logn≪k≤n−o(n), we prove super polynomial lower bounds for De Morgan circuits of depth 2, and nearly linear lower bounds for depth 2 threshold circuits. For unbounded depth and k≥2, we prove a linear lower bound for (unbounded fan-in) De Morgan circuits. For certain values of k, we prove a Ω(n‾√/logn) lower bound for general (no depth restriction) threshold circuits. Our proof for threshold circuits builds on a curious connection between detecting patterns and evaluating Boolean functions when the truth table of the function is given explicitly. Finally, we provide upper bounds on the size of circuits that solve the string matching problem.
]]></description>
<dc:subject>strings pattern-discovery computational-complexity algorithms rather-interesting nudge-targets consider:looking-to-see to-write-about consider:cartooning</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:0a1a0ee8d2ad/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:pattern-discovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:cartooning"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1604.06707">
    <title>[1604.06707] Loopless Gray Code Enumeration and the Tower of Bucharest</title>
    <dc:date>2017-04-29T11:52:55+00:00</dc:date>
    <link>https://arxiv.org/abs/1604.06707</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We give new algorithms for generating all n-tuples over an alphabet of m letters, changing only one letter at a time (Gray codes). These algorithms are based on the connection with variations of the Towers of Hanoi game. Our algorithms are loopless, in the sense that the next change can be determined in a constant number of steps, and they can be implemented in hardware. We also give another family of loopless algorithms that is based on the idea of working ahead and saving the work in a buffer.
]]></description>
<dc:subject>combinatorics generative-models rather-interesting to-write-about permutations strings algorithms nudge-targets consider:rediscovery consider:variations</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:248874a133bc/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:generative-models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:variations"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1301.5080">
    <title>[1301.5080] Approaches for enumerating permutations with a prescribed number of occurrences of patterns</title>
    <dc:date>2017-04-21T10:47:53+00:00</dc:date>
    <link>https://arxiv.org/abs/1301.5080</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In recent work, Zeilberger and the author used a functional equations approach for enumerating permutations with r occurrences of the pattern 12...k. In particular, the approach yielded a polynomial-time enumeration algorithm for any fixed nonnegative r. We extend that approach to patterns of the form 12...(k-2)(k)(k-1) by deriving analogous functional equations and using them to develop similar algorithms that enumerate permutations with r occurrences of the pattern. We also generalize those techniques to handle patterns of the form 23...k1 and derive analogous functional equations and enumeration algorithms. Finally, we show how the functional equations and algorithms can be modified to track inversions as well as handle multiple patterns simultaneously. This paper is accompanied by Maple packages that implement the algorithms described.]]></description>
<dc:subject>permutations strings combinatorics counting nudge-targets consider:looking-to-see to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:de22f8b5f8a5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:counting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1102.2480">
    <title>[1102.2480] Computational Approaches to Consecutive Pattern Avoidance in Permutations</title>
    <dc:date>2017-04-20T23:07:23+00:00</dc:date>
    <link>https://arxiv.org/abs/1102.2480</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In recent years, there has been increasing interest in consecutive pattern avoidance in permutations. In this paper, we introduce two approaches to counting permutations that avoid a set of prescribed patterns consecutively. These algoritms have been implemented in the accompanying Maple package CAV, which can be downloaded from the author's website. As a byproduct of the first algorithm, we have a theorem giving a sufficient condition for when two pattern sets are strongly (consecutively) Wilf-Equivalent. For the implementation of the second algorithm, we define the cluster tail generating function and show that it always satisfies a certain functional equation. We also explain how the CAV package can be used to approximate asymptotic constants for single pattern avoidance.
]]></description>
<dc:subject>permutations combinatorics strings nudge-targets consider:looking-to-see consider:performance-measures consider:rediscovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:3bdee77d3e15/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:permutations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:performance-measures"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:rediscovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1410.7669">
    <title>[1410.7669] Lost in Self-stabilization</title>
    <dc:date>2017-04-19T14:10:21+00:00</dc:date>
    <link>https://arxiv.org/abs/1410.7669</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[One of the questions addressed here is How can a twisted thread correct itself?. We consider a theoretical model where the studied mathematical object represents a 2D twisted discrete thread linking two points. This thread is made of a chain of agents which are lost, i.e. they have no knowledge of the global setting and no sense of direction. Thus, the modifications made by the agents are local and all the decisions use only minimal information about the local neighborhood. We introduce a random process such that the thread reorganizes itself efficiently to become a discrete line between these two points. The second question addressed here is to reorder a word by local flips in order to scatter the letters to avoid long successions of the same letter. These two questions are equivalent. The work presented here is at the crossroad of many different domains such as modeling cooling process in crystallography [2, 3, 8], stochastic cellular automata [6, 7], organizing a line of robots in distributed algorithms (the robot chain problem [5, 11]), and Christoffel words in language theory [1].
]]></description>
<dc:subject>distributed-processing collective-behavior rather-interesting strings rewriting-systems engineering-design emergent-design nudge-targets consider:looking-to-see to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:d3623531d273/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:distributed-processing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:collective-behavior"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:engineering-design"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:emergent-design"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1108.3620">
    <title>[1108.3620] Uniformly balanced words with linear complexity and prescribed letter frequencies</title>
    <dc:date>2017-04-19T14:03:58+00:00</dc:date>
    <link>https://arxiv.org/abs/1108.3620</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We consider the following problem. Let us fix a finite alphabet A; for any given d-uple of letter frequencies, how to construct an infinite word u over the alphabet A satisfying the following conditions: u has linear complexity function, u is uniformly balanced, the letter frequencies in u are given by the given d-uple. This paper investigates a construction method for such words based on the use of mixed multidimensional continued fraction algorithms.]]></description>
<dc:subject>strings rewriting-systems dynamical-systems to-understand fractals to-write-about open-questions nudge-targets consider:performance-measures</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:7ecbd167bc80/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamical-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:fractals"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:open-questions"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:performance-measures"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1108.5574">
    <title>[1108.5574] Substitutive Arnoux-Rauzy sequences have pure discrete spectrum</title>
    <dc:date>2017-04-19T14:02:29+00:00</dc:date>
    <link>https://arxiv.org/abs/1108.5574</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We prove that the symbolic dynamical system generated by a purely substitutive Arnoux-Rauzy sequence is measurably conjugate to a toral translation. The proof is based on an explicit construction of a fundamental domain with fractal boundary (a Rauzy fractal) for this toral translation.
]]></description>
<dc:subject>dynamical-systems discrete-mathematics rewriting-systems to-understand strings representation to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:33f85a6f5cca/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamical-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:discrete-mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1212.5106">
    <title>[1212.5106] Balance properties of Arnoux-Rauzy words</title>
    <dc:date>2017-04-19T14:01:01+00:00</dc:date>
    <link>https://arxiv.org/abs/1212.5106</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The paper deals with balances and imbalances in Arnoux-Rauzy words. We provide sufficient conditions for C-balancedness, but our results indicate that even a characterization of 2-balanced Arnoux-Rauzy words on a 3-letter alphabet is not immediate.]]></description>
<dc:subject>strings formal-languages to-understand dynamical-systems substitution-systems rewriting-systems nudge-targets consider:looking-to-see consider:feature-discovery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:77fa865f4605/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamical-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:substitution-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rewriting-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:feature-discovery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1505.00707">
    <title>[1505.00707] Specular sets</title>
    <dc:date>2017-04-18T22:36:51+00:00</dc:date>
    <link>https://arxiv.org/abs/1505.00707</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We introduce the notion of specular sets which are subsets of groups called here specular and which form a natural generalization of free groups. These sets are an abstract generalization of the natural codings of linear involutions. We prove several results concerning the subgroups generated by return words and by maximal bifix codes in these sets.
]]></description>
<dc:subject>strings combinatorics set-theory define-your-terms to-write-about substitution-systems consider:recombination consider:genetic-algorithms</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:a34f30644758/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:set-theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:define-your-terms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:substitution-systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:recombination"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:genetic-algorithms"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1511.02393">
    <title>[1511.02393] On Stabbing Queries for Generalized Longest Repeat</title>
    <dc:date>2017-03-22T11:53:42+00:00</dc:date>
    <link>https://arxiv.org/abs/1511.02393</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend the longest repeat query from point query to \emph{interval query}, allowing the search for longest repeat(s) covering any position interval, and thus significantly improve the usability of the solution. Our method for interval query takes a different approach using the insight from a recent work on \emph{shortest unique substrings} [1], as the prior work's approach for point query becomes infeasible in the setting of interval query. Using the critical insight from [1], we propose an indexing structure, which can be constructed in the optimal O(n) time and space for a string of size n, such that any future interval query can be answered in O(1) time. Further, our solution can find \emph{all} longest repeats covering any given interval using optimal O(occ) time, where occ is the number of longest repeats covering that given interval, whereas the prior O(n)-time and space work can find only one candidate for each point query. Experiments with real-world biological data show that our proposal is competitive with prior works, both time and space wise, while providing with the new functionality of interval queries as opposed to point queries provided by prior works.
]]></description>
<dc:subject>strings bioinformatics search-algorithms algorithms computational-complexity nudge-targets consider:looking-to-see</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:e4f8cfc93713/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:bioinformatics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:search-algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1702.07015">
    <title>[1702.07015] Unsupervised Learning of Morphological Forests</title>
    <dc:date>2017-03-05T13:01:43+00:00</dc:date>
    <link>https://arxiv.org/abs/1702.07015</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[This paper focuses on unsupervised modeling of morphological families, collectively comprising a forest over the language vocabulary. This formulation enables us to capture edgewise properties reflecting single-step morphological derivations, along with global distributional properties of the entire forest. These global properties constrain the size of the affix set and encourage formation of tight morphological families. The resulting objective is solved using Integer Linear Programming (ILP) paired with contrastive estimation. We train the model by alternating between optimizing the local log-linear model and the global ILP objective. We evaluate our system on three tasks: root detection, clustering of morphological families and segmentation. Our experiments demonstrate that our model yields consistent gains in all three tasks compared with the best published results.
]]></description>
<dc:subject>natural-language-processing word-nets machine-learning unsupervised-learning algorithms rather-interesting to-write-about strings nudge-targets consider:representation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:227a59cae642/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:natural-language-processing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:word-nets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:machine-learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:unsupervised-learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1608.03533">
    <title>[1608.03533] Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)</title>
    <dc:date>2017-02-27T12:51:11+00:00</dc:date>
    <link>https://arxiv.org/abs/1608.03533</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The ubiquitous presence of sequence data across fields such as the web, healthcare, bioinformatics, and text mining has made sequence mining a vital research area. However, sequence mining is particularly challenging because of difficulty in finding (dis)similarity/distance between sequences. This is because a distance measure between sequences is not obvious due to their unstructuredness---arbitrary strings of arbitrary length. Feature representations, such as n-grams, are often used but they either compromise on extracting both short- and long-term sequence patterns or have a high computation. We propose a new function, Sequence Graph Transform (SGT), that extracts the short- and long-term sequence features and embeds them in a finite-dimensional feature space. Importantly, SGT has low computation and can extract any amount of short- to long-term patterns without any increase in the computation, also proved theoretically in this paper. Due to this, SGT yields superior result with significantly higher accuracy and lower computation compared to the existing methods. We show it via several experimentation and SGT's real world application for clustering, classification, search and visualization as examples.
]]></description>
<dc:subject>edit-distance metrics machine-learning representation rather-interesting to-understand generative-models strings time-series to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:9991719378b8/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:metrics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:machine-learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:representation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-understand"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:generative-models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:time-series"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1702.00318">
    <title>[1702.00318] A Hybrid Evolutionary Algorithm Based on Solution Merging for the Longest Arc-Preserving Common Subsequence Problem</title>
    <dc:date>2017-02-11T15:31:11+00:00</dc:date>
    <link>https://arxiv.org/abs/1702.00318</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[The longest arc-preserving common subsequence problem is an NP-hard combinatorial optimization problem from the field of computational biology. This problem finds applications, in particular, in the comparison of arc-annotated Ribonucleic acid (RNA) sequences. In this work we propose a simple, hybrid evolutionary algorithm to tackle this problem. The most important feature of this algorithm concerns a crossover operator based on solution merging. In solution merging, two or more solutions to the problem are merged, and an exact technique is used to find the best solution within this union. It is experimentally shown that the proposed algorithm outperforms a heuristic from the literature.
]]></description>
<dc:subject>combinatorics bioinformatics metaheuristics strings rather-interesting nudge-targets consider:looking-to-see consider:representation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:62e2723cdce4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:bioinformatics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:metaheuristics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:representation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1701.00928">
    <title>[1701.00928] Counting Lyndon factors</title>
    <dc:date>2017-01-26T13:56:27+00:00</dc:date>
    <link>https://arxiv.org/abs/1701.00928</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[In this paper, we determine the maximum number of distinct Lyndon factors that a word of length n can contain. We also derive formulas for the expected total number of Lyndon factors in a word of length n on an alphabet of size σ, as well as the expected number of distinct Lyndon factors in such a word. The minimum number of distinct Lyndon factors in a word of length n is 1 and the minimum total number is n, with both bounds being achieved by xn where x is a letter. A more interesting question to ask is what is the minimum number of distinct Lyndon factors in a Lyndon word of length n? In this direction, it is known (Saari, 2014) that an optimal lower bound for the number of distinct Lyndon factors in a Lyndon word of length n is ⌈logϕ(n)+1⌉, where ϕ denotes the golden ratio (1+5‾√)/2. Moreover, this lower bound is attained by the so-called finite "Fibonacci Lyndon words", which are precisely the Lyndon factors of the well-known "infinite Fibonacci word" -- a special example of a "infinite Sturmian word". Saari (2014) conjectured that if w is Lyndon word of length n, n≠6, containing the least number of distinct Lyndon factors over all Lyndon words of the same length, then w is a Christoffel word (i.e., a Lyndon factor of an infinite Sturmian word). We give a counterexample to this conjecture. Furthermore, we generalise Saari's result on the number of distinct Lyndon factors of a Fibonacci Lyndon word by determining the number of distinct Lyndon factors of a given Christoffel word. We end with two open problems.]]></description>
<dc:subject>combinatorics formal-languages strings proof algorithms rather-interesting nudge-targets</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:d2ab56782c3c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:combinatorics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:formal-languages"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:proof"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:algorithms"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:rather-interesting"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1611.05537">
    <title>[1611.05537] Duplication Distance to the Root for Binary Sequences</title>
    <dc:date>2017-01-10T14:39:44+00:00</dc:date>
    <link>https://arxiv.org/abs/1611.05537</link>
    <dc:creator>Vaguery</dc:creator><description><![CDATA[We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form $\seq x = \seq a \seq b \seq c \to \seq y = \seq a \seq b \seq b \seq c$, where $\seq x$ and $\seq y$ are sequences and $\seq a$, $\seq b$, and $\seq c$ are their substrings, needed to generate a binary sequence of length n starting from a square-free sequence from the set {0,1,01,10,010,101}. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that f(n)=Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximum distance has a sharp transition from linear in n to logarithmic at β=1/2. We also study the duplication distance to the root for sequences with a given root and for special classes of sequences, namely, the de Bruijn sequences, the Thue-Morse sequence, and the Fibbonaci words. The problem is motivated by genomic tandem duplication mutations and the smallest number of tandem duplication events required to generate a given biological sequence.]]></description>
<dc:subject>strings metrics edit-distance computational-complexity dynamic-programming nudge-targets consider:looking-to-see consider:robustness to-write-about</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:Vaguery/b:5f4784bd94c1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:strings"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:metrics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:edit-distance"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:computational-complexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:dynamic-programming"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:nudge-targets"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:looking-to-see"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:consider:robustness"/>
	<rdf:li rdf:resource="https://pinboard.in/u:Vaguery/t:to-write-about"/>
</rdf:Bag></taxo:topics>
</item>
</rdf:RDF>