Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmLEB1282021-09-29T13:33:31+00:00
https://en.wikipedia.org/wiki/LEB128
jmencoding compression integers storage codes leb128https://pinboard.in/https://pinboard.in/u:jm/b:7ed911530c2d/Nearly Divisionless Random Integer Generation On Various Systems2019-06-07T09:45:45+00:00
https://lemire.me/blog/2019/06/06/nearly-divisionless-random-integer-generation-on-various-systems/
jmNaively, you could take the random integer and compute the remainder of the division by the size of the interval. It works because the remainder of the division by D is always smaller than D. Yet it introduces a statistical bias. You can do better without being slower. The conventional techniques supported in Java and Go require at least one or two integer division per integer generated. Unfortunately, division instructions are relatively expensive.
This algorithm is a practically-divisionless approach which performs a good bit faster on all 3 tested architectures.]]>performance hacks algorithms division modulo random integers speedhttps://pinboard.in/https://pinboard.in/u:jm/b:ae18dad07819/lemire/JavaFastPFOR: A simple integer compression library in Java2018-04-09T15:39:37+00:00
https://github.com/lemire/JavaFastPFOR
jm
a library to compress and uncompress arrays of integers very fast. The assumption is that most (but not all) values in your array use much less than 32 bits, or that the gaps between the integers use much less than 32 bits. These sort of arrays often come up when using differential coding in databases and information retrieval (e.g., in inverted indexes or column stores).
Please note that random integers are not compressible, by this library or by any other means. If you ever had the means of systematically compressing random integers, you could compress any data source to nothing, by recursive application of your technique.
This library can decompress integers at a rate of over 1.2 billions per second (4.5 GB/s). It is significantly faster than generic codecs (such as Snappy, LZ4 and so on) when compressing arrays of integers.
The library is used in LinkedIn Pinot, a realtime distributed OLAP datastore. Part of this library has been integrated in Parquet (http://parquet.io/). A modified version of the library is included in the search engine Terrier (http://terrier.org/). This libary is used by ClueWeb Tools (https://github.com/lintool/clueweb). It is also used by Apache NiFi.
]]>compression java pfor encoding integers algorithms storagehttps://pinboard.in/https://pinboard.in/u:jm/b:79294351fa68/Elias gamma coding2016-04-04T10:30:50+00:00
https://en.wikipedia.org/wiki/Elias_gamma_coding
jm'used most commonly when coding integers whose upper-bound cannot be determined beforehand.'
]]>data-structures algorithms elias-gamma-coding encoding coding numbers integershttps://pinboard.in/https://pinboard.in/u:jm/b:1641ab023aad/Frame of Reference and Roaring Bitmaps2015-09-23T13:50:16+00:00
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
jmlucene elasticsearch performance optimization roaring-bitmaps bitmaps frame-of-reference integers algorithmshttps://pinboard.in/https://pinboard.in/u:jm/b:5dd4238d15a7/Why Gandhi Is Such An Asshole In Civilization2014-11-03T17:13:17+00:00
http://kotaku.com/why-gandhi-is-such-an-asshole-in-civilization-1653818245
jmWhen a player adopted democracy in Civilization, their aggression would be automatically reduced by 2. Code being code, if Gandhi went democratic his aggression wouldn't go to -1, it looped back around to the ludicrously high figure of 255, making him as aggressive as a civilization could possibly be.
]]>civ civilization funny videogames bugs gandhi nuclear-war integers overflowhttps://pinboard.in/https://pinboard.in/u:jm/b:c56254215c8f/Lectures in Advanced Data Structures (6.851)2013-04-29T10:32:24+00:00
http://courses.csail.mit.edu/6.851/spring12/lectures/
jmData structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:
TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).
(via Tim Freeman)]]>data-structures lectures mit video data algorithms coding csail strings integers hashing sorting bst memoryhttps://pinboard.in/https://pinboard.in/u:jm/b:5c72d87f4ea4/Brooklyn Integers | Integers as a service2012-07-28T12:47:59+00:00
http://brooklynintegers.com/
jmvia:allspaw humour funny integers artisan satire hand-craftedhttps://pinboard.in/https://pinboard.in/u:jm/b:f5a7a3f98bd2/