Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmProtomaps2023-10-24T08:38:29+00:00
https://protomaps.com/
jmcartography javascript mapping maps web http range-requests map-tiles cdn formatshttps://pinboard.in/https://pinboard.in/u:jm/b:8a246bc3183d/simdjson/simdjson-java2023-10-09T08:06:01+00:00
https://github.com/simdjson/simdjson-java
jmsimd java json parsing formats performance librarieshttps://pinboard.in/https://pinboard.in/u:jm/b:587e2c3aab0d/ESB HDF Reader2023-10-04T16:42:26+00:00
https://github.com/dresdner353/energyutils/blob/main/ESB_HDF_READER.md
jmformats json csv hdf esb power feed-in-tarriff ireland open-data datahttps://pinboard.in/https://pinboard.in/u:jm/b:4cd7358f7373/Trino on Ice IV: Deep Dive Into Iceberg Internals2021-06-09T09:01:17+00:00
https://blog.starburst.io/trino-on-ice-iv-deep-dive-into-iceberg-internals
jmtrino iceberg data big-data data-lakes formats s3 avro orchttps://pinboard.in/https://pinboard.in/u:jm/b:ccbdfd8f2e06/AVIF has landed2020-09-08T21:07:30+00:00
https://jakearchibald.com/2020/avif-has-landed/
jmimages web avif webp jpeg compression formatshttps://pinboard.in/https://pinboard.in/u:jm/b:fd52f1004073/Apache Arrow2020-07-30T22:05:19+00:00
https://www.dremio.com/apache-arrow-explained/
jmArrow combines the benefits of columnar data structures with in-memory computing. It provides the performance benefits of these modern techniques while also providing the flexibility of complex data and dynamic schemas. And it does all of this in an open source and standardized way.
(via Tony Finch)]]>via:fanf arrow data formats compression columnar-storage storage librarieshttps://pinboard.in/https://pinboard.in/u:jm/b:fa06288fd165/FlexBuffers | Hacker News2020-06-22T13:49:53+00:00
https://news.ycombinator.com/item?id=23588558
jmflatbuffers flexbuffers json encoding data formats file-formats avro protobuf zerocopy sbe schemashttps://pinboard.in/https://pinboard.in/u:jm/b:567d7f5724e6/ndjson2019-04-25T08:14:24+00:00
https://github.com/ndjson/ndjson-spec
jmjson streaming unix pipes newlines formats interchange data standardshttps://pinboard.in/https://pinboard.in/u:jm/b:a80f59e1e13f/Why JSON isn't a good configuration language2018-07-17T09:33:11+00:00
https://www.lucidchart.com/techblog/2018/07/16/why-json-isnt-a-good-configuration-language/
jmjson configuration languages coding formatshttps://pinboard.in/https://pinboard.in/u:jm/b:12e1164540e1/S3 Inventory Adds Apache ORC output format and Amazon Athena Integration2017-11-20T10:22:59+00:00
https://aws.amazon.com/about-aws/whats-new/2017/11/s3-inventory-adds-apache-orc-output-format-and-amazon-athena-integration/
jmorc formats data interchange s3 athena outputhttps://pinboard.in/https://pinboard.in/u:jm/b:3ffe967cccc0/seriot.ch - Parsing JSON is a Minefield đź’Ł2016-10-27T13:43:28+00:00
http://seriot.ch/parsing_json.html
jmCrockford chose not to version [the] JSON definition: 'Probably the boldest design decision I made was to not put a version number on JSON so there is no mechanism for revising it. We are stuck with JSON: whatever it is in its current form, that’s it.' Yet JSON is defined in at least six different documents.
"Boldest". ffs. :facepalm:]]>bold courage json parsing coding data formats interchange fail standards confusionhttps://pinboard.in/https://pinboard.in/u:jm/b:bf8536a109d9/Osso2016-09-20T10:12:36+00:00
http://www.osso-project.org/manifesto/
jmosso events schema data interchange formats cep event-processing architecturehttps://pinboard.in/https://pinboard.in/u:jm/b:b65eb3684525/UncertML2015-01-07T00:17:32+00:00
http://www.uncertml.org/
jma conceptual model, with accompanying XML schema, that may be used to quantify and exchange complex uncertainties in data. The interoperable model can be used to describe uncertainty in a variety of ways including:
Samples
Statistics including mean, variance, standard deviation and quantile
Probability distributions including marginal and joint distributions and mixture models
]]>via:conor uncertainty statistics xml formatshttps://pinboard.in/https://pinboard.in/u:jm/b:d25e81904ab8/Standard Markdown2014-09-04T10:03:12+00:00
http://standardmarkdown.com/
jmJohn Gruber’s canonical description of Markdown’s syntax does not specify the syntax unambiguously. In the absence of a spec, early implementers consulted the original Markdown.pl code to resolve these ambiguities. But Markdown.pl was quite buggy, and gave manifestly bad results in many cases, so it was not a satisfactory replacement for a spec.
Because there is no unambiguous spec, implementations have diverged considerably. As a result, users are often surprised to find that a document that renders one way on one system (say, a GitHub wiki) renders differently on another (say, converting to docbook using Pandoc). To make matters worse, because nothing in Markdown counts as a “syntax error,” the divergence often isn't discovered right away.
There's no standard test suite for Markdown; the unofficial MDTest is the closest thing we have. The only way to resolve Markdown ambiguities and inconsistencies is Babelmark, which compares the output of 20+ implementations of Markdown against each other to see if a consensus emerges.
We propose a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification. We believe this is necessary, even essential, for the future of Markdown.
]]>writing markdown specs standards text formats htmlhttps://pinboard.in/https://pinboard.in/u:jm/b:1d280cd836af/FlatBuffers: Main Page2014-06-17T09:21:59+00:00
http://google.github.io/flatbuffers/
jmAccess to serialized data without parsing/unpacking - What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
Memory efficiency and speed - The only memory needed to access your data is that of the buffer. It requires 0 additional allocations. FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance sensitive applications. See the benchmarks for details.
Flexible - Optional fields means not only do you get great forwards and backwards compatibility (increasingly important for long-lived games: don't have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don't, and how you design data structures.
Tiny code footprint - Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details.
Strongly typed - Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you.
Convenient to use - Generated C++ code allows for terse access & construction code. Then there's optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers).
Looks nice, but it misses the language coverage of protobuf. Definitely more practical than capnproto.
]]>c++ google java serialization json formats protobuf capnproto storage flatbuffershttps://pinboard.in/https://pinboard.in/u:jm/b:6301528e508a/Simple Binary Encoding2014-05-06T14:31:54+00:00
http://mechanical-sympathy.blogspot.co.uk/2014/05/simple-binary-encoding.html
jman OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. [...] SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message.
The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer.
The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations.
The end result of applying these design principles is a codec that has ~25X greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again.
The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding.
]]>sbe encoding protobuf protocol-buffers json messages messaging binary formats low-latency martin-thompson xmlhttps://pinboard.in/https://pinboard.in/u:jm/b:bac9f09a3e68/'Pickles & Spores: Improving Support for Distributed Programming in Scala2014-04-29T10:00:31+00:00
https://speakerdeck.com/heathermiller/spores-distributable-functions-in-scala
jmpickling scala presentations spores closures fp immutability coding distributed distcomp serialization formats networkhttps://pinboard.in/https://pinboard.in/u:jm/b:4d377e1b419e/Cap'n Proto2013-04-03T16:22:04+00:00
http://kentonv.github.com/capnproto/
jmCap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I'm not seeing the same kind of support for optional data that protobufs has though. Overall I'm worried there's some useful features being omitted here...]]>serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14yhttps://pinboard.in/https://pinboard.in/u:jm/b:63b8402f4508/