Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmThe Death of Microservice Madness in 20182018-01-15T11:23:58+00:00
http://www.dwmkerr.com/the-death-of-microservice-madness-in-2018/
jmarchitecture devops microservices services soa coding monoliths state systemshttps://pinboard.in/https://pinboard.in/u:jm/b:0f831a39dc8a/Post-apocalyptic life in American health care2018-01-09T11:38:07+00:00
https://meaningness.com/metablog/post-apocalyptic-health-care
jmbureaucracy healthcare health systems us-politics insurance medicine dysfunctional fail fiasco via:craighttps://pinboard.in/https://pinboard.in/u:jm/b:ed55c26b957b/Steven Bellovin on Bitcoin2018-01-01T23:45:31+00:00
https://www.cs.columbia.edu/~smb/blog/2017-12/2017-12-30.html
jmWhen you engineer a system for deployment you build it to meet certain real-world goals. You may find that there are tradeoffs, and that you can't achieve all of your goals, but that's normal; as I've remarked, "engineering is the art of picking the right trade-off in an overconstrained environment". For any computer-based financial system, one crucial parameter is the transaction rate. For a system like Bitcoin, another goal had to be avoiding concentrations of power. And of course, there's transaction privacy.
There are less obvious factors, too. These days, "mining" for Bitcoins requires a lot of computations, which translates directly into electrical power consumption. One estimate is that the Bitcoin network uses up more electricity than many countries. There's also the question of governance: who makes decisions about how the network should operate? It's not a question that naturally occurs to most scientists and engineers, but production systems need some path for change.
In all of these, Bitcoin has failed. The failures weren't inevitable; there are solutions to these problems in the acdemic literature. But Bitcoin was deployed by enthusiasts who in essence let experimental code escape from a lab to the world, without thinking about the engineering issues—and now they're stuck with it. Perhaps another, better cryptocurrency can displace it, but it's always much harder to displace something that exists than to fill a vacuum.
]]>steven-bellovin bitcoin tech software systems engineering deployment cryptocurrency cypherpunkshttps://pinboard.in/https://pinboard.in/u:jm/b:4f78659ed032/'STELLA Report from the SNAFUcatchers Workshop on Coping With Complexity', March 14-16 20172017-11-13T15:06:10+00:00
https://drive.google.com/file/d/0B7kFkt5WxLeDTml5cTFsWXFCb1U/view
jmcomplexity postmortems dark-debt technical-debt resilience reliability systems snafu reports toread stella john-allspawhttps://pinboard.in/https://pinboard.in/u:jm/b:85fbb4cb2fd9/Locking, Little's Law, and the USL2017-09-20T14:36:55+00:00
https://groups.google.com/forum/#!msg/mechanical-sympathy/gchG_oQ_kQM/59BDMOdUAwAJ
jmLittle's law can be used to describe a system in steady state from a queuing perspective, i.e. arrival and leaving rates are balanced. In this case it is a crude way of modelling a system with a contention percentage of 100% under Amdahl's law, in that throughput is one over latency.
However this is an inaccurate way to model a system with locks. Amdahl's law does not account for coherence costs. For example, if you wrote a microbenchmark with a single thread to measure the lock cost then it is much lower than in a multi-threaded environment where cache coherence, other OS costs such as scheduling, and lock implementations need to be considered.
Universal Scalability Law (USL) accounts for both the contention and the coherence costs.
http://www.perfdynamics.com/Manifesto/USLscalability.html
When modelling locks it is necessary to consider how contention and coherence costs vary given how they can be implemented. Consider in Java how we have biased locking, thin locks, fat locks, inflation, and revoking biases which can cause safe points that bring all threads in the JVM to a stop with a significant coherence component.
]]>usl scaling scalability performance locking locks java jvm amdahls-law littles-law system-dynamics modelling systems caching threads schedulers contentionhttps://pinboard.in/https://pinboard.in/u:jm/b:d64fb1279a0b/Fireside Chat with Vint Cerf & Marc Andreessen (Google Cloud Next '17) - YouTube2017-05-15T10:09:07+00:00
https://youtu.be/y9bJ8LslSZ4?t=27m00s
jmvint-cerf gcp regulation oversight politics law reliability systemshttps://pinboard.in/https://pinboard.in/u:jm/b:3abb7da356e4/Best practices with Airflow2016-10-19T10:22:04+00:00
https://www.youtube.com/watch?v=dgaoqOZlvEA&feature=youtu.be
jmetl airflow batch architecture systems opshttps://pinboard.in/https://pinboard.in/u:jm/b:89b7e8acd127/QA Instability Implies Production Instability2016-07-15T13:36:28+00:00
http://www.michaelnygard.com/blog/2016/07/qa-instability-implies-production-instability/
jmInvariably, when I see a lot of developer effort in production support I also find an unreliable QA environment. It is both unreliable in that it is frequently not available for testing, and unreliable in the sense that the system’s behavior in QA is not a good predictor of its behavior in production.
]]>qa testing architecture patterns systems productionhttps://pinboard.in/https://pinboard.in/u:jm/b:0f86aee40e69/“Racist algorithms” and learned helplessness2016-04-07T15:39:02+00:00
https://algorithmicfairness.wordpress.com/2016/04/06/racist-algorithms-and-learned-helplessness/
jmWhenever I’ve had to talk about bias in algorithms, I’ve tried be careful to emphasize that it’s not that we shouldn’t use algorithms in search, recommendation and decision making. It’s that we often just don’t know how they’re making their decisions to present answers, make recommendations or arrive at conclusions, and it’s this lack of transparency that’s worrisome. Remember, algorithms aren’t just code.
What’s also worrisome is the amplifier effect. Even if “all an algorithm is doing” is reflecting and transmitting biases inherent in society, it’s also amplifying and perpetuating them on a much larger scale than your friendly neighborhood racist. And that’s the bigger issue. [...] even if the algorithm isn’t creating bias, it’s creating a feedback loop that has powerful perception effects.
]]>feedback bias racism algorithms software systems societyhttps://pinboard.in/https://pinboard.in/u:jm/b:0ea691a533c7/Taming Complexity with Reversibility2015-07-28T19:28:48+00:00
https://www.facebook.com/notes/kent-beck/taming-complexity-with-reversibility/1000330413333156
jmDevelopment servers. Each engineer has their own copy of the entire site. Engineers can make a change, see the consequences, and reverse the change in seconds without affecting anyone else.
Code review. Engineers can propose a change, get feedback, and improve or abandon it in minutes or hours, all before affecting any people using Facebook.
Internal usage. Engineers can make a change, get feedback from thousands of employees using the change, and roll it back in an hour.
Staged rollout. We can begin deploying a change to a billion people and, if the metrics tank, take it back before problems affect most people using Facebook.
Dynamic configuration. If an engineer has planned for it in the code, we can turn off an offending feature in production in seconds. Alternatively, we can dial features up and down in tiny increments (i.e. only 0.1% of people see the feature) to discover and avoid non-linear effects.
Correlation. Our correlation tools let us easily see the unexpected consequences of features so we know to turn them off even when those consequences aren't obvious.
IRC. We can roll out features potentially affecting our ability to communicate internally via Facebook because we have uncorrelated communication channels like IRC and phones.
Right hand side units. We can add a little bit of functionality to the website and turn it on and off in seconds, all without interfering with people's primary interaction with NewsFeed.
Shadow production. We can experiment with new services under real load, from a tiny trickle to the whole flood, without affecting production.
Frequent pushes. Reversing some changes require a code change. On the website we never more than eight hours from the next schedule code push (minutes if a fix is urgent and you are willing to compensate Release Engineering). The time frame for code reversibility on the mobile applications is longer, but the downward trend is clear from six weeks to four to (currently) two.
Data-informed decisions. (Thanks to Dave Cleal) Data-informed decisions are inherently reversible (with the exceptions noted below). "We expect this feature to affect this metric. If it doesn't, it's gone."
Advance countries. We can roll a feature out to a whole country, generate accurate feedback, and roll it back without affecting most of the people using Facebook.
Soft launches. When we roll out a feature or application with a minimum of fanfare it can be pulled back with a minimum of public attention.
Double write/bulk migrate/double read. Even as fundamental a decision as storage format is reversible if we follow this format: start writing all new data to the new data store, migrate all the old data, then start reading from the new data store in parallel with the old.
We do a bunch of these in work, and the rest are on the to-do list. +1 to these!]]>software deployment complexity systems facebook reversibility dark-releases releases ops cd migrationhttps://pinboard.in/https://pinboard.in/u:jm/b:6a3089426e1e/ferd.ca -> Lessons Learned while Working on Large-Scale Server Software2015-04-22T15:26:07+00:00
http://ferd.ca/lessons-learned-while-working-on-large-scale-server-software.html
jmdistributed scalability systems coding server-side erlang devops networking reliabilityhttps://pinboard.in/https://pinboard.in/u:jm/b:4b4817db08ed/'Machine Learning: The High-Interest Credit Card of Technical Debt' [PDF]2014-12-17T15:14:22+00:00
https://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43146.pdf
jmmachine-learning ml systems ops tech-debt maintainance google papers hidden-costs developmenthttps://pinboard.in/https://pinboard.in/u:jm/b:2fcc66b0e422/10 Things We Forgot to Monitor2014-01-29T15:00:43+00:00
http://word.bitly.com/post/74839060954/ten-things-to-monitor
jmnagios metrics ops monitoring systems ntp bitlyhttps://pinboard.in/https://pinboard.in/u:jm/b:8d0b22e66fef/Stability Patterns and Antipatterns [slides]2013-05-20T16:34:55+00:00
http://cdn.oreillystatic.com/en/assets/1/event/79/Stability%20Patterns%20Presentation.pdf
jmmichael-nygard design-patterns architecture systems networking reliability soa slides pdfhttps://pinboard.in/https://pinboard.in/u:jm/b:711ab6d6b63d/Basho | Alert Logic Relies on Riak to Support Rapid Growth2013-01-26T13:17:04+00:00
http://basho.com/blog/technical/2013/01/24/Alert-Logic-Riak/
jmiops riak basho ops systems alert-logic storage nosql databaseshttps://pinboard.in/https://pinboard.in/u:jm/b:7dbe818efe05/Big Data Lambda Architecture2013-01-25T16:25:38+00:00
http://www.databasetube.com/database/big-data-lambda-architecture/
jmstorm systems architecture lambda-architecture design Hadoophttps://pinboard.in/u:jm/b:b609421d73cf/Notes on Distributed Systems for Young Bloods2013-01-14T17:21:34+00:00
http://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
jmsystems distributed distcomp cap metrics coding guidelines architecture backpressure design twitterhttps://pinboard.in/https://pinboard.in/u:jm/b:1f6c84e7ef47/OmniTI's Experiences Adopting Chef2013-01-14T14:01:53+00:00
http://omniti.com/seeds/seeds-our-experiences-with-chef-adoption-challenges
jmchef deployment ops omniti systems vagrant automationhttps://pinboard.in/https://pinboard.in/u:jm/b:07d0ae11b7e7/an ex-RBSG engineer on the NatWest/RBS/UlsterBank IT fiasco2012-06-26T12:26:31+00:00
http://nielsenhayden.com/makinglight/archives/014081.html
jmsystems ops support maintainance legacy ca-7 banking rbs natwest ulster-bank fail outsourcinghttps://pinboard.in/https://pinboard.in/u:jm/b:3fef2ae57d4d/The MongoDB NoSQL Database Blog - MongoDB live at Craigslist2011-05-18T21:58:56+00:00
http://blog.mongodb.org/post/5545198613/mongodb-live-at-craigslist
jmMongoDB is now live at Craigslist, where it is being used to archive [10TB] of [old posts]'. iiiinteresting]]>mongodb nosql craigslist systemshttps://pinboard.in/https://pinboard.in/u:jm/b:7277bb1f19b9/