Rock-solid software. Build modern distributed systems

Some software just needs to be available and appear to work mostly correctly. Nobody cares if the number of hearts under a cure capybara photo is off by one, not even the capybara. On the other hand, there is software that is absolutely required to be flawless. Who wants to find out the software controlling their car brakes is (pun intended) broken? These programs require a very special approach to the whole lifecycle, including formal verification, making it both time- and resource-consuming to make even the tiniest adjustment.

There is, as always, the middle ground. This space is occupied by all kinds of line of business software that automate critical activities performed by various commercial entities like banks, insurance companies, retail stores etc. Here is where even the most experienced teams struggle with overwhelming conflicting forces that try to tear apart their software delivery processes. How can we build applications that are distributed, reliable, easy to change and always consistent? How can we do it quickly and with low costs? Finally, how can we do it all while the application is serving the users?

The very short answer is using asynchronous durable messaging. The long answer is this workshop. During the course of the two days we are going to look at a number of topics revolving around the central pillar of messaging.

First, we'll explore the frontend of the system where the user interaction happens. We'll discuss the patterns for ensuring that the user's intent is captured in the most reliable way. In this module we'll focus on the transition between HTTP-based APIs and message-based asynchronous flows.

Next, we are going to look at the correctness requirements for an asynchronous message-driven distributed system. What guarantees are necessary for it to be able to execute the business process reliably? We will focus on the following aspects:

- how not to lose messages?
- how to prevent processing a single message multiple times?
- how to prevent propagating invalid state?

We will explore various deduplication techniques, including embracing natural idempotency of data structures, immutable data structures, and identity-based deduplication.

In the third part we will explore techniques for ensuring correctness of our code, specifically:

- how do develop intuition for building distributed algorithms
- how to test concurrent code with hundreds and thousands of possible execution paths
- how to use model-checking tools such as TLA+

The fourth part will take us to the backend of the distributed system where our code often has to interact with APIs exposed by other systems or even third parties. We will discuss patterns to ensure reliable communication with these external agents.

Finally in the fifth part, we will re-examine the most common, identity-based, deduplication strategy with its advantages and pitfalls. We will attempt to develop an algorithm that removes some of these flaws.

The workshop consists of ~15 hands-on exercises with short lectures in between. Each exercise takes 10 minutes to complete and is designed to move us one step closer to designing a reliable distributed system. Each exercise is split into two parts. In the first part, each attendee is given time to work on the problem independently. In the second part, all attendees attempt to solve it using the mob programming approach.


Szymon Pobiega

Date & time 18-19 APR 2024, 9:30-17:30 Places Available 20