Chaos Engineering

Introduction “Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” Priciples of Chaos Engineering Introduction In modern, rapidly evolving distributed systems, components fail all the time. These failures can be complex as they can cascade across systems. System weaknesses such as latency, race conditions, byzantine failures etc can be exacerbated in the face of large traffic volumes....

August 13, 2019 · (updated January 16, 2022) · 6 min · Pradeep Loganathan

Retry Pattern

The retry pattern is an extremely important pattern to make applications and services more resilient to transient failures. A transient failure is a common type of failure in a cloud-based distributed architecture. This is often due to the nature of the network itself (loss of connectivity, timeout on requests, and so on). Transient faults occur when services are hosted separately and communicate over the wire, most likely over a HTTP protocol....

August 10, 2018 · (updated August 31, 2022) · 3 min · Pradeep Loganathan

Bulkhead Isolation

Bulkhead Isolation pattern is used to make sure that all our services work independently of each other and failure in one will not create a failure in another service. Techniques such as single-responsibility pattern, asynchronous-communication pattern, Circuit Breaker and Retry pattern help achieve the goal of stopping a failure propagating throughout the whole application. To implement this pattern, all our services need to work independently of each other and failure in one should not create a failure in another service....

July 8, 2018 · (updated August 31, 2022) · 1 min · Pradeep Loganathan