Intro

CPU is no longer the bottle neck. New problems are amount of data, complexity of data and the speed of changing. We call it data-intensive.

Storage - Database
Speed - Cache
Search - Indexing
Unkonwn size, continuous, asynchronous - Stream Processing
Accumulated data - Batch Processing

Right now, single tool is hard to meet requirements. And new tools are designed to optimize for variety of use cases, the boundary between category is blurred.

Reliability

What is correct? It’s hard to define, but we can simply consider:

The App performs what user expected
The App can tolerate some mistakes or using it in unexpected ways
The App is good enough under certain load and data volume
The App prevents any abuse or unauthorized access

Then we can define reliable means “continuing to work correctly, even bad things happend”.

The bad things are called faults, a reliable system should be fault-tolerant.

fault vs failure

fault - system deviates from original design

failure - system cannot work (crush)

We should design fault-tolerance mechanisms to prevent faults from causing failure

Hardware Faults

Morden hardware system will use RAID to add redundancy to reduce the fault rate.

Hardware faults are random and independent for most of the cases.

Software Errors

Software errors are sometimes correlated. And these bugs may hide for a long time until we trigger it.

Scalability

Scalability is used to describe a system’s ability to cope with increased load or changed resources.

Load

Remember use case is always the key. Load can be:

request per second
read / write ratio
hit rate on cache
…

When we consider the load, the first thing is make the use case clear. There isn’t best solution, there is only suitable solution.

Performance

There have two situations:

If load increases, and we keep the resource unchanged, how the performance changes?
If load increases, how many resources we need to change to keep the performance unchanged?

To solve these, we need to measure performance.

There are two key term:

throughput → the number of tasks we can process per second
response time → the time between client sending request and receiving response

latency vs response time

latency → duration for a request waitting to be handled

response time → user aspect, I send a request, how long it takes until I get response, it may include network delay, queuing delay, processing time, etc

Percentile

If we run a request multiple times, the response time is not a fixed number, it has distribution. So average response time is p50 (50% percent).

For most of the response time, it looks good, so we always pay attention to tail latencies (high percentile), like p99 (90%) or p999 (99.9%). These response time is always very large and affect user’s experience.

SLO SLA

Service Level Objectives & Service Level Agreements are contracts that define the expected performance and avaliability of a service.

For example, some SLAs may define “p50 < 50ms, p99 < 100ms”.

Queuing Delay & HoL

Queuing delay is one of the most significant reason for tail latency, because limited resource can only handle limited things in parallel.

If we have many requests, they will form a queue. Even the following 99% requests are fast, if the first 1% requests are slow, it will block the queue, and make the total execution time increasing. It’s called Head-of-Line block.

Approaches for coping with load

Scale up → vertical scale, means build more powerful machine

Scale out → horizontal scale, means distribute total load into several small machines

Elastic → autoscale, means this system can detect load changing, and automatically scale to keep performance

Maintainability

Operability - make it easy for operation team to keep the system running smoothly
Simplicity - make new engineer can understand the system easily
Evolvability - make it easy for adding new features