Intro
CPU is no longer the bottle neck. New problems are amount of data, complexity of data and the speed of changing. We call it data-intensive.
- Storage - Database
- Speed - Cache
- Search - Indexing
- Unkonwn size, continuous, asynchronous - Stream Processing
- Accumulated data - Batch Processing
Right now, single tool is hard to meet requirements. And new tools are designed to optimize for variety of use cases, the boundary between category is blurred.
Reliability
What is correct? It’s hard to define, but we can simply consider:
- The App performs what user expected
- The App can tolerate some mistakes or using it in unexpected ways
- The App is good enough under certain load and data volume
- The App prevents any abuse or unauthorized access
Then we can define reliable means “continuing to work correctly, even bad things happend”.
The bad things are called faults, a reliable system should be fault-tolerant.
fault vs failure
- fault - system deviates from original design
- failure - system cannot work (crush)
We should design fault-tolerance mechanisms to prevent faults from causing failure
Hardware Faults
Morden hardware system will use RAID to add redundancy to reduce the fault rate.
Hardware faults are random and independent for most of the cases.
Software Errors
Software errors are sometimes correlated. And these bugs may hide for a long time until we trigger it.
Scalability
Scalability is used to describe a system’s ability to cope with increased load or changed resources.
Load
Remember use case is always the key. Load can be:
- request per second
- read / write ratio
- hit rate on cache
- …
When we consider the load, the first thing is make the use case clear. There isn’t best solution, there is only suitable solution.
Performance
There have two situations:
- If load increases, and we keep the resource unchanged, how the performance changes?
- If load increases, how many resources we need to change to keep the performance unchanged?
To solve these, we need to measure performance.
There are two key term:
- throughput → the number of tasks we can process per second
- response time → the time between client sending request and receiving response
latency vs response time
- latency → duration for a request waitting to be handled
- response time → user aspect, I send a request, how long it takes until I get response, it may include network delay, queuing delay, processing time, etc
Percentile
If we run a request multiple times, the response time is not a fixed number, it has distribution. So average response time is p50 (50% percent).
For most of the response time, it looks good, so we always pay attention to tail latencies (high percentile), like p99 (90%) or p999 (99.9%). These response time is always very large and affect user’s experience.
SLO SLA
Service Level Objectives & Service Level Agreements are contracts that define the expected performance and avaliability of a service.
For example, some SLAs may define “p50 < 50ms, p99 < 100ms”.
Queuing Delay & HoL
Queuing delay is one of the most significant reason for tail latency, because limited resource can only handle limited things in parallel.
If we have many requests, they will form a queue. Even the following 99% requests are fast, if the first 1% requests are slow, it will block the queue, and make the total execution time increasing. It’s called Head-of-Line block.
Approaches for coping with load
Scale up → vertical scale, means build more powerful machine
Scale out → horizontal scale, means distribute total load into several small machines
Elastic → autoscale, means this system can detect load changing, and automatically scale to keep performance
Maintainability
- Operability - make it easy for operation team to keep the system running smoothly
- Simplicity - make new engineer can understand the system easily
- Evolvability - make it easy for adding new features