Data Model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.
Morden applications are built by layering one data model on top of another. Each layer hides the complexity of the layers below it by providing a data model.
Relational Vs Document Model
One-to-Many

For relational model, is hard to represent One-to-Many relationship, like one person may have 0 to infinite work experience.
- Relational → create multiple table to store company info, school info, then use foreign key to JOIN several tables
- Document → can use JSON-like structure, easy to read by human, and better locality (store these info in one place)
Many-to-One & Many-to-Many
When we store the region, we use ID rather than pure text. This is because ID has no meaning, it never needs to change. For example, if we want to update “Greater Seattle Area” to “Seattle”, we just need to modify the text in region_table.
Document model is good at One-to-Many because you can imagine it as a tree, but it’s not good at Many-to-X, because it looks like a graph.
If document model doesn’t support JOIN, then we need to use iteration to mock JOIN in application level. Even if it supports JOIN, we still need to use document reference (just like foreign key), which is similar to relational model.
locality
For document database, it always store the whole document as a single object.
For read, it need to load the whole document from disk to memory, so if we need most of parts inside the document, it’s fine; otherwise, its performance is poor.
For write, it also need to rewrite the whole document from memory to disk, and only modifications that don’t change the total encoded size of a document can easily be performed in place; otherwise, system need to assign new space for new document.
schema-on-read vs schema-on-write
Document Database is not schemaless. Actually, it has implicit constrain, like when we write service code to read something from DB, we assume we can get some fields, so schema-on-read is a more accurate term.
| schema-on-read | check when we READ | poor efficiency, cuz we cannot do any optimizations when we write it |
| schema-on-write | check when we WRITE | good efficiency cuz we can check the type then do optimization |
Summary
| document | relation | |
|---|---|---|
| relation map | tree, one-to-many | can use foreign key to achieve many-to-X |
| JOIN | :( | :) |
| flexibility | flexible, can add fields easily | schema, hard to change |
| locality | if operate the whole doc, performance is good; but if only operate partial doc, performance is not good | scatter in tables |
Query Language
Declarative vs Imperative Language
| Declarative | Imperative | |
|---|---|---|
| Concept | declare the logic rather than actual execution | define the execution plan |
| Example | SQL, CSS | C++, Python, … |
| Abstraction | high | low |
| Parallel | good, cuz we let system do the optimization | poor, cuz we already defined the steps |
| What u want? | How to do that? |
Here are some advantages for declarative language:
- More concise and easily use
- Hide implementation details
- Good support for parallelism
MapReduce Query
MapReduce is neither a declarative language nor a imperative language.
- declarative → we don’t need to specify how to iter or shuffle dataset
- imperative → we need to implement
mapandreducefunctions
It requires the map and reduce are pure function, which means they only use input data, they cannot do anything else like query database.
And they cannot have any side effects, which means no matter when we run the function for a given input, the output should be the same.
What’s more, mapreduce is a very low-level model for distributed execution, so engineers can implement higher-level query language base it, like SQL can be implemented as a pipeline of mapreduce.
Graph Data Model
Suitable for Many-to-Many relationships.
- vertice / node
- edge / relation
- attribute
Graph can store both homogeneous and heterogeneous data. For example, node can represents people, city, animal, activity, etc.