Skip to content

Commit e4b21f2

Browse files
committed
Add Basic note for System Design and Godot page inhancement
1 parent ffb746a commit e4b21f2

12 files changed

Lines changed: 1930 additions & 41 deletions

pages/Godot.md

Lines changed: 577 additions & 35 deletions
Large diffs are not rendered by default.

pages/System Design - CRDT.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
seoTitle: Conflict-free Replicated Data Types (CRDT) – System Design Guide
3+
description: "A comprehensive guide on CRDTs (Conflict-free Replicated Data Types). Learn about CvRDT vs CmRDT, state/operation propagation, join-semilattices, and Python/JavaScript implementations."
4+
keywords: "crdt, conflict-free replicated data types, cvrdt, cmrdt, replication, distributed systems, replication conflicts, state-based, operation-based, eventual consistency, yjs, automerge, figma, vr-rathod, code-note"
5+
displayTitle: System Design - CRDT
6+
---
7+
8+
> [!info] What is a CRDT?
9+
> A **Conflict-free Replicated Data Type (CRDT)** is a specialized data structure designed for distributed systems. It allows multiple replicas to be updated independently and concurrently without coordination (like locking or central authorization), with a mathematical guarantee that they will eventually converge to the identical state when all updates are propagated.
10+
11+
- # Explanation
12+
- In distributed databases or collaborative applications (like Google Docs or Figma), multiple users or servers can edit the same data concurrently. Traditional databases use locks or consensus protocols (like Paxos or Raft) to coordinate writes, which introduces high latency and dependency on network connectivity.
13+
- CRDTs enable **coordination-free eventual consistency** (AP in the CAP theorem). Replicas can accept updates offline and sync asynchronously.
14+
-
15+
- ## Real-World Analogy
16+
- **Git Merge (Automatic)**: Imagine editing a file where Git can resolve all merges automatically without conflicts. CRDTs are data structures pre-designed with merge rules so that conflict resolution is mathematically deterministic and handled entirely by the structure itself.
17+
- **Collaborative Whiteboard (Figma)**: Users drag shapes around a canvas. Instead of a server locking each shape, each user moves it locally. The movements are broadcast to others, and a CRDT ensures everyone sees the shapes in the same final positions.
18+
-
19+
- # How It Works
20+
collapsed:: true
21+
- ## The Two CRDT Approaches
22+
- There are two primary ways to design and propagate changes in a CRDT:
23+
-
24+
- ### 1. State-Based CRDTs (CvRDTs)
25+
- **Mechanism**: Replicas send their **entire state** to other replicas.
26+
- **Merge Operator**: Receivers merge their local state with the incoming state using a merge function ($\sqcup$).
27+
- **Requirements**: The merge operator must form a **Join-Semilattice**.
28+
- **Network**: Can tolerate message loss, duplication, and out-of-order delivery, since the merge operator is idempotent.
29+
-
30+
- ### 2. Operation-Based CRDTs (CmRDTs)
31+
- **Mechanism**: Replicas send only the **operations** (mutations) to other replicas.
32+
- **Requirements**: Operations must be commutative to ensure convergence regardless of delivery order.
33+
- **Network**: Requires a reliable broadcast channel that guarantees **exactly-once** or **at-least-once** causal delivery.
34+
-
35+
- ## Mathematical Foundations of CvRDTs
36+
- For a state-based CRDT (CvRDT) to guarantee convergence, its states must form a **partially ordered set (poset)**, and the merge operator ($\sqcup$, also called "join") must be a **Join-Semilattice**, which satisfies three properties:
37+
-
38+
- 1. **Commutativity**: $A \sqcup B = B \sqcup A$
39+
- The order in which replicas merge states does not affect the final result.
40+
- 2. **Associativity**: $(A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)$
41+
- The grouping of merge operations does not affect the final result.
42+
- 3. **Idempotency**: $A \sqcup A = A$
43+
- Merging the same state multiple times yields the same result (no double-counting).
44+
-
45+
- ## Visual Walkthrough: G-Counter (Grow-Only Counter)
46+
- A G-Counter is a state-based CRDT where the value can only increase.
47+
-
48+
- ### Initial State (3 replicas: A, B, C)
49+
- ```
50+
Replica A: [0, 0, 0] (Value: 0)
51+
Replica B: [0, 0, 0] (Value: 0)
52+
Replica C: [0, 0, 0] (Value: 0)
53+
```
54+
-
55+
- ### Concurrent Increments
56+
- A increments twice; B increments once.
57+
- ```
58+
Replica A: [2, 0, 0] (Value: 2)
59+
Replica B: [0, 1, 0] (Value: 1)
60+
Replica C: [0, 0, 0] (Value: 0)
61+
```
62+
-
63+
- ### Synchronization (A sends state to B; B sends state to C)
64+
- Replicas merge states by taking the **element-wise maximum** of their vectors:
65+
- $$Merge(V_1, V_2) = [\max(V_{1,0}, V_{2,0}), \max(V_{1,1}, V_{2,1}), \dots]$$
66+
-
67+
- **B merges A's state**: $[\max(0, 2), \max(1, 0), \max(0, 0)] = [2, 1, 0]$ (Value: 3)
68+
- **C merges B's state**: $[\max(0, 0), \max(0, 1), \max(0, 0)] = [0, 1, 0]$ (Value: 1)
69+
-
70+
- ```
71+
Replica A: [2, 0, 0] (Value: 2)
72+
Replica B: [2, 1, 0] (Value: 3)
73+
Replica C: [0, 1, 0] (Value: 1)
74+
```
75+
-
76+
- ### Full Convergence (All replicas sync)
77+
- ```
78+
Replica A: [2, 1, 0] (Value: 3)
79+
Replica B: [2, 1, 0] (Value: 3)
80+
Replica C: [2, 1, 0] (Value: 3)
81+
```
82+
-
83+
- # Complexity & Trade-offs
84+
collapsed:: true
85+
- ## Complexity Table
86+
- | Operation / Aspect | State-Based (CvRDT) | Operation-Based (CmRDT) |
87+
|--------------------|---------------------|--------------------------|
88+
| **Local Update** | $O(1)$ | $O(1)$ |
89+
| **Merge/Apply** | $O(N)$ (where $N$ is replica count) | $O(1)$ |
90+
| **Message Size** | $O(N)$ (entire state size) | $O(1)$ (operation details only) |
91+
| **Network Cost** | Higher (bandwidth overhead) | Lower (small payloads) |
92+
| **Network Reliability**| Low requirements (works over UDP/Gossip) | High requirements (needs causal order) |
93+
-
94+
- # Implementations
95+
collapsed:: true
96+
- :::code-tabs
97+
98+
```python
99+
class PNCounter:
100+
"""
101+
Positive-Negative Counter CvRDT implementation.
102+
Allows both increments and decrements by maintaining two G-Counters:
103+
one for positive additions (P) and one for negative subtractions (N).
104+
"""
105+
def __init__(self, replica_id: str):
106+
self.replica_id = replica_id
107+
# Dictionary mapping replica_id -> count
108+
self.P = {self.replica_id: 0}
109+
self.N = {self.replica_id: 0}
110+
111+
def increment(self, amount: int = 1):
112+
self.P[self.replica_id] += amount
113+
114+
def decrement(self, amount: int = 1):
115+
self.N[self.replica_id] += amount
116+
117+
def value(self) -> int:
118+
"""Returns the current accumulated count."""
119+
return sum(self.P.values()) - sum(self.N.values())
120+
121+
def merge(self, other: 'PNCounter'):
122+
"""
123+
Merges another PNCounter's state into this one.
124+
Takes the element-wise maximum for both P and N vectors.
125+
"""
126+
# Merge Positive vector
127+
all_p_keys = set(self.P.keys()).union(other.P.keys())
128+
for key in all_p_keys:
129+
self.P[key] = max(self.P.get(key, 0), other.P.get(key, 0))
130+
131+
# Merge Negative vector
132+
all_n_keys = set(self.N.keys()).union(other.N.keys())
133+
for key in all_n_keys:
134+
self.N[key] = max(self.N.get(key, 0), other.N.get(key, 0))
135+
136+
# Example Usage
137+
if __name__ == "__main__":
138+
nodeA = PNCounter("A")
139+
nodeB = PNCounter("B")
140+
141+
nodeA.increment(5)
142+
nodeB.increment(3)
143+
nodeA.decrement(2) # Net value A: 3, B: 3
144+
145+
# Sync Node A -> Node B
146+
nodeB.merge(nodeA)
147+
print("Node B Value after sync:", nodeB.value()) # Expected: 6 (5 - 2 from A + 3 from B)
148+
```
149+
150+
```javascript
151+
class LWWElementSet {
152+
/**
153+
* Last-Write-Wins Element Set (LWW-Element-Set) CvRDT.
154+
* Maintains an add set and a remove set with timestamps.
155+
*/
156+
constructor() {
157+
this.addSet = new Map(); // element -> timestamp
158+
this.removeSet = new Map(); // element -> timestamp
159+
}
160+
161+
add(element, timestamp = Date.now()) {
162+
const existing = this.addSet.get(element) || 0;
163+
this.addSet.set(element, Math.max(existing, timestamp));
164+
}
165+
166+
remove(element, timestamp = Date.now()) {
167+
const existing = this.removeSet.get(element) || 0;
168+
this.removeSet.set(element, Math.max(existing, timestamp));
169+
}
170+
171+
contains(element) {
172+
const addTime = this.addSet.get(element);
173+
if (addTime === undefined) return false;
174+
175+
const removeTime = this.removeSet.get(element);
176+
if (removeTime === undefined) return true;
177+
178+
// If added after removed, or added at the same time (bias towards add)
179+
return addTime >= removeTime;
180+
}
181+
182+
merge(other) {
183+
// Merge addSet
184+
for (const [element, timestamp] of other.addSet.entries()) {
185+
const localTime = this.addSet.get(element) || 0;
186+
this.addSet.set(element, Math.max(localTime, timestamp));
187+
}
188+
189+
// Merge removeSet
190+
for (const [element, timestamp] of other.removeSet.entries()) {
191+
const localTime = this.removeSet.get(element) || 0;
192+
this.removeSet.set(element, Math.max(localTime, timestamp));
193+
}
194+
}
195+
}
196+
197+
// Example Usage
198+
const client1 = new LWWElementSet();
199+
const client2 = new LWWElementSet();
200+
201+
client1.add("item1", 100);
202+
client2.remove("item1", 150); // client2 removes it with later timestamp
203+
204+
client1.merge(client2);
205+
console.log("Contains item1?", client1.contains("item1")); // false
206+
207+
client1.add("item1", 200); // client1 re-adds it later
208+
console.log("Contains item1 after re-add?", client1.contains("item1")); // true
209+
```
210+
211+
:::
212+
-
213+
- # CRDT vs Operational Transformation (OT)
214+
collapsed:: true
215+
- ## Architectural Differences
216+
- **Operational Transformation (OT)** is another conflict resolution paradigm used extensively in collaborative editors (e.g., Google Docs).
217+
- Instead of utilizing math in the data structure itself to resolve conflict, OT intercepts concurrent operations and intercepts/modifies their offsets dynamically depending on peer operations.
218+
-
219+
- | Feature | Operational Transformation (OT) | CRDT |
220+
|---------|---------------------------------|------|
221+
| **Topology** | Typically Server-Client (Centralized) | Peer-to-Peer or Server-Client (Decentralized) |
222+
| **Offline Support** | Poor (Requires synchronization locks) | Excellent (Offline updates merge natively) |
223+
| **Implementation Complexity** | Extremely high (Edge-cases in transform matrices) | Low to Medium (Relies on data structure rules) |
224+
| **Memory Overhead** | Low | High (Needs to store tombstones/metadata) |
225+
| **Use Cases** | Google Docs, Wave | Figma, Apple Notes, Yjs, Automerge, Redis |
226+
-
227+
- # Real-World Systems Utilizing CRDTs
228+
collapsed:: true
229+
- **Figma**: Uses a custom CRDT layout engine to synchronize multiplayer canvas edits.
230+
- **Redis Enterprise**: Uses CRDTs to provide Active-Active Multi-Master replication databases across geographically separated regions.
231+
- **Riak KV / Cassandra**: Uses CRDTs for resolving values in multi-master tables under eventual consistency.
232+
- **Yjs & Automerge**: Popular open-source JavaScript libraries for building offline-first, real-time collaborative text and rich-text editing applications.
233+
-
234+
- # Key Takeaways
235+
collapsed:: true
236+
- **Convergence Guarantee**: CRDTs ensure that any two nodes that have received the same set of updates are guaranteed to be in the same state.
237+
- **Mathematical Rigor**: CvRDT states must be partially ordered, and merging must be associative, commutative, and idempotent to prevent double-counting.
238+
- **Tombstones**: Removing data in sets or sequences usually requires storing a record of deletion (a "tombstone"), which can cause data structure size to grow indefinitely unless garbage collected.
239+
-
240+
- # More Learn
241+
collapsed:: true
242+
- ## Resources & Readings
243+
- [A comprehensive study of Convergent and Divergent Replicated Data Types](https://inria.hal.science/inria-00555588/document) — The original seminal paper by Marc Shapiro et al.
244+
- [Designing Data-Intensive Applications – Martin Kleppmann](https://dataintensive.net/) — Chapter 5 discusses replication and CRDT conflict resolution patterns.
245+
- [Yjs Documentation](https://docs.yjs.dev/) — In-depth look at production-grade text/rich-text CRDTs.
246+
- [Wikipedia -> CRDT](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type)

0 commit comments

Comments
 (0)