Skip to content

Commit fa36765

Browse files
Merge pull request #2681 from madeline-underwood/cass
Cassandra_JA to sign off
2 parents ff09754 + 8984a43 commit fa36765

File tree

7 files changed

+146
-132
lines changed

7 files changed

+146
-132
lines changed

content/learning-paths/servers-and-cloud-computing/cassandra-on-gcp/_index.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
---
2-
title: Deploy Cassandra on a Google Axion C4A virtual machine
3-
4-
draft: true
5-
cascade:
6-
draft: true
2+
title: Deploy Cassandra on a Google Axion C4A virtual machine
73

84
minutes_to_complete: 30
95

@@ -18,7 +14,7 @@ learning_objectives:
1814

1915
prerequisites:
2016
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
21-
- Familiarity with Cassandra architecture, replication, and [Cassandra partitioning & event-driven I/O](https://cassandra.apache.org/doc/stable/cassandra/architecture/)
17+
- Familiarity with Cassandra architecture, replication, and [Cassandra partitioning and event-driven I/O](https://cassandra.apache.org/doc/stable/cassandra/architecture/)
2218

2319
author: Pareena Verma
2420

content/learning-paths/servers-and-cloud-computing/cassandra-on-gcp/backgraound.md

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Get started with Cassandra on Google Axion C4A
3+
4+
weight: 2
5+
6+
layout: "learningpathall"
7+
---
8+
9+
## Explore Google Axion C4A Arm instances
10+
11+
Google Axion C4A is a family of Arm-based virtual machines built on Google's custom Axion CPU, based on Arm Neoverse-V2 cores. These virtual machines deliver strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
12+
13+
The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of Arm architecture in Google Cloud.
14+
15+
To learn more about Google Axion, see the Google blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).
16+
17+
## Explore Cassandra
18+
19+
Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers without a single point of failure.
20+
21+
It provides high availability, fault tolerance, and linear scalability, making it ideal for real-time big data applications and high-throughput workloads.
22+
23+
Cassandra is widely used for time-series data, IoT applications, recommendation engines, and large-scale cloud services. To learn, see the [Cassandra website](https://cassandra.apache.org/) and the [Cassandra documentation](https://cassandra.apache.org/doc/latest/).
Lines changed: 35 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,14 @@
11
---
2-
title: Apache Cassandra baseline testing on Google Axion C4A Arm Virtual machine
2+
title: Test Cassandra baseline functionality
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9+
## Overview
910

10-
Since Cassandra has been successfully installed on your GCP C4A Arm virtual machine, follow these steps to verify that it is running and functioning properly.
11-
12-
## Baseline Testing for Apache Cassandra
13-
14-
This guide helps verify the installation and perform baseline testing of **Apache Cassandra**.
11+
Now that Cassandra is installed on your GCP C4A Arm virtual machine, verify that it's running and functioning properly.
1512

1613
## Start Cassandra
1714

@@ -21,20 +18,23 @@ Run Cassandra in the background:
2118
cassandra -R
2219
```
2320

24-
The `-R` flag allows Cassandra to run in the background as a daemon, so you can continue using the terminal. The first startup may take **30–60 seconds** as it initializes the necessary files and processes.
21+
The `-R` flag allows Cassandra to run in the background as a daemon. The first startup may take 30–60 seconds as it initializes.
2522

2623
Check logs to ensure Cassandra started successfully:
2724

2825
```console
2926
tail -f ~/cassandra/logs/system.log
3027
```
31-
Look for the message **"Startup complete"**, which indicates Cassandra is fully initialized.
3228

33-
### Check Cassandra Status
29+
Look for the message "Startup complete", which indicates Cassandra is fully initialized.
30+
31+
## Check Cassandra status
32+
3433
```console
3534
nodetool status
3635
```
37-
You should see an output similar to:
36+
37+
The output is similar to:
3838

3939
```output
4040
Datacenter: datacenter1
@@ -44,29 +44,34 @@ Status=Up/Down
4444
-- Address Load Tokens Owns (effective) Host ID Rack
4545
UN 127.0.0.1 162.51 KiB 16 100.0% 78774686-39f3-47e7-87c3-3abc4f02a835 rack1
4646
```
47-
The `nodetool status` command displays the health and status of your Cassandra node(s). For a single-node setup, the output should indicate that the node is **Up (U)** and **Normal (N)**. This confirms that your Cassandra instance is running and ready to accept queries.
4847

49-
### Connect with CQLSH (Cassandra Query Shell)
50-
**cqlsh** is the interactive command-line shell for Cassandra. It allows you to run Cassandra Query Language (CQL) commands to interact with your database, create keyspaces and tables, insert data, and perform queries.
48+
For a single-node setup, the output should indicate that the node is Up (U) and Normal (N), confirming that your Cassandra instance is running and ready to accept queries.
49+
50+
## Connect with CQLSH
51+
52+
`cqlsh` is the interactive command-line shell for Cassandra that allows you to run Cassandra Query Language (CQL) commands.
5153

5254
```console
5355
cqlsh
5456
```
55-
You’ll enter the CQL (Cassandra Query Language) shell.
5657

57-
### Create a Keyspace (like a database)
58-
A **keyspace** in Cassandra is similar to a database in SQL systems. Here, we create a simple keyspace `testks` with a **replication factor of 1**, meaning data will only be stored on one node (suitable for a single-node setup).
58+
You'll enter the CQL (Cassandra Query Language) shell.
59+
60+
## Create a keyspace
61+
62+
A keyspace in Cassandra is similar to a database in SQL systems. Create a simple keyspace `testks` with a replication factor of 1 (suitable for a single-node setup):
5963

6064
```sql
6165
CREATE KEYSPACE testks WITH replication = {'class':'SimpleStrategy','replication_factor' : 1};
6266
```
63-
Check if created:
67+
68+
Verify the keyspace was created:
6469

6570
```sql
6671
DESCRIBE KEYSPACES;
6772
```
6873

69-
You should see an output similar to:
74+
The output is similar to:
7075

7176
```output
7277
cqlsh> DESCRIBE KEYSPACES;
@@ -75,8 +80,9 @@ system system_distributed system_traces system_virtual_schema
7580
system_auth system_schema system_views testks
7681
```
7782

78-
### Create a Table
79-
Tables in Cassandra are used to store structured data. This step creates a `users` table with three columns: `id` (unique identifier), `name` (text), and `age` (integer). The `id` column is the primary key.
83+
## Create a table
84+
85+
Create a `users` table with three columns:
8086

8187
```sql
8288
USE testks;
@@ -88,22 +94,24 @@ CREATE TABLE users (
8894
);
8995
```
9096

91-
### Insert Data
92-
We insert two sample rows into the `users` table. The `uuid()` function generates a unique identifier for each row, which ensures that every user entry has a unique primary key.
97+
## Insert data
98+
99+
Insert two sample rows into the `users` table. The `uuid()` function generates a unique identifier for each row:
93100

94101
```sql
95102
INSERT INTO users (id, name, age) VALUES (uuid(), 'Alice', 30);
96103
INSERT INTO users (id, name, age) VALUES (uuid(), 'Bob', 25);
97104
```
98105

99-
### Query Data
100-
This command retrieves all rows from the `users` table. Successful retrieval confirms that data insertion works correctly and that queries return expected results.
106+
## Query data
107+
108+
Retrieve all rows from the `users` table:
101109

102110
```sql
103111
SELECT * FROM users;
104112
```
105113

106-
You should see an output similar to:
114+
The output is similar to:
107115

108116
```output
109117
id | age | name
@@ -114,6 +122,6 @@ You should see an output similar to:
114122
(2 rows)
115123
```
116124

117-
This baseline test verifies that Cassandra 5.0.5 is installed and running correctly on the VM. It confirms the node status, allows connection via `cqlsh`, and ensures basic operations like creating a keyspace, table, inserting, and querying data work as expected.
125+
This baseline test verifies that Cassandra 5.0.5 is installed and running correctly on the VM, confirming node status, CQLSH connectivity, and basic database operations.
118126

119-
Please now press "Ctrl-D" to exit the Cassandra Query Shell.
127+
Press `Ctrl-D` to exit the Cassandra Query Shell.

content/learning-paths/servers-and-cloud-computing/cassandra-on-gcp/benchmnarking.md renamed to content/learning-paths/servers-and-cloud-computing/cassandra-on-gcp/benchmarking.md

Lines changed: 43 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,32 @@
11
---
2-
title: Cassandra Benchmarking
2+
title: Benchmark Cassandra performance
33
weight: 6
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Cassandra Benchmarking by Cassandra-Stress
10-
Cassandra benchmarking can be performed using the built-in `cassandra-stress` tool, which helps measure database performance under different workloads such as write, read, and mixed operations.
9+
## Benchmark Cassandra with cassandra-stress
1110

12-
### Steps for Cassandra Benchmarking with Cassandra-Stress
13-
**Verify cassandra-stress Installation:**
11+
You can perform Cassandra benchmarking using the built-in `cassandra-stress` tool, which measures database performance under different workloads such as write, read, and mixed operations.
1412

15-
Cassandra comes with a built-in tool called **cassandra-stress** that is used for testing performance. It is usually located in the `tools/bin/` folder of your Cassandra installation.
13+
## Verify cassandra-stress installation
14+
15+
Cassandra comes with a built-in tool called `cassandra-stress` that is used for testing performance. It's located in the `tools/bin/` folder of your Cassandra installation.
1616

1717
```console
1818
ls ~/cassandra/tools/bin | grep cassandra-stress
1919
```
20-
If you see cassandra-stress in the list, it means the tool is installed and ready to use.
2120

22-
**Run the version check:**
21+
If you see `cassandra-stress` in the list, the tool is installed and ready to use.
2322

24-
To make sure the tool works correctly, check its help options.
23+
Check the tool's help options to verify it works correctly:
2524

2625
```console
2726
~/cassandra/tools/bin/cassandra-stress help
2827
```
29-
You should see output similar to the following:
28+
29+
The output is similar to:
3030

3131
```output
3232
Usage: cassandra-stress <command> [options]
@@ -43,10 +43,10 @@ help : Print help for a command or option
4343
print : Inspect the output of a distribution definition
4444
version : Print the version of cassandra stress
4545
```
46-
If the tool is working, you will see a list of commands and options that you can use to run benchmarks.
47-
This confirms that your setup is correct and you’re ready to start testing Cassandra’s performance.
4846

49-
### Basic Write Test
47+
The list of commands and options confirms that your setup is correct and you're ready to start testing Cassandra's performance.
48+
49+
## Run a basic write test
5050
Insert 10,000 rows with 50 concurrent threads using `cassandra-stress`:
5151

5252
```console
@@ -56,7 +56,7 @@ Insert 10,000 rows with 50 concurrent threads using `cassandra-stress`:
5656
- **n=10000** → Specifies the number of rows to insert during the benchmark test.
5757
- **-rate threads=50** → Sets the number of concurrent worker threads simulating multiple clients writing to the cluster.
5858

59-
You should see output similar to the following:
59+
The output is similar to:
6060

6161
```output
6262
******************** Stress Settings ********************
@@ -186,13 +186,16 @@ Total operation time : 00:00:00
186186
END
187187
```
188188

189-
### Read Test
190-
The following command runs a **read benchmark** on your Cassandra database using `cassandra-stress`. It simulates multiple clients reading from the cluster at the same time and records performance metrics such as **throughput** and **latency**.
189+
## Run a read test
190+
191+
Run a read benchmark on your Cassandra database using `cassandra-stress`. This simulates multiple clients reading from the cluster at the same time and records performance metrics such as throughput and latency.
191192

192193
```console
193194
~/cassandra/tools/bin/cassandra-stress read n=10000 -rate threads=50
194195
```
195-
You should see output similar to the following:
196+
197+
The output is similar to:
198+
196199
```output
197200
******************** Stress Settings ********************
198201
Command:
@@ -322,21 +325,24 @@ Total operation time : 00:00:02
322325
END
323326
```
324327

325-
## Benchmark Results Table Explained:
328+
## Understand benchmark results
329+
330+
The metrics below explain what each value in the `cassandra-stress` output represents:
331+
332+
- Op rate (operations per second): the number of read operations Cassandra successfully executed per second.
333+
- Partition rate: the number of partitions read per second. Since this is a read test, the partition rate equals the op rate.
334+
- Row rate: the number of rows read per second. Again, for this test it equals the op rate.
335+
- Latency mean: the average time taken for each read request to complete.
336+
- Latency median: the 50th percentile latency - half of the operations completed faster than this time.
337+
- Latency max: the slowest single read request during the test.
338+
- Total partitions: the total number of partitions read during the test.
339+
- Total errors: the number of failed read operations.
340+
- GC metrics (Garbage Collection): shows whether JVM garbage collection paused Cassandra during the test.
341+
- Total operation time: the total wall-clock time taken to run the benchmark.
326342

327-
- **Op rate (operations per second):** The number of read operations Cassandra successfully executed per second.
328-
- **Partition rate:** Number of partitions read per second. Since this is a read test, the partition rate equals the op rate.
329-
- **Row rate:** Number of rows read per second. Again, for this test it equals the op rate.
330-
- **Latency mean:** The average time taken for each read request to complete.
331-
- **Latency median:** The 50th percentile latency — half of the operations completed faster than this time.
332-
- **Latency max:** The slowest single read request during the test.
333-
- **Total partitions:** The total number of partitions read during the test.
334-
- **Total errors:** Number of failed read operations.
335-
- **GC metrics (Garbage Collection):** Shows whether JVM garbage collection paused Cassandra during the test.
336-
- **Total operation time:** The total wall-clock time taken to run the benchmark.
343+
## Benchmark summary for Arm64
337344

338-
### Benchmark summary on Arm64
339-
Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SuSE shown, Ubuntu results were very similar):
345+
Results from the run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE shown; Ubuntu results were very similar):
340346

341347
| Metric | Write Test | Read Test |
342348
|----------------------------|----------------------|----------------------|
@@ -356,12 +362,12 @@ Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm6
356362
| Total GC Time | 0.0 s | 0.0 s |
357363
| Total Operation Time | 0:00:00 | 0:00:02 |
358364

359-
### Cassandra performance benchmarking notes
360-
When examining the benchmark results, you will notice that on the Google Axion C4A Arm-based instances:
365+
## What you've accomplished and what's next
366+
367+
You've successfully deployed Apache Cassandra 5.0.5 on a Google Axion C4A Arm-based virtual machine, validated its functionality, and measured its performance using cassandra-stress. The benchmark results on Google Axion C4A Arm-based instances demonstrate strong performance characteristics.
368+
369+
Write operations achieved high throughput of 10,690 op/s, while read operations reached 4,962 op/s on the `c4a-standard-4` Arm64 VM. Write latency was notably low with a mean of 3.7 ms compared to reads at 6.3 ms, indicating fast write processing on this Arm64 VM. The 95th and 99th percentile latencies show consistent performance, with writes significantly faster than reads. Zero errors or GC overhead confirm stable and reliable benchmarking results.
361370

362-
- The write operations achieved a high throughput of **10,690 op/s**, while read operations reached **4,962 op/s** on the `c4a-standard-4` Arm64 VM.
363-
- Latency for writes was very low (mean: **3.7 ms**) compared to reads (mean: **6.3 ms**), indicating fast write processing on this Arm64 VM.
364-
- The 95th and 99th percentile latencies show consistent performance, with writes significantly faster than reads.
365-
- There were no errors or GC overhead, confirming stable and reliable benchmarking results.
371+
The Arm64 VM provides efficient and predictable performance, making it suitable for high-throughput Cassandra workloads. The low write latencies and high operation rates demonstrate that Arm-based infrastructure can effectively handle database operations that require both speed and consistency. These results provide a solid baseline for evaluating Cassandra performance on Arm64 architecture and can guide decisions about instance sizing and configuration for production deployments.
366372

367-
Overall, the Arm64 VM provides efficient and predictable performance, making it suitable for high-throughput Cassandra workloads.
373+
To continue building on this foundation, you can explore advanced Cassandra configurations such as multi-node cluster deployments, replication strategies for high availability, or performance tuning for specific workload patterns. You might also investigate integrating Cassandra with application frameworks or comparing performance across different Arm-based instance types to optimize for your use case.

0 commit comments

Comments
 (0)