A real-time market data aggregator built with Spring Boot, Apache Kafka, Kafka Streams, and MongoDB.
This project simulates live stock tick data, processes it in-flight using Kafka Streams to compute indicators like RSI, stores results in MongoDB, and uses MongoDB aggregation pipelines to analyze historical trends.
This is the completed app from the tutorial:
Building a Real-Time Market Data Aggregator with Kafka and MongoDB
End-to-end pipeline:
- Simulate live market tick data and publish it to Kafka.
- Process the stream using Kafka Streams (for example, computing RSI).
- Persist both raw and derived data into MongoDB.
- Query historical data using MongoDB aggregation pipelines for analytics.
- Java 21
- Spring Boot
- Spring for Apache Kafka
- Kafka Streams
- MongoDB Atlas (or local MongoDB)
- Java 21+
- Maven 3.9+
- Kafka 3.5+ (local install or Docker)
- MongoDB Atlas cluster (M0 works) or local MongoDB
Update src/main/resources/application.properties with your MongoDB connection string and database name:
spring.data.mongodb.uri=<YOUR_MONGODB_CONNECTION_STRING>
spring.data.mongodb.database=<YOUR_DATABASE_NAME>Kafka broker (default local):
spring.kafka.bootstrap-servers=localhost:9092Create and start Kafka using whatever setup you prefer (local install or Docker). Once Kafka is running, create the required topic(s) used by the app.
Example:
bin/kafka-topics.sh --create \
--topic market-data \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1If your app uses different topic names, update them accordingly.
From the project root:
mvn clean spring-boot:runYou should see logs indicating:
- simulated market data is being produced
- Kafka Streams is processing events
- computed indicators are being written to MongoDB
You can expect documents representing:
- raw tick data (symbol, price, timestamp)
- derived indicator values (for example RSI windows)
Exact fields depend on the implementation in the tutorial.
This demo is intentionally simple and designed to show the full loop:
Kafka -> Kafka Streams -> MongoDB
For production-like workloads, you would typically:
- increase topic partitions for parallelism
- tune Kafka Streams state stores and commit intervals
- use MongoDB indexes appropriate for query patterns
- consider time series collections if you are not using Search indexes