Welcome to the APAC Roadshow 2026 workshop! This guide provides complete instructions for setting up and running the KartShoppe training platform from a blank slate. KartShoppe is a fully-functional, real-time e-commerce application designed to teach modern data engineering principles.
This session is designed for participants to start with just the source code and progressively build, deploy, and enhance the platform by introducing powerful stream processing capabilities with Apache Kafka and Apache Flink.
By the end of this workshop, you will have hands-on experience with:
- Event-Driven Architecture: Understand how to use Kafka as the central nervous system of a modern, decoupled application.
- Apache Flink Fundamentals: Go from zero to building sophisticated, stateful stream processing jobs. You'll learn to implement sources, sinks, transformations, and windowing to solve real business problems.
- Reactive Microservices: See how a Quarkus-based backend can consume, process, and serve data from Flink and Kafka, pushing live updates directly to a web UI.
- Database Change Data Capture (CDC): Learn to capture row-level changes from a PostgreSQL database in real-time and turn them into an event stream for Flink to process.
- Stateful Stream Processing: Implement practical, stateful logic to solve classic e-commerce challenges like real-time inventory management.
This setup follows a progressive, hands-on learning approach designed for maximum impact:
-
Start with a Working System: You'll begin by launching the core KartShoppe application. It's a functional e-commerce site with a UI, API, and message broker, but with a key piece missing: real-time data processing.
-
Incremental Enhancements: The workshop guides you through developing and deploying specific Flink jobs. You won't just learn theory; you'll solve a real business problem with each job you write.
-
Tangible, Visual Feedback: As soon as you deploy a Flink job, you will see a new feature come to life in the KartShoppe UI. Watch as inventory counts update in real-time and order statuses change instantly based on user behavior. This immediate feedback loop makes learning concrete and rewarding.
- Frontend App: The user interface within the browser that interacts with the backend via REST APIs and WebSockets.
- Quarkus API: The backend application that handles core business logic. It writes orders to the Postgres database and produces raw order events into the Kafka cluster. It also consumes processed inventory events from Kafka to update the frontend.
- Postgres: The primary transactional database.
- Kafka Cluster: The central message bus for all real-time events.
- Apache Flink Cluster: A stream processing platform that runs applications for order processing and inventory management in real time.
- Tooling (Kpow and Flex): Provides monitoring and management capabilities for the Kafka and Flink clusters, respectively.
The Quarkus API serves as the central backend, handling synchronous user-facing interactions and asynchronous event processing. While it exposes multiple REST endpoints for actions like viewing products and managing the shopping cart, its most critical functions revolve around the checkout process and real-time data synchronization.
-
Checkout and Inventory Event Logic: The
/checkoutendpoint orchestrates the core order processing flow. After persisting the order to the PostgreSQL database, it employs conditional logic based on auseCdcflag from the frontend:- When
useCdcisfalse: The API is directly responsible for triggering inventory updates. It iterates through the order's items and uses theorderEventEmitterto produce an event for each item onto a Kafka topic. These specific events are consumed downstream by the Flink Inventory Job to calculate inventory changes in real time. - When
useCdcistrue: This direct event publishing is skipped. The system instead relies on the database transaction being captured by the Flink CDC Job to drive the inventory workflow.
- When
-
Real-time Product State Management: The API uses two distinct Kafka consumer technologies that work together to build a complete, real-time view of product data.
- The Kafka Streams application consumes events from the
inventory-eventstopic. It processes these partial updates (e.g., stock level changes) and merges them into the full product objects stored in the cache. - A standard Kafka Consumer (
ProductConsumer) subscribes to theproductstopic, which contains full product details (e.g., name, description, initial price).
- The Kafka Streams application consumes events from the
-
Unified Cache and WebSocket Updates: Both the Kafka Streams app and the Kafka Consumer, upon receiving their respective events, call a central update function. This unified logic ensures data consistency by performing two actions simultaneously:
- Updates Internal Cache: It modifies the in-memory state store managed by the
ProductCacheService. This ensures that REST API calls always retrieve the most current product data with low latency. - Pushes to Frontend: It sends the complete, updated product object to all connected clients via the
EcommerceWebsocket, ensuring the user interface reflects every change in real-time.
- Updates Internal Cache: It modifies the in-memory state store managed by the
The stream processing layer of the architecture is composed of two distinct and composable Flink jobs that work in tandem to process order data and manage product inventory in real-time.
This is the core stateful processing job that maintains a real-time view of product inventory. It is a sophisticated application that demonstrates several key Flink patterns.
-
Purpose: To consume product catalog updates and real-time order events, calculate inventory levels, generate alerts, and publish enriched data streams for use by other services.
-
Key Patterns Implemented:
- Pattern 01: Hybrid Source for State Bootstrapping: The product data stream is created using a
HybridSource. This powerful pattern first reads a file (initial-products.json) to bootstrap the job with the complete product catalog, and then seamlessly switches to reading from a Kafka topic (product-updates) for continuous, real-time updates. - Pattern 02: Co-Processing Multiple Streams: The job's core logic uses a
CoProcessFunctiontoconnecttwo distinct streams: the product stream (created by the Hybrid Source) and the order stream (from theorder-eventstopic). This allows for unified logic and state management across different event types. - Pattern 03: Shared Keyed State: The
CoProcessFunctionmaintainsValueStatekeyed byproductId. This ensures that both product updates and order deductions modify the same, consistent state for any given product. - Pattern 04: Timers for Event Generation: The application uses Flinkβs built-in timer mechanism to actively monitor the state of each product. The
CoProcessFunctionregisters a processing time timer for eachproductIdthat is set to fire at a future point in time (e.g., one hour later). Crucially, whenever a new event arrives for that productβeither a catalog update or an order deductionβthe existing timer is cancelled and a new one is registered. This action effectively resets the "staleness" clock. If no events arrive for that product within the configured timeout period, the timer will fire, triggering theonTimercallback method. This callback then generates and emits a specificSTALE_INVENTORYevent, proactively signaling that the product's data might be out-of-date or require investigation. - Pattern 05: Side Outputs: To separate concerns, the job routes different types of business alerts (
LOW_STOCK,OUT_OF_STOCK,PRICE_DROP) away from the main data flow into dedicated side outputs. These are later unioned and sent to a specific alerts topic. - Pattern 06: Data Validation & Canonicalization: The job consumes raw product data, parses it into a clean
Productobject, and sinks it to a canonicalproductstopic for other microservices to use. - Output Streams: The job produces several distinct output streams to different Kafka topics:
products: Enriched, clean product data for general consumption.inventory-events: The main stream of events representing every change in inventory or price.inventory-alerts: A dedicated stream for all generated alerts.websocket_fanout: A copy of the inventory events, intended for real-time UI updates.
- Pattern 01: Hybrid Source for State Bootstrapping: The product data stream is created using a
This job acts as a real-time data pipeline that bridges the transactional database with the event-streaming ecosystem using Change Data Capture (CDC).
- Purpose: Its sole responsibility is to capture
INSERTevents from theorder_itemstable in PostgreSQL as they happen. - Process Flow:
- Capture: It uses the Flink CDC Source, backed by Debezium, to non-intrusively read the PostgreSQL Write-Ahead Log (WAL).
- Filter: The raw stream of database changes is immediately filtered to isolate only the creation of new rows (
op: "c") in theorder_itemstable. - Transform: Each captured event is transformed by a
MapFunctioninto a simple, clean JSON message containing only the essential fields:productId,quantity,orderId, andtimestamp. - Publish: The clean JSON events are published to the
order-eventsKafka topic.
Ensure students have:
- Docker Desktop installed and running
- Node.js: Version
18.xor higher. - Python: Version
3.10or higher.- Note it must include
pip(package installer) andvenv(virtual environment support).
- Note it must include
- 8GB RAM minimum (16GB recommended)
- Internet connection (for downloading dependencies)
- macOS or Linux (Windows with WSL2 works too)
Run this script ONCE before the training:
./setup-environment.shThis installs:
- β SDKMAN (Java version manager)
- β Java 11 (for building Flink jobs)
- β Java 17 (for running Quarkus)
- β Verifies Docker Desktop
- β Verifies Node.js 18+
- β Verifies Python 3.10+
Estimated time: 5-10 minutes
Expected output:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β¨ Setup Completed Successfully! β¨ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Verified & Installed Components:
β Docker: Docker version 24.x
β Node.js: v20.x.x
β npm: 10.x.x
β Python: Python 3.12.3
β SDKMAN: 5.x.x
β Java 11: 11.0.25-tem
β Java 17: 17.0.13-tem
For this workshop, we will set up the core data infrastructure for our application using Instaclustr's managed platform. Your tasks are separated into two parts: creating a Kafka cluster and connecting to a pre-provisioned Postgres database.
β© Skip this section if you are using local Kafka and PostgreSQL instances.
Your first task is to provision a new Apache Kafka cluster. This will serve as the central message bus for all real-time events in our application.
- Please follow the official Instaclustr guide to create your cluster: Creating an Apache Kafka Cluster.
Once the cluster is provisioned and running, take note of your connection details (especially the Broker Addresses and credentials), as you will need them for the upcoming steps.
To pre-configure logical replication for Flink CDC, a PostgreSQL database has already been provisioned for the workshop.
- You do not need to create this yourself.
- Your workshop instructor will provide you with the necessary connection details (host, port, database name, user, and password).
In this workshop, we'll use Kpow and Flex to monitor Apache Kafka and Apache Flink clusters. Both tools are covered by a single, free Community license that you need to activate.
Please follow these two simple steps:
- Generate Your License: Go to the Factor House Getting Started page to generate your personal license.
- Create
license.envFile: Once generated, copy the license environment variables and paste them into a new file namedlicense.env. You can uselicense.env.exampleas a reference for the correct format.
Get the core KartShoppe platform running without any Flink jobs.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PLATFORM ARCHITECTURE (No Flink Yet) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Frontend (React) ββ Quinoa ββ Quarkus API β
β β β
β WebSockets β
β β β
β Kafka β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Here are steps to start and stop the training platform depending on which instances (Instaclustr or local) are used for the workshop.
π Running with Instaclustr Managed Services
First, we need to tell our application how to connect to your newly created Kafka cluster and the provided PostgreSQL database.
- Open the
.envfile in the project root. - Using the connection details from your Instaclustr Kafka cluster and the provided PostgreSQL credentials, fill in the required values.
- You can use the
.env.remotefile as a template to see which variables are needed.
The Quarkus backend API needs a specific configuration file to connect to remote services. This command applies the configuration required for the application to work with Instaclustr services.
# This replaces the default properties with the one configured for remote connections
cp quarkus-api/src/main/resources/application.properties.remote \
quarkus-api/src/main/resources/application.propertiesOur Flink jobs need specific topics in Kafka to read from and write to. The following script will set up a Python virtual environment and create them for you.
# Create and activate a Python virtual environment
python3 -m venv venv
source venv/bin/activate
# Install required Python packages
pip install -r scripts/requirements.txt
# Run the script to create topics on your Instaclustr Kafka cluster
python scripts/manage_topics.py --action createNext, we'll run a script to create the orders and order_items tables in your provided PostgreSQL database.
# Ensure your virtual environment is still active
source venv/bin/activate
# Run the script to create the necessary tables
python scripts/manage_db.py --action initThis step launches all the moving parts of our system. The start-platform-remote.sh script performs two key actions:
- Starts Local Services in Docker: It launches the monitoring tools (Kpow, Flex) and the Flink cluster (JobManager, TaskManager) as Docker containers.
- Starts Applications Locally: It runs the Quarkus backend API and the frontend application directly on your machine.
./start-platform-remote.shβ Verification: Once everything is running, check the following URLs:
- KartShoppe App: http://localhost:8081
- Kpow for Kafka: http://localhost:4000
- Flex for Flink: http://localhost:5000
Note: The product page in the KartShoppe UI will be empty. This is expected! Our Flink job, which is responsible for populating the product data, isn't running yet.
Now, let's deploy our main Flink job. This job reads the initial product data, calculates inventory, and publishes the results to Kafka, which the UI is listening to.
./flink-inventory-with-orders-job.shβ Verification: After a few moments, refresh the KartShoppe UI. You should now see the products populated!
Try it out! You can now experiment with adding items to your cart and completing a purchase (leave the "Use Flink CDC" box unchecked for now). Watch the inventory levels change in real time.
Finally, deploy the second Flink job. This job uses Change Data Capture (CDC) to read order data directly from the database transaction log instead of from a direct Kafka event.
./flink-order-cdc-job.shβ Verification: To test this new data path, add items to your cart and proceed to checkout.
Important: On the checkout page, be sure to check the Use Flink CDC box before completing the purchase.
When you are finished with your session, you can shut down all the local components by running:
./stop-platform-remote.shThis script will stop the local Docker containers and terminate the Quarkus and frontend processes. It will not affect your remote Instaclustr services.
π Running with Local Instances
First, we need to tell our application how to connect to the Kafka cluster and PostgreSQL database running in Docker.
cp .env.local .envThe Quarkus backend API needs a specific configuration file to connect to the local instances. This command applies the configuration required for the application to work with the local instances.
# This replaces the default properties with the one configured for local connections
cp quarkus-api/src/main/resources/application.properties.local \
quarkus-api/src/main/resources/application.propertiesThis step launches all the moving parts of our system. The start-platform-local.sh script performs two key actions:
- Starts Local Infrastructure in Docker: It launches the entire local data platform as Docker containers. This includes:
- A Kafka cluster (using Redpanda)
- A PostgreSQL database
- Monitoring tools (Kpow, Flex)
- A Flink cluster (JobManager, TaskManager)
- Starts Applications Locally: It runs the Quarkus backend API and the frontend application directly on your machine.
./start-platform-local.shβ Verification: Once everything is running, check the following URLs:
- KartShoppe App: http://localhost:8081
- Kpow for Kafka: http://localhost:4000
- Flex for Flink: http://localhost:5000
Note: The product page in the KartShoppe UI will be empty. This is expected! Our Flink job, which is responsible for populating the product data, isn't running yet.
Now, let's deploy our main Flink job. This job reads the initial product data, calculates inventory, and publishes the results to Kafka, which the UI is listening to.
./flink-inventory-with-orders-job.shβ Verification: After a few moments, refresh the KartShoppe UI. You should now see the products populated!
Try it out! You can now experiment with adding items to your cart and completing a purchase (leave the "Use Flink CDC" box unchecked for now). Watch the inventory levels change in real time.
Finally, deploy the second Flink job. This job uses Change Data Capture (CDC) to read order data directly from the database transaction log instead of from a direct Kafka event.
./flink-order-cdc-job.shβ Verification: To test this new data path, add items to your cart and proceed to checkout.
Important: On the checkout page, be sure to check the Use Flink CDC box before completing the purchase.
When you are finished with your session, you can shut down all the local components by running:
./stop-platform-local.shThis script will stop the local Docker containers and terminate the Quarkus and frontend processes.
- Add 10 items of the same product to your cart
- Watch the inventory count decrease in real-time
- Check Flink dashboard to see events processed
- View Kafka topic to see inventory update messages
Kpow provides a powerful window into the Kafka cluster, allowing for the inspection of topics, tracing of messages, and production of new data.
β‘οΈ Kpow is accessible at: http://localhost:4000.
First, it is important to understand the topics that drive the application. In the Kpow UI, navigate to the Topics view. While tens of topics are present, these are the most critical for the workshop:
products: The canonical topic with the clean, validated state of all products.product-updates: The input topic for raw, real-time changes to the product catalog.order-events: Carries commands to deduct product quantity from inventory after a sale.inventory-events: A detailed audit log of every state change (e.g., price, inventory) for a product.inventory-alerts: A filtered stream for business-critical notifications like "low stock."
To see the initial product catalog that was loaded by the Flink job, the products topic can be inspected directly from the topic list. From the topic details view, locate the products topic. Click the menu icon on the left-hand side of the topic's row and select Inspect data.
It navigates to the Data > Inspect page with the products topic already pre-selected. Simply click the Search button to view the messages.
This section demonstrates how to trace a single purchase and observe the corresponding events in Kafka.
After a purchase of three units of the FutureTech UltraBook Pro 15 (PROD_0001), two new records are expected: one in order-events (the command) and one in inventory-events (the result).
In Kpow's Data Inspect view:
- Select both the
order-eventsandinventory-eventstopics. - Use the kJQ Filter to find records related to the specific product. This helps to filter out irrelevant data. Enter the following filter:
.value.productId == "PROD_0001" - Click Search.
The two new records related to the purchase will be displayed, showing the command to deduct inventory and the resulting event confirming the new stock level.
External events, like a new stock delivery, can also be simulated. This is done by producing a message directly to the product-updates topic. The following steps reset the inventory for the product back to 10.
-
In Kpow, navigate to Data > Produce.
-
Select the
product-updatestopic. -
Paste the following JSON into the Value field.
{ "productId": "PROD_0001", "name": "FutureTech UltraBook Pro 15", "description": "High-performance laptop with Intel i9, 32GB RAM, 1TB SSD. Premium quality from FutureTech.", "category": "Electronics", "brand": "FutureTech", "price": 1899.99, "inventory": 10, "imageUrl": "https://picsum.photos/400/300?random=1", "tags": [ "laptop", "computer", "productivity", "futuretech", "electronics" ], "rating": 4.5, "reviewCount": 49 } -
Click Produce.
β Verification: Once the message is produced, the Flink job will process it. After refreshing the KartShoppe UI, the product's inventory will be updated to 10.
Flex provides deep insights into the Flink cluster, showing job health, metrics, and the dataflow topology.
β‘οΈ Flex is accessible at: http://localhost:5000.
The main dashboard gives a high-level overview of the Flink cluster, including the two jobs deployed for this workshop.
The dataflow graph for any job can be inspected to understand how data moves through the pipeline.
- Navigate to Jobs > Inspect.
- Select the
Inventory Management Job.
Flex makes it easy to access the logs from Flink's JobManager and TaskManagers, which is essential for debugging. The logs can be checked to see the output from the Flink apps or to look for any potential errors.
- To view the JobManager logs, navigate to Job Manager > Logs.
- For TaskManager logs, navigate to Task Managers, select a specific manager, and then go to Inspect > Logs.
A running Flink job can be stopped at any time from the Flex UI.
To do this, navigate to Jobs > Inspect for the specific job and click the Cancel button in the top action bar.
- Show, then explain: Start each Flink job, show the visible change, THEN explain the concepts
- Use the UI constantly: Keep the browser open, refresh often to show real-time updates
- Monitor Flink dashboard: Show the job graph, metrics, backpressure (http://localhost:8081)
- Explore Kafka topics: Use Redpanda Console to see messages flowing (http://localhost:4000)
- Explain incrementally: Each module builds on the previous - don't overwhelm with all patterns at once
- Follow along: Run commands in your own terminal
- Experiment: Try breaking things! Stop a Flink job, see what happens, restart it
- Ask questions: Why does this pattern need keyed state? Why use broadcast state here?
- Monitor everything: Open all dashboards (Flink, Redpanda, Quarkus Dev UI)
- Read the logs:
tail -f logs/quarkus.logand Flink job logs show what's happening
# Check Docker
docker info
# Check Java version
java -version # Should be 17+
# if not
sdk use java 17.0.13-tem
# View logs
tail -f logs/quarkus.log
docker compose logs -f# Switch to Java 11 for building Flink jobs
sdk use java 11.0.25-tem
# Rebuild
./gradlew :flink-inventory:clean :flink-inventory:shadowJar
# Check logs
tail -f logs/inventory.log# Kill process on port 8081 (Quarkus)
lsof -ti:8081 | xargs kill -9
# Kill process on port 8082 (Flink, if running)
lsof -ti:8082 | xargs kill -9## Stop everything
# remote
./stop-platform-remote.sh
# local
./stop-platform-local.sh
## Clean Docker volumes
# remote
docker compose -f compose-remote.yml down -v
# local
docker compose -f compose-local.yml down -v
## Clean Gradle cache
./gradlew clean
# or
./clean.sh
## Restart
# remote
./start-platform-remote.sh
# local
./start-platform-local.sh















