Skip to content

Commit b8da637

Browse files
committed
update readme
1 parent 4750e63 commit b8da637

File tree

3 files changed

+28
-7
lines changed

3 files changed

+28
-7
lines changed

README.md

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,17 @@ It's ideal for scenarios involving **event-driven architectures, microservices c
116116

117117
<br>
118118

119-
This stack builds a comprehensive analytics platform that erases the line between real-time stream analytics and large-scale batch processing. It achieves this by combining the power of Apache Flink and Apache Spark on a unified data lakehouse, enabling you to work with a single source of truth for all your data workloads.
119+
This stack builds a **comprehensive analytics platform** that erases the line between real-time stream analytics and large-scale batch processing. It achieves this by combining the power of **Apache Flink**, enhanced by [**Flex**](https://factorhouse.io/flex) for enterprise-grade management and monitoring, with **Apache Spark** on a unified data lakehouse, enabling you to work with a single source of truth for all your data workloads.
120120

121121
### 📌 Description
122122

123123
This architecture is designed around a modern data lakehouse that serves both streaming and batch jobs from the same data. At its foundation, data is stored in Apache Iceberg tables on MinIO, an S3-compatible object store. This provides powerful features like ACID transactions, schema evolution, and time travel for your data.
124124

125-
A central **Hive Metastore** acts as the unified catalog, or "brain," for the entire ecosystem. By using a robust **PostgreSQL** database as its backend, the metastore reliably tracks all table schemas and metadata. This central catalog allows both **Apache Flink** (for low-latency streaming) and **Apache Spark** (for batch ETL and interactive analytics) to discover, query, and write to the same tables seamlessly, eliminating data silos.
125+
A central **Hive Metastore** serves as a unified metadata catalog for the entire data ecosystem, providing essential information about the structure and location of datasets. By using a robust **PostgreSQL** database as its backend, the metastore reliably tracks all table schemas and metadata. This central catalog allows both **Apache Flink** (for low-latency streaming) and **Apache Spark** (for batch ETL and interactive analytics) to discover, query, and write to the same tables seamlessly, eliminating data silos.
126126

127127
The role of PostgreSQL is twofold: in addition to providing a durable backend for the metastore, it is configured as a high-performance transactional database ready for **Change Data Capture (CDC)**. This design allows you to stream every `INSERT`, `UPDATE`, and `DELETE` from your operational data directly into the lakehouse, keeping it perfectly synchronized in near real-time.
128128

129-
The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Flink management and monitoring, a **Flink SQL Gateway** enables interactive queries on live data streams, and a full **Spark cluster** supports complex data transformations. This integrated environment is ideal for building sophisticated solutions for fraud detection, operational intelligence, and unified business analytics.
129+
The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Flink management and monitoring, a **Flink SQL Gateway** enables interactive queries on live data streams, and a single node **Spark cluster** supports complex data transformations. This integrated environment is ideal for building sophisticated solutions for fraud detection, operational intelligence, and unified business analytics.
130130

131131
---
132132

@@ -135,7 +135,26 @@ The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Fli
135135
#### 🚀 Flex (Enterprise Flink Runtime)
136136

137137
- Container: **kpow** from (`factorhouse/flex:latest` (**enterprise**)) or **kpow-ce** from (`factorhouse/flex-ce:latest` (**community**))
138-
- Provides an enterprise-ready tooling solution to streamline and simplify Apache Flink management. It gathers Flink resource information, offering custom telemetry, insights, and a rich data-oriented UI.
138+
- Provides an enterprise-ready tooling solution to streamline and simplify Apache Flink management. It gathers Flink resource information, offering custom telemetry, insights, and a rich data-oriented UI. Key features include:
139+
- **Comprehensive Flink Monitoring & Insights:**
140+
- Gathers Flink resource information minute-by-minute.
141+
- Offers fully integrated metrics and telemetry.
142+
- Provides access to long-term metrics and aggregated consumption/production data, from cluster-level down to individual job-level details.
143+
- **Simplified Management for All User Groups:**
144+
- User-friendly interface and intuitive controls.
145+
- Aims to align business needs with Flink capabilities.
146+
- **Enterprise-Grade Security & Governance:**
147+
- **Versatile Authentication:** Supports DB, File, LDAP, SAML, OpenID, Okta, and Keycloak.
148+
- **Robust Authorization:** Offers Simple or fine-grained Role-Based Access Controls (RBAC).
149+
- **Data Policies:** Includes capabilities for masking and redaction of sensitive data (e.g., PII, Credit Card).
150+
- **Audit Logging:** Captures all user actions for comprehensive data governance.
151+
- **Secure Deployments:** Supports HTTPS and is designed for air-gapped environments (all data remains local).
152+
- **Powerful Flink Enhancements:**
153+
- **Multi-tenancy:** Advanced capabilities to manage Flink resources effectively with control over visibility and usage.
154+
- **Multi-Cluster Monitoring:** Manage and monitor multiple Flink clusters from a single installation.
155+
- **Key Integrations:**
156+
- **Prometheus:** Exposes endpoints for integration with preferred metrics and alerting systems.
157+
- **Slack:** Allows user actions to be sent to an operations channel in real-time.
139158
- Exposes UI at `http://localhost:3001`
140159

141160
#### 🧠 Flink Cluster (Real-Time Engine)
@@ -322,11 +341,11 @@ cd factorhouse-local
322341

323342
Core services like Flink, Spark, and Kafka Connect are designed to be modular and do not come bundled with the specific connectors and libraries needed to communicate with other systems like the Hive Metastore, Apache Iceberg, or S3.
324343

325-
`setup-env.sh` automates the process of downloading all the required dependencies and organizing them into a local deps directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.
344+
`setup-env.sh` automates the process of downloading all the required dependencies and organizing them into a local `deps` directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.
326345

327346
<details>
328347

329-
<summary><b>The following dependencies are downloaded.</b></summary>
348+
<summary><b>View all downloaded dependencies</b></summary>
330349

331350
#### Kafka Connectors
332351

@@ -428,6 +447,8 @@ export KPOW_LICENSE=/home/<username>/.factorhouse/kpow-license.env
428447
docker compose -p kpow -f compose-kpow.yml up -d
429448
```
430449
450+
> By default, it is configured to deploy the Enterprise edition. See below for instructions on how to configure it to run the Community edition instead.
451+
431452
<details>
432453
433454
<summary>License file example</summary>
@@ -466,7 +487,7 @@ services:
466487
467488
</details>
468489
469-
## Running the Platform with Docker
490+
## Running the Platform
470491
471492
To get the platform running, you first need to configure your local environment. This involves setting environment variables to select the edition you want to run (Community or Enterprise) and providing the file paths to your licenses. Once these prerequisites are set, you can launch the services using `docker compose`. You have two primary options: you can start all services (Kpow, Flex, and Pinot) together for a fully integrated experience, or you can run Kpow and Flex independently for more focused use cases. When you are finished, remember to run the corresponding `down` command to stop and remove the containers, and unset the environment variables to clean up your session.
472493

images/factorhouse-local.png

-139 KB
Loading

images/fh-local-labs.png

-263 KB
Loading

0 commit comments

Comments
 (0)