Watchtower is a project designed to monitor S3 file events for further AI processing throught Doc-Search. This service is been designed for listening creating/uploading new files into cloud storage to download it and start processing to extract text content, build and store knowledge graph of content entities and store to Doc-Search service.
There are following domains:
domain
|----> File Storage (core)
| |----> Bucket
| | |----> Context: bucket management into s3 object storage
| | |----> Services: IBucketManager
| |----> Object
| |----> Context: object management into s3 object storage
| |----> Services: IObjectManager
|
|----> Task Processing (support)
| |----> Task
| | |----> Context: task management into storage
| | |----> Services: ITaskStorage
| |----> Message
| |----> Context: task queue management
| |----> Services: ITaskQueue
And there are usecases:
usecase
|----> Storage Use Case
| |----> CRUD of bucket and object
| |----> generate share URL of stored object
| |----> upload file to storage and create new task processing event
|
|----> Task Use Case
| |----> task management into storage and queue
|
|----> Orchestrator (process)
| |----> combined both usecases to common upload and processing file pipeline
| |----> task processing stages like recognizing and indexing by uploading files
There is context map:
+----------------+
| Orchesttator |
+--------+-------+
|
┌───────────┴───────────┐
▼ ▼
+----------------+ +-------------+
| StorageUseCase | | TaskUseCase |
+----------------+ +-------------+
| |
▼ ▼
+----------------+ +-------------+
| Storage Domain | | Task Domain |
+----------------+ +-------------+
Context data flow:
HTTP Request
│
▼
HTTP Handler (ServerState)
│
▼
Orchestrator (orchestrator)
├── StorageUseCase (application)
│ │
│ ▼
│ Storage (domain)
│
└── TaskUseCase (application)
│
▼
Task (domain)
- Task event based - create new event for processing by file uploading;
- Tasks management - using RabbitMQ and Redis for tasks management of processing;
- Text extracting - extract text from PDF, DOCX, and TXT files by OCR and LLM;
- Document storing - storing document object to Doc-Search service;
- Embeddings computing (removed) - computing file text content embeddings by pre-trained model for semantic-search.
- Stateless scalable architecture - stateless service that is guarantied by RabbitMQ and Redis services.
-
Clone the repository:
git clone <repository-url>/watchtower.git cd watchtower
-
Build docker image from sources:
docker build -t watchtower:latest . -
Edit configs file
configs/production.tomlor.envfile to launch docker compose services -
Start the application using Docker Compose:
docker compose up -d watchtower <other-needed-services>
-
The application should now be running. Check the logs with:
docker compose logs -f
