A FastAPI-based vector database system for document chunking, embedding, and semantic search using Cohere's embedding models.
VectorServer is a document processing and retrieval system that:
- Stores documents in a hierarchical structure (Libraries > Documents > Chunks)
- Automatically chunks long documents into semantically meaningful segments
- Generates vector embeddings using Cohere's embed-v4.0 model
- Provides semantic search capabilities across document collections
- Offers a RESTful API for document management and search
- π’ Define the Chunk, Document and Library classes.
- π’ Implement two or three indexing algorithms, do not use external libraries,
- π’ Exact kNN:
- Time complexity: O(nd)
- Space complexity: O(n)
- Simplest and fastest to implement; most precise and fast enough for small datasets.
- π’ IVF
- Time complexity:
- Build time: O(I Γ N Γ K Γ D)
- I: Number of k-means iterations
- N Γ K Γ D: Each iteration computes distances from N vectors to K centroids
- Search Time: O(K Γ D + |P|)
- Coarse Search: O(K Γ D) - compute distance from query to K centroids
- Fine Search: O(|P|) - return labels of nearest centroid, where |P| = average size of labels β N/K
- Space complexity: O(N Γ D + K Γ D + N) - N = number of vectors - D = vector dimensions - K = number of partitions Where:
- N = number of vectors
- D = vector dimensionality
- K = number of partitions/centroids
- π’ Implement the necessary data structures/algorithms to ensure that there
are no data races between reads and writes to the database.
- I've used
aiosqliteto leverage FastAPI's async capabilities and prevent data races. This isn't a very "custom" solution; previously I had implemented theDBclass as a context manager which handled transactions manually. For SQLite, this is a fine solution, but it doesn't make the most of FastAPI's capabilities.
- I've used
- π’ Create the logic to do the CRUD operations on libraries and
documents/chunks.
- Most DB operations implemented
- π’ Implement an API layer on top of that logic to let users interact with the
vector database.
- All endpoints for Libraries implemented
- π’ Create a docker image for the project
- sufficient for development, but not for production
- π’ Metadata filtering
- π’ Persistence to Disk (indexes are currently not persisted to disk, must be rebuilt on each app start)
- π΄ Leader-Follower Architecture
- π΄ Python SDK Client
graph TB
C[HTTP Clients] --> MAIN[main.py]
API_DOCS[Swagger UI] --> MAIN
MAIN --> LIB_R[libraries.py]
MAIN --> DOC_R[documents.py]
MAIN --> CHUNK_R[chunks.py]
MAIN --> SEARCH_R[search.py]
MAIN --> INDEX_R[indexes.py]
LIB_R --> LIB_S[LibraryService]
DOC_R --> DOC_S[DocumentService]
CHUNK_R --> CHUNK_S[ChunkService]
SEARCH_R --> SEARCH_S[SearchService]
INDEX_R --> SEARCH_S
LIB_S --> LIB_REPO[LibraryRepository]
DOC_S --> DOC_REPO[DocumentRepository]
DOC_S --> CHUNK_REPO[ChunkRepository]
CHUNK_S --> CHUNK_REPO
SEARCH_S --> CHUNK_REPO
SEARCH_S --> DOC_REPO
SEARCH_S --> VECTOR_REPO[VectorIndexRepository]
LIB_REPO --> DB[Database Manager]
DOC_REPO --> DB
CHUNK_REPO --> DB
DB --> SQLITE[(SQLite)]
DOC_S --> EMBEDDER[Embedder]
DOC_S --> CHUNKER[SmartChunker]
SEARCH_S --> EMBEDDER
EMBEDDER --> COHERE[Cohere API]
VECTOR_REPO --> FLAT[FlatIndex]
VECTOR_REPO --> IVF[IVF Index]
SEARCH_S --> PERSISTENT[PersistentIndex]
PERSISTENT --> DISK[Disk Storage]
The system uses a three-tier hierarchical structure:
Library (Collection of related documents)
> Document (Individual files/texts)
> Chunk (Text segments with embeddings)
Libraries: Top-level collections for organizing documents by topic, project, or source
Documents: Individual text files or content with metadata
Chunks: Text segments (~500 characters) with vector embeddings for semantic search
- FastAPI Application: Async web framework with automatic OpenAPI documentation
- Route Handlers: RESTful endpoints for CRUD operations and search
- Dependency Injection: Service instances provided via FastAPI's dependency system
- Business Logic: Document processing, search orchestration, and entity management
- Transaction Management: Coordinates database operations across repositories
- Integration Points: Connects external APIs (Cohere) with internal systems
- Data Access: Abstract database operations with consistent interfaces
- Connection Management: Thread-safe SQLite connections with read/write separation
- Vector Operations: Specialized repositories for embedding storage and retrieval
- Smart Chunking: Intelligent text segmentation with boundary detection
- Embedding Generation: Cohere API integration for vector embeddings
- Index Management: Multiple indexing strategies (Flat, IVF) with persistence
- SQLite Database: Lightweight, serverless database with foreign key constraints
- Persistent Storage: Disk-based index caching for improved startup performance
Database: SQLite with foreign key constraints for data integrity
- Lightweight, serverless, perfect for development and testing
- BLOB storage for binary vector embeddings
- Automatic cascade deletion maintains referential integrity
Embedding Model: Cohere embed-v4.0 (1024 dimensions)
- State-of-the-art multilingual embeddings
- Optimized for search and retrieval tasks
- Consistent 1024-dimensional vectors for all content
Chunking Strategy: Intelligent text segmentation
- 500-character chunks
- 50-character overlap (NOT IMPLEMENTED)
- Smart boundary detection (sentences > words > characters) (NOT IMPLEMENTED)
- Preserves context across chunk boundaries
Framework: FastAPI + Pydantic
- Type safety with automatic validation
- OpenAPI documentation generation
- High performance async capabilities
- Python 3.11+
- Cohere API key
- Clone the repository
git clone <repository-url>
cd vectorserver- Environment Configuration
Create a
.envfile:
COHERE_API_KEY=your_cohere_api_key_here
DB_PATH=data/dev.sqlite- Install dependencies
pip install -r requirements.txt
# or with uv (recommended)
uv sync- Initialize Database
# the dev.sqlite database is included in this repository docker-compose up --build# Development server with hot reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Production server
uvicorn app.main:app --host 0.0.0.0 --port 8000The API will be available at http://localhost:8000 with interactive docs at /docs.
# Run all tests
pytest
# Run specific test modules
pytest tests/test_db.py -v
pytest tests/test_main.py -vCreate a Library
curl -X POST "http://localhost:8000/libraries" \
-H "Content-Type: application/json" \
-d '{
"name": "Research Papers",
"description": "Collection of ML research papers",
"metadata": {"topic": "machine_learning"}
}'Upload and Process Document
curl -X POST "http://localhost:8000/libraries/{library_id}/documents" \
-H "Content-Type: application/json" \
-d '{
"title": "Attention Is All You Need",
"content": "The dominant sequence transduction models...",
"metadata": {"authors": ["Vaswani", "Shazeer"], "year": 2017}
}'Semantic Search
curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{
"content": "Assiniboine",
"library_id": "9f9b0b6d-3671-4f9b-a20c-d9e31cc61dba"
}'
vectorserver/
βββ app/
β βββ models/ # Pydantic models
β β βββ library.py
β β βββ document.py
β β βββ chunk.py
β βββ routes/ # API endpoints
β β βββ libraries.py
β β βββ documents.py
β β βββ search.py
β βββ repositories # Database/indexing operations
β β βββ base.py
β β βββ library.py
β β βββ document.py
β β βββ chunk.py
β β βββ vector_index.py
β β βββ db.py
β βββ embeddings.py # Cohere embedding integration
β βββ settings.py # Configuration
β βββ main.py # FastAPI app
βββ tests/
β βββ *.py
βββ data/ # SQLite database files
βββ README.md
- Cosine similarity-based retrieval
- Configurable result count
- Cross-document search capabilities
- Embedding caching for performance
- Complete CRUD operations for all entities (NOT QUITE)
- Cascade deletion maintains data integrity
- JSON metadata storage for flexible schema
- Timestamp tracking for audit trails
- RESTful design with OpenAPI documentation
- Type-safe request/response models
- Error handling with detailed messages
- Async support for high concurrency
This project is licensed under the MIT License - see the LICENSE file for details.