GraphRAG-Local-Bridge

GraphRAG-Local-Bridge is a deployment and adaptation toolkit for Microsoft GraphRAG. It enables seamless integration with local LLMs (e.g., SenseNova, Qwen) and embedding models (e.g., BGE-M3) by providing protocol translation and intelligent JSON error correction.

Key Highlights

Intelligent JSON Repairing: Automatically corrects illegal control characters and formatting errors in JSON outputs from local LLMs, preventing indexing failures.
Protocol Bridging: Translates OpenAI-style requests into custom API formats required by local inference engines.
BGE-M3 Optimization: Includes a dedicated proxy to make HuggingFace Text-Embeddings-Inference (TEI) compatible with GraphRAG.
Long-Context Ready: Pre-configured for models with 64K+ context windows to maximize knowledge discovery.

Prerequisites

OS: Ubuntu 22.04 or later
Python: 3.11.5 (Recommended)
GraphRAG Version: 2.7.0
Docker: Required for running the embedding service.

Installation & Setup

1. Environment Preparation

# Install core libraries
pip install graphrag==2.7.0 fastapi uvicorn httpx modelscope

2. Data Directory Setup

mkdir -p ./my_graphrag/input
# Place your .txt or .csv files into the input directory
# cp /your/source/data/*.txt ./my_graphrag/input/

3. Deploy Embedding Model (BGE-M3)

Download the model using ModelScope:

cd ./my_graphrag
modelscope download BAAI/bge-m3 --cache_dir ./bge-m3-model

Start the TEI container (adjust device ID as needed):

docker run -d --gpus "device=0" -p 8001:80 \
    --name bge-m3 \
    -v $(pwd)/bge-m3-model/BAAI/bge-m3:/data \
    --security-opt seccomp=unconfined \
    --pull always ghcr.io/huggingface/text-embeddings-inference:1.5 \
    --model-id /data \
    --dtype float16 \
    --port 80

Technical Specifications (API Formats)

The bridge services are designed to handle specific API schemas. If your backend differs, you may need to modify the proxy scripts.

1. LLM Backend Requirement

The current bridge_server.py is tailored for backends (like TGI or custom VLLM) that accept the following format:

# Example of the raw backend call the bridge makes:
curl -X POST http://10.119.70.11:8088/generate \
-H "Content-Type: application/json" \
-d '{
    "inputs": "<|im_start|>user\nWhat is a Knowledge Graph?<|im_end|>\n<|im_start|>assistant\n",
    "parameters": {
        "max_new_tokens": 2048,
        "temperature": 0.3,
        "stop": ["<|im_end|>"],
        "details": true
    }
}'

2. Embedding Proxy Requirement

The fix_tei_proxy.py ensures GraphRAG can communicate with the TEI service by forcing the encoding_format to float.

# Example of the call handled by the embedding proxy (port 8102):
curl -X POST http://localhost:8102/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
    "input": "GraphRAG is a powerful RAG technology",
    "model": "bge-m3"
}'

Running the Bridge Services

Keep these two services running in the background during indexing and querying.

1. LLM Bridge Server

Translates GraphRAG requests and performs JSON sanitization.

# Start the LLM bridge (listens on port 8900)
python bridge_server.py

2. Embedding Proxy

Fixes compatibility issues between GraphRAG and TEI.

# Start the Embedding proxy (listens on port 8102)
python fix_tei_proxy.py

Building the Knowledge Graph

1. Initialize Project

graphrag init --root .

2. Configure `settings.yaml`

Overwrite the generated settings.yaml with the optimized version provided in this repository. Ensure the api_base points to the bridge ports:

LLM: http://localhost:8900/v1
Embedding: http://localhost:8102/v1

3. Run Indexing

# Clear old cache
rm -rf output/* cache/* logs/*

# Run indexing with verbose logging
export LITELLM_LOG=DEBUG
graphrag index --root . --verbose

Executing Queries

Global Search (Macro-level questions)

graphrag query \
  --root . \
  --method global \
  --query "In 2003, who was the top executive of DeepBlue Optoelectronics’ parent company, and what was his management style?"

Local Search (Specific details)

graphrag query \
  --root . \
  --method local \
  --query "What was the direct trigger for Gu Changfeng leaving Tianqiong Group?"

Important Notes

Customizing for Other LLMs

The bridge_server.py is a template. If your LLM uses a different API (e.g., different field names or prompt wrappers), you must modify the chat_completions function in bridge_server.py to match your model's requirements.

LLM Compatibility

If your LLM is already 100% OpenAI-compatible, you can bypass bridge_server.py. However, if you encounter Invalid control character or Invalid JSON errors during indexing, use the bridge to benefit from its JSON cleaning logic.

Data Format

GraphRAG currently supports .txt and .csv. For PDF, Word, or Excel files, please convert them to plain text before placing them in the input/ folder.

Debugging

If community reports fail to generate, check the raw LLM responses:

cat cache/community_reporting/chat_create_community_report_* | less

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
test_files		test_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bge_m3.sh		bge_m3.sh
bridge_server.py		bridge_server.py
check_entities.py		check_entities.py
check_files.py		check_files.py
fix_tei_proxy.py		fix_tei_proxy.py
read_report.py		read_report.py
settings.yaml		settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG-Local-Bridge

Key Highlights

Prerequisites

Installation & Setup

1. Environment Preparation

2. Data Directory Setup

3. Deploy Embedding Model (BGE-M3)

Technical Specifications (API Formats)

1. LLM Backend Requirement

2. Embedding Proxy Requirement

Running the Bridge Services

1. LLM Bridge Server

2. Embedding Proxy

Building the Knowledge Graph

1. Initialize Project

2. Configure `settings.yaml`

3. Run Indexing

Executing Queries

Global Search (Macro-level questions)

Local Search (Specific details)

Important Notes

Customizing for Other LLMs

LLM Compatibility

Data Format

Debugging

License

About

Uh oh!

Releases

Packages

Languages

License

winer632/GraphRAG-Local-Bridge

Folders and files

Latest commit

History

Repository files navigation

GraphRAG-Local-Bridge

Key Highlights

Prerequisites

Installation & Setup

1. Environment Preparation

2. Data Directory Setup

3. Deploy Embedding Model (BGE-M3)

Technical Specifications (API Formats)

1. LLM Backend Requirement

2. Embedding Proxy Requirement

Running the Bridge Services

1. LLM Bridge Server

2. Embedding Proxy

Building the Knowledge Graph

1. Initialize Project

2. Configure settings.yaml

3. Run Indexing

Executing Queries

Global Search (Macro-level questions)

Local Search (Specific details)

Important Notes

Customizing for Other LLMs

LLM Compatibility

Data Format

Debugging

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Configure `settings.yaml`

Packages