-
Notifications
You must be signed in to change notification settings - Fork 6.6k
feat: Add native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying.
#20318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…beddings during data ingestion and querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug where the embed_on_weaviate parameter was not properly preventing client-side embedding generation in the WeaviateVectorStore. The fix ensures that when embed_on_weaviate=True, vectors are not retrieved from nodes during data ingestion and querying, allowing Weaviate to generate embeddings server-side instead.
Key changes:
- Added a
use_vectorparameter to theget_data_objectutility function to conditionally retrieve embeddings from nodes - Updated
add,async_add, and query methods to passuse_vector=Falsewhenembed_on_weaviate=True - Added comprehensive unit tests covering both sync and async operations with the new parameter
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| llama_index/vector_stores/weaviate/utils.py | Added use_vector parameter to get_data_object function to conditionally retrieve node embeddings |
| llama_index/vector_stores/weaviate/base.py | Added _embed_on_weaviate private attribute and updated add, async_add, and get_query_parameters methods to respect the embedding mode |
| tests/test_vector_stores_weaviate.py | Added TestWeaviateEmbedding class with unit tests for add/async_add/query operations with both embedding modes, plus fixed async test method signature |
| pyproject.toml | Bumped version from 1.4.1 to 1.4.2 |
| uv.lock | Updated version lock file to reflect version bump |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py
Outdated
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py
Show resolved
Hide resolved
…r and attribute to `native_embedding`.
embed_on_weaviate option to allow Weaviate to generate embeddings during data ingestion and querying.native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
llama-index-integrations/vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py:491
- When
native_embedding=True, the vector is set to None on line 486, which breaks the alpha calculation logic for hybrid queries. On line 490, the conditionif vector is not None and query.query_str:will always be False when using native embeddings, causing alpha to remain 1.0 instead of using the configured value (default 0.5). Consider changing line 490 to:if (not self._native_embedding or query.query_embedding is not None) and query.query_str:to correctly handle both native and client-side embedding scenarios.
vector = query.query_embedding if not self._native_embedding else None
alpha = 1
if query.mode == VectorStoreQueryMode.HYBRID:
_logger.debug(f"Using hybrid search with alpha {query.alpha}")
if vector is not None and query.query_str:
alpha = query.alpha or 0.5
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py
Outdated
Show resolved
Hide resolved
.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py
Show resolved
Hide resolved
… hybrid query vector generation and update hybrid query logic.
…s://github.com/selvamanigovindaraj/llama_index into feature/allow-embedding-generation-on-weaviate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py
Show resolved
Hide resolved
.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py
Show resolved
Hide resolved
…g is enabled in Weaviate vector store and include corresponding tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| vector_store.async_client | ||
|
|
||
|
|
||
| class TestWeaviateEmbedding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid class based tests? None of the other tests follow this pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be nice to update the example notebook in docs/examples/vector_stores/WeaviateIndexDemo.ipynb to mention this
Description
This feature enables the use of Weaviate's native embedding capabilities, which are otherwise not directly accessible via standard LlamaIndex workflows that assume client-side embedding generation.
Fixes #18666
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
###Test Details:
Unit Tests: Added TestWeaviateEmbedding class to tests/test_vector_stores_weaviate.py covering:
E2E Verification: Ran comprehensive E2E tests against Weaviate Cloud and Embedded Weaviate, verifying:
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods