feat: Add `native_embedding` option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

selvamanigovindaraj · 2025-11-28T04:10:01Z

Description

This feature enables the use of Weaviate's native embedding capabilities, which are otherwise not directly accessible via standard LlamaIndex workflows that assume client-side embedding generation.

Fixes #18666

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

###Test Details:

Unit Tests: Added TestWeaviateEmbedding class to tests/test_vector_stores_weaviate.py covering:

Add method with native embedding enabled (verifies vector is not sent)
Add method with native embedding disabled (verifies vector is passed)
Query method with native embedding enabled (verifies vector is not sent)
Query method with native embedding disabled (verifies vector is passed)
Async add method with native embedding enabled (verifies vector is not sent)
Async add method with native embedding disabled (verifies vector is passed)
Async query method with native embedding enabled (verifies vector is not sent)

E2E Verification: Ran comprehensive E2E tests against Weaviate Cloud and Embedded Weaviate, verifying:

Server-Side Embedding: Native embedding enabled successfully adds nodes without client-side vectors.
Client-Side Embedding: Native embedding disabled successfully adds nodes with provided vectors.
Retrieval: Verified semantic search works in all scenarios.
Async Support: Verified async add works correctly.

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

…beddings during data ingestion and querying.

Copilot

Pull request overview

This PR fixes a bug where the embed_on_weaviate parameter was not properly preventing client-side embedding generation in the WeaviateVectorStore. The fix ensures that when embed_on_weaviate=True, vectors are not retrieved from nodes during data ingestion and querying, allowing Weaviate to generate embeddings server-side instead.

Key changes:

Added a use_vector parameter to the get_data_object utility function to conditionally retrieve embeddings from nodes
Updated add, async_add, and query methods to pass use_vector=False when embed_on_weaviate=True
Added comprehensive unit tests covering both sync and async operations with the new parameter

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
llama_index/vector_stores/weaviate/utils.py	Added `use_vector` parameter to `get_data_object` function to conditionally retrieve node embeddings
llama_index/vector_stores/weaviate/base.py	Added `_embed_on_weaviate` private attribute and updated `add`, `async_add`, and `get_query_parameters` methods to respect the embedding mode
tests/test_vector_stores_weaviate.py	Added TestWeaviateEmbedding class with unit tests for add/async_add/query operations with both embedding modes, plus fixed async test method signature
pyproject.toml	Bumped version from 1.4.1 to 1.4.2
uv.lock	Updated version lock file to reflect version bump

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py

...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py

…r and attribute to `native_embedding`.

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

llama-index-integrations/vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py:491

When native_embedding=True, the vector is set to None on line 486, which breaks the alpha calculation logic for hybrid queries. On line 490, the condition if vector is not None and query.query_str: will always be False when using native embeddings, causing alpha to remain 1.0 instead of using the configured value (default 0.5). Consider changing line 490 to: if (not self._native_embedding or query.query_embedding is not None) and query.query_str: to correctly handle both native and client-side embedding scenarios.

        vector = query.query_embedding if not self._native_embedding else None
        alpha = 1
        if query.mode == VectorStoreQueryMode.HYBRID:
            _logger.debug(f"Using hybrid search with alpha {query.alpha}")
            if vector is not None and query.query_str:
                alpha = query.alpha or 0.5

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py

.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py

… hybrid query vector generation and update hybrid query logic.

…s://github.com/selvamanigovindaraj/llama_index into feature/allow-embedding-generation-on-weaviate

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py

...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py

…g is enabled in Weaviate vector store and include corresponding tests.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

logan-markewich · 2025-12-01T16:48:16Z

...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py

            vector_store.async_client
+
+
+class TestWeaviateEmbedding:


Can we avoid class based tests? None of the other tests follow this pattern

logan-markewich · 2025-12-01T16:51:27Z

.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py

it'd be nice to update the example notebook in docs/examples/vector_stores/WeaviateIndexDemo.ipynb to mention this

feat: Add embed_on_weaviate option to allow Weaviate to generate em…

42a1464

…beddings during data ingestion and querying.

Copilot AI review requested due to automatic review settings November 28, 2025 04:10

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 28, 2025

Copilot started reviewing on behalf of selvamanigovindaraj November 28, 2025 04:10 View session

Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 04:13

Copilot AI reviewed Nov 28, 2025

View reviewed changes

.../vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py Outdated Show resolved Hide resolved

...ations/vector_stores/llama-index-vector-stores-weaviate/tests/test_vector_stores_weaviate.py Show resolved Hide resolved

refactor: rename Weaviate vector store's embed_on_weaviate paramete…

ecd9d3e

…r and attribute to `native_embedding`.

selvamanigovindaraj requested a review from Copilot November 28, 2025 05:21

selvamanigovindaraj changed the title ~~feat: Add embed_on_weaviate option to allow Weaviate to generate embeddings during data ingestion and querying.~~ feat: Add native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying. Nov 28, 2025

Copilot started reviewing on behalf of selvamanigovindaraj November 28, 2025 05:21 View session

Merge branch 'main' into feature/allow-embedding-generation-on-weaviate

1bc95dc

Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 05:24

Copilot AI reviewed Nov 28, 2025

View reviewed changes

selvamanigovindaraj added 2 commits November 28, 2025 11:11

feat: Add native_embedding option to WeaviateVectorStore to control…

f79e9f4

… hybrid query vector generation and update hybrid query logic.

Merge branch 'feature/allow-embedding-generation-on-weaviate' of http…

eacbb41

…s://github.com/selvamanigovindaraj/llama_index into feature/allow-embedding-generation-on-weaviate

selvamanigovindaraj requested a review from Copilot November 28, 2025 05:42

Copilot started reviewing on behalf of selvamanigovindaraj November 28, 2025 05:43 View session

Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 05:47

Copilot AI reviewed Nov 28, 2025

View reviewed changes

feat: Add validation for a non-empty query_str when native embeddin…

58a498c

…g is enabled in Weaviate vector store and include corresponding tests.

selvamanigovindaraj requested a review from Copilot November 28, 2025 05:57

Copilot started reviewing on behalf of selvamanigovindaraj November 28, 2025 05:58 View session

Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 05:59

Copilot AI reviewed Nov 28, 2025

View reviewed changes

selvamanigovindaraj requested a review from Copilot November 28, 2025 10:40

Copilot started reviewing on behalf of selvamanigovindaraj November 28, 2025 10:40 View session

Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 10:43

Copilot AI reviewed Nov 28, 2025

View reviewed changes

selvamanigovindaraj mentioned this pull request Dec 1, 2025

[Feature Request]: Leave embedding creation to vector stores #18666

Open

logan-markewich reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add `native_embedding` option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

feat: Add `native_embedding` option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

selvamanigovindaraj commented Nov 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

logan-markewich Dec 1, 2025

Uh oh!

logan-markewich Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

Are you sure you want to change the base?

feat: Add native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

Conversation

selvamanigovindaraj commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

logan-markewich Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

logan-markewich Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add `native_embedding` option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

feat: Add `native_embedding` option to allow Weaviate to generate embeddings during data ingestion and querying. #20318

selvamanigovindaraj commented Nov 28, 2025 •

edited

Loading