Skip to content

Conversation

@selvamanigovindaraj
Copy link
Contributor

@selvamanigovindaraj selvamanigovindaraj commented Nov 28, 2025

Description

This feature enables the use of Weaviate's native embedding capabilities, which are otherwise not directly accessible via standard LlamaIndex workflows that assume client-side embedding generation.

Fixes #18666

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

###Test Details:

Unit Tests: Added TestWeaviateEmbedding class to tests/test_vector_stores_weaviate.py covering:

  • Add method with native embedding enabled (verifies vector is not sent)
  • Add method with native embedding disabled (verifies vector is passed)
  • Query method with native embedding enabled (verifies vector is not sent)
  • Query method with native embedding disabled (verifies vector is passed)
  • Async add method with native embedding enabled (verifies vector is not sent)
  • Async add method with native embedding disabled (verifies vector is passed)
  • Async query method with native embedding enabled (verifies vector is not sent)

E2E Verification: Ran comprehensive E2E tests against Weaviate Cloud and Embedded Weaviate, verifying:

  • Server-Side Embedding: Native embedding enabled successfully adds nodes without client-side vectors.
  • Client-Side Embedding: Native embedding disabled successfully adds nodes with provided vectors.
  • Retrieval: Verified semantic search works in all scenarios.
  • Async Support: Verified async add works correctly.

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

…beddings during data ingestion and querying.
Copilot AI review requested due to automatic review settings November 28, 2025 04:10
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 28, 2025
Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 04:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where the embed_on_weaviate parameter was not properly preventing client-side embedding generation in the WeaviateVectorStore. The fix ensures that when embed_on_weaviate=True, vectors are not retrieved from nodes during data ingestion and querying, allowing Weaviate to generate embeddings server-side instead.

Key changes:

  • Added a use_vector parameter to the get_data_object utility function to conditionally retrieve embeddings from nodes
  • Updated add, async_add, and query methods to pass use_vector=False when embed_on_weaviate=True
  • Added comprehensive unit tests covering both sync and async operations with the new parameter

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
llama_index/vector_stores/weaviate/utils.py Added use_vector parameter to get_data_object function to conditionally retrieve node embeddings
llama_index/vector_stores/weaviate/base.py Added _embed_on_weaviate private attribute and updated add, async_add, and get_query_parameters methods to respect the embedding mode
tests/test_vector_stores_weaviate.py Added TestWeaviateEmbedding class with unit tests for add/async_add/query operations with both embedding modes, plus fixed async test method signature
pyproject.toml Bumped version from 1.4.1 to 1.4.2
uv.lock Updated version lock file to reflect version bump

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@selvamanigovindaraj selvamanigovindaraj changed the title feat: Add embed_on_weaviate option to allow Weaviate to generate embeddings during data ingestion and querying. feat: Add native_embedding option to allow Weaviate to generate embeddings during data ingestion and querying. Nov 28, 2025
Copilot finished reviewing on behalf of selvamanigovindaraj November 28, 2025 05:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

llama-index-integrations/vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py:491

  • When native_embedding=True, the vector is set to None on line 486, which breaks the alpha calculation logic for hybrid queries. On line 490, the condition if vector is not None and query.query_str: will always be False when using native embeddings, causing alpha to remain 1.0 instead of using the configured value (default 0.5). Consider changing line 490 to: if (not self._native_embedding or query.query_embedding is not None) and query.query_str: to correctly handle both native and client-side embedding scenarios.
        vector = query.query_embedding if not self._native_embedding else None
        alpha = 1
        if query.mode == VectorStoreQueryMode.HYBRID:
            _logger.debug(f"Using hybrid search with alpha {query.alpha}")
            if vector is not None and query.query_str:
                alpha = query.alpha or 0.5

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

…g is enabled in Weaviate vector store and include corresponding tests.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

vector_store.async_client


class TestWeaviateEmbedding:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid class based tests? None of the other tests follow this pattern

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to update the example notebook in docs/examples/vector_stores/WeaviateIndexDemo.ipynb to mention this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Leave embedding creation to vector stores

2 participants