Skip to content

Conversation

@xoaryaa
Copy link
Contributor

@xoaryaa xoaryaa commented Dec 27, 2025

Related Issues

Proposed Changes:

  • Added a new web search component: BrightDataWebSearch under haystack/components/websearch.
  • Integrates with Bright Data SERP API via https://api.brightdata.com/request using data_format="parsed_light".
  • Google search engine is intentionally hard-coded (per Bright Data guidance).
  • Added pagination support via page_number:
    • Maps page_number to Google start parameter (start=(page_number-1)*10).
  • Converts SERP organic results into Haystack Documents:
    • content: description (fallback to title)
    • meta: title, link (and passes through optional fields when present, e.g. extensions, global_rank)
  • Supports optional domain filtering via allowed_domains.
  • Supports the following environment variables:
    • BRIGHT_DATA_API_TOKEN
    • BRIGHT_DATA_ZONE

How did you test it?

  • Unit tests:
    • pytest test/components/websearch/test_brightdata.py
  • Manual verification (Bright Data trial account):
    • Set:
      • BRIGHT_DATA_API_TOKEN
      • BRIGHT_DATA_ZONE
    • Ran:
      from haystack.components.websearch import BrightDataWebSearch
      
      search = BrightDataWebSearch(top_k=3)
      result = search.run(query="pizza", page_number=1)
      print([d.meta["link"] for d in result["documents"]])
    • Confirmed the component returns valid Google result links.

Notes for the reviewer

  • The integration is scoped to Google-only intentionally; future extensions can add other engines if needed.
  • Pagination is implemented via Google start only.
  • Error handling:
    • Raises TimeoutError on request timeouts.
    • Raises BrightDataWebSearchError for request failures or invalid responses.

Checklist

@xoaryaa xoaryaa requested a review from a team as a code owner December 27, 2025 15:41
@xoaryaa xoaryaa requested review from vblagoje and removed request for a team December 27, 2025 15:41
@vercel
Copy link

vercel bot commented Dec 27, 2025

@xoaryaa is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Dec 27, 2025
@meirk-brd
Copy link

Hi @xoaryaa ,

Looks great, thank you very much for contributing !

@vblagoje - will you be able to review the following PR? we would love to push it and create some content around it !

@vblagoje
Copy link
Member

@xoaryaa and @meirk-brd we can't integrate everyone's favourite search engine in haystack core, the core package will get quickly bloated. We reserve core for essential components only and we designate integrations such as this one to https://github.com/deepset-ai/haystack-core-integrations/ or even better a self-maintained component listed in our https://github.com/deepset-ai/haystack-integrations/ community project.

@vblagoje
Copy link
Member

I recommend you to self host this repo and we'll gladly add it to https://github.com/deepset-ai/haystack-integrations/ and co-promote content you publish around it.

@meirk-brd
Copy link

Got it, thank you very much @xoaryaa and @vblagoje , we can close this one, @xoaryaa we will create a PyPi package out of your contribution and will share the docs in : https://github.com/deepset-ai/haystack-integrations/

@vblagoje
Copy link
Member

Got it, thank you very much @xoaryaa and @vblagoje , we can close this one, @xoaryaa we will create a PyPi package out of your contribution and will share the docs in : https://github.com/deepset-ai/haystack-integrations/

Perfect, we have a deal @meirk-brd - pick a repo, do your own dev cycles independently, the component contract will likely stay the same for a long time! Update your integration details on https://github.com/deepset-ai/haystack-integrations/ when needed and we'll approve it quickly! For a blog and other content, coordinate efforts with @bilgeyucel and we are good to go! Thanks for these contributions and looking forward to your upcoming releases.

@xoaryaa
Copy link
Contributor Author

xoaryaa commented Dec 31, 2025

Thanks @meirk-brd @vblagoje
Closing this PR as requested

@xoaryaa xoaryaa closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bright Data web search component

3 participants