-
Notifications
You must be signed in to change notification settings - Fork 74
test: optimize example test discovery and execution speed #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: optimize example test discovery and execution speed #372
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
There's a few things to clarify from this PR
|
|
Here's an example output run after these fixes, just using the default This was run on a 32GB macbook m1 max - and takes around 4 minutes |
|
Here's the results from a run on my 32GB RAM / 4GB VRAM Thinkpad |
|
I don't think my issues are blocking, so I'm inclined to approve / merge but will wait a bit in case there are other opinions. |
-> A parsing error in the check. I'll fix. The URL is set in my environment
We have a number of models used in our tests. I looked at this one for validation as I didn't have it, but actually we'd need to do something on every test to check the model exists. Docs/upfront check can easily get out of date unless we can do it programatically. -> Suggest discussion, then additional issue/pr if action needed and multiple ones like are down to insufficient resources. 4GB vram is low for AI dev (though we'd always want to drive down min requirements for usage). The error at least is clear. A possible mitigation would be more fine grained markers to document memory usage - perhaps resulting in skipping all llm tests? Another alternative might be to allow cpu only (not gpu) -> Similar - let's discuss and plan |
c43a53b to
1540a3c
Compare
|
The issue with watsonx should be fixed now. |
|
I did not have the qwen model downloaded. Pulled it and trying again just to see what happens |
ollama probably intelligently split the layers across gpu and cpu. I have marked the test as large ram, as when I ran it the system was struggling somewhat since the model was ~16GB ram by itself. |
qwen2.5vl:7b ? For me it's only 6GB (we're starting to approach building cross-platform container images levels of fun here 😄 ) |
…s by default Addresses PR generative-computing#372 review feedback: 1. Keep examples clean by using comment-based pytest markers - Examples now use single-line `# pytest: marker1, marker2` format - No pytest imports or pytestmark assignments in example files - Added pytest hooks to parse markers and skip heavy examples early - Prevents memory exhaustion from HuggingFace imports during collection 2. Run qualitative tests by default - Removed `-m "not qualitative"` from pytest addopts - Users can opt-out with: `pytest -m "not qualitative"` (~2 min) - Default `pytest` now runs full suite including quality checks Changes: - docs/examples/conftest.py: +266 lines for comment marker parsing - docs/examples/*.py: All 61 examples restored clean with single comment marker - pyproject.toml: Removed qualitative exclusion, added explanatory comments - AGENTS.md: Updated documentation to reflect new defaults All examples remain pure Python without test infrastructure pollution.
|
@planetf1 I think you accidentally committed a bunch of temp files in your last commit |
- Add 'slow' marker for tests >5 minutes (e.g., dataset loading) - Skip slow tests by default via pyproject.toml addopts - Mark generative_gsm8k.py with slow marker - Exclude conftest.py from test collection - Update all documentation (AGENTS.md, README.md, test/MARKERS_GUIDE.md, PR_372_SUMMARY.md) Addresses PR generative-computing#372 review feedback: - Slow tests (like GSM8K) now skipped by default for fast experience - Qualitative tests still run by default (comprehensive) - conftest.py no longer collected as a test Default: pytest runs qualitative tests, skips slow (~4-6 min) Fast: pytest -m 'not qualitative' (~2 min) Slow: pytest -m slow All: pytest --co -q (bypass config)
ah. Fixed Still Refining the execution to ensure we can run all tests |
Test Suite Timing SummaryQuick Reference
Recommended Workflows
Key Insights
|
…s by default Addresses PR generative-computing#372 review feedback: 1. Keep examples clean by using comment-based pytest markers - Examples now use single-line `# pytest: marker1, marker2` format - No pytest imports or pytestmark assignments in example files - Added pytest hooks to parse markers and skip heavy examples early - Prevents memory exhaustion from HuggingFace imports during collection 2. Run qualitative tests by default - Removed `-m "not qualitative"` from pytest addopts - Users can opt-out with: `pytest -m "not qualitative"` (~2 min) - Default `pytest` now runs full suite including quality checks Changes: - docs/examples/conftest.py: +266 lines for comment marker parsing - docs/examples/*.py: All 61 examples restored clean with single comment marker - pyproject.toml: Removed qualitative exclusion, added explanatory comments - AGENTS.md: Updated documentation to reflect new defaults All examples remain pure Python without test infrastructure pollution.
- Add 'slow' marker for tests >5 minutes (e.g., dataset loading) - Skip slow tests by default via pyproject.toml addopts - Mark generative_gsm8k.py with slow marker - Exclude conftest.py from test collection - Update all documentation (AGENTS.md, README.md, test/MARKERS_GUIDE.md, PR_372_SUMMARY.md) Addresses PR generative-computing#372 review feedback: - Slow tests (like GSM8K) now skipped by default for fast experience - Qualitative tests still run by default (comprehensive) - conftest.py no longer collected as a test Default: pytest runs qualitative tests, skips slow (~4-6 min) Fast: pytest -m 'not qualitative' (~2 min) Slow: pytest -m slow All: pytest --co -q (bypass config)
1cec30c to
4b1b773
Compare
…ples with optional dependency detection - Add automatic detection for optional dependencies (langchain_core) - Fix import error in context_example.py (stdlib.base -> stdlib.context) - Add requires_heavy_ram marker to sofai_graph_coloring.py - Add 15-minute timeout to pytest configuration - Remove langchain_messages.py from manual skip list (now auto-detected) Resolves test failures while keeping examples clean with only marker comments.
|
Running the tests myself and thought I'd share my version of your table:
|
|
I updated my above table. I'm unclear on what is causing the skipped, deselected, and xpassed tests and if the warnings are just me or happening to everyone. I'll be taking a final look through the code changes after lunch. |
|
Mine:
|
ajbozarth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested everything locally and test run well (as documented in my comment above)
Looking at the code everything looks solid, just a few comments on potential merge conflicts with my open PRs depending on which gets merged first:
| You can then run all tests by running `pytest`, or only the CI/CD tests by | ||
| running `CICD=1 pytest`. See [test/MARKERS_GUIDE.md](../test/MARKERS_GUIDE.md) for | ||
| details on running specific test categories (e.g., by backend, resource requirements). | ||
| You can then run tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#369 merge conflict note: this section is removed entirely
| ollama serve # Start Ollama (required for most tests) | ||
| uv run pytest -m "not qualitative" # Skips LLM quality tests (~2 min) | ||
| uv run pytest # Full suite (includes LLM quality tests) | ||
| uv run pytest # Default: qualitative tests, skip slow tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#369 merge conflict note: the ruff/mypy steps are edited here as well
| You can then run all tests by running `pytest`, or only the CI/CD tests by | ||
| running `CICD=1 pytest`. See [test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md) for | ||
| details on running specific test categories (e.g., by backend, resource requirements). | ||
| You can then run tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#369 merge conflict note: This section is moved to CONTRIBUTING.md
4b1b773 to
607953d
Compare
- Add comment-based pytest markers for examples (cleaner than decorators) - Enable qualitative tests by default, add 'slow' marker for >5min tests - Improve test infrastructure with better skip logic and collection - Fix test failures: watsonx credentials, pytest imports, heavy RAM markers - Fix mypy errors in tools.py from upstream changes Resolves test discovery performance issues and improves CI reliability.
607953d to
20fb553
Compare
|
Opened up two new issues following a final test after rebase
Will merge @ajbozart The overlap in docs is small. thanks for the review. I can review your PR after rebase, but you've probably looked more closely at the text/guidance. This PR improves default behaviour, and adds a few more workable options, but is compatible with the original instructions. |
Misc PR
Type of PR
Description
This PR optimizes the test infrastructure to improve test execution speed, reliability, and developer experience. The changes address test collection hangs, improve skip handling, and establish proper test categorization.
Key Changes:
Example Test Discovery Optimization (e1701c1)
@pytest.mark.ollama,@pytest.mark.qualitative, etc.) to 66 example filesdocs/examples/conftest.py(import failure handling)pytest -m ""for full suite)AGENTS.md,README.md,docs/tutorial.md,test/MARKERS_GUIDE.mdStandalone Example Execution (1cd9c7b)
Qualitative Test Marking (c5e36ef)
docs/examples/rag/mellea_pdf.pyas qualitative due to external PDF dependencyTest Failure Fixes (f03581e)
vision_openai_examples.py: Simplified skip logic, added requirements docstringdocs/examples/conftest.py: Detect pytest.skip() exceptions in subprocess stderrtest_vision_openai.py::test_image_block_in_chat: Added@pytest.mark.qualitativedecoratortestpaths = ["test", "docs"]inpyproject.tomlfor fail-fast behaviorImpact:
pytest -m ""Testing