Skip to content

Conversation

@kappa90
Copy link
Contributor

@kappa90 kappa90 commented Dec 5, 2025

Problem

Memory nodes are sync and lead to deadlocks when running subagent workflows. Let's convert memory-related nodes to async to remove blocking operations.

Changes

  • Converted memory-related nodes to use async/await pattern
  • Added asyncio_mode = auto to pytest.ini (this is present in the main pytest.ini but not in ee)

How did you test this code?

Updated all tests

Copy link
Contributor Author

kappa90 commented Dec 5, 2025

@kappa90 kappa90 requested a review from a team December 6, 2025 00:01
@kappa90 kappa90 marked this pull request as ready for review December 6, 2025 00:01
Copilot AI review requested due to automatic review settings December 6, 2025 00:01
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. ee/hogai/chat_agent/slash_commands/commands/remember/command.py, line 51 (link)

    syntax: calling sync method append_core_memory() which doesn't exist anymore - should use async_append_core_memory()

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR converts memory-related nodes from synchronous to asynchronous operations to prevent deadlocks when running subagent workflows. The changes include converting database operations to their async equivalents, updating method signatures to use async/await, and converting all tests to be async-compatible.

Key Changes

  • Converted memory-related model methods and node operations to async/await pattern
  • Added asyncio_mode = auto to ee/pytest.ini for async test support
  • Updated all test classes to use NonAtomicBaseTest and converted test methods to async

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ee/pytest.ini Added asyncio_mode configuration for pytest async support
ee/models/assistant.py Converted CoreMemory methods to async (achange_status_to_pending, async_append_question_to_initial_text, etc.)
ee/hogai/chat_agent/slash_commands/test/test_slash_command_handler.py Converted router tests to async
ee/hogai/chat_agent/slash_commands/nodes.py Converted router method to arouter and made it async
ee/hogai/chat_agent/memory/test/test_nodes.py Converted all test classes to NonAtomicBaseTest and made test methods async with proper async database operations
ee/hogai/chat_agent/memory/nodes.py Converted memory node methods to async (arun, arouter), replaced sync database calls with async equivalents
ee/hogai/chat_agent/graph.py Updated router method calls to arouter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kappa90 kappa90 force-pushed the 12-06-fix_ph-ai_async_memory_nodes branch 2 times, most recently from 3edc6c1 to e456c9e Compare December 6, 2025 07:10
@kappa90 kappa90 force-pushed the 12-06-fix_ph-ai_async_memory_nodes branch 5 times, most recently from 0223120 to 009a1ec Compare December 7, 2025 11:31
@kappa90 kappa90 force-pushed the 12-06-fix_ph-ai_async_memory_nodes branch from 009a1ec to f61f0eb Compare December 7, 2025 19:59
Copy link
Contributor

@skoob13 skoob13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Pushed fixes for a few sync issues.

@skoob13 skoob13 added the evals-ready Whether to run AI evals on this PR. label Dec 8, 2025
@posthog-bot
Copy link
Contributor

🧠 AI eval results

Evaluated 35 experiments, comprising 64 metrics. Showing experiments with largest regressions first.

funnel

🔴 plan_correctness: 5.00%, -2.50% (improvements: 2, regressions: 2)
🟢 QueryKindSelection: 60.00%, +10.00% (improvements: 0, regressions: 0)
🔴 query_and_plan_alignment: 37.50%, -25.00% (improvements: 0, regressions: 0)
🔴 time_range_relevancy: 60.00%, -25.00% (improvements: 0, regressions: 0)

Baseline: master-1764965827 • Avg. case performance: ⏱️ 157.35 s, 🔢 1955 tokens, 💵 $0.0011 in tokens

insight_evaluation_accuracy

🔴 InsightEvaluationAccuracy: 25.00%, -25.00% (improvements: 0, regressions: 1)

Baseline: master-1764966032 • Avg. case performance: ⏱️ 15.08 s, 🔢 1547 tokens, 💵 $0.0030 in tokens

memory_onboarding

🔴 has_correct_style: 50.00%, -16.67% (improvements: 0, regressions: 1)
🔵 has_technical_details: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔴 satisfies_business_details: 83.33%, -16.67% (improvements: 0, regressions: 1)
🔵 satisfies_product_details: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966063 • Avg. case performance: ⏱️ 67.30 s, 🔢 406 tokens, 💵 $0.0029 in tokens

tool_search_session_recordings

🔴 date_time_filtering_correctness: 86.84%, -10.71% (improvements: 0, regressions: 2)
🔴 filter_generation_correctness: 90.79%, -8.93% (improvements: 0, regressions: 2)

Baseline: master-1764966802 • Avg. case performance: ⏱️ 10.09 s, 🔢 16787 tokens, 💵 $0.0344 in tokens

retention

🔵 QueryKindSelection: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🟢 plan_correctness: 75.00%, +20.00% (improvements: 2, regressions: 1)
🟢 query_and_plan_alignment: 84.00%, +27.75% (improvements: 3, regressions: 0)
🔴 time_range_relevancy: 90.00%, -10.00% (improvements: 0, regressions: 1)

Baseline: master-1764966182 • Avg. case performance: ⏱️ 49.36 s, 🔢 0 tokens

tool_search_session_recordings

🔴 date_time_filtering_correctness: 91.67%, -5.56% (improvements: 0, regressions: 1)
🔴 filter_generation_correctness: 92.36%, -5.56% (improvements: 0, regressions: 1)

Baseline: master-1764966802 • Avg. case performance: ⏱️ 18.86 s, 🔢 16497 tokens, 💵 $0.0338 in tokens

tool_generate_hogql_query

🔵 no_mustache: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🟢 sql_semantics_correctness: 40.00%, +14.29% (improvements: 2, regressions: 0)
🟢 sql_syntax_correctness: 80.00%, +21.43% (improvements: 3, regressions: 0)

Baseline: master-1764966728 • Avg. case performance: ⏱️ 6.31 s, 🔢 21984 tokens, 💵 $0.0446 in tokens

memory

🟢 ToolRelevance: 88.80%, +18.80% (improvements: 2, regressions: 0)
🟢 memory_content_relevance: 87.00%, +17.00% (improvements: 2, regressions: 0)

Baseline: master-1764966049 • Avg. case performance: ⏱️ 1.89 s, 🔢 1142 tokens, 💵 $0.0032 in tokens

yaml_fixing

🟢 ExactMatch: 100.00%, +16.67% (improvements: 1, regressions: 0)

Baseline: master-1764966888 • Avg. case performance: ⏱️ 0.64 s, 🔢 169 tokens, 💵 $0.0001 in tokens

surveys

🔵 feature_flag_integration: 58.33%, ±0.00% (improvements: 0, regressions: 0)
🔵 feature_flag_understanding: 83.33%, ±0.00% (improvements: 0, regressions: 0)
🔵 first_question_type_correct: 20.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 survey_creation_basics: 80.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 survey_question_quality: 74.00%, ±0.00% (improvements: 0, regressions: 0)
🟢 survey_relevance: 77.00%, +9.00% (improvements: 3, regressions: 0)

Baseline: master-1764966574 • Avg. case performance: ⏱️ 3.75 s, 🔢 5470 tokens, 💵 $0.0117 in tokens

25 experiments with no significant changes

trends

🔵 plan_correctness: 77.50%, ±0.00% (improvements: 1, regressions: 1)
🔵 QueryKindSelection: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 query_and_plan_alignment: 88.13%, -1.25% (improvements: 0, regressions: 1)
🔵 time_range_relevancy: 95.63%, +1.88% (improvements: 1, regressions: 0)

Baseline: master-1764966595 • Avg. case performance: ⏱️ 30.34 s, 🔢 0 tokens

tool_routing_session_replay

🔵 ToolRelevance: 28.20%, -1.05% (improvements: 1, regressions: 2)

Baseline: master-1764966353 • Avg. case performance: ⏱️ 8.63 s, 🔢 0 tokens

root_documentation

🔵 ToolRelevance: 95.18%, +0.24% (improvements: 15, regressions: 10)

Baseline: master-1764966282 • Avg. case performance: ⏱️ 16.51 s, 🔢 0 tokens

tool_routing_dashboard_creation

🔵 ToolRelevance: 58.68%, -0.14% (improvements: 2, regressions: 2)

Baseline: master-1764965807 • Avg. case performance: ⏱️ 9.30 s, 🔢 0 tokens

root

🔵 ToolRelevance: 10.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966266 • Avg. case performance: ⏱️ 8.07 s, 🔢 0 tokens

root_entity_search

🔵 ToolRelevance: 98.22%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966314 • Avg. case performance: ⏱️ 4.92 s, 🔢 0 tokens

root_style

🔵 style_checker: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966325 • Avg. case performance: ⏱️ 8.88 s, 🔢 0 tokens

session_summarization_no_context

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966370 • Avg. case performance: ⏱️ 5.15 s, 🔢 0 tokens

session_summarization_limit_with_context

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966381 • Avg. case performance: ⏱️ 7.33 s, 🔢 0 tokens

session_summarization_limit_without_context

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966396 • Avg. case performance: ⏱️ 6.05 s, 🔢 0 tokens

session_summarization_current_session

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966408 • Avg. case performance: ⏱️ 6.72 s, 🔢 0 tokens

sql

🔵 plan_correctness: 0.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 retry_efficiency: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 QueryKindSelection: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 sql_syntax_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 time_range_relevancy: 98.21%, ±0.00% (improvements: 1, regressions: 1)

Baseline: master-1764966423 • Avg. case performance: ⏱️ 45.47 s, 🔢 0 tokens

survey_analysis

🔵 recommendation_quality: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 test_data_detection: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 theme_extraction_quality: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966550 • Avg. case performance: ⏱️ 9.94 s, 🔢 2697 tokens, 💵 $0.0088 in tokens

ui_context_actions

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966651 • Avg. case performance: ⏱️ 7.08 s, 🔢 0 tokens

ui_context_events

🔵 ToolRelevance: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966667 • Avg. case performance: ⏱️ 7.59 s, 🔢 0 tokens

create_experiment

🔵 ExperimentOutputScorer: 87.13%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966681 • Avg. case performance: ⏱️ 1.50 s, 🔢 0 tokens

create_feature_flag

🔵 FeatureFlagOutputScorer: 89.62%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966688 • Avg. case performance: ⏱️ 15.49 s, 🔢 13518 tokens, 💵 $0.0277 in tokens

combined_rename_and_add

🔵 SemanticSimilarity: 88.93%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966724 • Avg. case performance: ⏱️ 0.05 s, 🔢 0 tokens

tool_filter_revenue_analytics

🔵 date_time_filtering_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 filter_generation_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966751 • Avg. case performance: ⏱️ 1.91 s, 🔢 3826 tokens, 💵 $0.0081 in tokens

tool_filter_revenue_analytics_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966757 • Avg. case performance: ⏱️ 0.90 s, 🔢 3801 tokens, 💵 $0.0080 in tokens

tool_search_session_recordings_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966824 • Avg. case performance: ⏱️ 12.48 s, 🔢 45872 tokens, 💵 $0.0927 in tokens

tool_search_session_recordings_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966824 • Avg. case performance: ⏱️ 11.47 s, 🔢 73793 tokens, 💵 $0.1486 in tokens

tool_call_dashboard_creation

🔵 dashboard_creation_accuracy: 40.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966840 • Avg. case performance: ⏱️ 14.63 s, 🔢 7344 tokens, 💵 $0.0034 in tokens

filter_query_generation

🔵 SemanticSimilarity: 99.21%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966878 • Avg. case performance: ⏱️ 0.47 s, 🔢 579 tokens, 💵 $0.0012 in tokens

insights_addition

🔵 SemanticSimilarity: 55.14%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1764966893 • Avg. case performance: ⏱️ 0.31 s, 🔢 0 tokens

Triggered by this commit.

@skoob13 skoob13 removed the evals-ready Whether to run AI evals on this PR. label Dec 8, 2025
@kappa90 kappa90 merged commit 4b88747 into master Dec 8, 2025
184 checks passed
@kappa90 kappa90 deleted the 12-06-fix_ph-ai_async_memory_nodes branch December 8, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants