-
Notifications
You must be signed in to change notification settings - Fork 296
Fix Electric SQL collections getting stuck after rapid tab changes #3431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Electric SQL collections getting stuck after rapid tab changes #3431
Conversation
Investigated Discord bug report where collections stop reconnecting after rapid tab switching in Firefox. Root causes identified: 1. PRIMARY: Race condition in pause/resume state machine - resume() only checks for 'paused' state, but pause() sets 'pause-requested' as intermediate state. Rapid visibility changes cause stream to get stuck. 2. SECONDARY: Memory leak - visibility change listener is never removed, causing stale handlers and memory accumulation. 3. CONTRIBUTING: SSE fallback logic can be triggered unintentionally during rapid tab switching. Firefox is more affected due to aggressive request abortion (NS_BINDING_ABORTED) and timing differences in visibility API. Includes recommended fixes for all identified issues.
Additional documentation files created during investigation: - VISIBILITY_BUG_ANALYSIS.md: Deep dive with state diagrams - VISIBILITY_BUG_SUMMARY.md: Concise summary with code snippets - VISIBILITY_BUG_QUICK_REFERENCE.md: Quick lookup guide
Fixes race condition where rapid visibility changes cause streams to get stuck and stop reconnecting, particularly in Firefox. Primary fix - Race condition in pause/resume state machine: - Modified #resume() to handle 'pause-requested' state in addition to 'paused' state - When resuming from 'pause-requested', set state back to 'active' to prevent the pause from completing - This fixes the bug where rapid tab switching causes #resume() to be called before the abort completes, leaving the stream stuck Secondary fix - Memory leak in visibility listener: - Added #unsubscribeFromVisibilityChanges field to store cleanup function - Modified #subscribeToVisibilityChanges() to store removeEventListener callback - Modified unsubscribeAll() to call cleanup function - Prevents memory leaks and stale event handlers Root cause: When tab visibility changes rapidly (especially in Firefox): 1. Tab hidden → #pause() sets state to 'pause-requested' 2. Request is aborted (takes time to complete) 3. Tab visible again → #resume() called before state becomes 'paused' 4. Old #resume() only checked for 'paused', so it did nothing 5. Stream remained stuck in 'pause-requested' state Firefox is more affected due to aggressive request abortion with NS_BINDING_ABORTED and different timing in visibility API implementation. Fixes: Collections getting stuck after rapid tab switching Related: Discord bug report about collections not reconnecting
- Add changeset for collection stuck bug fix - Remove investigation documentation files for cleaner PR
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3431 +/- ##
==========================================
- Coverage 69.54% 69.39% -0.15%
==========================================
Files 182 182
Lines 9778 9787 +9
Branches 353 360 +7
==========================================
- Hits 6800 6792 -8
- Misses 2976 2993 +17
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixes an additional race condition where a stale abort completion could overwrite the state after resume() has been called. Timeline of the bug: 1. Request #1 running with AbortController #1 2. Tab hidden → pause() sets state to 'pause-requested', aborts #1 3. Tab visible → resume() sets state to 'active', starts request #2 4. Old request #1's abort completes, sets state to 'paused' 5. Stream stuck because state is 'paused' but should be 'active' Fix: Only transition to 'paused' if state is still 'pause-requested'. If resume() already changed it to 'active', don't overwrite it. This ensures that old abort completions don't interfere with the new active request started by resume().
- Add type assertion to handle concurrent state changes during async operations that TypeScript's flow analysis cannot detect - Update changeset with detailed state machine explanation and both race condition fixes - Add explanatory comments about why type assertion is necessary TypeScript's control flow analysis sees state as 'active' after line 631, but doesn't account for the fact that the visibility change handler can call #pause() during the await, changing state to 'pause-requested'. The type assertion tells TypeScript we're intentionally checking the runtime value.
|
This fix looks good to me. It will handle the case where we quickly resume the stream before the pause fully finished. The old pause shouldn't interfere with the quick resume because of the extra check that's added in the if test. It would be nice to have preview packages in Electric such that the OP can test the fix and confirm it works before we actually merge it. |
reported at https://discord.com/channels/719702312431386674/1438326657063850035/1438326657063850035