fix(sync-service): Fix subquery shape dependency validation on restore from backup #3628
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is still in draft because I have some reservations about it:
However this PR represents my current best guess as to how a Materializer went missing causing AutoArc's outage.
Summary
restore_dependency_handlesfor clarityProblem
Shapes with subqueries store handles to their dependency shapes in
shape_dependencies_handles. When restoring shapes from storage viaload_shapes, this list is rebuilt and validated byrestore_dependency_handles, which removes shapes whose dependencies no longer exist.However, when restoring from backup via
load_backup, this validation was not performed. The backup contains serializedShapestructs with theirshape_dependencies_handlesalready populated, but those handles may reference shapes that no longer exist (due to cleanup, schema changes, or prior removal).Evidence from AutoArc's production logs
AWS CloudWatch logs from Dec 18 show a crash loop with this error:
GenServer.call({:via, Registry, {..., {Electric.Shapes.Consumer.Materializer, "32220858-1765808264363524"}}}, :get_link_values, 5000)
** (EXIT) no process: the process is not alive
Key observations:
32220858-1765808264363524was created Dec 15 (timestamp embedded in handle):shutdownreason)32220858-1765808264363524was not among themshape_dependencies_handlesThis is a classic stale reference: the dependency shape was removed/lost, but parent shapes restored from backup still held handles to it.
Fix
load_backupremoves shapes without valid storage, callremove_shapes_with_invalid_dependenciesto cascade removals to any shapes whoseshape_dependencies_handlesreference the removed handlesrestore_dependency_handlesto also cascade removals (handles theload_shapespath)