Commit 215da6a
Manav Kumar
[yugabyte#28102] YSQL: de-refer the rule before unlocking the route
Summary:
On running the connection burst test following core was generated
(lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey"
Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded.
(lldb) bt all
error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45
* thread #1, name = 'odyssey', stop reason = signal SIGSEGV
* frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19
frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8
frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11
frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13
frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12
frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2
frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2
Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash.
The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer.
This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer.
Jira: DB-17729
Test Plan: Jenkins: all tests
Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena
Reviewed By: skumar
Subscribers: svc_phabricator, yql
Differential Revision: https://phorge.dev.yugabyte.com/D455831 parent 0c5ebd2 commit 215da6a
1 file changed
+3
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
384 | 384 | | |
385 | 385 | | |
386 | 386 | | |
| 387 | + | |
| 388 | + | |
387 | 389 | | |
388 | 390 | | |
389 | | - | |
390 | | - | |
| 391 | + | |
391 | 392 | | |
392 | 393 | | |
393 | 394 | | |
| |||
0 commit comments