Skip to content

Commit 0c5ebd2

Browse files
author
Manav Kumar
committed
[yugabyte#27894] YSQL: Don't release the lock on route until server is closed in multi route pooling
Summary: **Issue Summary** A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows: frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10 frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2 frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2 frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2 frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6 frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11 frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8 frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8 frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2 frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2 **Root Cause** The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure. **Note** If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release. In the connection manager code base: Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks. Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention. Jira: DB-17501 Test Plan: Jenkins: all tests Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena Reviewed By: skumar Subscribers: svc_phabricator, yql Differential Revision: https://phorge.dev.yugabyte.com/D45641
1 parent 914fc29 commit 0c5ebd2

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

src/odyssey/sources/router.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -824,7 +824,6 @@ static od_server_t *yb_get_idle_server_to_close(od_router_t *router,
824824
* shutting down this server.
825825
*/
826826
if (idle_server) {
827-
od_route_unlock(route);
828827
od_router_unlock(router);
829828
return idle_server;
830829
}

0 commit comments

Comments
 (0)