forked from yugabyte/yugabyte-db
-
Notifications
You must be signed in to change notification settings - Fork 2
Bump setuptools from 72.2.0 to 78.1.1 in /managed/devops #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dependabot
wants to merge
1
commit into
master
Choose a base branch
from
dependabot/pip/managed/devops/setuptools-78.1.1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bumps [setuptools](https://github.com/pypa/setuptools) from 72.2.0 to 78.1.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](pypa/setuptools@v72.2.0...v78.1.1) --- updated-dependencies: - dependency-name: setuptools dependency-version: 78.1.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
Summary: After commit f85bbca, vmodule flag is no longer respected by postgres process, for example: ``` ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k zgrep pg_sample ~/logs/latest_test/1.log ``` shows no vlogs. The reason is that `VLOG(1)` is used early by ``` #0 0x00007f7e1b48b090 in google::InitVLOG3__(google::SiteFlag*, int*, char const*, int)@plt () from /net/dev-server-timur/share/code/yugabyte-db/build/debug-clang19-dynamic-ninja/lib/libyb_util_shmem.so #1 0x00007f7e1b47616e in yb::(anonymous namespace)::NegotiatorSharedState::WaitProposal (this=0x7f7e215e8000) at ../../src/yb/util/shmem/reserved_address_segment.cc:108 #2 0x00007f7e1b4781e0 in yb::AddressSegmentNegotiator::Impl::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:252 #3 0x00007f7e1b4737ce in yb::AddressSegmentNegotiator::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:376 #4 0x00007f7e1b742b7b in yb::tserver::SharedMemoryManager::InitializePostmaster (this=0x7f7e202e9788 <yb::pggate::PgSharedMemoryManager()::shared_mem_manager>, fd=45) at ../../src/yb/tserver/tserver_shared_mem.cc:252 #5 0x00007f7e2023588f in yb::pggate::PgSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/pg_shared_mem.cc:29 #6 0x00007f7e202788e9 in YBCSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/ybc_pg_shared_mem.cc:22 #7 0x000055636b8956f5 in PostmasterMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1083 yugabyte#8 0x000055636b774bfe in PostgresServerProcessMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/main/main.c:209 yugabyte#9 0x000055636b7751f2 in main () ``` and caches `vmodule` value before `InitGFlags` sets it from environment. The fix is to explicitly call `UpdateVmodule` from `InitGFlags` after setting `vmodule`. Jira: DB-15888 Test Plan: ``` ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k zgrep pg_sample ~/logs/latest_test/1.log ``` Reviewers: hsunder Reviewed By: hsunder Subscribers: ybase, yql Tags: #jenkins-ready, #jenkins-trigger Differential Revision: https://phorge.dev.yugabyte.com/D42731
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…rdup for tablegroup_name Summary: As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal, sergei Subscribers: sergei, yql Differential Revision: https://phorge.dev.yugabyte.com/D43386
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…ck/release functions at TabletService Summary: In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108 frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3 frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26 frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9 frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:126:7 frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7 frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12 frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at function.h:313:10 frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12 frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10 frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3 frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19 frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13 frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13 frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12 frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10 frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3 ``` This revision addresses the issue by returning after executing the rpc callback with validation failure status. Jira: DB-17124 Test Plan: Jenkins Reviewers: rthallam, amitanand Reviewed By: amitanand Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D44663
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…own flags are set at ObjectLockManager Summary: In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms). This led to one of the itests failing with the below stack ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10 frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10 frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3 frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5 frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27 frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17 frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13 frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10 frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26 frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10 frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28 frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22 frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18 frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25 frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12 frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10 frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3 frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392 frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12 ``` This is due to accessing unique_ptr `thread_pool_token_` after it has been reset. This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset. Jira: DB-17121 Test Plan: Jenkins ./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {' Reviewers: rthallam, amitanand, sergei Reviewed By: amitanand Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D44662
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…ow during index backfill. Summary: In the last few weeks we have seen few instances of the stress test (with various nemesis) run into a master crash caused by a stack trace that looks like: ``` * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20 frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17 frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50 frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5 frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3 frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5 frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19 frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5 frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3 frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5 frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19 frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5 frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3 frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5 frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19 frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 ... frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5 frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3 frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5 frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19 frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5 frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3 frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5 frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14 frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14 frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask:: TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5 frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7 frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112 frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940 frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7 frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9 frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16 frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12 frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3 ``` Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same. This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading we should only get there, ** after ** marking `BackfillTable::done_` as `true`. If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion. Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this. Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success. This prevents the infinite recursion. Also, adding a few log lines that may help better understand the scenario if we run into this again. Jira: DB-17296 Test Plan: yb_build.sh fastdebug --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow* Reviewers: zdrudi, rthallam, jason Reviewed By: zdrudi Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D45031
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…trigger at txn end
Summary:
FK constraint has the following optimization for performing batched read:
- FK constraint registers ybctids which need to be checked
- on checking particular ybctid for particular FK constraint all registered ybctds are read at once, result is cached
- checking ybctid for another FK constraint will check cached result instead of making real read request.
In Postgres constraint could be of 2 types:
- IMMEDIATE (default). Checked at statement end.
- DEFERRED. Checked at transaction end.
Nowadays FK optimization work incorrectly in case multiple constraint of different types is used in same transaction. And the reason is that YSQL registers ybctids for both types of constraint in single map.
Example:
```
1. CREATE TABLE pk_t(k INT PRIMARY KEY);
2. CREATE TABLE fk_t(k INT PRIMARY KEY,
pk_1 INT REFERENCES pk_t(k),
pk_2 INT REFERENCES pk_t(k) DEFERRABLE INITIALLY DEFERRED);
3. INSERT INTO pk_t VALUES (1);
4. BEGIN;
5. INSERT INTO fk_t VALUES(1, 1, 2);
6. INSERT INTO pk_t VALUES(2);
7. COMMIT;
```
- On step #5 YSQL inserts value `(1, 1, 2)` into table with 2 FK referenced columns. Where constraint for second column is `DEFERRED`.
- Both constraint registers ybctid for rows `k = 1` and `k = 2` in table `pk_t`.
- Because constraint for first column is non deferred is it executed immediately (at the end of the statement).
- Due to optimization both registered ybctids will be read at once. And result will be cached. And the result contains `k = 1` only, because `k = 2` is only inserted on step #6
- On step #7 YSQL will perform the check of constraint for second column and cached result will be used which doesn't have `k = 2` inserted on step #6
Solution is to store ybctids for deferred and non-deferred constraint in different structure. And read them independently. All the ybctids registered for deferred constraints will be read only on transaction commit step (step #7).
For this purpose the new `YBCNotifyDeferredTriggersProcessingStarted()` function is introduced. Which is called straight before deferred triggers firing at the beginning of `COMMIT` command processing.
Jira: DB-14665
Original commit: af3f948 / D40896
Test Plan:
Jenkins
New unit test are introduced
```
./yb_build.sh --gtest_filter PgFKeyTest.DeferredConstraintReadAtTxnEnd
```
Reviewers: pjain, myang, kramanathan, patnaik.balivada
Reviewed By: myang
Subscribers: yql
Tags: #jenkins-ready
Differential Revision: https://phorge.dev.yugabyte.com/D42601
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…mp by using pg_strdup for tablegroup_name Summary: #### Backport Summary Fixed trivial merge conflicts due to the usage of `YbTableProperties` instead of `YbcTableProperties` on the master branch. #### Original Summary As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Original commit: 7eea1de / D43386 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal Subscribers: yql, sergei Differential Revision: https://phorge.dev.yugabyte.com/D43418
jmeehan16
pushed a commit
that referenced
this pull request
Oct 21, 2025
…s closed in multi route pooling
Summary:
**Issue Summary**
A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:
frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2
**Root Cause**
The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.
**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.
In the connection manager code base:
Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.
Jira: DB-17501
Test Plan: Jenkins: all tests
Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena
Reviewed By: skumar
Subscribers: svc_phabricator, yql
Differential Revision: https://phorge.dev.yugabyte.com/D45641
jmeehan16
pushed a commit
that referenced
this pull request
Oct 21, 2025
Summary: On running the connection burst test following core was generated (lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey" Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded. (lldb) bt all error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45 * thread #1, name = 'odyssey', stop reason = signal SIGSEGV * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19 frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8 frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11 frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13 frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12 frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2 frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2 Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash. The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer. This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer. Jira: DB-17729 Test Plan: Jenkins: all tests Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena Reviewed By: skumar Subscribers: svc_phabricator, yql Differential Revision: https://phorge.dev.yugabyte.com/D45583
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…mp by using pg_strdup for tablegroup_name Summary: As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Original commit: 7eea1de / D43386 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal Subscribers: yql, sergei Differential Revision: https://phorge.dev.yugabyte.com/D43421
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…acks in object lock/release functions at TabletService Summary: Original commit: 790195b / D44663 In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108 frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3 frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26 frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9 frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:126:7 frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7 frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12 frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at function.h:313:10 frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12 frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10 frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3 frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19 frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13 frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13 frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12 frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10 frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3 ``` This revision addresses the issue by returning after executing the rpc callback with validation failure status. Jira: DB-17124 Test Plan: Jenkins Reviewers: rthallam, amitanand, #db-approvers Reviewed By: rthallam, #db-approvers Subscribers: svc_phabricator, ybase Differential Revision: https://phorge.dev.yugabyte.com/D44684
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…adpool once shutdown flags are set at ObjectLockManager Summary: Original commit: f5197a2 / D44662 In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms). This led to one of the itests failing with the below stack ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10 frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10 frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3 frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5 frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27 frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17 frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13 frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10 frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26 frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10 frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28 frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22 frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18 frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25 frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12 frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10 frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3 frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392 frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12 ``` This is due to accessing unique_ptr `thread_pool_token_` after it has been reset. This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset. Jira: DB-17121 Test Plan: Jenkins ./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {' Reviewers: rthallam, amitanand, sergei Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D44728
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…e to stack-overflow during index backfill. Summary: In the last few weeks we have seen few instances of the stress test (with various nemesis) run into a master crash caused by a stack trace that looks like: ``` * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20 frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17 frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50 frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5 frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3 frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5 frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19 frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5 frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3 frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5 frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19 frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5 frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3 frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5 frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19 frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 ... frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5 frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3 frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5 frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19 frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5 frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3 frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5 frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14 frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14 frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask:: TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5 frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7 frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112 frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940 frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7 frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9 frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16 frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12 frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3 ``` Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same. This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading we should only get there, ** after ** marking `BackfillTable::done_` as `true`. If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion. Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this. Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success. This prevents the infinite recursion. Also, adding a few log lines that may help better understand the scenario if we run into this again. Jira: DB-17296 Original commit: 5d402b5 / D45031 Test Plan: yb_build.sh fastdebug --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow* Reviewers: zdrudi, rthallam, jason, #db-approvers Reviewed By: rthallam Subscribers: svc_phabricator, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D45138
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…cking the route Summary: On running the connection burst test following core was generated (lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey" Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded. (lldb) bt all error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45 * thread #1, name = 'odyssey', stop reason = signal SIGSEGV * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19 frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8 frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11 frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13 frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12 frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2 frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2 Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash. The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer. This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer. Original commit: None / D45583 Jira: DB-17729 Test Plan: Jenkins: all tests Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena Reviewed By: skumar Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D45653
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…te until server is closed in multi route pooling
Summary:
**Issue Summary**
A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:
frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2
**Root Cause**
The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.
**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.
In the connection manager code base:
Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.
Original commit: None / D45641
Jira: DB-17501
Test Plan: Jenkins: all tests
Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena
Reviewed By: skumar
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D45657
cameron-p-m
pushed a commit
that referenced
this pull request
Nov 26, 2025
Summary: The stacktrace of the core dump: ``` (lldb) bt all * thread #1, name = 'postgres', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac59fb720 postgres`FreeTupleDesc [inlined] GetMemoryChunkContext(pointer=0x0000000000000000) at memutils.h:141:12 frame #1: 0x0000aaaac59fb710 postgres`FreeTupleDesc [inlined] pfree(pointer=0x0000000000000000) at mcxt.c:1500:26 frame #2: 0x0000aaaac59fb710 postgres`FreeTupleDesc(tupdesc=0x000013d7fd8dccc8) at tupdesc.c:326:5 frame #3: 0x0000aaaac61c7204 postgres`RelationDestroyRelation(relation=0x000013d7fd8dc9a8, remember_tupdesc=false) at relcache.c:4577:4 frame #4: 0x0000aaaac5febab8 postgres`YBRefreshCache at relcache.c:5216:3 frame #5: 0x0000aaaac5feba94 postgres`YBRefreshCache at postgres.c:4442:2 frame #6: 0x0000aaaac5feb50c postgres`YBRefreshCacheWrapperImpl(catalog_master_version=0, is_retry=false, full_refresh_allowed=true) at postgres.c:4570:3 frame #7: 0x0000aaaac5feea34 postgres`PostgresMain [inlined] YBRefreshCacheWrapper(catalog_master_version=0, is_retry=false) at postgres.c:4586:9 frame yugabyte#8: 0x0000aaaac5feea2c postgres`PostgresMain [inlined] YBCheckSharedCatalogCacheVersion at postgres.c:4951:3 frame yugabyte#9: 0x0000aaaac5fee984 postgres`PostgresMain(dbname=<unavailable>, username=<unavailable>) at postgres.c:6574:4 frame yugabyte#10: 0x0000aaaac5efe5b4 postgres`BackendRun(port=0x000013d7ffc06400) at postmaster.c:4995:2 frame yugabyte#11: 0x0000aaaac5efdd08 postgres`ServerLoop [inlined] BackendStartup(port=0x000013d7ffc06400) at postmaster.c:4701:3 frame yugabyte#12: 0x0000aaaac5efdc70 postgres`ServerLoop at postmaster.c:1908:7 frame yugabyte#13: 0x0000aaaac5ef8ef8 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) at postmaster.c:1562:11 frame yugabyte#14: 0x0000aaaac5ddae1c postgres`PostgresServerProcessMain(argc=25, argv=0x000013d7ffe068f0) at main.c:213:3 frame yugabyte#15: 0x0000aaaac59dee38 postgres`main + 36 frame yugabyte#16: 0x0000ffff9f606340 libc.so.6`__libc_start_call_main + 112 frame yugabyte#17: 0x0000ffff9f606418 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152 frame yugabyte#18: 0x0000aaaac59ded34 postgres`_start + 52 ``` It is related to invalidation message. The test involves concurrent DDL execution without object locking. I added a few logs to help to debug this issue. Test Plan: (1) Append to the end of file ./build/latest/postgres/share/postgresql.conf.sample: ``` yb_debug_log_catcache_events=1 log_min_messages=DEBUG1 ``` (2) Create a RF-1 cluster ``` ./bin/yb-ctl create --rf 1 ``` (3) Run the following example via ysqlsh: ``` -- === 1. SETUP === DROP TABLE IF EXISTS accounts_timetravel; CREATE TABLE accounts_timetravel ( id INT PRIMARY KEY, balance INT, last_updated TIMESTAMPTZ ); INSERT INTO accounts_timetravel VALUES (1, 1000, now()); \echo '--- 1. Initial Data (The Past) ---' SELECT * FROM accounts_timetravel; -- Wait 2 seconds SELECT pg_sleep(2); -- === 2. CAPTURE THE "PAST" HLC TIMESTAMP === -- -- *** THIS IS THE FIX *** -- Get the current time as seconds from the Unix epoch, -- multiply by 1,000,000 to get microseconds, -- and cast to a big integer. -- SELECT (EXTRACT(EPOCH FROM now())*1000000)::bigint AS snapshot_hlc \gset SELECT :snapshot_hlc; \echo '--- (Snapshot HLC captured) ---' SELECT * FROM pg_yb_catalog_version; -- Wait 2 more seconds SELECT pg_sleep(2); -- === 3. UPDATE THE DATA === UPDATE accounts_timetravel SET balance = 500, last_updated = now() WHERE id = 1; \echo '--- 2. New Data (The Present) ---' SELECT * FROM accounts_timetravel; CREATE TABLE foo(id int); -- increment the catalog version ALTER TABLE foo ADD COLUMN val TEXT; SELECT * FROM pg_yb_catalog_version; -- === 4. PERFORM THE TIME-TRAVEL QUERY === -- -- Set our 'read_time_guc' variable to the HLC value -- \set read_time_guc :snapshot_hlc \echo '--- 3. Time-Travel Read (Querying the Past) ---' \echo 'Setting yb_read_time to HLC (microseconds):' :read_time_guc -- This will now be interpolated correctly and will succeed. SET yb_read_time = :read_time_guc; -- This query will now correctly read the historical data SELECT * FROM accounts_timetravel; SELECT * FROM pg_yb_catalog_version; -- === 5. CLEANUP === RESET yb_read_time; \echo '--- 4. Back to the Present ---' SELECT * FROM accounts_timetravel; DROP TABLE accounts_timetravel; ``` (4) Look at the postgres log for the following samples: ``` 2025-11-07 18:31:06.223 UTC [3321231] LOG: Preloading relcache for database 13524, session user id: 10, yb_read_time: 0 ``` ``` 2025-11-07 18:31:06.303 UTC [3321231] LOG: Building relcache entry for pg_index (oid 2610) took 785 us ``` ``` 2025-11-07 18:31:09.265 UTC [3321221] LOG: Rebuild relcache entry for accounts_timetravel (oid 16384) ``` ``` 2025-11-07 18:31:09.525 UTC [3321221] LOG: Delete relcache entry for accounts_timetravel (oid 16384) ``` ``` 2025-11-07 18:31:14.035 UTC [3321221] DEBUG: Setting yb_read_time to 1762540271568993 ``` ``` 2025-11-07 18:31:14.037 UTC [3321221] LOG: Preloading relcache for database 13524, session user id: 13523, yb_read_time: 1762540271568993 ``` ``` 2025-11-07 18:31:14.183 UTC [3321221] DEBUG: Setting yb_read_time to 0 ``` Reviewers: kfranz, #db-approvers Reviewed By: kfranz, #db-approvers Subscribers: jason, yql Differential Revision: https://phorge.dev.yugabyte.com/D48114
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
dependencies
Pull requests that update a dependency file
python
Pull requests that update Python code
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bumps setuptools from 72.2.0 to 78.1.1.
Changelog
Sourced from setuptools's changelog.
... (truncated)
Commits
8e4868aBump version: 78.1.0 → 78.1.1100e9a6Merge pull request #49518faf1d7Add news fragment.2ca4a9fRely on re.sub to perform the decision in one expression.e409e80Extract _sanitize method for sanitizing the filename.250a6d1Add a check to ensure the name resolves relative to the tmpdir.d8390feExtract _resolve_download_filename with test.4e1e893Merge https://github.com/jaraco/skeleton3a3144fFix typo:pyproject.license->project.license(#4931)d751068Fix typo: pyproject.license -> project.licenseDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.