Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github May 19, 2025

Bumps setuptools from 72.2.0 to 78.1.1.

Changelog

Sourced from setuptools's changelog.

v78.1.1

Bugfixes

  • More fully sanitized the filename in PackageIndex._download. (#4946)

v78.1.0

Features

  • Restore access to _get_vc_env with a warning. (#4874)

v78.0.2

Bugfixes

  • Postponed removals of deprecated dash-separated and uppercase fields in setup.cfg. All packages with deprecated configurations are advised to move before 2026. (#4911)

v78.0.1

Misc

v78.0.0

Bugfixes

  • Reverted distutils changes that broke the monkey patching of command classes. (#4902)

Deprecations and Removals

  • Setuptools no longer accepts options containing uppercase or dash characters in setup.cfg.

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [setuptools](https://github.com/pypa/setuptools) from 72.2.0 to 78.1.1.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@v72.2.0...v78.1.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 78.1.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels May 19, 2025
jmeehan16 pushed a commit that referenced this pull request Jun 12, 2025
Summary:
After commit f85bbca, vmodule flag is no longer respected by postgres process, for example:
```
ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k
zgrep pg_sample ~/logs/latest_test/1.log
```
shows no vlogs.

The reason is that `VLOG(1)` is used early by
```
#0  0x00007f7e1b48b090 in google::InitVLOG3__(google::SiteFlag*, int*, char const*, int)@plt () from /net/dev-server-timur/share/code/yugabyte-db/build/debug-clang19-dynamic-ninja/lib/libyb_util_shmem.so
#1  0x00007f7e1b47616e in yb::(anonymous namespace)::NegotiatorSharedState::WaitProposal (this=0x7f7e215e8000) at ../../src/yb/util/shmem/reserved_address_segment.cc:108
#2  0x00007f7e1b4781e0 in yb::AddressSegmentNegotiator::Impl::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:252
#3  0x00007f7e1b4737ce in yb::AddressSegmentNegotiator::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:376
#4  0x00007f7e1b742b7b in yb::tserver::SharedMemoryManager::InitializePostmaster (this=0x7f7e202e9788 <yb::pggate::PgSharedMemoryManager()::shared_mem_manager>, fd=45) at ../../src/yb/tserver/tserver_shared_mem.cc:252
#5  0x00007f7e2023588f in yb::pggate::PgSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/pg_shared_mem.cc:29
#6  0x00007f7e202788e9 in YBCSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/ybc_pg_shared_mem.cc:22
#7  0x000055636b8956f5 in PostmasterMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1083
yugabyte#8  0x000055636b774bfe in PostgresServerProcessMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/main/main.c:209
yugabyte#9  0x000055636b7751f2 in main ()
```
and caches `vmodule` value before `InitGFlags` sets it from environment.

The fix is to explicitly call `UpdateVmodule` from `InitGFlags` after setting `vmodule`.
Jira: DB-15888

Test Plan:
```
ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k
zgrep pg_sample ~/logs/latest_test/1.log
```

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ybase, yql

Tags: #jenkins-ready, #jenkins-trigger

Differential Revision: https://phorge.dev.yugabyte.com/D42731
jmeehan16 pushed a commit that referenced this pull request Jun 12, 2025
…rdup for tablegroup_name

Summary:
As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon
fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the
tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`).

```
[P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118
[P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0
[P-yb-controller-1]     #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5
[P-yb-controller-1]     #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8
[P-yb-controller-1]     #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]     #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd)
[P-yb-controller-1]
[P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80)
[P-yb-controller-1] freed by thread T0 here:
[P-yb-controller-1]     #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
[P-yb-controller-1]     #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3
[P-yb-controller-1]     #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4
[P-yb-controller-1]     #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]
[P-yb-controller-1] previously allocated by thread T0 here:
[P-yb-controller-1]     #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
[P-yb-controller-1]     #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28
[P-yb-controller-1]     #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4
[P-yb-controller-1]     #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
[P-yb-controller-1]     #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
[P-yb-controller-1]     #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
[P-yb-controller-1]     #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
[P-yb-controller-1]     #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9
[P-yb-controller-1]     yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8
[P-yb-controller-1]     yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8
[P-yb-controller-1]     yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10
[P-yb-controller-1]     yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
```

This revision fixes the issue by using pg_strdup to make a copy of the string.
Jira: DB-15915

Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated

Reviewers: aagrawal, skumar, mlillibridge, sergei

Reviewed By: aagrawal, sergei

Subscribers: sergei, yql

Differential Revision: https://phorge.dev.yugabyte.com/D43386
jmeehan16 pushed a commit that referenced this pull request Jun 12, 2025
…ck/release functions at TabletService

Summary:
In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined]
std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108
    frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined]
yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3
    frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26
    frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>)
const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*,
yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9
    frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at
local_call.h:126:7
    frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7
    frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>()))
std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25
    frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void
std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>,
__args=<unavailable>) at invoke.h:224:5
    frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity>
const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12
    frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at
function.h:313:10
    frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void
(std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12
    frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void
(std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10
    frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3
    frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19
    frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13
    frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13
    frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12
    frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10
    frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3
```

This revision addresses the issue by returning after executing the rpc callback with validation failure status.
Jira: DB-17124

Test Plan: Jenkins

Reviewers: rthallam, amitanand

Reviewed By: amitanand

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D44663
jmeehan16 pushed a commit that referenced this pull request Jun 12, 2025
…own flags are set at ObjectLockManager

Summary:
In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms).

This led to one of the itests failing with the below stack
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object
  * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10
    frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10
    frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3
    frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous
namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous
namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5
    frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @
0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27
    frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17
    frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13
    frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10
    frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26
    frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10
    frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28
    frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22
    frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18
    frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25
    frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12
    frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10
    frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3
    frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392
    frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12
```
This is due to accessing unique_ptr `thread_pool_token_` after it has been reset.

This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset.
Jira: DB-17121

Test Plan:
Jenkins

./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {'

Reviewers: rthallam, amitanand, sergei

Reviewed By: amitanand

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D44662
braddietrich pushed a commit that referenced this pull request Jul 7, 2025
…ow during index backfill.

Summary:
In the last few weeks we have seen few instances of the stress test (with various nemesis)
run into a master crash caused by a stack trace that looks like:

```
 * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address
   * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20
     frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17
     frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50
     frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10
     frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5
     frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3
     frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5
     frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19
     frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10
     frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5
     frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3
     frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5
     frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19
     frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:     1323:10
     frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5
     frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3
     frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5
     frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19
     frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:     1323:10

...

   frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5
     frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3
     frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5
     frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19
     frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:   1323:10
     frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5
     frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3
     frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5
     frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14
     frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status         const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14
     frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask::                    TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5
     frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7
     frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112
     frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940
     frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7
     frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9
     frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16
     frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12
     frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3
```

Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same.

This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading
we should only get there, ** after ** marking `BackfillTable::done_` as `true`.

If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion.

Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this.

Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success.
This prevents the infinite recursion.

Also, adding a few log lines that may help better understand the scenario if we run into this again.
Jira: DB-17296

Test Plan: yb_build.sh fastdebug  --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow*

Reviewers: zdrudi, rthallam, jason

Reviewed By: zdrudi

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D45031
braddietrich pushed a commit that referenced this pull request Jul 7, 2025
…trigger at txn end

Summary:
FK constraint has the following optimization for performing batched read:
- FK constraint registers ybctids which need to be checked
- on checking particular ybctid for particular FK constraint all registered ybctds are read at once, result is cached
- checking ybctid for another FK constraint will check cached result instead of making real read request.

In Postgres constraint could be of 2 types:
- IMMEDIATE (default). Checked at statement end.
- DEFERRED. Checked at transaction end.

Nowadays FK optimization work incorrectly in case multiple constraint of different types is used in same transaction. And the reason is that YSQL registers ybctids for both types of constraint in single map.

Example:

```
1. CREATE TABLE pk_t(k INT PRIMARY KEY);
2. CREATE TABLE fk_t(k INT PRIMARY KEY,
                     pk_1 INT REFERENCES pk_t(k),
                     pk_2 INT REFERENCES pk_t(k) DEFERRABLE INITIALLY DEFERRED);
3. INSERT INTO pk_t VALUES (1);
4. BEGIN;
5. INSERT INTO fk_t VALUES(1, 1, 2);
6. INSERT INTO pk_t VALUES(2);
7. COMMIT;
```

- On step #5 YSQL inserts value `(1, 1, 2)` into table with 2 FK referenced columns. Where constraint for second column is `DEFERRED`.
- Both constraint registers ybctid for rows `k = 1` and `k = 2` in table `pk_t`.
- Because constraint for first column is non deferred is it executed immediately (at the end of the statement).
- Due to optimization both registered ybctids will be read at once. And result will be cached. And the result contains `k = 1` only, because `k = 2` is only inserted on step #6
- On step #7 YSQL will perform the check of constraint for second column and cached result will be used which doesn't have `k = 2` inserted on step #6

Solution is to store ybctids for deferred and non-deferred constraint in different structure. And read them independently. All the ybctids registered for deferred constraints will be read only on transaction commit step (step #7).
For this purpose the new `YBCNotifyDeferredTriggersProcessingStarted()` function is introduced. Which is called straight before deferred triggers firing at the beginning of `COMMIT` command processing.
Jira: DB-14665

Original commit: af3f948 / D40896

Test Plan:
Jenkins

New unit test are introduced

```
./yb_build.sh --gtest_filter PgFKeyTest.DeferredConstraintReadAtTxnEnd
```

Reviewers: pjain, myang, kramanathan, patnaik.balivada

Reviewed By: myang

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D42601
braddietrich pushed a commit that referenced this pull request Jul 7, 2025
…mp by using pg_strdup for tablegroup_name

Summary:
#### Backport Summary
Fixed trivial merge conflicts due to the usage of `YbTableProperties` instead of `YbcTableProperties` on the master branch.

#### Original Summary
As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon
fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the
tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`).

```
[P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118
[P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0
[P-yb-controller-1]     #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5
[P-yb-controller-1]     #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8
[P-yb-controller-1]     #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]     #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd)
[P-yb-controller-1]
[P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80)
[P-yb-controller-1] freed by thread T0 here:
[P-yb-controller-1]     #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
[P-yb-controller-1]     #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3
[P-yb-controller-1]     #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4
[P-yb-controller-1]     #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]
[P-yb-controller-1] previously allocated by thread T0 here:
[P-yb-controller-1]     #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
[P-yb-controller-1]     #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28
[P-yb-controller-1]     #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4
[P-yb-controller-1]     #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
[P-yb-controller-1]     #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
[P-yb-controller-1]     #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
[P-yb-controller-1]     #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
[P-yb-controller-1]     #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9
[P-yb-controller-1]     yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8
[P-yb-controller-1]     yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8
[P-yb-controller-1]     yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10
[P-yb-controller-1]     yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
```

This revision fixes the issue by using pg_strdup to make a copy of the string.
Jira: DB-15915

Original commit: 7eea1de / D43386

Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated

Reviewers: aagrawal, skumar, mlillibridge, sergei

Reviewed By: aagrawal

Subscribers: yql, sergei

Differential Revision: https://phorge.dev.yugabyte.com/D43418
jmeehan16 pushed a commit that referenced this pull request Oct 21, 2025
…s closed in multi route pooling

Summary:
**Issue Summary**

A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:

frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
    frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
    frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
    frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
    frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
    frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
    frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
    frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
    frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
    frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2

**Root Cause**

The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.

**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.

In the connection manager code base:

Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.
Jira: DB-17501

Test Plan: Jenkins: all tests

Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena

Reviewed By: skumar

Subscribers: svc_phabricator, yql

Differential Revision: https://phorge.dev.yugabyte.com/D45641
jmeehan16 pushed a commit that referenced this pull request Oct 21, 2025
Summary:
On running the connection burst test following core was generated

(lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey"
Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded.
(lldb) bt all
error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45
* thread #1, name = 'odyssey', stop reason = signal SIGSEGV
  * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19
    frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8
    frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11
    frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13
    frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12
    frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2
    frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2
Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash.
The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer.

This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer.
Jira: DB-17729

Test Plan: Jenkins: all tests

Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena

Reviewed By: skumar

Subscribers: svc_phabricator, yql

Differential Revision: https://phorge.dev.yugabyte.com/D45583
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…mp by using pg_strdup for tablegroup_name

Summary:
As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon
fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the
tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`).

```
[P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118
[P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0
[P-yb-controller-1]     #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5
[P-yb-controller-1]     #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8
[P-yb-controller-1]     #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]     #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd)
[P-yb-controller-1]
[P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80)
[P-yb-controller-1] freed by thread T0 here:
[P-yb-controller-1]     #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
[P-yb-controller-1]     #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3
[P-yb-controller-1]     #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4
[P-yb-controller-1]     #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]
[P-yb-controller-1] previously allocated by thread T0 here:
[P-yb-controller-1]     #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
[P-yb-controller-1]     #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28
[P-yb-controller-1]     #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4
[P-yb-controller-1]     #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
[P-yb-controller-1]     #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
[P-yb-controller-1]     #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
[P-yb-controller-1]     #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
[P-yb-controller-1]     #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9
[P-yb-controller-1]     yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8
[P-yb-controller-1]     yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8
[P-yb-controller-1]     yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10
[P-yb-controller-1]     yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
```

This revision fixes the issue by using pg_strdup to make a copy of the string.
Jira: DB-15915

Original commit: 7eea1de / D43386

Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated

Reviewers: aagrawal, skumar, mlillibridge, sergei

Reviewed By: aagrawal

Subscribers: yql, sergei

Differential Revision: https://phorge.dev.yugabyte.com/D43421
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…acks in object lock/release functions at TabletService

Summary:
Original commit: 790195b / D44663
In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined]
std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108
    frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined]
yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3
    frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26
    frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>)
const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*,
yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9
    frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at
local_call.h:126:7
    frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7
    frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>()))
std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25
    frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void
std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>,
__args=<unavailable>) at invoke.h:224:5
    frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined]
std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity>
const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12
    frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36,
std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at
function.h:313:10
    frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void
(std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12
    frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void
(std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10
    frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3
    frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19
    frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13
    frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13
    frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12
    frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10
    frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3
```

This revision addresses the issue by returning after executing the rpc callback with validation failure status.
Jira: DB-17124

Test Plan: Jenkins

Reviewers: rthallam, amitanand, #db-approvers

Reviewed By: rthallam, #db-approvers

Subscribers: svc_phabricator, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D44684
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…adpool once shutdown flags are set at ObjectLockManager

Summary:
Original commit: f5197a2 / D44662
In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms).

This led to one of the itests failing with the below stack
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object
  * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10
    frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10
    frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3
    frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous
namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous
namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5
    frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @
0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27
    frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17
    frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13
    frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10
    frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26
    frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10
    frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28
    frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22
    frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18
    frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25
    frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12
    frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10
    frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3
    frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392
    frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12
```
This is due to accessing unique_ptr `thread_pool_token_` after it has been reset.

This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset.
Jira: DB-17121

Test Plan:
Jenkins

./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {'

Reviewers: rthallam, amitanand, sergei

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D44728
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…e to stack-overflow during index backfill.

Summary:
In the last few weeks we have seen few instances of the stress test (with various nemesis)
run into a master crash caused by a stack trace that looks like:

```
 * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address
   * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20
     frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17
     frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50
     frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10
     frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5
     frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3
     frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5
     frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19
     frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10
     frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5
     frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3
     frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5
     frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19
     frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:     1323:10
     frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5
     frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3
     frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5
     frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19
     frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:     1323:10

...

   frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5
     frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3
     frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5
     frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19
     frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19
     frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:   1323:10
     frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5
     frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3
     frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5
     frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14
     frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status         const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14
     frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask::                    TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5
     frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7
     frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112
     frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940
     frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7
     frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9
     frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16
     frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12
     frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3
```

Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same.

This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading
we should only get there, ** after ** marking `BackfillTable::done_` as `true`.

If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion.

Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this.

Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success.
This prevents the infinite recursion.

Also, adding a few log lines that may help better understand the scenario if we run into this again.
Jira: DB-17296

Original commit: 5d402b5 / D45031

Test Plan: yb_build.sh fastdebug  --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow*

Reviewers: zdrudi, rthallam, jason, #db-approvers

Reviewed By: rthallam

Subscribers: svc_phabricator, yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D45138
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…cking the route

Summary:
On running the connection burst test following core was generated

(lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey"
Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded.
(lldb) bt all
error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45
* thread #1, name = 'odyssey', stop reason = signal SIGSEGV
  * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19
    frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8
    frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11
    frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13
    frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12
    frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2
    frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2
Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash.
The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer.

This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer.

Original commit: None / D45583
Jira: DB-17729

Test Plan: Jenkins: all tests

Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena

Reviewed By: skumar

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45653
kgalieva pushed a commit that referenced this pull request Nov 6, 2025
…te until server is closed in multi route pooling

Summary:
**Issue Summary**

A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:

frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
    frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
    frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
    frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
    frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
    frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
    frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
    frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
    frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
    frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2

**Root Cause**

The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.

**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.

In the connection manager code base:

Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.

Original commit: None / D45641
Jira: DB-17501

Test Plan: Jenkins: all tests

Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena

Reviewed By: skumar

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45657
cameron-p-m pushed a commit that referenced this pull request Nov 26, 2025
Summary:
The stacktrace of the core dump:
```
(lldb) bt all
* thread #1, name = 'postgres', stop reason = signal SIGSEGV: address not mapped to object
  * frame #0: 0x0000aaaac59fb720 postgres`FreeTupleDesc [inlined] GetMemoryChunkContext(pointer=0x0000000000000000) at memutils.h:141:12
    frame #1: 0x0000aaaac59fb710 postgres`FreeTupleDesc [inlined] pfree(pointer=0x0000000000000000) at mcxt.c:1500:26
    frame #2: 0x0000aaaac59fb710 postgres`FreeTupleDesc(tupdesc=0x000013d7fd8dccc8) at tupdesc.c:326:5
    frame #3: 0x0000aaaac61c7204 postgres`RelationDestroyRelation(relation=0x000013d7fd8dc9a8, remember_tupdesc=false) at relcache.c:4577:4
    frame #4: 0x0000aaaac5febab8 postgres`YBRefreshCache at relcache.c:5216:3
    frame #5: 0x0000aaaac5feba94 postgres`YBRefreshCache at postgres.c:4442:2
    frame #6: 0x0000aaaac5feb50c postgres`YBRefreshCacheWrapperImpl(catalog_master_version=0, is_retry=false, full_refresh_allowed=true) at postgres.c:4570:3
    frame #7: 0x0000aaaac5feea34 postgres`PostgresMain [inlined] YBRefreshCacheWrapper(catalog_master_version=0, is_retry=false) at postgres.c:4586:9
    frame yugabyte#8: 0x0000aaaac5feea2c postgres`PostgresMain [inlined] YBCheckSharedCatalogCacheVersion at postgres.c:4951:3
    frame yugabyte#9: 0x0000aaaac5fee984 postgres`PostgresMain(dbname=<unavailable>, username=<unavailable>) at postgres.c:6574:4
    frame yugabyte#10: 0x0000aaaac5efe5b4 postgres`BackendRun(port=0x000013d7ffc06400) at postmaster.c:4995:2
    frame yugabyte#11: 0x0000aaaac5efdd08 postgres`ServerLoop [inlined] BackendStartup(port=0x000013d7ffc06400) at postmaster.c:4701:3
    frame yugabyte#12: 0x0000aaaac5efdc70 postgres`ServerLoop at postmaster.c:1908:7
    frame yugabyte#13: 0x0000aaaac5ef8ef8 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) at postmaster.c:1562:11
    frame yugabyte#14: 0x0000aaaac5ddae1c postgres`PostgresServerProcessMain(argc=25, argv=0x000013d7ffe068f0) at main.c:213:3
    frame yugabyte#15: 0x0000aaaac59dee38 postgres`main + 36
    frame yugabyte#16: 0x0000ffff9f606340 libc.so.6`__libc_start_call_main + 112
    frame yugabyte#17: 0x0000ffff9f606418 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152
    frame yugabyte#18: 0x0000aaaac59ded34 postgres`_start + 52
```
It is related to invalidation message. The test involves concurrent DDL execution without object
locking.

I added a few logs to help to debug this issue.

Test Plan:
(1)
Append to the end of file ./build/latest/postgres/share/postgresql.conf.sample:

```
yb_debug_log_catcache_events=1
log_min_messages=DEBUG1
```

(2) Create a RF-1 cluster
```
./bin/yb-ctl create --rf 1
```

(3) Run the following example via ysqlsh:
```
-- === 1. SETUP ===
DROP TABLE IF EXISTS accounts_timetravel;
CREATE TABLE accounts_timetravel (
  id INT PRIMARY KEY,
  balance INT,
  last_updated TIMESTAMPTZ
);

INSERT INTO accounts_timetravel VALUES (1, 1000, now());

\echo '--- 1. Initial Data (The Past) ---'
SELECT * FROM accounts_timetravel;

-- Wait 2 seconds
SELECT pg_sleep(2);

-- === 2. CAPTURE THE "PAST" HLC TIMESTAMP ===
--
--    *** THIS IS THE FIX ***
--    Get the current time as seconds from the Unix epoch,
--    multiply by 1,000,000 to get microseconds,
--    and cast to a big integer.
--
SELECT (EXTRACT(EPOCH FROM now())*1000000)::bigint AS snapshot_hlc \gset
SELECT :snapshot_hlc;
\echo '--- (Snapshot HLC captured) ---'

SELECT * FROM pg_yb_catalog_version;

-- Wait 2 more seconds
SELECT pg_sleep(2);

-- === 3. UPDATE THE DATA ===
UPDATE accounts_timetravel SET balance = 500, last_updated = now() WHERE id = 1;

\echo '--- 2. New Data (The Present) ---'
SELECT * FROM accounts_timetravel;

CREATE TABLE foo(id int);
-- increment the catalog version
ALTER TABLE foo ADD COLUMN val TEXT;

SELECT * FROM pg_yb_catalog_version;
-- === 4. PERFORM THE TIME-TRAVEL QUERY ===
--
-- Set our 'read_time_guc' variable to the HLC value
--
\set read_time_guc :snapshot_hlc

\echo '--- 3. Time-Travel Read (Querying the Past) ---'
\echo 'Setting yb_read_time to HLC (microseconds):' :read_time_guc

-- This will now be interpolated correctly and will succeed.
SET yb_read_time = :read_time_guc;

-- This query will now correctly read the historical data
SELECT * FROM accounts_timetravel;
SELECT * FROM pg_yb_catalog_version;

-- === 5. CLEANUP ===
RESET yb_read_time;
\echo '--- 4. Back to the Present ---'
SELECT * FROM accounts_timetravel;

DROP TABLE accounts_timetravel;
```

(4) Look at the postgres log for the following samples:

```
2025-11-07 18:31:06.223 UTC [3321231] LOG:  Preloading relcache for database 13524, session user id: 10, yb_read_time: 0
```

```
2025-11-07 18:31:06.303 UTC [3321231] LOG:  Building relcache entry for pg_index (oid 2610) took 785 us
```

```
2025-11-07 18:31:09.265 UTC [3321221] LOG:  Rebuild relcache entry for accounts_timetravel (oid 16384)
```

```
2025-11-07 18:31:09.525 UTC [3321221] LOG:  Delete relcache entry for accounts_timetravel (oid 16384)
```

```
2025-11-07 18:31:14.035 UTC [3321221] DEBUG:  Setting yb_read_time to 1762540271568993
```

```
2025-11-07 18:31:14.037 UTC [3321221] LOG:  Preloading relcache for database 13524, session user id: 13523, yb_read_time: 1762540271568993
```

```
2025-11-07 18:31:14.183 UTC [3321221] DEBUG:  Setting yb_read_time to 0
```

Reviewers: kfranz, #db-approvers

Reviewed By: kfranz, #db-approvers

Subscribers: jason, yql

Differential Revision: https://phorge.dev.yugabyte.com/D48114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants