forked from yugabyte/yugabyte-db
-
Notifications
You must be signed in to change notification settings - Fork 2
Bump github.com/golang-jwt/jwt/v4 from 4.4.2 to 4.5.2 in /managed/node-agent #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dependabot
wants to merge
1
commit into
master
Choose a base branch
from
dependabot/go_modules/managed/node-agent/github.com/golang-jwt/jwt/v4-4.5.2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bumps [github.com/golang-jwt/jwt/v4](https://github.com/golang-jwt/jwt) from 4.4.2 to 4.5.2. - [Release notes](https://github.com/golang-jwt/jwt/releases) - [Changelog](https://github.com/golang-jwt/jwt/blob/main/VERSION_HISTORY.md) - [Commits](golang-jwt/jwt@v4.4.2...v4.5.2) --- updated-dependencies: - dependency-name: github.com/golang-jwt/jwt/v4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
Summary: After commit f85bbca, vmodule flag is no longer respected by postgres process, for example: ``` ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k zgrep pg_sample ~/logs/latest_test/1.log ``` shows no vlogs. The reason is that `VLOG(1)` is used early by ``` #0 0x00007f7e1b48b090 in google::InitVLOG3__(google::SiteFlag*, int*, char const*, int)@plt () from /net/dev-server-timur/share/code/yugabyte-db/build/debug-clang19-dynamic-ninja/lib/libyb_util_shmem.so #1 0x00007f7e1b47616e in yb::(anonymous namespace)::NegotiatorSharedState::WaitProposal (this=0x7f7e215e8000) at ../../src/yb/util/shmem/reserved_address_segment.cc:108 #2 0x00007f7e1b4781e0 in yb::AddressSegmentNegotiator::Impl::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:252 #3 0x00007f7e1b4737ce in yb::AddressSegmentNegotiator::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:376 #4 0x00007f7e1b742b7b in yb::tserver::SharedMemoryManager::InitializePostmaster (this=0x7f7e202e9788 <yb::pggate::PgSharedMemoryManager()::shared_mem_manager>, fd=45) at ../../src/yb/tserver/tserver_shared_mem.cc:252 #5 0x00007f7e2023588f in yb::pggate::PgSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/pg_shared_mem.cc:29 #6 0x00007f7e202788e9 in YBCSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/ybc_pg_shared_mem.cc:22 #7 0x000055636b8956f5 in PostmasterMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1083 yugabyte#8 0x000055636b774bfe in PostgresServerProcessMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/main/main.c:209 yugabyte#9 0x000055636b7751f2 in main () ``` and caches `vmodule` value before `InitGFlags` sets it from environment. The fix is to explicitly call `UpdateVmodule` from `InitGFlags` after setting `vmodule`. Jira: DB-15888 Test Plan: ``` ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k zgrep pg_sample ~/logs/latest_test/1.log ``` Reviewers: hsunder Reviewed By: hsunder Subscribers: ybase, yql Tags: #jenkins-ready, #jenkins-trigger Differential Revision: https://phorge.dev.yugabyte.com/D42731
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…rdup for tablegroup_name Summary: As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal, sergei Subscribers: sergei, yql Differential Revision: https://phorge.dev.yugabyte.com/D43386
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
Summary:
Running Java test TestPgRegressPgTypesUDT on ASAN fails with
ts1|pid74720|:24411 ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15: runtime error: applying non-zero offset 8 to null pointer
ts1|pid74720|:24411 #0 0x55eb30c5c61b in qsort_arg ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15
ts1|pid74720|:24411 #1 0x55eb30883c9a in multirange_canonicalize ${YB_SRC_ROOT}/src/postgres/src/backend/utils/adt/multirangetypes.c:481:2
ts1|pid74720|:24411 #2 0x55eb30883c9a in make_multirange ${YB_SRC_ROOT}/src/postgres/src/backend/utils/adt/multirangetypes.c:648:16
ts1|pid74720|:24411 #3 0x55eb2ffd01db in ExecInterpExpr ${YB_SRC_ROOT}/src/postgres/src/backend/executor/execExprInterp.c:731:8
...
ts1|pid74720|:24411 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15
ts1|pid74720|:24411 2025-05-30 04:15:01.687 UTC [74992] WARNING: server process (PID 76254) exited with exit code 1
ts1|pid74720|:24411 2025-05-30 04:15:01.687 UTC [74992] DETAIL: Failed process was running: select textmultirange();
pg_regress|pid75085|stdout test yb.port.multirangetypes ... FAILED (test process exited with exit code 2) 1615 ms
The issue is that multirange_constructor0 passes NULL, which flows all
the way down to qsort_arg, and qsort_arg attempts pointer arithmetic on
that NULL. Fix by returning early before that.
This fix may impact other cases besides multirange since qsort_arg is
used in several places, but given no issues were reported till now,
perhaps it isn't possible to pass NULL to qsort_arg through those
places.
Jira: DB-16985
Test Plan:
On Almalinux 8:
./yb_build.sh asan daemons initdb \
--java-test 'org.yb.pgsql.TestPgRegressPgTypesUDT#schedule'
Close: yugabyte#27447
Reviewers: telgersma
Reviewed By: telgersma
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D44464
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…ck/release functions at TabletService Summary: In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108 frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3 frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26 frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9 frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:126:7 frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7 frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12 frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at function.h:313:10 frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12 frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10 frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3 frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19 frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13 frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13 frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12 frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10 frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3 ``` This revision addresses the issue by returning after executing the rpc callback with validation failure status. Jira: DB-17124 Test Plan: Jenkins Reviewers: rthallam, amitanand Reviewed By: amitanand Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D44663
jmeehan16
pushed a commit
that referenced
this pull request
Jun 12, 2025
…own flags are set at ObjectLockManager Summary: In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms). This led to one of the itests failing with the below stack ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10 frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10 frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3 frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5 frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27 frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17 frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13 frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10 frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26 frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10 frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28 frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22 frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18 frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25 frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12 frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10 frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3 frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392 frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12 ``` This is due to accessing unique_ptr `thread_pool_token_` after it has been reset. This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset. Jira: DB-17121 Test Plan: Jenkins ./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {' Reviewers: rthallam, amitanand, sergei Reviewed By: amitanand Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D44662
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…e of abort when object locking feature is enabled Summary: Timed out requests at the Object lock Manager are resumed organically by the poller thread once the deadline is exceeded. So this could mean that a ysql statement could time out, execute a finish transaction rpc and could initiate a new ysql transaction before the poller resumes the obsolete timed out lock waiter. Given that we try re-using docdb transactions wherever possible when object locking is enabled, the above in combination with transaction re-use could lead to undesired issues. Here's how the OLM works in brief: 1. While serving new incoming lock requests, shared `TrackedTransactionLockEntry` is created if one doesn't exists, keyed against txn id, stored in `txn_locks_` 2. `PrepareAcquire` is executed which validates whether the lock request should be tried. 3. moves on to acquire the lock if available, else enters the wait queue. 4. When waiters are resumed, goto step 2. 5. When serving a release all request, remove entry from `txn_locks_`, release acquired locks, and let obsolete waiting locks be resumed by the poller thread. Here's the brief working of the object lock tracker code. 1. For incoming lock requests, instrument the lock in the corresponding map key against `<txn, subtxn>` with state set to `WAITING` and invoke the OLM. 2. When OLM executes the lock callback, tap into it, try finding the map with key `<txn, subtxn>`, and if exists, change the state of the lock entry in the map accordingly. Consider the following scenario: 1. ysql starts read only `ysql_txnA`, issues a lock request. this is associated with `docdb_txnA`, and the lock request is forwarded to the OLM. 2. lock request enters the wait-queue 3. YSQL detects timeout, cancels the request, issues a finish txn call. `docdb_txnA` doesn't get consumed since it didn't write any intents nor the txn itself failed (failed heartbeats). OLM erases `TrackedTransactionLockEntry` keyed against `docdb_txnA`. 4. YSQL start a new `ysql_txnB`, issues a lock request. docdb re-uses `docdb_txnA` issues the lock request, OLM creates a new entry for the same transaction id, and moves on to acquire the lock. 5. The poller might now realize that the earlier waiting request timed out, and try resuming it. This results in issues with observability, since at some point the OLM has state corresponding to different ysql transactions stored under the same docdb transaction id. As a consequence, it results in a segv with the lock tracking code in the above scenario as follows: - Step 1 creates a map for key `<docdb_txnA, 1>`, and inserts `key(lock)`. - Step 3 erases the map for key `<docdb_txnA, 1>`. - Step 4 creates a new map for key `<docdb_txnA, 1>`, and inserts `key(lock_new)`. - Step 5 tries to new access entry `key(lock)` in the new map which doesn't exists, resulting in a segv ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaaead1b198 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] yb::tserver::ObjectLockTracker::UntrackLock(this=<unavailable>, lock_context=0x000010f1bbe20800) at ts_local_lock_manager.cc:134:25 frame #1: 0x0000aaaaead1b0c8 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) at ts_local_lock_manager.cc:104:7 frame #2: 0x0000aaaaead1b0b4 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(this=0x000010f1b5ad4cd0, status=Status @ 0x0000ffef78896640)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)::operator()(yb::Status) const at ts_local_lock_manager.cc:324:46 frame #3: 0x0000aaaaead1b034 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] decltype(std::declval<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&>()(std::declval<yb::Status const&>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&, yb::Status const&>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #4: 0x0000aaaaead1b01c yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&, yb::Status const&>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 ``` This revision address the issue by burning/consuming the docdb transaction in case of YSQL transaction abort. This would force the `ysql_txnB` above to be associated with a new `docdb_txnB` and wouldn't lead to any observability issues/mixed state. Note that there isn't any problem with reusing docdb transactions across ysql read-only transactions that move on to commit successfully. This is because if the ysql transaction moves on to commit, it implies that all the object locks were granted => no waiting object locks, and hence all corresponding entries of the respective docdb transaction would be released at the OLM in-line with the commit. Jira: DB-17123 Test Plan: Jenkins ./yb_build.sh --cxx-test pg_object_locks-test --gtest_filter PgObjectLocksTestRF1.TestDisableReuseAbortedPlainTxn ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPgAuth#schedule' Reviewers: amitanand, yyan, rthallam, #db-approvers Reviewed By: amitanand Subscribers: svc_phabricator, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D44857
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…ow during index backfill. Summary: In the last few weeks we have seen few instances of the stress test (with various nemesis) run into a master crash caused by a stack trace that looks like: ``` * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20 frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17 frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50 frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5 frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3 frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5 frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19 frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5 frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3 frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5 frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19 frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5 frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3 frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5 frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19 frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 ... frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5 frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3 frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5 frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19 frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5 frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3 frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5 frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14 frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14 frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask:: TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5 frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7 frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112 frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940 frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7 frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9 frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16 frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12 frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3 ``` Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same. This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading we should only get there, ** after ** marking `BackfillTable::done_` as `true`. If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion. Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this. Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success. This prevents the infinite recursion. Also, adding a few log lines that may help better understand the scenario if we run into this again. Jira: DB-17296 Test Plan: yb_build.sh fastdebug --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow* Reviewers: zdrudi, rthallam, jason Reviewed By: zdrudi Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D45031
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…tale results for select statement with key columns only Summary: There are several scenarios where rows are requested by key columns only (including constraints and aggregates like count). Doc reader have an optimization for this kind of reads: since actual values does not matter (because key column values can be extracted from the row's doc key) then there is no need to decode row values and it is enough to check if the row is tombstoned or not. Unfortunately, this optimization was not updated to take fast backward scan into account: an iterator stops reading when the firstly met record is not marked with a tombstone value, which is the case for fast backward scan, which reads a row in the reverse order, from the oldest to the newest change (per sub doc key), and the oldest record is generally corresponds to the initially inserted values. The change updates the algorithm if checking the row existence: the scanning continues until all row changes have been taken into account to get the actual state of the row. **Examples of row existence check before the fix** Example #1 ``` Record #1: KEY, T1 => PACKED ROW Record #2: KEY, COL1, T2 => COL1 NEW VALUE 1 Forward scan / old backward scan: #1 (alive, T1) => stop result: row exists. Fast backward scan: #2 (alive, T2) => stop result: row exists. ``` Example #2 ``` Record #1: KEY, T3 => DEL Record #2: KEY, T1 => PACKED ROW Record #3: KEY, COL1, T2 => COL1 NEW VALUE Forward scan / old backward scan: #1 (deleted, T3), #2 (skipped: T1 < T3), #3 (skipped: T2 < T3) => stop result: row does not exist. Fast backward scan: #3 (alive, T2) => stop result: row exists. ``` Example #3 ``` Record #1: KEY, T4 => PACKED ROW Record #2: KEY, T3 => DEL Record #3: KEY, T1 => PACKED ROW Record #4: KEY, COL1, T2 => COL1 NEW VALUE Forward scan / old backward scan: #1 (alive, T4) => stop result: row exists. Fast backward scan: #4 (alive, T2) => stop result: row exists. ``` **Examples of row existence check with the fix** Example #1 ``` Record #1: KEY, T1 => PACKED ROW Record #2: KEY, COL1, T2 => COL1 NEW VALUE 1 Fast backward scan: #2 (alive, T2) => #1 (skipped, T1 < T2) => stop result: row exists. ``` Example #2 ``` Record #1: KEY, T3 => DEL Record #2: KEY, T1 => PACKED ROW Record #3: KEY, COL1, T2 => COL1 NEW VALUE Fast backward scan: #3 (alive, T2) => #2 (skipped: T1 < T2) => #1 (deleted: T3 > T2) => stop result: row does not exist. ``` Example #3 ``` Record #1: KEY, T4 => PACKED ROW Record #2: KEY, T3 => DEL Record #3: KEY, T1 => PACKED ROW Record #4: KEY, COL1, T2 => COL1 NEW VALUE Fast backward scan: #4 (alive, T2) => #3 (skipped: T1 < T2) => #2 (deleted: T3 > T2) => #1 (alive: T4 > T3) => stop result: row exists. ``` Original commit: d8bd1fd / D42212 Jira: DB-15387 Test Plan: ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kNone **manual testing:** ``` SET yb_use_hash_splitting_by_default = FALSE; CREATE TABLE t1(c0 int UNIQUE DEFAULT 1); INSERT INTO t1(c0) VALUES (2); DELETE FROM t1; SELECT * FROM t1; SELECT * FROM t1 GROUP BY t1.c0 ORDER BY t1.c0 DESC; ``` Reviewers: sergei, rthallam Reviewed By: rthallam Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D42486
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
Summary: This revision adds the procedure `yb_index_check()` that checks whether the given index is consistent with its base relation or not. This operation doesn't support GIN indexes yet. An index row has the following components: - index attributes: key and non-key columns (if any) - ybbasectid: system attribute storing the ybctid of base relation row - ybuniqueidxkeysuffix: system attribute, present only for unique indexes (it is non-null only when the index is in null-are-distinct mode and the key columns contain at least one NULL) The structure of an index row is as follows (in `<DocKey>` -> `<DocValue>` format): - Non-unique index: (key cols, ybbasectid) -> (non-key cols) - Unique index: (key cols, ybuniqueidxkeysuffix) -> (non-key cols, ybbasectid) An index row is consistent if all of its attributes are consistent. An index attribute is consistent if its value and the corresponding base relation value are - for key attributes: binary equal (semantic equality without binary equality runs the risk of allowing multiple index rows for a given base table row if the key column can have multiple binary representations). - for non-key attributes: binary or semantically equal. Note: if both the values are NULL, they are consistent. Index consistency check is done in two steps: # Check for spurious index rows # Check for missing index rows **Part 1: Check for spurious index rows** Here, we check if the index contains a row that it should not. To do this: For every index row, fetch the row in the base table (filtered by partial index predicate) such that baserow.ybctid == indexrow.ybbasectid. If such a row doesn’t exist, use a baserow with all NULL values. The result will be the same as a LEFT join on indexrow.ybbasectid = baserow.ybctid with the index table on the left (if the index was a regular relation). Fetch the following columns as the join targetlist: - from index row: ybbasectid, index attributes, ybuniqueidxkeysuffix (only for unique indexes) - from base table row: ybctid, columns/expressions corresponding to index attributes On the joined result, make the following checks: # ybbasectid should be non-null # ybbasectid should be equal to ybctid # index attributes and the corresponding base relation column/expression should be consistent as per the above definition # for unique indexes, ybuniqueidxkeysuffix should be non-null iff the index uses null-are-distinct mode and key columns contain at least one null. When non-null, it should be equal to ybbasectid If the above checks pass for every row in the index, it implies that the index does not contain any spurious rows. This can be proved by contradiction as follows: Let’s assume that the above checks passed for every row in the index, yet it contains a spurious row, namely indexrow1. This index row must satisfy the following: - indexrow1.ybbasectid != null (from check #1) - base table has a row, namely baserow, such that baserow.ybctid == indexrow1.ybbasectid (otherwise ybctid would be null and check #2 would have failed) - index attributes of indexrow1 are consistent with baserow (from check #3) - If the index is unique, indexrow1.ybuniqueidxkeysuffix is either null or equal to ybbasectid, depending on the index mode and key cols (from check #4) The above shows that indexrow1 has a valid counterpart in the baserow. Given this, the only possible reason why indexrow1 should not have been present in the index is that another index row, namely indexrow2, must exist such that the pair (indexrow2, baserow) also satisfies the above checks. We can say that indexrow1 and indexrow2 - have the same ybbasectid (baserow.ybctid == indexrow2.ybbasectid == indexrow1.ybbasectid). - have binary equal values for key columns. This is because key cols of both index rows are binary equal to the corresponding baserow values (from check #3 and definition of consistency). - have identical ybuniqueidxkeysuffix (it depends on index type, mode, and key cols - all of these are already established to be the same for the two index rows). The DocKey of the index row is created by a subset of (key cols, ybbasectid, ybuniqueidxkeysuffix). Each component is identical for the two index rows, implying identical DocKeys. This is not possible because DocDB does not allow duplicate DocKeys. Hence, such an indexrow1 does not exist. **Part 2: Check for missing index rows** This part checks if no entries are missing from the index. Given that it is already established that the index does not contain any spurious rows, it suffices to check if the index row count is what it should be. That is, for every qualifying row in the base table (filtered by partial index predicate), the index should contain one row. - To fetch the index row and the corresponding base table tow efficiently, batch nested loop join is used (details below) - Both parts of the check use a single read time. This works out of the box because the entire check is executed as a single YSQL statement. **Batch Nested Loop Join usage** Batchable join clauses must be of the form `inner_indexed_var = expression on (outer_vars)` and the expression must not involve functions. To satisfy the above requirement, - join condition: baserow.ybctid == indexrow.ybbasectid. - outer subplan: index relation scan - inner subplan: base relation scan. BNL expects an index on the var referenced in the join clause (ybctid, in this case). So, a dummy primary key index object on the ybctid column is temporarily created (not persisted in the PG catalog). Like any other PK index in YB, this index points to the base relation and doesn't have a separate docdb table. Because such an index object doesn't actually exist, the planner was bypassed and the join plan was hardcoded. Jira: DB-15118 Backport summary: - `V73__25820__yb_index_check.sql` - Rename it as per the migration backport standard. - Rename `prosupport` field to `protransform` - Add pg_depend entry - Remove pg_description entry - pg_yb_migration.dat: Specify the correct migration filename and major/minor version - catalog.h: Set YB_LAST_USED_OID to 8090 (same as in master branch) - pggate.cc, pg_dml_read.h, pg_dml_read.cc: YbctidProvider is not available in 2024.2 and earlier branches. Introduce a new field, `requested_ybctids_owned_`, in `PgDml` to hold the container. - indexam.c: - rename `rd_indam` to `rd_amroutine` - Additional imports `access/sysattr.h` and `catalog/pg_type.h` - Adjacent line conflicts - yb_scan.c: - SystemAttributeDefinition() takes an additional argument, relhasoids - Use YBSystemFirstLowInvalidAttributeNumber instead of YBFirstLowInvalidAttributeNumber in ybcSetupTargets() because index check involves attributes with attnum < YBFirstLowInvalidAttributeNumber. This change is inline with D40090 on the master branch. - adjacent line conflicts - heap.c: - YbSystemAttributeDefinition(): change return type from `const FormData_pg_attribute *` to `Form_pg_attribute` (inline with SystemAttributeDefinition()) - definitions TYPALIGN_INT and TYPSTORAGE_EXTENDED are not available, use the underlying char values directly - Unlike master branch, ybctid column is not exposed in this version. Update SystemAttributeDefinition() such that it returns dummy pg_attribute entry for it. - nodeIndexOnlyScan.c: macro definitions of TABLETUPLE_YBCTID and INDEXTUPLE_YBCTID are not available, expand them. - datum.c: add missing function datum_image_eq(). It is required by yb_index_check(). - yb_index_check.c: - indnullsnotdistinct is not applicable as the NULLS NOT DISTINCT feature is not available in 2024.2 and earlier branches - SystemAttributeDefinition() function call takes addition `relhasoids` argument - lnext()'s definition is different in this and earlier branches - ExecTypeFromTL() function call takes an additional `hasoid` argument - additional imports: executor/ybcExpr.h, optimizer/clauses.h, access/htup_details.h, pg_yb_utils.h, catalog/pg_type.h, access/sysattr.h - ExecInitRangeTable() is not available, set the range table manually - ExecCloseResultRelations() and ExecCloseRangeTableRelations() are neither available nor applicable in this branch - yb_index_check.sql/yb_index_check.out: remove nulls not distinct tests as the feature is not available in this branch - pg_operator.dat: add symbol `ByteaEqualOperator` to equality operator on bytea type - nodeIndexScan.c - lockmode when calling index_open() on this branch differs from the master branch, the same change applies when calling yb_dummy_baserel_index_open(). - adjacent line conflicts - execExprInterp.c: - ExecEvalSysVar() is not available on this branch and the related code change is not applicable either. - ExecInterpExpr(): Handle scan of sysvar YBIdxBaseTupleIdAttributeNumber and YBUniqueIdxKeySuffixAttributeNumber - ybc_pggate.cc/ybc_pggate.h: - YbcStatus -> YBCStatus - YbcPgStatement -> YBCPgStatement - Adjacent line conflicts in: misc/Makefile, itup.h, pg_proc.dat, tuptable.h, ybc_pg_typedefs.h, ybc_pggate.h, pggate.cc, pg_dml_read.h, pg_expr.cc - Adjacent line conflicts in guc.h. This change is not even required and is being removed from master (D42398), skip it. Original commit: 10de037 / D41376 Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressYbIndexCheck Reviewers: amartsinchyk, tnayak Reviewed By: amartsinchyk Subscribers: yql, smishra Differential Revision: https://phorge.dev.yugabyte.com/D42444
braddietrich
pushed a commit
that referenced
this pull request
Jul 7, 2025
…mp by using pg_strdup for tablegroup_name Summary: #### Backport Summary Fixed trivial merge conflicts due to the usage of `YbTableProperties` instead of `YbcTableProperties` on the master branch. #### Original Summary As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Original commit: 7eea1de / D43386 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal Subscribers: yql, sergei Differential Revision: https://phorge.dev.yugabyte.com/D43418
jmeehan16
pushed a commit
that referenced
this pull request
Oct 21, 2025
…ssions.txt for postgres code
Summary:
Sometimes (possible after update to clang 19) address sanitizer fails to symbolize leak stack trace:
```
==85768==WARNING: Can't read from symbolizer at fd 5
==85768==WARNING: Can't read from symbolizer at fd 5
==85768==WARNING: Can't read from symbolizer at fd 5
==85768==WARNING: Can't read from symbolizer at fd 5
==85768==WARNING: Can't read from symbolizer at fd 5
#0 0x5580d4be8bba (${BUILD_ROOT}/postgres/bin/postgres+0xef0bba)
#1 0x5580d587cc4c (${BUILD_ROOT}/postgres/bin/postgres+0x1b84c4c)
#2 0x5580d55be5cc (${BUILD_ROOT}/postgres/bin/postgres+0x18c65cc)
#3 0x5580d55be171 (${BUILD_ROOT}/postgres/bin/postgres+0x18c6171)
#4 0x7fc9991e57e4 (/lib64/libc.so.6+0x3a7e4) (BuildId: 889235a2805b8308b2d0274921bbe1890e9a1986)
```
While actually this stack trace contains supresseed function (postmaster_strdup in this particular scenario), it cannot obtain function name, so detected leak is propagated and test fails.
Replaced lsan-suppressions.txt entries with code annotations using `__lsan_ignore_object` to address this issue.
Also cleaned up lsan-supressions.txt and resolved issue yugabyte#27496 by adding missing call to YBCFreeStatus.
Jira: DB-17615
Test Plan: ./yb_build.sh asan --cxx-test pgwrapper_pg_wait_on_conflict-test --gtest_filter PgTabletSplittingWaitQueuesTest.SplitTablet -n 200 -- -p 8
Reviewers: dmitry, esheng, rthallam
Reviewed By: dmitry, esheng, rthallam
Subscribers: rthallam, ybase, yql
Tags: #jenkins-ready
Differential Revision: https://phorge.dev.yugabyte.com/D45426
jmeehan16
pushed a commit
that referenced
this pull request
Oct 21, 2025
…s closed in multi route pooling
Summary:
**Issue Summary**
A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:
frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2
**Root Cause**
The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.
**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.
In the connection manager code base:
Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.
Jira: DB-17501
Test Plan: Jenkins: all tests
Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena
Reviewed By: skumar
Subscribers: svc_phabricator, yql
Differential Revision: https://phorge.dev.yugabyte.com/D45641
jmeehan16
pushed a commit
that referenced
this pull request
Oct 21, 2025
Summary: On running the connection burst test following core was generated (lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey" Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded. (lldb) bt all error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45 * thread #1, name = 'odyssey', stop reason = signal SIGSEGV * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19 frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8 frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11 frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13 frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12 frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2 frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2 Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash. The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer. This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer. Jira: DB-17729 Test Plan: Jenkins: all tests Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena Reviewed By: skumar Subscribers: svc_phabricator, yql Differential Revision: https://phorge.dev.yugabyte.com/D45583
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…mp by using pg_strdup for tablegroup_name Summary: As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`). ``` [P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118 [P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0 [P-yb-controller-1] #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5 [P-yb-controller-1] #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8 [P-yb-controller-1] #2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] #6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd) [P-yb-controller-1] [P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80) [P-yb-controller-1] freed by thread T0 here: [P-yb-controller-1] #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 [P-yb-controller-1] #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3 [P-yb-controller-1] #2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4 [P-yb-controller-1] #3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] #4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] #5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] #6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] #7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) [P-yb-controller-1] [P-yb-controller-1] previously allocated by thread T0 here: [P-yb-controller-1] #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3 [P-yb-controller-1] #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28 [P-yb-controller-1] #2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4 [P-yb-controller-1] #3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11 [P-yb-controller-1] #4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2 [P-yb-controller-1] #5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3 [P-yb-controller-1] #6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19 [P-yb-controller-1] #7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9 [P-yb-controller-1] yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8 [P-yb-controller-1] yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8 [P-yb-controller-1] yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10 [P-yb-controller-1] yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3 [P-yb-controller-1] yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4 [P-yb-controller-1] yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4 [P-yb-controller-1] yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3 [P-yb-controller-1] yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802) ``` This revision fixes the issue by using pg_strdup to make a copy of the string. Jira: DB-15915 Original commit: 7eea1de / D43386 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated Reviewers: aagrawal, skumar, mlillibridge, sergei Reviewed By: aagrawal Subscribers: yql, sergei Differential Revision: https://phorge.dev.yugabyte.com/D43421
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…nge code
Summary:
Running Java test TestPgRegressPgTypesUDT on ASAN fails with
ts1|pid74720|:24411 ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15: runtime error: applying non-zero offset 8 to null pointer
ts1|pid74720|:24411 #0 0x55eb30c5c61b in qsort_arg ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15
ts1|pid74720|:24411 #1 0x55eb30883c9a in multirange_canonicalize ${YB_SRC_ROOT}/src/postgres/src/backend/utils/adt/multirangetypes.c:481:2
ts1|pid74720|:24411 #2 0x55eb30883c9a in make_multirange ${YB_SRC_ROOT}/src/postgres/src/backend/utils/adt/multirangetypes.c:648:16
ts1|pid74720|:24411 #3 0x55eb2ffd01db in ExecInterpExpr ${YB_SRC_ROOT}/src/postgres/src/backend/executor/execExprInterp.c:731:8
...
ts1|pid74720|:24411 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ${YB_SRC_ROOT}/src/postgres/src/include/lib/sort_template.h:301:15
ts1|pid74720|:24411 2025-05-30 04:15:01.687 UTC [74992] WARNING: server process (PID 76254) exited with exit code 1
ts1|pid74720|:24411 2025-05-30 04:15:01.687 UTC [74992] DETAIL: Failed process was running: select textmultirange();
pg_regress|pid75085|stdout test yb.port.multirangetypes ... FAILED (test process exited with exit code 2) 1615 ms
The issue is that multirange_constructor0 passes NULL, which flows all
the way down to qsort_arg, and qsort_arg attempts pointer arithmetic on
that NULL. Fix by returning early before that.
This fix may impact other cases besides multirange since qsort_arg is
used in several places, but given no issues were reported till now,
perhaps it isn't possible to pass NULL to qsort_arg through those
places.
Jira: DB-16985
Test Plan:
On Almalinux 8:
./yb_build.sh asan daemons initdb \
--java-test 'org.yb.pgsql.TestPgRegressPgTypesUDT#schedule'
Close: yugabyte#27447
Original commit: 45c49e6 / D44464
Reviewers: telgersma
Reviewed By: telgersma
Differential Revision: https://phorge.dev.yugabyte.com/D44545
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…acks in object lock/release functions at TabletService Summary: Original commit: 790195b / D44663 In functions `TabletServiceImpl::AcquireObjectLocks` and `TabletServiceImpl::ReleaseObjectLocks`, we weren't returning after executing the rpc callback with initial validation steps fail. This led to segv issues like below ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] std::__1::unique_ptr<yb::tserver::TSLocalLockManager::Impl, std::__1::default_delete<yb::tserver::TSLocalLockManager::Impl>>::operator->[abi:ne190100](this=0x0000000000000000) const at unique_ptr.h:272:108 frame #1: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) [inlined] yb::tserver::TSLocalLockManager::AcquireObjectLocksAsync(this=0x0000000000000000, req=0x00005001bfffa290, deadline=yb::CoarseTimePoint @ x23, callback=0x0000ffefb6066560, wait=(value_ = true)) at ts_local_lock_manager.cc:541:3 frame #2: 0x0000aaaac351e5f0 yb-tserver`yb::tserver::TabletServiceImpl::AcquireObjectLocks(this=0x00005001bdaf6020, req=0x00005001bfffa290, resp=0x00005001bfffa300, context=<unavailable>) at tablet_service.cc:3673:26 frame #3: 0x0000aaaac36bd9a0 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(this=<unavailable>, req=0x00005001bfffa290, resp=0x00005001bfffa300, rpc_context=RpcContext @ 0x0000ffefb6066600)::$_36::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::AcquireObjectLockRequestPB const*, yb::tserver::AcquireObjectLockResponsePB*, yb::rpc::RpcContext) const at tserver_service.service.cc:1470:9 frame #4: 0x0000aaaac36bd978 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:126:7 frame #5: 0x0000aaaac36bd680 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36::operator()(this=<unavailable>, call=<unavailable>) const at tserver_service.service.cc:1468:7 frame #6: 0x0000aaaac36bd5c8 yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #7: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36&, std::__1::shared_ptr<yb::rpc::InboundCall>>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 frame yugabyte#8: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __arg=<unavailable>) at function.h:171:12 frame yugabyte#9: 0x0000aaaac36bd5bc yb-tserver`std::__1::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36, std::__1::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_36>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=<unavailable>) at function.h:313:10 frame yugabyte#10: 0x0000aaaac36d1384 yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ne190100](this=<unavailable>, __args=nullptr) const at function.h:430:12 frame yugabyte#11: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(this=<unavailable>, __arg=nullptr) const at function.h:989:10 frame yugabyte#12: 0x0000aaaac36d136c yb-tserver`yb::tserver::TabletServerServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_service.service.cc:913:3 frame yugabyte#13: 0x0000aaaac30e05b4 yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x00005001bff9b8c0, incoming=nullptr) at service_pool.cc:275:19 frame yugabyte#14: 0x0000aaaac3006ed0 yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:309:13 frame yugabyte#15: 0x0000aaaac30ec868 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x00005001bff5c640, task=0x00005001bfdf1958) at thread_pool.cc:138:13 frame yugabyte#16: 0x0000aaaac39afd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x00005001bfe1e750) const at function.h:430:12 frame yugabyte#17: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x00005001bfe1e750) const at function.h:989:10 frame yugabyte#18: 0x0000aaaac39afd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x00005001bfe1e6e0) at thread.cc:937:3 ``` This revision addresses the issue by returning after executing the rpc callback with validation failure status. Jira: DB-17124 Test Plan: Jenkins Reviewers: rthallam, amitanand, #db-approvers Reviewed By: rthallam, #db-approvers Subscribers: svc_phabricator, ybase Differential Revision: https://phorge.dev.yugabyte.com/D44684
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…adpool once shutdown flags are set at ObjectLockManager Summary: Original commit: f5197a2 / D44662 In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms). This led to one of the itests failing with the below stack ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10 frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10 frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3 frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5 frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27 frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17 frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13 frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10 frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26 frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10 frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28 frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22 frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18 frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25 frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12 frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10 frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3 frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392 frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12 ``` This is due to accessing unique_ptr `thread_pool_token_` after it has been reset. This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset. Jira: DB-17121 Test Plan: Jenkins ./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {' Reviewers: rthallam, amitanand, sergei Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D44728
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…f docdb txn incase of abort when object locking feature is enabled Summary: Original commit: 7d02f94 / D44857 Timed out requests at the Object lock Manager are resumed organically by the poller thread once the deadline is exceeded. So this could mean that a ysql statement could time out, execute a finish transaction rpc and could initiate a new ysql transaction before the poller resumes the obsolete timed out lock waiter. Given that we try re-using docdb transactions wherever possible when object locking is enabled, the above in combination with transaction re-use could lead to undesired issues. Here's how the OLM works in brief: 1. While serving new incoming lock requests, shared `TrackedTransactionLockEntry` is created if one doesn't exists, keyed against txn id, stored in `txn_locks_` 2. `PrepareAcquire` is executed which validates whether the lock request should be tried. 3. moves on to acquire the lock if available, else enters the wait queue. 4. When waiters are resumed, goto step 2. 5. When serving a release all request, remove entry from `txn_locks_`, release acquired locks, and let obsolete waiting locks be resumed by the poller thread. Here's the brief working of the object lock tracker code. 1. For incoming lock requests, instrument the lock in the corresponding map key against `<txn, subtxn>` with state set to `WAITING` and invoke the OLM. 2. When OLM executes the lock callback, tap into it, try finding the map with key `<txn, subtxn>`, and if exists, change the state of the lock entry in the map accordingly. Consider the following scenario: 1. ysql starts read only `ysql_txnA`, issues a lock request. this is associated with `docdb_txnA`, and the lock request is forwarded to the OLM. 2. lock request enters the wait-queue 3. YSQL detects timeout, cancels the request, issues a finish txn call. `docdb_txnA` doesn't get consumed since it didn't write any intents nor the txn itself failed (failed heartbeats). OLM erases `TrackedTransactionLockEntry` keyed against `docdb_txnA`. 4. YSQL start a new `ysql_txnB`, issues a lock request. docdb re-uses `docdb_txnA` issues the lock request, OLM creates a new entry for the same transaction id, and moves on to acquire the lock. 5. The poller might now realize that the earlier waiting request timed out, and try resuming it. This results in issues with observability, since at some point the OLM has state corresponding to different ysql transactions stored under the same docdb transaction id. As a consequence, it results in a segv with the lock tracking code in the above scenario as follows: - Step 1 creates a map for key `<docdb_txnA, 1>`, and inserts `key(lock)`. - Step 3 erases the map for key `<docdb_txnA, 1>`. - Step 4 creates a new map for key `<docdb_txnA, 1>`, and inserts `key(lock_new)`. - Step 5 tries to new access entry `key(lock)` in the new map which doesn't exists, resulting in a segv ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x0000aaaaead1b198 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] yb::tserver::ObjectLockTracker::UntrackLock(this=<unavailable>, lock_context=0x000010f1bbe20800) at ts_local_lock_manager.cc:134:25 frame #1: 0x0000aaaaead1b0c8 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) at ts_local_lock_manager.cc:104:7 frame #2: 0x0000aaaaead1b0b4 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(this=0x000010f1b5ad4cd0, status=Status @ 0x0000ffef78896640)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)::operator()(yb::Status) const at ts_local_lock_manager.cc:324:46 frame #3: 0x0000aaaaead1b034 yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] decltype(std::declval<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&>()(std::declval<yb::Status const&>())) std::__1::__invoke[abi:ne190100]<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&, yb::Status const&>(__f=<unavailable>, __args=<unavailable>) at invoke.h:149:25 frame #4: 0x0000aaaaead1b01c yb-tserver`std::__1::__function::__func<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status), std::__1::allocator<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)>, void (yb::Status const&)>::operator()(yb::Status const&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190100]<yb::tserver::TSLocalLockManager::Impl::PrepareAndExecuteAcquire(yb::tserver::AcquireObjectLockRequestPB const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, std::__1::function<void (yb::Status const&)>&, yb::StronglyTypedBool<yb::tserver::WaitForBootstrap_Tag>)::'lambda'(yb::Status)&, yb::Status const&>(__args=<unavailable>, __args=<unavailable>) at invoke.h:224:5 ``` This revision address the issue by burning/consuming the docdb transaction in case of YSQL transaction abort. This would force the `ysql_txnB` above to be associated with a new `docdb_txnB` and wouldn't lead to any observability issues/mixed state. Note that there isn't any problem with reusing docdb transactions across ysql read-only transactions that move on to commit successfully. This is because if the ysql transaction moves on to commit, it implies that all the object locks were granted => no waiting object locks, and hence all corresponding entries of the respective docdb transaction would be released at the OLM in-line with the commit. Jira: DB-17123 Test Plan: Jenkins ./yb_build.sh --cxx-test pg_object_locks-test --gtest_filter PgObjectLocksTestRF1.TestDisableReuseAbortedPlainTxn ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPgAuth#schedule' Reviewers: #db-approvers, amitanand, yyan, rthallam Reviewed By: amitanand Subscribers: yql, ybase, svc_phabricator Differential Revision: https://phorge.dev.yugabyte.com/D45004
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…e to stack-overflow during index backfill. Summary: In the last few weeks we have seen few instances of the stress test (with various nemesis) run into a master crash caused by a stack trace that looks like: ``` * thread #1, name = 'yb-master', stop reason = signal SIGSEGV: invalid address * frame #0: 0x0000aaaad52f5fc4 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::shared_ptr<yb::master::BackfillTablet>::shared_ptr[abi:ue170006]<yb::master::BackfillTablet, void>(this=<unavailable>, __r=std::__1:: weak_ptr<yb::master::BackfillTablet>::element_type @ 0x000013e4bf787778) at shared_ptr.h:701:20 frame #1: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] std::__1::enable_shared_from_this<yb::master::BackfillTablet>::shared_from_this[abi:ue170006](this=0x000013e4bf787778) at shared_ptr.h:1954:17 frame #2: 0x0000aaaad52f5fbc yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=0x000013e4bf787778) at backfill_index.cc:1300:50 frame #3: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame #4: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4d458) at backfill_index.cc:1620:5 frame #5: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:470:3 frame #6: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4d458) at async_rpc_tasks.cc:273:5 frame #7: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4d458) at backfill_index.cc:1463:19 frame yugabyte#8: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#9: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc:1323: 10 frame yugabyte#10: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cd98) at backfill_index.cc:1620:5 frame yugabyte#11: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:470:3 frame yugabyte#12: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cd98) at async_rpc_tasks.cc:273:5 frame yugabyte#13: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cd98) at backfill_index.cc:1463:19 frame yugabyte#14: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#15: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#16: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bbd4cfd8) at backfill_index.cc:1620:5 frame yugabyte#17: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:470:3 frame yugabyte#18: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bbd4cfd8) at async_rpc_tasks.cc:273:5 frame yugabyte#19: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bbd4cfd8) at backfill_index.cc:1463:19 frame yugabyte#20: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#21: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 ... frame yugabyte#2452: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4bdc7ed98) at backfill_index.cc:1620:5 frame yugabyte#2453: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:470:3 frame yugabyte#2454: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4bdc7ed98) at async_rpc_tasks.cc:273:5 frame yugabyte#2455: 0x0000aaaad52f63f0 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone() [inlined] yb::master::BackfillChunk::Launch(this=0x000013e4bdc7ed98) at backfill_index.cc:1463:19 frame yugabyte#2456: 0x0000aaaad52f6324 yb-master`yb::master::BackfillTablet::LaunchNextChunkOrDone(this=<unavailable>) at backfill_index.cc:1303:19 frame yugabyte#2457: 0x0000aaaad52fb0d4 yb-master`yb::master::BackfillTablet::Done(this=0x000013e4bf787778, status=<unavailable>, backfilled_until=<unavailable>, number_rows_processed=<unavailable>, failed_indexes=<unavailable>) at backfill_index.cc: 1323:10 frame yugabyte#2458: 0x0000aaaad52f9dd8 yb-master`yb::master::BackfillChunk::UnregisterAsyncTaskCallback(this=0x000013e4ba1ff458) at backfill_index.cc:1620:5 frame yugabyte#2459: 0x0000aaaad52be9e0 yb-master`yb::master::RetryingRpcTask::UnregisterAsyncTask(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:470:3 frame yugabyte#2460: 0x0000aaaad52bd4d8 yb-master`yb::master::RetryingRpcTask::Run(this=0x000013e4ba1ff458) at async_rpc_tasks.cc:273:5 frame yugabyte#2461: 0x0000aaaad52c0260 yb-master`yb::master::RetryingRpcTask::RunDelayedTask(this=0x000013e4ba1ff458, status=0x0000ffffab2668c0) at async_rpc_tasks.cc:432:14 frame yugabyte#2462: 0x0000aaaad5c3f838 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] boost::function1<void, yb::Status const&>::operator()(this=0x000013e4bff63b18, a0=0x0000ffffab2668c0) const at function_template.hpp:763:14 frame yugabyte#2463: 0x0000aaaad5c3f81c yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(ev_loop*, ev_timer*, int) [inlined] yb::rpc::DelayedTask:: TimerHandler(this=0x000013e4bff63ae8, watcher=<unavailable>, revents=<unavailable>) at delayed_task.cc:155:5 frame yugabyte#2464: 0x0000aaaad5c3f284 yb-master`void ev::base<ev_timer, ev::timer>::method_thunk<yb::rpc::DelayedTask, &yb::rpc::DelayedTask::TimerHandler(ev::timer&, int)>(loop=<unavailable>, w=<unavailable>, revents=<unavailable>) at ev++.h:479:7 frame yugabyte#2465: 0x0000aaaad4cdf170 yb-master`ev_invoke_pending + 112 frame yugabyte#2466: 0x0000aaaad4ce21fc yb-master`ev_run + 2940 frame yugabyte#2467: 0x0000aaaad5c725fc yb-master`yb::rpc::Reactor::RunThread() [inlined] ev::loop_ref::run(this=0x000013e4bfcfadf8, flags=0) at ev++.h:211:7 frame yugabyte#2468: 0x0000aaaad5c725f4 yb-master`yb::rpc::Reactor::RunThread(this=0x000013e4bfcfadc0) at reactor.cc:735:9 frame yugabyte#2469: 0x0000aaaad65c61d8 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ue170006](this=0x000013e4bfeffa80) const at function.h:517:16 frame yugabyte#2470: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000013e4bfeffa80) const at function.h:1168:12 frame yugabyte#2471: 0x0000aaaad65c61c4 yb-master`yb::Thread::SuperviseThread(arg=0x000013e4bfeffa20) at thread.cc:895:3 ``` Essentially, a BackfillChunk is considered done (without sending out an RPC) and launches the next BackfillChunk; which does the same. This may happen if `BackfillTable::indexes_to_build()` is empty, or if the `backfill_jobs()` is empty. However, based on the code reading we should only get there, ** after ** marking `BackfillTable::done_` as `true`. If for some reason, we have `indexes_to_build()` as `empty` and `BackfillTable::done_ == false`, we could get into this infinite recursion. Since I am unable to explain and recreate how this happens, I'm adding a test flag `TEST_simulate_empty_indexes` to repro this. Fix: We update `BackfillChunk::SendRequest` to handle the empty `indexes_to_build()` as a failure rather than treating this as a success. This prevents the infinite recursion. Also, adding a few log lines that may help better understand the scenario if we run into this again. Jira: DB-17296 Original commit: 5d402b5 / D45031 Test Plan: yb_build.sh fastdebug --cxx-test pg_index_backfill-test --gtest_filter *.SimulateEmptyIndexesForStackOverflow* Reviewers: zdrudi, rthallam, jason, #db-approvers Reviewed By: rthallam Subscribers: svc_phabricator, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D45138
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…cking the route Summary: On running the connection burst test following core was generated (lldb) target create "/home/yugabyte/yb-software/yugabyte-2024.2.3.0-b116-centos-x86_64/bin/odyssey" --core "/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey" Core file '/home/yugabyte/cores/core_41219_1752696376_!home!yugabyte!yb-software!yugabyte-2024.2.3.0-b116-centos-x86_64!bin!odyssey' (x86_64) was loaded. (lldb) bt all error: odyssey GetDIE for DIE 0x3c is outside of its CU 0x66d45 * thread #1, name = 'odyssey', stop reason = signal SIGSEGV * frame #0: 0x0000564340e2cc6f odyssey`od_backend_connect(server=0x00005138fc5ef6c0, context="", route_params=0x0000000000000000, client=0x00005138ff7a2580) at backend.c:815:19 frame #1: 0x0000564340e2a80e odyssey`od_frontend_attach(client=0x00005138ff7a2580, context="", route_params=0x0000000000000000) at frontend.c:305:8 frame #2: 0x0000564340e26b11 odyssey`od_frontend_remote [inlined] od_frontend_attach_and_deploy(client=0x00005138ff7a2580, context=<unavailable>) at frontend.c:361:11 frame #3: 0x0000564340e26afe odyssey`od_frontend_remote(client=0x00005138ff7a2580) at frontend.c:2120:13 frame #4: 0x0000564340e22d65 odyssey`od_frontend(arg=0x00005138ff7a2580) at frontend.c:2756:12 frame #5: 0x0000564340e4b912 odyssey`mm_scheduler_main(arg=0x00005138fc218dc0) at scheduler.c:17:2 frame #6: 0x0000564340e4bb77 odyssey`mm_context_runner at context.c:28:2 Which points to storage = route->rule->storage; meaning rule has already been set to NULL which lead to above crash. The root cause is a race condition in the object cleanup. The rule associated with a route was being de-referenced (unref) outside of a lock protecting the route object while cleaning up the route. This allows for a scenario where one thread could proceed to clean up the rule, while another thread simultaneously acquires a lock on the same route and attempts to use its rule pointer, which would now be a dangling pointer. This diff move the de-referencing of the rule object to a code block where a lock is already acquired on the route object. This change ensures atomic handling of the route and its associated rule, preventing any concurrent access to an invalid pointer. Original commit: None / D45583 Jira: DB-17729 Test Plan: Jenkins: all tests Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena Reviewed By: skumar Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D45653
kgalieva
pushed a commit
that referenced
this pull request
Nov 6, 2025
…te until server is closed in multi route pooling
Summary:
**Issue Summary**
A core dump was triggered during a ConnectionBurst stress test, with the crash occurring in the od_backend_close_connection function with multi route pooling. The stack trace is as follows:
frame #0: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] mm_tls_free(io=0x0000000000000000) at tls.c:91:10
frame #1: 0x00005601a62712bc odyssey`od_backend_close_connection [inlined] machine_io_free(obj=0x0000000000000000) at io.c:201:2
frame #2: 0x00005601a627129e odyssey`od_backend_close_connection [inlined] od_io_close(io=0x000031f53e72b8b8) at io.h:77:2
frame #3: 0x00005601a627128c odyssey`od_backend_close_connection(server=0x000031f53e72b880) at backend.c:56:2
frame #4: 0x00005601a6250de5 odyssey`od_router_attach(router=0x00007fff00dbeb30, client_for_router=0x000031f53e5df180, wait_for_idle=<unavailable>, external_client=0x000031f53ee30680) at router.c:1010:6
frame #5: 0x00005601a6258b1b odyssey`od_auth_frontend [inlined] yb_execute_on_control_connection(client=0x000031f53ee30680, function=<unavailable>) at frontend.c:2842:11
frame #6: 0x00005601a6258b0b odyssey`od_auth_frontend(client=0x000031f53ee30680) at auth.c:677:8
frame #7: 0x00005601a626782e odyssey`od_frontend(arg=0x000031f53ee30680) at frontend.c:2539:8
frame yugabyte#8: 0x00005601a6290912 odyssey`mm_scheduler_main(arg=0x000031f53e390000) at scheduler.c:17:2
frame yugabyte#9: 0x00005601a6290b77 odyssey`mm_context_runner at context.c:28:2
**Root Cause**
The crash originated from an improper lock release in the yb_get_idle_server_to_close function, introduced in commit 55beeb0 during multi-route pooling implementation. The function released the lock on the route object, despite a comment explicitly warning against it. After returning to its caller, no lock was held on the route or idle_route. This allowed other coroutines to access and use the same route and its idle server, which the original coroutine intended to close. This race condition led to a crash due to an assertion failure during connection closure.
**Note**
If the order of acquiring locks is the same across all threads or processes differences in the release order alone cannot cause a deadlock. Deadlocks arise from circular dependencies during acquisition, not release.
In the connection manager code base:
Locks are acquired in the order: router → route. This order must be strictly enforced everywhere to prevent deadlocks.
Lock release order varies (e.g., router then route in od_router_route and yb_get_idle_server_to_close, versus the reverse elsewhere). This variation does not cause deadlocks, as release order is irrelevant to deadlock prevention.
Original commit: None / D45641
Jira: DB-17501
Test Plan: Jenkins: all tests
Reviewers: skumar, vikram.damle, asrinivasan, arpit.saxena
Reviewed By: skumar
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D45657
cameron-p-m
pushed a commit
that referenced
this pull request
Nov 26, 2025
Summary: The stacktrace of the core dump: ``` (lldb) bt all * thread #1, name = 'postgres', stop reason = signal SIGSEGV: address not mapped to object * frame #0: 0x0000aaaac59fb720 postgres`FreeTupleDesc [inlined] GetMemoryChunkContext(pointer=0x0000000000000000) at memutils.h:141:12 frame #1: 0x0000aaaac59fb710 postgres`FreeTupleDesc [inlined] pfree(pointer=0x0000000000000000) at mcxt.c:1500:26 frame #2: 0x0000aaaac59fb710 postgres`FreeTupleDesc(tupdesc=0x000013d7fd8dccc8) at tupdesc.c:326:5 frame #3: 0x0000aaaac61c7204 postgres`RelationDestroyRelation(relation=0x000013d7fd8dc9a8, remember_tupdesc=false) at relcache.c:4577:4 frame #4: 0x0000aaaac5febab8 postgres`YBRefreshCache at relcache.c:5216:3 frame #5: 0x0000aaaac5feba94 postgres`YBRefreshCache at postgres.c:4442:2 frame #6: 0x0000aaaac5feb50c postgres`YBRefreshCacheWrapperImpl(catalog_master_version=0, is_retry=false, full_refresh_allowed=true) at postgres.c:4570:3 frame #7: 0x0000aaaac5feea34 postgres`PostgresMain [inlined] YBRefreshCacheWrapper(catalog_master_version=0, is_retry=false) at postgres.c:4586:9 frame yugabyte#8: 0x0000aaaac5feea2c postgres`PostgresMain [inlined] YBCheckSharedCatalogCacheVersion at postgres.c:4951:3 frame yugabyte#9: 0x0000aaaac5fee984 postgres`PostgresMain(dbname=<unavailable>, username=<unavailable>) at postgres.c:6574:4 frame yugabyte#10: 0x0000aaaac5efe5b4 postgres`BackendRun(port=0x000013d7ffc06400) at postmaster.c:4995:2 frame yugabyte#11: 0x0000aaaac5efdd08 postgres`ServerLoop [inlined] BackendStartup(port=0x000013d7ffc06400) at postmaster.c:4701:3 frame yugabyte#12: 0x0000aaaac5efdc70 postgres`ServerLoop at postmaster.c:1908:7 frame yugabyte#13: 0x0000aaaac5ef8ef8 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) at postmaster.c:1562:11 frame yugabyte#14: 0x0000aaaac5ddae1c postgres`PostgresServerProcessMain(argc=25, argv=0x000013d7ffe068f0) at main.c:213:3 frame yugabyte#15: 0x0000aaaac59dee38 postgres`main + 36 frame yugabyte#16: 0x0000ffff9f606340 libc.so.6`__libc_start_call_main + 112 frame yugabyte#17: 0x0000ffff9f606418 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152 frame yugabyte#18: 0x0000aaaac59ded34 postgres`_start + 52 ``` It is related to invalidation message. The test involves concurrent DDL execution without object locking. I added a few logs to help to debug this issue. Test Plan: (1) Append to the end of file ./build/latest/postgres/share/postgresql.conf.sample: ``` yb_debug_log_catcache_events=1 log_min_messages=DEBUG1 ``` (2) Create a RF-1 cluster ``` ./bin/yb-ctl create --rf 1 ``` (3) Run the following example via ysqlsh: ``` -- === 1. SETUP === DROP TABLE IF EXISTS accounts_timetravel; CREATE TABLE accounts_timetravel ( id INT PRIMARY KEY, balance INT, last_updated TIMESTAMPTZ ); INSERT INTO accounts_timetravel VALUES (1, 1000, now()); \echo '--- 1. Initial Data (The Past) ---' SELECT * FROM accounts_timetravel; -- Wait 2 seconds SELECT pg_sleep(2); -- === 2. CAPTURE THE "PAST" HLC TIMESTAMP === -- -- *** THIS IS THE FIX *** -- Get the current time as seconds from the Unix epoch, -- multiply by 1,000,000 to get microseconds, -- and cast to a big integer. -- SELECT (EXTRACT(EPOCH FROM now())*1000000)::bigint AS snapshot_hlc \gset SELECT :snapshot_hlc; \echo '--- (Snapshot HLC captured) ---' SELECT * FROM pg_yb_catalog_version; -- Wait 2 more seconds SELECT pg_sleep(2); -- === 3. UPDATE THE DATA === UPDATE accounts_timetravel SET balance = 500, last_updated = now() WHERE id = 1; \echo '--- 2. New Data (The Present) ---' SELECT * FROM accounts_timetravel; CREATE TABLE foo(id int); -- increment the catalog version ALTER TABLE foo ADD COLUMN val TEXT; SELECT * FROM pg_yb_catalog_version; -- === 4. PERFORM THE TIME-TRAVEL QUERY === -- -- Set our 'read_time_guc' variable to the HLC value -- \set read_time_guc :snapshot_hlc \echo '--- 3. Time-Travel Read (Querying the Past) ---' \echo 'Setting yb_read_time to HLC (microseconds):' :read_time_guc -- This will now be interpolated correctly and will succeed. SET yb_read_time = :read_time_guc; -- This query will now correctly read the historical data SELECT * FROM accounts_timetravel; SELECT * FROM pg_yb_catalog_version; -- === 5. CLEANUP === RESET yb_read_time; \echo '--- 4. Back to the Present ---' SELECT * FROM accounts_timetravel; DROP TABLE accounts_timetravel; ``` (4) Look at the postgres log for the following samples: ``` 2025-11-07 18:31:06.223 UTC [3321231] LOG: Preloading relcache for database 13524, session user id: 10, yb_read_time: 0 ``` ``` 2025-11-07 18:31:06.303 UTC [3321231] LOG: Building relcache entry for pg_index (oid 2610) took 785 us ``` ``` 2025-11-07 18:31:09.265 UTC [3321221] LOG: Rebuild relcache entry for accounts_timetravel (oid 16384) ``` ``` 2025-11-07 18:31:09.525 UTC [3321221] LOG: Delete relcache entry for accounts_timetravel (oid 16384) ``` ``` 2025-11-07 18:31:14.035 UTC [3321221] DEBUG: Setting yb_read_time to 1762540271568993 ``` ``` 2025-11-07 18:31:14.037 UTC [3321221] LOG: Preloading relcache for database 13524, session user id: 13523, yb_read_time: 1762540271568993 ``` ``` 2025-11-07 18:31:14.183 UTC [3321221] DEBUG: Setting yb_read_time to 0 ``` Reviewers: kfranz, #db-approvers Reviewed By: kfranz, #db-approvers Subscribers: jason, yql Differential Revision: https://phorge.dev.yugabyte.com/D48114
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bumps github.com/golang-jwt/jwt/v4 from 4.4.2 to 4.5.2.
Release notes
Sourced from github.com/golang-jwt/jwt/v4's releases.
Commits
2f0e9adBackporting 0951d18 to v47b1c1c0Merge commit from fork9358574Allow strict base64 decoding (#259)2f0984aUsingtparsefor nicer CI test display (#251)2101c1fNo pointer embedding in the example (#255)35053d4Removed unneeded if statement (#241)0c4e387Add doc comment to ParseWithClaims (#232)bfea432Include https://github.com/golang-jwt/jwe in README (#229)d81acbfBump matrix to support latest go version (go1.19) (#231)fdaf0ebImplement a BearerExtractor (#226)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.