Commit e93b204
committed
[BACKPORT 2025.1][yugabyte#27557] DocDB: Don't schedule tasks on threadpool once shutdown flags are set at ObjectLockManager
Summary:
Original commit: f5197a2 / D44662
In context of object locking, commit 6e80c56 / D44228 got rid of logic that signaled obsolete waiters corresponding to transactions that issued a release all locks request (could have been terminated to failures like timeout, deadlock etc) in order to early terminate failed waiting requests. Hence, now we let the obsolete requests terminate organically from the OLM resumed by the poller thread that runs at an interval of `olm_poll_interval_ms` (defaults to 100ms).
This led to one of the itests failing with the below stack
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV: address not mapped to object
* frame #0: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(std::__1::function<void ()>) [inlined] yb::ThreadPoolToken::Submit(this=<unavailable>, r=<unavailable>) at threadpool.cc:146:10
frame #1: 0x0000aaaac8a093ec yb-tserver`yb::ThreadPoolToken::SubmitFunc(this=0x0000000000000000, f=<unavailable>) at threadpool.cc:142:10
frame #2: 0x0000aaaac73cdfe8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoSignal(this=0x00003342bfa0d400, entry=<unavailable>) at object_lock_manager.cc:767:3
frame #3: 0x0000aaaac73cc7c0 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>, yb::docdb::LockData&&, yb::StronglyTypedBool<yb::docdb::(anonymous
namespace)::IsLockRetry_Tag>, unsigned long, yb::Status) [inlined] yb::docdb::ObjectLockManagerImpl::PrepareAcquire(this=0x00003342bfa0d400, txn_lock=<unavailable>, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous
namespace)::TrackedTransactionLockEntry>::element_type @ 0x00003342bfa94a38, data=0x00003342b9a6a830, resume_it_offset=<unavailable>, resume_with_status=<unavailable>) at object_lock_manager.cc:523:5
frame #4: 0x0000aaaac73cc6a8 yb-tserver`yb::docdb::ObjectLockManagerImpl::DoLock(this=0x00003342bfa0d400, transaction_entry=std::__1::shared_ptr<yb::docdb::(anonymous namespace)::TrackedTransactionLockEntry>::element_type @
0x00003342bfa94a38, data=0x00003342b9a6a830, is_retry=(value_ = true), resume_it_offset=<unavailable>, resume_with_status=Status @ 0x0000ffefaa036658) at object_lock_manager.cc:552:27
frame #5: 0x0000aaaac73cbcb4 yb-tserver`yb::docdb::WaiterEntry::Resume(this=0x00003342b9a6a820, lock_manager=0x00003342bfa0d400, resume_with_status=<unavailable>) at object_lock_manager.cc:381:17
frame #6: 0x0000aaaac85bdd4c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() at object_lock_manager.cc:752:13
frame #7: 0x0000aaaac85bda74 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::docdb::ObjectLockManager::Shutdown(this=<unavailable>) at object_lock_manager.cc:1092:10
frame yugabyte#8: 0x0000aaaac85bda6c yb-tserver`yb::tserver::TSLocalLockManager::Shutdown() [inlined] yb::tserver::TSLocalLockManager::Impl::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:411:26
frame yugabyte#9: 0x0000aaaac85bd7e8 yb-tserver`yb::tserver::TSLocalLockManager::Shutdown(this=<unavailable>) at ts_local_lock_manager.cc:566:10
frame yugabyte#10: 0x0000aaaac8665a34 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ResetAndGetTSLocalLockManager(this=0x000033423fc1ad80) at tablet_server.cc:797:28
frame yugabyte#11: 0x0000aaaac8665a18 yb-tserver`yb::tserver::YsqlLeasePoller::Poll() [inlined] yb::tserver::TabletServer::ProcessLeaseUpdate(this=0x000033423fc1ad80, lease_refresh_info=0x000033423a476b80) at tablet_server.cc:828:22
frame yugabyte#12: 0x0000aaaac8665950 yb-tserver`yb::tserver::YsqlLeasePoller::Poll(this=<unavailable>) at ysql_lease_poller.cc:143:18
frame yugabyte#13: 0x0000aaaac8438d58 yb-tserver`yb::tserver::MasterLeaderPollScheduler::Impl::Run(this=0x000033423ff5cc80) at master_leader_poller.cc:125:25
frame yugabyte#14: 0x0000aaaac89ffd18 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator()[abi:ne190100](this=0x000033423ffc7930) const at function.h:430:12
frame yugabyte#15: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator()(this=0x000033423ffc7930) const at function.h:989:10
frame yugabyte#16: 0x0000aaaac89ffd04 yb-tserver`yb::Thread::SuperviseThread(arg=0x000033423ffc78c0) at thread.cc:937:3
frame yugabyte#17: 0x0000ffffac0378b8 libpthread.so.0`start_thread + 392
frame yugabyte#18: 0x0000ffffac093afc libc.so.6`thread_start + 12
```
This is due to accessing unique_ptr `thread_pool_token_` after it has been reset.
This revision fixes the issue by not scheduling any tasks on the threadpool once the shutdown flags has been set (hence not accessing `thread_pool_token_`). Since we wait for in-progress requests at the OLM and also in-progress resume tasks scheduled on the messenger using `waiters_amidst_resumption_on_messenger_`, it is safe to say that `thread_pool_token_` would not be accessed once it is reset.
Jira: DB-17121
Test Plan:
Jenkins
./yb_build.sh --cxx-test='TEST_F(PgObjectLocksTestRF1, TestShutdownWithWaiters) {'
Reviewers: rthallam, amitanand, sergei
Reviewed By: rthallam
Subscribers: yql, ybase
Differential Revision: https://phorge.dev.yugabyte.com/D447281 parent 582af54 commit e93b204
File tree
3 files changed
+51
-3
lines changed- src/yb
- docdb
- tserver
- yql/pgwrapper
3 files changed
+51
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
729 | 729 | | |
730 | 730 | | |
731 | 731 | | |
732 | | - | |
| 732 | + | |
733 | 733 | | |
| 734 | + | |
734 | 735 | | |
735 | 736 | | |
736 | 737 | | |
| |||
761 | 762 | | |
762 | 763 | | |
763 | 764 | | |
764 | | - | |
| 765 | + | |
765 | 766 | | |
766 | 767 | | |
767 | 768 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
49 | 53 | | |
50 | 54 | | |
51 | 55 | | |
52 | 56 | | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
| 60 | + | |
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
59 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
60 | 72 | | |
61 | 73 | | |
62 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
63 | 79 | | |
64 | 80 | | |
65 | 81 | | |
| |||
663 | 679 | | |
664 | 680 | | |
665 | 681 | | |
| 682 | + | |
| 683 | + | |
666 | 684 | | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
667 | 690 | | |
668 | 691 | | |
669 | 692 | | |
| |||
747 | 770 | | |
748 | 771 | | |
749 | 772 | | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
750 | 797 | | |
751 | 798 | | |
752 | 799 | | |
| |||
0 commit comments