Skip to content

Commit bed64d0

Browse files
committed
[BACKPORT 2024.2][yugabyte#26060] docdb: Fast backward scan returns stale results for select statement with key columns only
Summary: There are several scenarios where rows are requested by key columns only (including constraints and aggregates like count). Doc reader have an optimization for this kind of reads: since actual values does not matter (because key column values can be extracted from the row's doc key) then there is no need to decode row values and it is enough to check if the row is tombstoned or not. Unfortunately, this optimization was not updated to take fast backward scan into account: an iterator stops reading when the firstly met record is not marked with a tombstone value, which is the case for fast backward scan, which reads a row in the reverse order, from the oldest to the newest change (per sub doc key), and the oldest record is generally corresponds to the initially inserted values. The change updates the algorithm if checking the row existence: the scanning continues until all row changes have been taken into account to get the actual state of the row. **Examples of row existence check before the fix** Example #1 ``` Record #1: KEY, T1 => PACKED ROW Record #2: KEY, COL1, T2 => COL1 NEW VALUE 1 Forward scan / old backward scan: #1 (alive, T1) => stop result: row exists. Fast backward scan: #2 (alive, T2) => stop result: row exists. ``` Example #2 ``` Record #1: KEY, T3 => DEL Record #2: KEY, T1 => PACKED ROW Record #3: KEY, COL1, T2 => COL1 NEW VALUE Forward scan / old backward scan: #1 (deleted, T3), #2 (skipped: T1 < T3), #3 (skipped: T2 < T3) => stop result: row does not exist. Fast backward scan: #3 (alive, T2) => stop result: row exists. ``` Example #3 ``` Record #1: KEY, T4 => PACKED ROW Record #2: KEY, T3 => DEL Record #3: KEY, T1 => PACKED ROW Record #4: KEY, COL1, T2 => COL1 NEW VALUE Forward scan / old backward scan: #1 (alive, T4) => stop result: row exists. Fast backward scan: #4 (alive, T2) => stop result: row exists. ``` **Examples of row existence check with the fix** Example #1 ``` Record #1: KEY, T1 => PACKED ROW Record #2: KEY, COL1, T2 => COL1 NEW VALUE 1 Fast backward scan: #2 (alive, T2) => #1 (skipped, T1 < T2) => stop result: row exists. ``` Example #2 ``` Record #1: KEY, T3 => DEL Record #2: KEY, T1 => PACKED ROW Record #3: KEY, COL1, T2 => COL1 NEW VALUE Fast backward scan: #3 (alive, T2) => #2 (skipped: T1 < T2) => #1 (deleted: T3 > T2) => stop result: row does not exist. ``` Example #3 ``` Record #1: KEY, T4 => PACKED ROW Record #2: KEY, T3 => DEL Record #3: KEY, T1 => PACKED ROW Record #4: KEY, COL1, T2 => COL1 NEW VALUE Fast backward scan: #4 (alive, T2) => #3 (skipped: T1 < T2) => #2 (deleted: T3 > T2) => #1 (alive: T4 > T3) => stop result: row exists. ``` Original commit: d8bd1fd / D42212 Jira: DB-15387 Test Plan: ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kNone ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV1 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV2 ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kNone **manual testing:** ``` SET yb_use_hash_splitting_by_default = FALSE; CREATE TABLE t1(c0 int UNIQUE DEFAULT 1); INSERT INTO t1(c0) VALUES (2); DELETE FROM t1; SELECT * FROM t1; SELECT * FROM t1 GROUP BY t1.c0 ORDER BY t1.c0 DESC; ``` Reviewers: sergei, rthallam Reviewed By: rthallam Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D42486
1 parent 9b62540 commit bed64d0

File tree

2 files changed

+33
-6
lines changed

2 files changed

+33
-6
lines changed

src/yb/docdb/doc_reader.cc

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -838,7 +838,10 @@ class GetHelperBase : public PackedRowContext {
838838
auto& fetched_key = VERIFY_RESULT_REF(Prepare(prefetched_key, root_write_time));
839839

840840
if constexpr (kCheckExistOnly) {
841-
if (found_) {
841+
// In case of fast backward scan enabled, row reading happens in the reverse order,
842+
// from the overy oldest to very newest record, which mean it is required to scan all
843+
// the records to understand if the row exists.
844+
if (found_ && !kFastBackward) {
842845
return FoundResult(/* iter_valid= */ true);
843846
}
844847
auto iter_valid = VERIFY_RESULT(Scan(&fetched_key, root_write_time));
@@ -901,6 +904,7 @@ class GetHelperBase : public PackedRowContext {
901904

902905
// Skip too old root records.
903906
if (key_result.write_time < root_write_time->encoded()) {
907+
DVLOG_WITH_PREFIX_AND_FUNC(4) << "record is too old, skipping";
904908
Move<kFastBackward>(data_.iter);
905909
return true; // Continue scanning.
906910
}
@@ -911,11 +915,13 @@ class GetHelperBase : public PackedRowContext {
911915
auto has_row_update = data_.packed_row->HasUpdatesAfter(key_result.write_time);
912916
if (has_row_update.value_or(true)) {
913917
// Delete record is too old, just skip it.
918+
DVLOG_WITH_PREFIX_AND_FUNC(4) << "tombstoned record is too old, skipping";
914919
Move<kFastBackward>(data_.iter);
915920
return true; // Continue scanning.
916921
}
917922

918923
// This is a more recent delete, need to be taken it into account.
924+
DVLOG_WITH_PREFIX_AND_FUNC(4) << "read tombstoned record, row is considered deleted";
919925
found_ = false;
920926
}
921927

@@ -970,7 +976,8 @@ class GetHelperBase : public PackedRowContext {
970976
DVLOG_WITH_PREFIX_AND_FUNC(4)
971977
<< "key: " << dockv::SubDocKey::DebugSliceToString(key_result.key)
972978
<< ", write time: " << key_result.write_time.ToString()
973-
<< ", value: " << key_result.value.ToDebugHexString();
979+
<< ", value: " << key_result.value.ToDebugHexString()
980+
<< ", root write time: " << root_write_time->ToString();
974981
DCHECK(key_result.key.starts_with(root_doc_key_));
975982

976983
// With the fast backward scan, the full packed row may be not the first record met during
@@ -1046,9 +1053,6 @@ class GetHelperBase : public PackedRowContext {
10461053
<< "current position: " << dockv::SubDocKey::DebugSliceToString(current_entry->key)
10471054
<< ", value: " << current_entry->value.ToDebugHexString();
10481055
RETURN_NOT_OK(ProcessEntry(subkeys, current_entry->value, current_entry->write_time));
1049-
if (kCheckExistOnly && found_) {
1050-
return false;
1051-
}
10521056

10531057
data_.packed_row->TrackColumnUpdate(current_column_->id, current_entry->write_time);
10541058

@@ -1071,7 +1075,7 @@ class GetHelperBase : public PackedRowContext {
10711075
// Check if the iterator moved out the current column after the Prev().
10721076
current_entry = &VERIFY_RESULT_REF(data_.iter->Fetch());
10731077
if (!IsEntrySubKeysSame(*current_entry, subkeys)) {
1074-
// No more entries for the given root doc key or move out the given column.
1078+
// No more entries for the given root doc key or the iterator moved out the given column.
10751079
break;
10761080
}
10771081

src/yb/yql/pgwrapper/pg_single_tserver-test.cc

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -871,10 +871,26 @@ class PgFastBackwardScanTest
871871
// A helper functor to execute a select statement for the test.
872872
auto fetch_and_validate = [this, &conn, &table_name, with_nulls](
873873
const std::string& expected_without_nulls, const std::string& expected_with_nulls) {
874+
const auto stmt =
875+
Format("SELECT c_1, c_5, r FROM $0 WHERE h = 1 ORDER BY r DESC", table_name);
876+
877+
// Make sure the statement is backward scan.
878+
const auto explain = VERIFY_RESULT(conn.FetchAllAsString("EXPLAIN ANALYZE " + stmt));
879+
SCHECK(Slice(explain).starts_with("Index Scan Backward"), RuntimeError, "");
880+
874881
return FetchAndValidate(conn,
875882
Format("SELECT c_1, c_5, r FROM $0 WHERE h = 1 ORDER BY r DESC", table_name),
876883
with_nulls ? expected_with_nulls : expected_without_nulls);
877884
};
885+
auto fetch_and_validate_pk_only = [this, &conn, &table_name](const std::string& expected) {
886+
const auto stmt = Format("SELECT r FROM $0 WHERE h = 1 ORDER BY r DESC", table_name);
887+
888+
// Make sure the statement is backward scan.
889+
const auto explain = VERIFY_RESULT(conn.FetchAllAsString("EXPLAIN ANALYZE " + stmt));
890+
SCHECK(Slice(explain).starts_with("Index Scan Backward"), RuntimeError, "");
891+
892+
return FetchAndValidate(conn, stmt, expected);
893+
};
878894

879895
// Create table.
880896
{
@@ -906,6 +922,7 @@ class PgFastBackwardScanTest
906922
ASSERT_OK(fetch_and_validate(
907923
"1, 5, 3; 1, 5, 2; 1, 5, 1", "NULL, 5, 3; NULL, 5, 2; NULL, 5, 1"
908924
));
925+
ASSERT_OK(fetch_and_validate_pk_only("3; 2; 1"));
909926

910927
if (intents_usage == IntentsUsage::kMixed) {
911928
ASSERT_OK(conn.StartTransaction(IsolationLevel::SERIALIZABLE_ISOLATION));
@@ -920,6 +937,7 @@ class PgFastBackwardScanTest
920937
"256, 260, 5; 256, 260, 4; 1, 5, 2; 4096, 5, 1",
921938
"256, 260, 5; 256, 260, 4; NULL, 5, 2; 4096, 5, 1"
922939
));
940+
ASSERT_OK(fetch_and_validate_pk_only("5; 4; 2; 1"));
923941

924942
// Re-insert data for the deleted rows.
925943
ASSERT_OK(insert_values(build_values(/* r_keys */ keys(3), 16383)));
@@ -928,25 +946,29 @@ class PgFastBackwardScanTest
928946
"256, 260, 5; 256, 260, 4; 16384, 16388, 3; 1, 5, 2; 4096, 5, 1",
929947
"256, 260, 5; 256, 260, 4; 16384, 16388, 3; NULL, 5, 2; 4096, 5, 1"
930948
));
949+
ASSERT_OK(fetch_and_validate_pk_only("5; 4; 3; 2; 1"));
931950

932951
// Delete the same records again.
933952
ASSERT_OK(conn.ExecuteFormat("DELETE FROM $0 WHERE r = 3", table_name));
934953
ASSERT_OK(cluster_->FlushTablets(tablet::FlushMode::kSync));
935954
ASSERT_OK(fetch_and_validate(
936955
"256, 260, 5; 256, 260, 4; 1, 5, 2; 4096, 5, 1",
937956
"256, 260, 5; 256, 260, 4; NULL, 5, 2; 4096, 5, 1"));
957+
ASSERT_OK(fetch_and_validate_pk_only("5; 4; 2; 1"));
938958

939959
// Delete records from the intents DB in such way the very first row would be in regular DB.
940960
ASSERT_OK(conn.ExecuteFormat("DELETE FROM $0 WHERE r IN (4, 5, 6)", table_name));
941961
ASSERT_OK(cluster_->FlushTablets(tablet::FlushMode::kSync));
942962
ASSERT_OK(fetch_and_validate("1, 5, 2; 4096, 5, 1", "NULL, 5, 2; 4096, 5, 1"));
963+
ASSERT_OK(fetch_and_validate_pk_only("2; 1"));
943964

944965
// Insert some data which would be positioned before all the existing rows.
945966
ASSERT_OK(insert_values(build_values(/* r_keys */ keys(0), 65535)));
946967
ASSERT_OK(cluster_->FlushTablets(tablet::FlushMode::kSync));
947968
ASSERT_OK(fetch_and_validate(
948969
"1, 5, 2; 4096, 5, 1; 65536, 65540, 0", "NULL, 5, 2; 4096, 5, 1; 65536, 65540, 0"
949970
));
971+
ASSERT_OK(fetch_and_validate_pk_only("2; 1; 0"));
950972

951973
if (intents_usage != IntentsUsage::kRegularOnly) {
952974
ASSERT_OK(conn.CommitTransaction());
@@ -956,6 +978,7 @@ class PgFastBackwardScanTest
956978
ASSERT_OK(conn.ExecuteFormat("DELETE FROM $0 WHERE h = 1 AND r >= 0", table_name));
957979
ASSERT_OK(cluster_->FlushTablets(tablet::FlushMode::kSync));
958980
ASSERT_OK(fetch_and_validate("", ""));
981+
ASSERT_OK(fetch_and_validate_pk_only(""));
959982

960983
LOG_WITH_FUNC(INFO) << "Done";
961984
}

0 commit comments

Comments
 (0)