Skip to content

[NEW] Add Observability KPIs for Large Read/Write Payloads #2911

@YiwenZhang12

Description

@YiwenZhang12

The problem/use-case that the feature addresses

Valkey provides proto-max-bulk-len to limit excessively large payloads, but it does not offer visibility into:

  • how often large payloads are written or read,

  • how large these payloads are,

  • whether a shard is being stressed by large key operations.

We need lightweight KPIs to monitor large read/write behavior, since large values can impact latency, memory pressure, and network throughput.

Description of the feature

Add low-overhead KPIs that track large read/write payloads based on configurable size thresholds.

  1. Write path KPIs
    Measure the incoming payload size (after RESP parsing).
    If above threshold, update:
large_write_ops
large_write_total_bytes
large_write_max_payload
  1. Read path KPIs
    Measure the serialized reply size before writing to the output buffer.
    If above threshold, update:
large_read_ops
large_read_total_bytes
large_read_max_payload
  1. Config (bytes)
large-write-threshold 1048576    # 1MB default
large-read-threshold  1048576    # 1MB default
  1. INFO metrics
large_write_ops:123
large_write_total_bytes:987654321
large_write_max_payload:62914560

large_read_ops:45
large_read_total_bytes:456789012
large_read_max_payload:31457280

Alternatives you've considered

  • proto-max-bulk-len (enforcement only, no visibility)

  • Slowlog (not tied to payload size)

Additional information

This feature is purely for observability, not enforcement.
If approved, I will proceed with implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions