Skip to content

group_by_value and archive format lines #3822

@learncourse2024

Description

@learncourse2024

Hi,

Here is the brief overview -

  • Read from kinesis
  • parse the json record for /audit/timestamp field - prepare path of format year=/month=/day= from the timestamp field
  • write to s3 with path formed in processor
  • eg: if timestamp field has value '1733264297' - when converted to date its 12/3/2024 - so the record should be written to year=2024/month=12/day=3/ partition.

Below is the config which is used
input: aws_kinesis: streams: dynamodb: table: create: false region: credentials: id: secret: checkpoint_limit: commit_period: start_from_oldest: false pipeline: processors: - mapping: | meta timestamp = this.audit.modifiedTimestamp.number() meta path = "year=" + @timestamp.ts_strftime("%Y", "UTC") + "/month=" + @timestamp.ts_strftime("%m", "UTC") + "/day=" + @timestamp.ts_strftime("%d", "UTC") + "/hour=" + @timestamp.ts_strftime("%H", "UTC") output: label: "valid" aws_s3: bucket: path: folder1/${!meta("path")}/${!uuid_v4()}-${!timestamp_unix_nano()}.json tags: { } content_type: application/octet-stream metadata: exclude_prefixes: [ ] region: us-east-1 credentials: id: secret: } token: batching: count: byte_size: period: processors: - archive: format: lines

The issue with the above config is if a batch of records contain 2 different timestamps it is writing the combined records to a single file in the first records partition.
From the document of Redpanda connect i see that when archive format lines is set in output s3 then it does not differentiate between the batches/groups formed in processor logic.

Is there a way to achieve this so that files of multiple records be written to correct partition? if i removed archive format lines the records are written to correct partitions but these are all single record json files and this we want to avoid .

Any help/thoughts on this is really appreciated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions