-
Notifications
You must be signed in to change notification settings - Fork 900
Description
Hi,
Here is the brief overview -
- Read from kinesis
- parse the json record for /audit/timestamp field - prepare path of format year=/month=/day= from the timestamp field
- write to s3 with path formed in processor
- eg: if timestamp field has value '1733264297' - when converted to date its 12/3/2024 - so the record should be written to year=2024/month=12/day=3/ partition.
Below is the config which is used
input: aws_kinesis: streams: dynamodb: table: create: false region: credentials: id: secret: checkpoint_limit: commit_period: start_from_oldest: false pipeline: processors: - mapping: | meta timestamp = this.audit.modifiedTimestamp.number() meta path = "year=" + @timestamp.ts_strftime("%Y", "UTC") + "/month=" + @timestamp.ts_strftime("%m", "UTC") + "/day=" + @timestamp.ts_strftime("%d", "UTC") + "/hour=" + @timestamp.ts_strftime("%H", "UTC") output: label: "valid" aws_s3: bucket: path: folder1/${!meta("path")}/${!uuid_v4()}-${!timestamp_unix_nano()}.json tags: { } content_type: application/octet-stream metadata: exclude_prefixes: [ ] region: us-east-1 credentials: id: secret: } token: batching: count: byte_size: period: processors: - archive: format: lines
The issue with the above config is if a batch of records contain 2 different timestamps it is writing the combined records to a single file in the first records partition.
From the document of Redpanda connect i see that when archive format lines is set in output s3 then it does not differentiate between the batches/groups formed in processor logic.
Is there a way to achieve this so that files of multiple records be written to correct partition? if i removed archive format lines the records are written to correct partitions but these are all single record json files and this we want to avoid .
Any help/thoughts on this is really appreciated