Skip to content

IRSA credentials failing for AWS MSK IAM in Fluent Bit 4.1.0 — STS AssumeRoleWithWebIdentity returns broken connection (HTTP Status: 0) and fallback to IMDS occurs #11255

@ZokerG

Description

@ZokerG

Bug Report

When running Fluent Bit 4.1.0 on Amazon EKS using IRSA (IAM Roles for Service Accounts) to authenticate to MSK IAM, Fluent Bit consistently fails during the call to:

STS AssumeRoleWithWebIdentity

The internal AWS credential provider logs always show:

broken connection to sts.us-east-1.amazonaws.com:443 (HTTP Status: 0) STS assume role request failed

After the failure, Fluent Bit incorrectly falls back to IMDSv2, and retrieves credentials from the EC2 node role, not from the pod’s IRSA role.

This results in invalid MSK OAuthBearer tokens and ultimately:

SASL authentication error: Access denied

Therefore, IRSA does not work with Fluent Bit for MSK IAM, even though the environment is correctly configured and STS is reachable externally.


🔄 Steps to Reproduce

1. Create Service Account with IRSA

apiVersion: v1 kind: ServiceAccount metadata: name: fluent-bit-irsa-serviceaccount namespace: poc-ciam annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/kubernetes-pod-test

2. Inject AWS environment variables into Fluent Bit pod

env: - name: AWS_ROLE_ARN value: arn:aws:iam::<ACCOUNT_ID>:role/kuberentes-pod-test - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token - name: AWS_REGION value: us-east-1

3. Configure MSK IAM authentication

[OUTPUT] Name kafka Brokers b-1.example.amazonaws.com:9098,b-2.example.amazonaws.com:9098 Topics audit-filtered msk_iam yes

4. Enable debug logs

log_level debug

❌ Actual Behavior

1. Fluent Bit fails during STS AssumeRoleWithWebIdentity

[debug] [aws_credentials] Calling STS.. [debug] [http_client] not using http_proxy for header [error] [http_client] broken connection to sts.us-east-1.amazonaws.com:443 ? [debug] [aws_client] sts.us-east-1.amazonaws.com: http_do=-1, HTTP Status: 0 [debug] [aws_credentials] STS assume role request failed

2. Fluent Bit incorrectly falls back to IMDSv2

[debug] [aws_credentials] Init called on the EC2 IMDS provider [debug] [aws_credentials] Requesting credentials for instance role eksctl-nodegroup-NodeInstanceRole

This behavior is incorrect when IRSA is configured.

3. MSK authentication ultimately fails

SASL authentication error: Access denied (state AUTH_REQ)

The OAuth token is being signed with the wrong AWS principal (EC2 node instead of IRSA role).


✔ Expected Behavior

  • Fluent Bit should sucessfully perform AssumeRoleWithWebIdentity using the service account token.

  • There should be no fallback to IMDS when IRSA is active.

  • MSK IAM authentication should work using pod-level AWS IAM credentials.


🔍 Additional Diagnostics Performed

We performed multiple environment-level tests to rule out network, TLS, DNS and AWS STS connectivity issues.

1️⃣ Successful STS connectivity from test pod using curl

Running inside a pod using the same ServiceAccount:

curl -v https://sts.us-east-1.amazonaws.com/

The output shows:

  • TLS handshake works

  • STS responds with HTTP/1.1 302 Found

  • Certificate validation succeeds

This confirms the cluster CAN reach STS successfully.


2️⃣ Successful Fluent Bit HTTP output to STS

We configured Fluent Bit with a temporary HTTP output:

[OUTPUT] Name http Host sts.us-east-1.amazonaws.com Port 443 URI / tls on tls.verify off Format json

Result:

HTTP/1.1 302 Found

This demonstrates that:

✔ Fluent Bit’s HTTP client works

✔ TLS negotiation works

✔ External connectivity is fine

❗ Only the internal STS client inside:

flb_aws_credentials_sts.c
fails to connect.

This strongly points to:

  • internal TLS handling bug

  • keep-alive connection reuse issue

  • missing SNI or TLS config problem

  • incorrect interaction with AWS STS endpoints


📂 Environment Details

Component | Version -- | -- Fluent Bit | 4.1.0 AWS EKS | 1.29 MSK | IAM authentication enabled IRSA | correctly configured and verified curl test to STS | success Fluent Bit HTTP output to STS | success Internal STS AssumeRoleWithWebIdentity | fails

📎 Recommended Evidence to Attach

We will attach:

  • Full Fluent Bit logs (with aws_credentials + http_client debug)

  • Curl output showing STS connectivity success

  • HTTP output plugin logs showing 302 from STS

  • IAM trust policy JSON

  • Kubernetes ServiceAccount manifest

  • Pod description showing injected AWS env vars

  • Screenshots of OIDC provider in AWS IAM console


🚀 Summary for Fluent Bit maintainers

The environment can reach STS with no issues (tested via curl and via Fluent Bit’s own HTTP output plugin).

Only the internal AWS credential provider inside Fluent Bit fails to connect to STS, returning HTTP Status 0.

This causes a fallback to IMDS, which breaks MSK IAM authentication.

As a result, IRSA cannot be used with MSK IAM in Fluent Bit 4.1.0.

[OUTPUT]
Name kafka
Match #{kafka-match}#
Brokers #{aws-msk-bootstrap-servers}#
Topics #{kafka-topic}#
Format json
aws_msk_iam true
aws_msk_iam_cluster_arn #{aws-msk-iam-cluster-arn}#

Image Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions