-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Report
When running Fluent Bit 4.1.0 on Amazon EKS using IRSA (IAM Roles for Service Accounts) to authenticate to MSK IAM, Fluent Bit consistently fails during the call to:
STS AssumeRoleWithWebIdentity
The internal AWS credential provider logs always show:
broken connection to sts.us-east-1.amazonaws.com:443 (HTTP Status: 0) STS assume role request failed
After the failure, Fluent Bit incorrectly falls back to IMDSv2, and retrieves credentials from the EC2 node role, not from the pod’s IRSA role.
This results in invalid MSK OAuthBearer tokens and ultimately:
SASL authentication error: Access denied
Therefore, IRSA does not work with Fluent Bit for MSK IAM, even though the environment is correctly configured and STS is reachable externally.
🔄 Steps to Reproduce
1. Create Service Account with IRSA
apiVersion: v1 kind: ServiceAccount metadata: name: fluent-bit-irsa-serviceaccount namespace: poc-ciam annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/kubernetes-pod-test
2. Inject AWS environment variables into Fluent Bit pod
env: - name: AWS_ROLE_ARN value: arn:aws:iam::<ACCOUNT_ID>:role/kuberentes-pod-test - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token - name: AWS_REGION value: us-east-1
3. Configure MSK IAM authentication
[OUTPUT] Name kafka Brokers b-1.example.amazonaws.com:9098,b-2.example.amazonaws.com:9098 Topics audit-filtered msk_iam yes
4. Enable debug logs
log_level debug
❌ Actual Behavior
1. Fluent Bit fails during STS AssumeRoleWithWebIdentity
[debug] [aws_credentials] Calling STS.. [debug] [http_client] not using http_proxy for header [error] [http_client] broken connection to sts.us-east-1.amazonaws.com:443 ? [debug] [aws_client] sts.us-east-1.amazonaws.com: http_do=-1, HTTP Status: 0 [debug] [aws_credentials] STS assume role request failed
2. Fluent Bit incorrectly falls back to IMDSv2
[debug] [aws_credentials] Init called on the EC2 IMDS provider [debug] [aws_credentials] Requesting credentials for instance role eksctl-nodegroup-NodeInstanceRole
This behavior is incorrect when IRSA is configured.
3. MSK authentication ultimately fails
SASL authentication error: Access denied (state AUTH_REQ)
The OAuth token is being signed with the wrong AWS principal (EC2 node instead of IRSA role).
✔ Expected Behavior
-
Fluent Bit should sucessfully perform AssumeRoleWithWebIdentity using the service account token.
-
There should be no fallback to IMDS when IRSA is active.
-
MSK IAM authentication should work using pod-level AWS IAM credentials.
🔍 Additional Diagnostics Performed
We performed multiple environment-level tests to rule out network, TLS, DNS and AWS STS connectivity issues.
1️⃣ Successful STS connectivity from test pod using curl
Running inside a pod using the same ServiceAccount:
curl -v https://sts.us-east-1.amazonaws.com/
The output shows:
-
TLS handshake works
-
STS responds with
HTTP/1.1 302 Found -
Certificate validation succeeds
This confirms the cluster CAN reach STS successfully.
2️⃣ Successful Fluent Bit HTTP output to STS
We configured Fluent Bit with a temporary HTTP output:
[OUTPUT] Name http Host sts.us-east-1.amazonaws.com Port 443 URI / tls on tls.verify off Format json
Result:
HTTP/1.1 302 Found
This demonstrates that:
✔ Fluent Bit’s HTTP client works
✔ TLS negotiation works
✔ External connectivity is fine
❗ Only the internal STS client inside:
flb_aws_credentials_sts.c
fails to connect.
This strongly points to:
-
internal TLS handling bug
-
keep-alive connection reuse issue
-
missing SNI or TLS config problem
-
incorrect interaction with AWS STS endpoints
📂 Environment Details
📎 Recommended Evidence to Attach
We will attach:
-
Full Fluent Bit logs (with aws_credentials + http_client debug)
-
Curl output showing STS connectivity success
-
HTTP output plugin logs showing 302 from STS
-
IAM trust policy JSON
-
Kubernetes ServiceAccount manifest
-
Pod description showing injected AWS env vars
-
Screenshots of OIDC provider in AWS IAM console
🚀 Summary for Fluent Bit maintainers
The environment can reach STS with no issues (tested via curl and via Fluent Bit’s own HTTP output plugin).
Only the internal AWS credential provider inside Fluent Bit fails to connect to STS, returning
HTTP Status 0.This causes a fallback to IMDS, which breaks MSK IAM authentication.
As a result, IRSA cannot be used with MSK IAM in Fluent Bit 4.1.0.
[OUTPUT]
Name kafka
Match #{kafka-match}#
Brokers #{aws-msk-bootstrap-servers}#
Topics #{kafka-topic}#
Format json
aws_msk_iam true
aws_msk_iam_cluster_arn #{aws-msk-iam-cluster-arn}#
