Skip to content

EKSv3 Promise Leaks #1826

@zee-sh

Description

@zee-sh

Describe what happened

EKS v3 Promise Leak Issue Report Prepared with the Help of Claude Code

Issue Summary

The @pulumi/eks v3.9.1 package has a critical promise leak bug in the cluster.provider property getter that makes it unusable in production environments. This issue prevents upgrading from v2 to v3.

Environment Details

  • Package: @pulumi/eks v3.9.1
  • Pulumi Core: v3.181.0
  • Node.js: v20+ (via devenv)
  • Detection: PULUMI_DEBUG_PROMISE_LEAKS=1

Root Cause Analysis

Exact Location

The promise leak originates from:

/node_modules/@pulumi/clusterMixins.ts:68:30 - Cluster.get [as provider]

Stack Trace

Promise leak detected:
CONTEXT(514): rpcKeepAlive
STACK_TRACE:
Error: 
    at Object.debuggablePromise (/node_modules/@pulumi/runtime/debuggable.ts:84:75)
    at Object.rpcKeepAlive (/node_modules/@pulumi/runtime/settings.ts:608:25)
    at Object.registerResource (/node_modules/@pulumi/runtime/resource.ts:507:18)
    at new Resource (/node_modules/@pulumi/resource.ts:555:13)
    at new CustomResource (/node_modules/@pulumi/resource.ts:1078:9)
    at new ProviderResource (/node_modules/@pulumi/resource.ts:1138:9)
    at new Provider (/node_modules/@pulumi/provider.ts:57:9)
    at Cluster.get [as provider] (/node_modules/@pulumi/clusterMixins.ts:68:30)

Reproduction Steps

Minimal Reproduction Case

import * as eks from "@pulumi/eks";
import * as aws from "@pulumi/aws";

// 1. Create EKS cluster (NO promise leaks)
const cluster = new eks.Cluster("test", {
    version: "1.31",
    vpcId: vpc.id,
    publicSubnetIds: publicSubnets.map(s => s.id),
    privateSubnetIds: privateSubnets.map(s => s.id),
    authenticationMode: eks.AuthenticationMode.API,
});

// 2. Access cluster.provider (TRIGGERS promise leaks)
const provider = cluster.provider; // <-- This line causes 717+ promise leaks

Full Test Case

// File: eks-v3-test.ts
export function reproducePromiseLeak(vpc: aws.ec2.Vpc, publicSubnets: aws.ec2.Subnet[], privateSubnets: aws.ec2.Subnet[]) {
    // Step 1: Create cluster - this works fine
    const cluster = new eks.Cluster("test-cluster", {
        version: "1.31",
        vpcId: vpc.id,
        publicSubnetIds: publicSubnets.map(s => s.id),
        privateSubnetIds: privateSubnets.map(s => s.id),
        skipDefaultNodeGroup: true,
        authenticationMode: eks.AuthenticationMode.API,
        instanceType: "t3.medium",
        desiredCapacity: 0,
        minSize: 0,
        maxSize: 1,
    });

    // Step 2: Access provider property - this triggers massive promise leaks
    const k8sProvider = cluster.provider; // 717+ promise leaks detected
    
    return { cluster, k8sProvider };
}

Commands to Reproduce

# Install dependencies
npm install @pulumi/[email protected] @pulumi/[email protected]

# Test with promise leak detection
PULUMI_DEBUG_PROMISE_LEAKS=1 pulumi preview

# Result: 717+ promise leaks detected

Impact Analysis

Scale of the Problem

  • 717+ promise leaks detected in a single cluster.provider access
  • Each leak involves rpcKeepAlive, transferProperty, transferIsStable, transferIsSecret operations
  • Memory consumption grows exponentially with cluster usage

Production Impact

  1. Memory Leaks: Unresolved promises accumulate in memory
  2. Performance Degradation: Resource cleanup never occurs
  3. Deployment Failures: Pulumi runtime becomes unstable
  4. Migration Blocker: Cannot upgrade from v2 to v3

Affected Use Cases

  • Any code accessing cluster.provider for Kubernetes resource creation
  • Compatibility layers bridging v2 to v3
  • Standard EKS + Kubernetes integration patterns

Comparison with v2 Behavior

v2 (Working)

const cluster = new eks.Cluster("test", { /* config */ });
const provider = cluster.provider; // Works fine, no promise leaks

v3 (Broken)

const cluster = new eks.Cluster("test", { /* config */ });
const provider = cluster.provider; // 717+ promise leaks

Workaround

Currently, the only workaround is to avoid cluster.provider entirely:

// Instead of: cluster.provider
// Use: Manual provider creation
const k8sProvider = new k8s.Provider("manual-provider", {
    kubeconfig: cluster.kubeconfig
});

However, this breaks the established EKS v2 API compatibility and requires significant code changes.

Investigation Methodology

Test Environment

  • Created isolated test in sandbox environment
  • Used PULUMI_DEBUG_PROMISE_LEAKS=1 for detection
  • Tested individual patterns to isolate root cause

Key Findings

  1. EKS cluster creation alone: ✅ No promise leaks
  2. Accessing cluster.provider: ❌ Immediate massive promise leaks
  3. Kubernetes resource creation: Only problematic when using leaked provider
  4. Stack trace: Points directly to clusterMixins.ts:68:30

Expected vs Actual Behavior

Expected (v2 Behavior)

const cluster = new eks.Cluster(/*...*/);
const provider = cluster.provider; // Should work without promise leaks
// Use provider for Kubernetes resources

Actual (v3 Behavior)

const cluster = new eks.Cluster(/*...*/);
const provider = cluster.provider; // 717+ promise leaks detected
// Pulumi runtime becomes unstable

References

Suggested Fix

The issue appears to be in the clusterMixins.ts file where the provider property getter is not properly handling promise lifecycle. The fix should ensure that:

  1. Provider creation doesn't leak promises in the registration process
  2. rpcKeepAlive promises are properly resolved/cleaned up
  3. Resource registration completes without leaving hanging promises

Business Impact

This bug is a migration blocker preventing upgrade from EKS v2 to v3. Organizations using EKS + Kubernetes integration patterns cannot adopt v3 until this is resolved.

Severity: Critical

  • Blocks entire v2 → v3 migration path
  • Affects core EKS functionality
  • Memory leak potential in production
  • No clean workaround available

Test Cases for Validation

Once fixed, these test cases should pass without promise leaks:

// Test 1: Basic provider access
const cluster = new eks.Cluster(/*...*/);
const provider = cluster.provider; // Should work

// Test 2: Multiple provider accesses
const provider1 = cluster.provider;
const provider2 = cluster.provider; // Should reuse, not leak

// Test 3: Kubernetes resource creation
const namespace = new k8s.core.v1.Namespace("test", {
    metadata: { name: "test" }
}, { provider: cluster.provider }); // Should work

All tests should run with PULUMI_DEBUG_PROMISE_LEAKS=1 without any promise leak warnings.


Sample program

Commands to Reproduce

# Install dependencies
npm install @pulumi/[email protected] @pulumi/[email protected]

# Test with promise leak detection
PULUMI_DEBUG_PROMISE_LEAKS=1 pulumi preview

# Result: 717+ promise leaks detected

Log output

No response

Affected Resource(s)

No response

Output of pulumi about

CLI
Version 3.181.0
Go Version go1.24.4
Go Compiler gc

Host
OS darwin
Version 15.5
Arch arm64

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Metadata

Metadata

Assignees

Labels

impact/performanceSomething is slower than expectedkind/bugSome behavior is incorrect or out of spec

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions