Skip to content

When configuring PD decomposition with 2p2d, the second prefiller fails to start properly. #14882

@lululu-1997

Description

@lululu-1997

My scenario is as follows:

  • There are a total of 5 pods, with roles two prefillers and 2 decoders, and 1 router.
  • Both the prefiller and decoder pods each have one L 20 GPU device.
  • Environment without RDMA support.

The startup command is as follows:

  • prefiller-1

python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode prefill --host prefiller1-ip --port 30000 --trust-remote-code --dist-init-addr
prefiller1-ip:5000 --nnodes 2 --node-rank 0 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --log-level debug

  • prefiller-2

python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode prefill --host prefiller2-ip --port 30000 --trust-remote-code --dist-init-addr
prefiller1-ip:5000 --nnodes 2 --node-rank 1 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --log-level debug

  • decoder-1

python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode decode --host decoder1-ip --port 30001 --trust-remote-code --dist-init-addr
decoder1-ip:5000 --nnodes 2 --node-rank 0 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --max-running-requests 128

  • decoder-2

python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode decode --host decoder2-ip --port 30001 --trust-remote-code --dist-init-addr
decoder1-ip:5000 --nnodes 2 --node-rank 1 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --max-running-requests 128

  • router

python -m sglang_router.launch_router --pd-disaggregation --prefill http://prefiller1-ip:30000 --prefill http://prefiller2-ip:30000 --decode http://decode1-ip:30001 --decode http://decoder2-ip:30001 --host 0.0.0.0 --port 8000

The log is as follows:

  • prefiller1(normal)
[2025-12-11 07:31:28 TP0] kv manager bind to 10.64.3.56:45509
[2025-12-11 07:31:28 TP0] Starting new HTTP connection (1): 10.64.3.56:8998
[2025-12-11 07:31:28] Register prefill bootstrap: DP0 TP0 PP0 with rank_ip: 10.64.3.56 and rank_port: 45509
[2025-12-11 07:31:28] 10.64.3.56 [11/Dec/2025:07:31:28 +0000] "PUT /route HTTP/1.1" 200 154 "-" "python-requests/2.32.5"
[2025-12-11 07:31:28 TP0] http://10.64.3.56:8998 "PUT /route HTTP/1.1" 200 2
[2025-12-11 07:31:28 TP0] Prefill successfully registered to bootstrap server.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1211 07:31:28.056981  6314 transfer_engine.cpp:486] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I1211 07:31:28.057013  6314 transfer_engine.cpp:91] Transfer Engine parseHostNameWithPort. server_name: 10.64.3.56 port: 12001
I1211 07:31:28.057044  6314 transfer_engine.cpp:146] Transfer Engine RPC using P2P handshake, listening on 10.64.3.56:15863
I1211 07:31:28.057222  6314 transfer_engine.cpp:185] Auto-discovering topology...
W1211 07:31:28.057345  6314 topology.cpp:55] No RDMA devices found, check your device installation
I1211 07:31:28.057384  6314 transfer_engine.cpp:200] Topology discovery complete. Found 0 HCAs.
I1211 07:31:28.057410  6314 tcp_transport.cpp:299] TcpTransport: listen on port 15980
[2025-12-11 07:31:28] INFO:     Started server process [6184]
[2025-12-11 07:31:28] INFO:     Waiting for application startup.
[2025-12-11 07:31:28] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
[2025-12-11 07:31:28] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
[2025-12-11 07:31:28] INFO:     Application startup complete.
[2025-12-11 07:31:28] INFO:     Uvicorn running on http://10.64.3.56:30000 (Press CTRL+C to quit)
[2025-12-11 07:31:29] Starting new HTTP connection (1): 10.64.3.56:30000
[2025-12-11 07:31:29] Endpoint '/get_model_info' is deprecated and will be removed in a future version. Please use '/model_info' instead.
[2025-12-11 07:31:29] INFO:     10.64.3.56:41116 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-12-11 07:31:29] http://10.64.3.56:30000 "GET /get_model_info HTTP/1.1" 200 306
[2025-12-11 07:31:29] Start of pd disaggregation warmup ...
[2025-12-11 07:31:29] Starting new HTTP connection (1): 10.64.3.56:30000
[2025-12-11 07:31:29] Starting batch tokenization for 1 text requests
[2025-12-11 07:31:29 TP0] Processing batch generate request with 1 requests
[2025-12-11 07:31:29 TP0] FakeKVSender init with kv_indices: 4, aux_index: 0
[2025-12-11 07:31:29 TP0] Prefill batch, #new-seq: 1, #new-token: 4, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 0.00,
[2025-12-11 07:31:29 TP0] Attempting to acquire lock 140245255167712 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Lock 140245255167712 acquired on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Attempting to release lock 140245255167712 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Lock 140245255167712 released on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP0] FakeKVSender send with kv_indices: [1 2 3 4], state_indices: None
[2025-12-11 07:31:30 TP0] FakeKVSender poll success
[2025-12-11 07:31:30] INFO:     10.64.3.56:41132 - "POST /generate HTTP/1.1" 200 OK
[2025-12-11 07:31:30] http://10.64.3.56:30000 "POST /generate HTTP/1.1" 200 318
[2025-12-11 07:31:30] End of prefill disaggregation mode warmup with status 200, resp: [{'text': '%', 'output_ids': [4], 'meta_info': {'id': '1ef6e2124cda44819b54ea4c9723a54a', 'finish_reason': {'type': 'length', 'length': 0}, 'prompt_tokens': 4, 'weight_version': 'default', 'total_retractions': 0, 'completion_tokens': 1, 'cached_tokens': 0, 'e2e_latency': 1.7358613014221191, 'response_sent_to_client_ts': 1765438290.8528178}}]
[2025-12-11 07:31:30] The server is fired up and ready to roll!
  • prefiller2(abnormal)

It stopped without the server starting normally, and it didn't crash either.

[2025-12-11 07:31:28 TP1] http://10.64.3.56:8998 "PUT /route HTTP/1.1" 200 2
[2025-12-11 07:31:28 TP1] Prefill successfully registered to bootstrap server.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1211 07:31:28.023576  2175 transfer_engine.cpp:486] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I1211 07:31:28.023595  2175 transfer_engine.cpp:91] Transfer Engine parseHostNameWithPort. server_name: 10.64.6.201 port: 12001
I1211 07:31:28.023624  2175 transfer_engine.cpp:146] Transfer Engine RPC using P2P handshake, listening on 10.64.6.201:16275
I1211 07:31:28.023764  2175 transfer_engine.cpp:185] Auto-discovering topology...
W1211 07:31:28.023859  2175 topology.cpp:55] No RDMA devices found, check your device installation
I1211 07:31:28.023886  2175 transfer_engine.cpp:200] Topology discovery complete. Found 0 HCAs.
I1211 07:31:28.023903  2175 tcp_transport.cpp:299] TcpTransport: listen on port 15901
[2025-12-11 07:31:28] Dummy health check server started in background thread at 10.64.6.201:30000
[2025-12-11 07:31:29 TP1] Processing batch generate request with 1 requests
[2025-12-11 07:31:29 TP1] FakeKVSender init with kv_indices: 4, aux_index: 0
[2025-12-11 07:31:30 TP1] Attempting to acquire lock 139800112822416 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Lock 139800112822416 acquired on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Attempting to release lock 139800112822416 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Lock 139800112822416 released on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] FakeKVSender send with kv_indices: [1 2 3 4], state_indices: None
[2025-12-11 07:31:30 TP1] FakeKVSender poll success
  • decoder1 and decoder2

10.64.6.201 is prefiller2's ip

[2025-12-11 07:54:27 TP0] Error fetching prefill parallel info from bootstrap: HTTPConnectionPool(host='10.64.6.201', port=8998): Max retries exceeded with url: /route?engine_rank=-1&target_dp_group=-1&target_pp_rank=-1 (Caused by NewConnectionError("HTTPConnection(host='10.64.6.201', port=8998): Failed to establish a new connection: [Errno 111] Connection refused"))
[2025-12-11 07:54:27 TP0] Decode transfer failed for request rank=0 decode_req.req.rid='25e54d719d01468bbac125e73db43b02' decode_req.req.bootstrap_room=8865415883299881072 with exception KVTransferError(bootstrap_room=8865415883299881072): Could not fetch prefill parallel info from bootstrap_addr: 10.64.6.201:8998
  • router

It seems to look normal.

2025-12-11 08:07:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:07:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:07:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:07:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:07:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:09:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:11:35  INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0

Reference documents

https://docs.sglang.io/advanced_features/pd_disaggregation.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions