-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
My scenario is as follows:
- There are a total of 5 pods, with roles two prefillers and 2 decoders, and 1 router.
- Both the prefiller and decoder pods each have one L 20 GPU device.
- Environment without RDMA support.
The startup command is as follows:
- prefiller-1
python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode prefill --host prefiller1-ip --port 30000 --trust-remote-code --dist-init-addr
prefiller1-ip:5000 --nnodes 2 --node-rank 0 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --log-level debug
- prefiller-2
python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode prefill --host prefiller2-ip --port 30000 --trust-remote-code --dist-init-addr
prefiller1-ip:5000 --nnodes 2 --node-rank 1 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --log-level debug
- decoder-1
python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode decode --host decoder1-ip --port 30001 --trust-remote-code --dist-init-addr
decoder1-ip:5000 --nnodes 2 --node-rank 0 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --max-running-requests 128
- decoder-2
python -m sglang.launch_server --model-path /root/.cache/huggingface/Qwen3-4B --disaggregation-mode decode --host decoder2-ip --port 30001 --trust-remote-code --dist-init-addr
decoder1-ip:5000 --nnodes 2 --node-rank 1 --tp-size 2 --dp-size 1 --enable-dp-attention --mem-fraction-static 0.8 --max-running-requests 128
- router
python -m sglang_router.launch_router --pd-disaggregation --prefill http://prefiller1-ip:30000 --prefill http://prefiller2-ip:30000 --decode http://decode1-ip:30001 --decode http://decoder2-ip:30001 --host 0.0.0.0 --port 8000
The log is as follows:
- prefiller1(normal)
[2025-12-11 07:31:28 TP0] kv manager bind to 10.64.3.56:45509
[2025-12-11 07:31:28 TP0] Starting new HTTP connection (1): 10.64.3.56:8998
[2025-12-11 07:31:28] Register prefill bootstrap: DP0 TP0 PP0 with rank_ip: 10.64.3.56 and rank_port: 45509
[2025-12-11 07:31:28] 10.64.3.56 [11/Dec/2025:07:31:28 +0000] "PUT /route HTTP/1.1" 200 154 "-" "python-requests/2.32.5"
[2025-12-11 07:31:28 TP0] http://10.64.3.56:8998 "PUT /route HTTP/1.1" 200 2
[2025-12-11 07:31:28 TP0] Prefill successfully registered to bootstrap server.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1211 07:31:28.056981 6314 transfer_engine.cpp:486] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I1211 07:31:28.057013 6314 transfer_engine.cpp:91] Transfer Engine parseHostNameWithPort. server_name: 10.64.3.56 port: 12001
I1211 07:31:28.057044 6314 transfer_engine.cpp:146] Transfer Engine RPC using P2P handshake, listening on 10.64.3.56:15863
I1211 07:31:28.057222 6314 transfer_engine.cpp:185] Auto-discovering topology...
W1211 07:31:28.057345 6314 topology.cpp:55] No RDMA devices found, check your device installation
I1211 07:31:28.057384 6314 transfer_engine.cpp:200] Topology discovery complete. Found 0 HCAs.
I1211 07:31:28.057410 6314 tcp_transport.cpp:299] TcpTransport: listen on port 15980
[2025-12-11 07:31:28] INFO: Started server process [6184]
[2025-12-11 07:31:28] INFO: Waiting for application startup.
[2025-12-11 07:31:28] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
[2025-12-11 07:31:28] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
[2025-12-11 07:31:28] INFO: Application startup complete.
[2025-12-11 07:31:28] INFO: Uvicorn running on http://10.64.3.56:30000 (Press CTRL+C to quit)
[2025-12-11 07:31:29] Starting new HTTP connection (1): 10.64.3.56:30000
[2025-12-11 07:31:29] Endpoint '/get_model_info' is deprecated and will be removed in a future version. Please use '/model_info' instead.
[2025-12-11 07:31:29] INFO: 10.64.3.56:41116 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-12-11 07:31:29] http://10.64.3.56:30000 "GET /get_model_info HTTP/1.1" 200 306
[2025-12-11 07:31:29] Start of pd disaggregation warmup ...
[2025-12-11 07:31:29] Starting new HTTP connection (1): 10.64.3.56:30000
[2025-12-11 07:31:29] Starting batch tokenization for 1 text requests
[2025-12-11 07:31:29 TP0] Processing batch generate request with 1 requests
[2025-12-11 07:31:29 TP0] FakeKVSender init with kv_indices: 4, aux_index: 0
[2025-12-11 07:31:29 TP0] Prefill batch, #new-seq: 1, #new-token: 4, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 0.00,
[2025-12-11 07:31:29 TP0] Attempting to acquire lock 140245255167712 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Lock 140245255167712 acquired on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Attempting to release lock 140245255167712 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:29 TP0] Lock 140245255167712 released on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP0] FakeKVSender send with kv_indices: [1 2 3 4], state_indices: None
[2025-12-11 07:31:30 TP0] FakeKVSender poll success
[2025-12-11 07:31:30] INFO: 10.64.3.56:41132 - "POST /generate HTTP/1.1" 200 OK
[2025-12-11 07:31:30] http://10.64.3.56:30000 "POST /generate HTTP/1.1" 200 318
[2025-12-11 07:31:30] End of prefill disaggregation mode warmup with status 200, resp: [{'text': '%', 'output_ids': [4], 'meta_info': {'id': '1ef6e2124cda44819b54ea4c9723a54a', 'finish_reason': {'type': 'length', 'length': 0}, 'prompt_tokens': 4, 'weight_version': 'default', 'total_retractions': 0, 'completion_tokens': 1, 'cached_tokens': 0, 'e2e_latency': 1.7358613014221191, 'response_sent_to_client_ts': 1765438290.8528178}}]
[2025-12-11 07:31:30] The server is fired up and ready to roll!
- prefiller2(abnormal)
It stopped without the server starting normally, and it didn't crash either.
[2025-12-11 07:31:28 TP1] http://10.64.3.56:8998 "PUT /route HTTP/1.1" 200 2
[2025-12-11 07:31:28 TP1] Prefill successfully registered to bootstrap server.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1211 07:31:28.023576 2175 transfer_engine.cpp:486] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I1211 07:31:28.023595 2175 transfer_engine.cpp:91] Transfer Engine parseHostNameWithPort. server_name: 10.64.6.201 port: 12001
I1211 07:31:28.023624 2175 transfer_engine.cpp:146] Transfer Engine RPC using P2P handshake, listening on 10.64.6.201:16275
I1211 07:31:28.023764 2175 transfer_engine.cpp:185] Auto-discovering topology...
W1211 07:31:28.023859 2175 topology.cpp:55] No RDMA devices found, check your device installation
I1211 07:31:28.023886 2175 transfer_engine.cpp:200] Topology discovery complete. Found 0 HCAs.
I1211 07:31:28.023903 2175 tcp_transport.cpp:299] TcpTransport: listen on port 15901
[2025-12-11 07:31:28] Dummy health check server started in background thread at 10.64.6.201:30000
[2025-12-11 07:31:29 TP1] Processing batch generate request with 1 requests
[2025-12-11 07:31:29 TP1] FakeKVSender init with kv_indices: 4, aux_index: 0
[2025-12-11 07:31:30 TP1] Attempting to acquire lock 139800112822416 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Lock 139800112822416 acquired on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Attempting to release lock 139800112822416 on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] Lock 139800112822416 released on /root/.cache/flashinfer/0.5.3/89/cached_ops/tmp/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.lock
[2025-12-11 07:31:30 TP1] FakeKVSender send with kv_indices: [1 2 3 4], state_indices: None
[2025-12-11 07:31:30 TP1] FakeKVSender poll success
- decoder1 and decoder2
10.64.6.201 is prefiller2's ip
[2025-12-11 07:54:27 TP0] Error fetching prefill parallel info from bootstrap: HTTPConnectionPool(host='10.64.6.201', port=8998): Max retries exceeded with url: /route?engine_rank=-1&target_dp_group=-1&target_pp_rank=-1 (Caused by NewConnectionError("HTTPConnection(host='10.64.6.201', port=8998): Failed to establish a new connection: [Errno 111] Connection refused"))
[2025-12-11 07:54:27 TP0] Decode transfer failed for request rank=0 decode_req.req.rid='25e54d719d01468bbac125e73db43b02' decode_req.req.bootstrap_room=8865415883299881072 with exception KVTransferError(bootstrap_room=8865415883299881072): Could not fetch prefill parallel info from bootstrap_addr: 10.64.6.201:8998
- router
It seems to look normal.
2025-12-11 08:07:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:07:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:07:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:07:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:07:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:09:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.201:30000, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.193:30001, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:432: Before eviction - Used size per tenant:
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:434: Tenant: http://10.64.6.246:30001, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:484: After eviction - Used size per tenant:
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.3.56:30000, Size: 0
2025-12-11 08:11:35 INFO sgl_model_gateway::policies::tree: /sgl-workspace/sglang/sgl-model-gateway/src/policies/tree.rs:486: Tenant: http://10.64.6.246:30001, Size: 0
Reference documents
https://docs.sglang.io/advanced_features/pd_disaggregation.html