generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Description of the issue
- Collectd tests are currently failing with the following error:
2023/05/07 06:10:11 CollectD test group failed due to: %!w(*fmt.wrapError=&{Failed to complete setup after agent run due to: write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused 0xc00026f400})
- Example: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/4833290668/jobs/8613380241
- The issue seems to have started since https://github.com/aws/amazon-cloudwatch-agent-test/pull/165/files#diff-781bb3810ede78771b4deb63bb6bb797a2d06434f0d844429b5ad16a20304e0fR56-R88 was merged.
Analysis
- Adding a small sleep after the first flush seemingly fixes the issue but root cause analysis to understand why the first flush call does not receive a connection refused but the subsequent one does is pending. See Sleep for 30 secs before flush cycles for collectd metrics #210 for the change.
- I also added additional logging to print the {status of the agent & the agent logs} once before the initial flush & again before the subsequent flush.
=== RUN TestMetricValueBenchmarkSuite
>>>> Starting MetricBenchmarkTestSuite
=== RUN TestMetricValueBenchmarkSuite/TestAllInSuite
2023/05/07 06:10:10 Executing subset of plugin tests: [collectd]
2023/05/07 06:10:10 Running CollectD
2023/05/07 06:10:10 Starting agent using agent config file agent_configs/collectd_config.json
2023/05/07 06:10:10 Copy File agent_configs/collectd_config.json to /opt/aws/amazon-cloudwatch-agent/bin/config.json
2023/05/07 06:10:10 File agent_configs/collectd_config.json abs path /home/ec2-user/amazon-cloudwatch-agent-test/test/metric_value_benchmark/agent_configs/collectd_config.json
2023/05/07 06:10:10 File : agent_configs/collectd_config.json copied to : /opt/aws/amazon-cloudwatch-agent/bin/config.json
2023/05/07 06:10:10 Agent has started
/////////////////////////////////////////////////////////////////////////
2023/05/07 06:10:10 First flush
2023/05/07 06:10:10 Agent status:{
"status": "running",
"starttime": "2023-05-07T06:10:09+0000",
"configstatus": "configured",
"cwoc_status": "stopped",
"cwoc_starttime": "",
"cwoc_configstatus": "not configured",
"version": "1.247354.0b251981"
}
May 07 06:10:10 ip-172-31-29-254.ec2.internal systemd[1]: Started Amazon CloudWatch Agent.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: I! Detecting run_as_user...
/////////////////////////////////////////////////////////////////////////
2023/05/07 06:10:10 Next flush
2023/05/07 06:10:11 Collectd here8. write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused
2023/05/07 06:10:11 Agent status:{
"status": "running",
"starttime": "2023-05-07T06:10:09+0000",
"configstatus": "configured",
"cwoc_status": "stopped",
"cwoc_starttime": "",
"cwoc_configstatus": "not configured",
"version": "1.247354.0b251981"
}
May 07 06:10:10 ip-172-31-29-254.ec2.internal systemd[1]: Started Amazon CloudWatch Agent.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: I! Detecting run_as_user...
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Starting AmazonCloudWatchAgent 1.247354.0b251981
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! AWS SDK log level not set
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded inputs: socket_listener
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded aggregators:
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded processors: ec2tagger
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded outputs: cloudwatch
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Tags enabled: host=ip-172-31-29-254.ec2.internal
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"ip-172-31-29-254.ec2.internal", Flush Interval:1s
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Initializing plugins
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [logagent] starting
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Connecting outputs
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Attempting connection to [outputs.cloudwatch]
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! Successfully created credential sessions
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! cloudwatch: get unique roll up list []
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Successfully connected to outputs.cloudwatch
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Starting service inputs
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [inputs.socket_listener] Listening on udp://127.0.0.1:25826
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! cloudwatch: publish with ForceFlushInterval: 5s, Publish Jitter: 3s
2023/05/07 06:10:11 CollectD test group failed due to: %!w(*fmt.wrapError=&{Failed to complete setup after agent run due to: write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused 0xc00026f400})
metrics_value_benchmark_test.go:103:
Error Trace: /home/ec2-user/amazon-cloudwatch-agent-test/test/metric_value_benchmark/metrics_value_benchmark_test.go:103
Error: Not equal:
expected: "Successful"
actual : "Failed"
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-(status.TestStatus) (len=10) "Successful"
+(status.TestStatus) (len=6) "Failed"
Test: TestMetricValueBenchmarkSuite/TestAllInSuite
Messages: Metric Benchmark Test Suite Failed
2023/05/07 06:10:11 >>>>>>>>>>>>>><<<<<<<<<<<<<<
2023/05/07 06:10:11 >>>>>>>>>>>>>>Failed<<<<<<<<<<<<<<
2023/05/07 06:10:11 ==============CollectD==============
2023/05/07 06:10:11 ==============Failed==============
Starting Agent Failed
2023/05/07 06:10:11 ==============================
2023/05/07 06:10:11 >>>>>>>>>>>>>>><<<<<<<<<<<<<<<
>>>> Finished MetricBenchmarkTestSuite
--- FAIL: TestMetricValueBenchmarkSuite (1.32s)
--- FAIL: TestMetricValueBenchmarkSuite/TestAllInSuite (1.32s)
FAIL
FAIL github.com/aws/amazon-cloudwatch-agent-test/test/metric_value_benchmark 1.330s
FAIL
- As can be seen in the logs above, the agent says its in a running status in both cases but two things stand out:
- For the initial flush, the agent logs indicate it was still at a very initial stage before any of the plugins were even initialized - so assuming the agent wasnt listening on the udp port yet, why did the flush not fail?
- For the subsequent flush, the agent logs indicate it was listening on the udp port - so why did the flush fail this time with a connection refused?
Metadata
Metadata
Assignees
Labels
No labels