Skip to content

2.0 branch bug: Tests that cause [HALT] events in big runs leave Test2 in a bad state, unable to exit without ctrl+c #287

@troglodyne

Description

@troglodyne

This one may be a doozy to replicate/debug. That said, this is what I've been using to run yath 2.0 to try to smoke cPanel's perl tests:

PERL5LIB='/usr/local/cpanel/t/lib' /usr/local/cpanel/3rdparty/perl/542/bin/yath test -B --qvf --no-progress --no-unsafe-inc --no-blib --no-lib --no-tlib -P FindBin -P Test::More --retry=1 --retry-isolated --term-size 2000 --extension t --switch=-w --durations build-tools/yath-durations.json --renderer TAPHarness -j`perl -E 'say int( 1.0 * int qx{nproc} )'`  --renderer DB --publish-mode complete --cover-files --project 'smoke-perl:8f7ee6d107a:cloudlinux9:10.2.64.170' --db-config Test2::Harness::UI::Config::Cpanel  --exclude-file    t/00_cplint.t --exclude-pattern t/benchmark/ --exclude-pattern t/broken/ --exclude-pattern t/final/ --exclude-pattern t/integration/ --exclude-pattern t/js/ --exclude-pattern t/qa/ --exclude-pattern t/support/ --exclude-pattern t/unreliable/ --exclude-pattern t/zz t

We are reporting in to a remote DB, but that doesn't appear to be causing issues in particular. Instead if one of the tests winds up causing a HALT, then the whole run seems borked until we ctrl+c it:

[  HALT  ]  job 28    IPC Fatal Error: hub '1166772~0~1758818045~2' is not available, failed to send event!
(TO RETRY)  job 29    t/Cpanel-Wrap.t
< REASON >  job 29    Test script returned error (Signal: 15)
< REASON >  job 29    No plan was declared
(TO RETRY)  job 40    t/etc-rpm.versions_perlmajor_upgrade.t
< REASON >  job 40    Test script returned error (Signal: 15)
< REASON >  job 40    No plan was declared, and no assertions were made.
(TO RETRY)  job 20    t/Cpanel-MariaDB-Install.t
< REASON >  job 20    Test script returned error (Signal: 15)
< REASON >  job 20    Planned for 62 assertions, but saw 33
(TO RETRY)  job 53    t/large/detect-unused-packages.t
< REASON >  job 53    Test script returned error (Signal: 15)
< REASON >  job 53    No plan was declared
(TO RETRY)  job 28    t/Cpanel-Server-Handlers-Httpd.t
< REASON >  job 28    Test script returned error (Signal: 15)
< REASON >  job 28    Errors were encountered (Count: 1)
< REASON >  job 28    Planned for 106 assertions, but saw 6
(  DIAG  )  job 3     Looks like you planned 11255 tests but ran 5.
(TO RETRY)  job 3     t/01_devel_smoke_p.t
< REASON >  job 3     Test script returned error (Err: 255)
< REASON >  job 3     Planned for 11255 assertions, but saw 5
(TO RETRY)  job 42    t/large/06_Template_security.t
< REASON >  job 42    Test script returned error (Signal: 15)
< REASON >  job 42    Planned for 1 assertions, but saw 0
( STDERR )  RUNNER    Testing looks complete, but a filehandle is still open (Did a plugin or renderer fork without an exec?), will timeout in 10 seconds...
( STDERR )  RUNNER      9...
( STDERR )  RUNNER      8...
( STDERR )  RUNNER      7...
( STDERR )  RUNNER      6...
( STDERR )  RUNNER      5...
( STDERR )  RUNNER      4...
( STDERR )  RUNNER      3...
( STDERR )  RUNNER      2...
(TO RETRY)  job 48    t/large/Cpanel-PingTest.t
< REASON >  job 48    Test script returned error (Signal: 15)
< REASON >  job 48    Planned for 15 assertions, but saw 6
( STDERR )  RUNNER    Testing looks complete, but a filehandle is still open (Did a plugin or renderer fork without an exec?), will timeout in 10 seconds...
( STDERR )  RUNNER      9...
( STDERR )  RUNNER      8...
( STDERR )  RUNNER      7...
( STDERR )  RUNNER      6...
( STDERR )  RUNNER      5...
( STDERR )  RUNNER      4...
( STDERR )  RUNNER      3...
( STDERR )  RUNNER      2...
( STDERR )  RUNNER      1...
( STDERR )  RUNNER      0...

( STDERR )  RUNNER      (in cleanup) Disconnected pipe at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/Test2/Harness/Instance.pm line 382.

If I wind up finding a fix myself, I'll probably submit it, but any assistance would of course be appreciated. That said I have actually had to make a few patches along the way here when running things that could complicate this:
troglodyne@b52b60e#diff-7db323a2448800b006f5d6f74c00fc2031878d36a9218ba2d73142206202c5a1

As far as I can tell, the run explodes before running anything when orig_tmp is not set, which appears to always be the case:

The 'orig_tmp' option does not exist at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/Getopt/Yath/Settings/Group.pm line 74.                                                                                            Getopt::Yath::Settings::Group::AUTOLOAD(Getopt::Yath::Settings::Group=HASH(0x2949230)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath/IPC.pm line 50
        App::Yath::IPC::dir(App::Yath::IPC=HASH(0x21c47c8)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath/IPC.pm line 78
        App::Yath::IPC::_find_ipcs(App::Yath::IPC=HASH(0x21c47c8)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath/IPC.pm line 71
        App::Yath::IPC::ipcs(App::Yath::IPC=HASH(0x21c47c8)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath/IPC.pm line 310                                                                           App::Yath::IPC::find(App::Yath::IPC=HASH(0x21c47c8)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath.pm line 431                                                                               App::Yath::process_args(App::Yath=HASH(0x21c4318)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath.pm line 505                                                                                 App::Yath::run(App::Yath=HASH(0x21c4318)) called at /usr/local/cpanel/3rdparty/perl/542/cpanel-lib/App/Yath/Script.pm line 62
        App::Yath::Script::run("/usr/local/cpanel/3rdparty/perl/542/bin/yath", ARRAY(0x16155b8)) called at /usr/local/cpanel/3rdparty/perl/542/bin/yath line 24

Certainly there's nothing in POD or the help output that suggests this is an actual option.
I patched this to be at least something that should work on the majority of systems (hardcode /tmp), but this may not be an assumption the rest of the application respects. Not really sure. That certainly might explain some oddities regarding a failure to find things that it was expecting.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions