Skip to content

Conversation

@idtodd
Copy link

@idtodd idtodd commented Dec 8, 2025

Use-after-free errors were encountered during the destruction of LSTM python adapters. This PR aims to correct all memory-related problems encountered during those runs.

Removals

  • NGen.cpp has the explicit call to release the python interpreter lock removed. Do this seemed to make it less likely to free data before running the python BMI finalize functions on freed python objects.

Changes

  • Formulation_Manager::get_forcing_params uses a std::unique_ptr for managing the call to closedir. This ensures that any open DIR * will be closed regardless of how the function terminates and fixes a use-after-free error when reading properties of a dirent * after closing the directory as it's currently written.
  • Bmi_Py_Adapter checks for whether its bmi_model is null before attempting to call Finalize.

Testing

  1. Run to completion using an RTE Docker image built and run on AWS Workspaces.

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

  • Linux

@idtodd idtodd changed the title Memory free safety improvements (NGWPC-9097) Freed Memory Safety Improvements (NGWPC-9097) Dec 8, 2025
@idtodd idtodd requested a review from PhilMiller December 17, 2025 16:34
@PhilMiller
Copy link

So, looking at this again and thinking a bit more deeply about it, I want to make sure that certain testing has been done before we remove the explicit interp.reset() call coming before MPI_Finalize(). Specifically, we need to make sure that runs using either of the NetCDF forcings reader and the BMI Forcings Engine exit cleanly when compiled and run in parallel with MPI. The call to interp.reset()

Right now, I actually think the Forcings Engine case will fail, or only succeed by fortunate accident. Looking at Formulation_Manager::finalize() and ForcingsEngineDataProvider::finalize(), what I think we'll get is a call to the bmi->Finalize() function on the shared instance for each catchment formulation being fed by the forcings engine. Maybe its code is OK with that, or not, but allowing repeated Finalize() calls is definitely not what the BMI spec demands of it.

There's a difference between the two cases, in that the NetCDF code actually shares the instances of NetCDFPerFeatureDataProvider among all of the catchment formulations referencing a common NetCDF file, while the Forcings Engine code has an instance of ForcingsEngineLumpedDataProvider per catchment, all of which share an underlying BMI object. In the NetCDF case, the first catchment to have finalize() called will have the shared instance close and null out its nc_file. In the Forcings Engine case, every instance makes the same bmi_->Finalize() call before nulling its own copy of that shared_ptr.

I think the expedient thing to do to nail this down would be to

  • Make ForcingsEngineDataProvider::finalize() a no-op
  • Rename ForcingsEngineStore::clear() to ForcingsEngineStorage::finalilze(), and have it call Finalize() on each stored BMI object before making the data_.clear() call.

Co-authored-by: Phil Miller - NOAA <[email protected]>
@PhilMiller
Copy link

Ok, the code looks good to me.

The testing we need is to have runs with MPI enabled, more than 1 MPI process, and use of each of the NetCDF forcings provider, and the BMI Forcings Engine provider. Each of those should run to completion and exit without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants