https://github.com/NVIDIA/Fuser/blob/cce887595dc86b099506b70f88d653880fde5116/csrc/multidevice/communication.cpp#L493 This can be fixed by generating host IR that reuses the same input and output buffer for allreduce.