-
Notifications
You must be signed in to change notification settings - Fork 7k
[data] Add more div/multiply methods to ExecutionResources #59367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Add more div/multiply methods to ExecutionResources #59367
Conversation
Signed-off-by: iamjustinhsu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes a solid improvement by separating the internal high-precision resource values from the public, rounded properties. This effectively addresses the precision loss issue described. The addition of __truediv__ and __mul__ is a good extension for the class's functionality.
My main feedback is to add safe handling for division-by-zero in __truediv__ to prevent potential runtime errors. I've also suggested reordering arguments in the new dunder methods for better consistency with the rest of the class. Overall, these are great changes.
| def __truediv__(self, other: "ExecutionResources") -> "ExecutionResources": | ||
| # NOTE: We add access each resource privately because we want to preserve the | ||
| # decimal precision. The public properties will now on runtime safe_round to | ||
| # 5 decimals. | ||
| return ExecutionResources( | ||
| cpu=self._cpu / other._cpu, | ||
| gpu=self._gpu / other._gpu, | ||
| memory=self._memory / other._memory, | ||
| object_store_memory=self._object_store_memory / other._object_store_memory, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of __truediv__ doesn't handle division by zero, which will raise a ZeroDivisionError if any resource in other is zero. This can lead to unexpected crashes. It's better to define a safe behavior for this case. For resource calculations, a common approach is to treat x / 0 as inf (for x > 0) and 0 / 0 as 0.
I've also reordered the arguments to match the __init__ method for consistency and fixed a minor typo in the comment.
def __truediv__(self, other: "ExecutionResources") -> "ExecutionResources":
# NOTE: We access each resource privately because we want to preserve the
# decimal precision. The public properties will now on runtime safe_round to
# 5 decimals.
def _safe_div(a, b):
if b == 0.0:
return float("inf") if a != 0.0 else 0.0
return a / b
return ExecutionResources(
cpu=_safe_div(self._cpu, other._cpu),
gpu=_safe_div(self._gpu, other._gpu),
object_store_memory=_safe_div(
self._object_store_memory, other._object_store_memory
),
memory=_safe_div(self._memory, other._memory),
)| def __mul__(self, other: "ExecutionResources") -> "ExecutionResources": | ||
| return ExecutionResources( | ||
| cpu=self._cpu * other._cpu, | ||
| gpu=self._gpu * other._gpu, | ||
| memory=self._memory * other._memory, | ||
| object_store_memory=self._object_store_memory * other._object_store_memory, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the __init__ method and other methods in this class, the keyword arguments for ExecutionResources should be in the order cpu, gpu, object_store_memory, memory. This improves readability and maintainability.
def __mul__(self, other: "ExecutionResources") -> "ExecutionResources":
return ExecutionResources(
cpu=self._cpu * other._cpu,
gpu=self._gpu * other._gpu,
object_store_memory=self._object_store_memory * other._object_store_memory,
memory=self._memory * other._memory,
)Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
| return float("nan") | ||
| if a > 0.0: | ||
| return float("inf") | ||
| return float("-inf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Division of NaN by zero returns wrong value
The safe_div function mishandles the case where the numerator is NaN and the denominator is 0.0. Since nan == 0.0 is False and nan > 0.0 is also False, the function falls through to return -inf instead of the mathematically correct nan. Adding an explicit math.isnan check at the start of safe_div would fix this edge case.
For my future autoscaler changes, I would like to divide and multiply execution resources. See https://github.com/anyscale/rayturbo/pull/2778 for more info.
In the example above,
a * breturns 0 due to rounding errors. Ideally, this should be preserved. The approach I do is to use_cpu,_gpu,_memory, and_object_store_memoryhold the true precision, butcpu,gpu,memory, andobject_store_memorydo not.