-
Notifications
You must be signed in to change notification settings - Fork 7k
[tune] remove dependency on gpy #59152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
TimothySeah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with gaussian processes or their implementations in scikitlearn/GPy. Also the scikitlearn -> GPy migration does not appear to be a 1 for 1 replacement. It might be worth getting a review from someone more familiar and/or having a good testing plan to ensure no regressions.
|
|
||
| try: | ||
| m.optimize() | ||
| m = GaussianProcessRegressor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is m = GPy.models.GPRegression(X, y, kernel) equivalent to
m = GaussianProcessRegressor(
kernel=kernel, optimizer="fmin_l_bfgs_b", alpha=1e-10
)
m.fit(X, y)
? Might be worth doublechecking if this matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these are the default values of GaussianProcessRegressor
|
|
||
| def __init__( | ||
| self, input_dim, variance=1.0, lengthscale=1.0, epsilon=0.0, active_dims=None | ||
| self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do input_dim and active_dims no longer matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sklearn Kernel doesn't need to pass it in during construction, it calculates at runtime.
| if Y is None: | ||
| Y = X | ||
|
|
||
| epsilon = np.clip(self.epsilon, 1e-5, 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the 1e-5 lower bound come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's consistent with epsilon_bounds, I think it should already be handled by sklearn but kept here for consistency with prev logic.
| def __call__(self, X, Y=None, eval_gradient=False): | ||
| X = np.atleast_2d(X) | ||
| if Y is None: | ||
| Y = X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do Y = np.copy(X) since it was X2 = np.copy(X) before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because we replaced in-place modification of
X = X[:, 1:]
X2 = X2[:, 1:]with
X_spatial = X[:, 1:]
Y_spatial = Y[:, 1:]| return self.variance * np.ones(X.shape[0]) | ||
| if eval_gradient: | ||
| K_gradient_variance = K | ||
| dist2 = np.square(euclidean_distances(X_spatial, Y_spatial)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is no longer divided by lengthscale - see the previous code: dist2 = np.square(euclidean_distances(X, X2)) / self.lengthscale
Iiuc call in the scikitlearn implementation is similar to K + update_gradients_full in the GPy implementation, but the update_gradients_full portion seems different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is because sklearn uses logspace
| m.fit(X, y) | ||
|
|
||
| scores.append(m.log_likelihood()) | ||
| scores.append(m.log_marginal_likelihood_value_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log likelihood isn't quite the same as log marginal likelihood but maybe that doesn't affect this algorithm in a meaningful way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpy's implementation returns _log_marginal_likelihood, so it's the same thing.
TimothySeah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but might be worth adding testing notes to the PR description.
Signed-off-by: Matthew Deng <[email protected]>
Description
Replace GPy with scikit-learn's
GaussianProcessRegressorin the PB2 (Population Based Bandits) scheduler.Additional information
TV_SquaredExpkernel to implement sklearn'sKernelinterface instead of GPy'sKernGPy.models.GPRegressionwithsklearn.gaussian_process.GaussianProcessRegressorgpyfromtune-test-requirements.txtTesting
Re-ran existing PB2 tests in
est_trial_scheduler_pbt.pyto validate change.