Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR proposes a fix for #180 by replacing the existing torch.quantile implementation with one based on torch.kthvalue. Because torch.quantile limits tensors to 16 million elements, it causes problems when computing codebooks for more than 50,000 vectors of dimension greater than 335.
The new implementation does not have that limitation. For linear interpolation, it is 2x slower on GPU and 10x slower on CPU, but it has been verified to produce correct results. Any suggestions to speed up the CPU implementation are welcome.
A corresponding implementation in Rust would require more effort since it would mean replacing the underlying ATen kernel for quantile.
Runtime comparison
Tensor dim = (50000, 768)
Tensor dim = (50000, 384)