We use sgemm to run matmuls for every query.
We can cut this in half by supporting float16, which MKL/OpenBLAS support.
Requirements:
- We need to be able to toggle this behavior. We should expect some perf loss from the loss in precision.
- We need to decide when float16 computation is allowed. e.g. Are we casting our stored embeddings to float16 instead of float32? Where does that happen?