-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Describe the bug
ONNX model quantization using algorithms like HQQ and RTN appears to be broken since version 0.9.2, and the issue persists on the main branch.
In version 0.9.1, the quantization process correctly produces a single, valid model.onnx file. However, starting from version 0.9.2, the process generates a model with incorrectly large external data (model.onnx + model.onnx.data).
Consequently, tools like onnxslim or even onnx.load() fail to load the model, throwing an onnx.onnx_cpp2py_export.checker.ValidationError because they cannot locate the tensor data. This regression makes the quantization feature unusable for large models that are saved with external data.
The issue has been observed with both HQQ and RTN quantization algorithms.
To Reproduce
Steps to reproduce the behavior:
- Install
olive-aiversion 0.9.2 or build from themainbranch. - Get a large ONNX model.
- Run the
olive quantizecommand:olive quantize -m ./path/to/your_model.onnx --algorithm hqq --precision int4 --output_path ./quantized_model.onnx
- Attempt to load or inspect the generated model, which is typically located at
./quantized_model.onnx/model/<original_model_name>.onnx.# For example, using onnxslim onnxslim --inspect ./quantized_model.onnx/model/<original_model_name>.onnx
- Observe the
onnx.onnx_cpp2py_export.checker.ValidationErroras shown in the logs below.
Expected behavior
The quantization process should produce a valid, self-contained ONNX model. If external data is used, the paths within the .onnx file should be relative to the model file itself, allowing it to be correctly loaded by standard ONNX tools.
The behavior should be consistent with version 0.9.1, where a single, valid ONNX file was successfully generated.
Olive config
The configuration is provided via command-line arguments. No JSON config file was used.
olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1Olive logs
✅ Working Log on Olive v0.9.1
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:03:08,593] [INFO] [run.py:142:run_engine] Running workflow default_workflow
[2025-10-19 22:03:08,594] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:03:08,595] [INFO] [accelerator_creator.py:217:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:223:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:864:_create_system] Creating target system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:867:_create_system] Target system created in 0.000072 seconds
[2025-10-19 22:03:08,596] [INFO] [engine.py:879:_create_system] Creating host system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:882:_create_system] Host system created in 0.000057 seconds
[2025-10-19 22:03:08,606] [INFO] [engine.py:683:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
...py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:272: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
b_array_torch = torch.from_numpy(b_array)
[2025-10-19 22:03:28,230] [INFO] [engine.py:757:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 19.624261 seconds
[2025-10-19 22:03:28,231] [INFO] [engine.py:241:run] Run history for cpu-cpu:
[2025-10-19 22:03:28,236] [INFO] [engine.py:497:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+============+===================+=====================+================+===========+
| b4406937 | | | | |
+------------+-------------------+---------------------+----------------+-----------+
| bc8feb05 | b4406937 | onnxhqqquantization | 19.6243 | |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:03:28,236] [INFO] [cache.py:195:load_model] Loading model bc8feb05 from cache.
[2025-10-19 22:03:28,625] [INFO] [engine.py:266:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx
> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model.onnx
# ... (successful onnxslim inspection output) ...
+------------------------+--------------------------------------+
| Model Name | model.onnx |
# ... (rest of successful inspection) ...
+------------------------+--------------------------------------+
| Model Size | 651.36 MB |
+------------------------+--------------------------------------+
> ll ./onnx/onnx_models_quantized/language_model.onnx/
total 652M
-rw-rw-r-- 1 user user 408 10月 19 22:03 model_config.json
-rw-rw-r-- 1 user user 652M 10月 19 22:03 model.onnx
❌ Failing Log on Olive v0.9.2 / main
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algori
thm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_leve
l 1
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/transformers/utils/generic.py:441: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
_torch_pytree._register_pytree_node(
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:04:30,242] [INFO] [run.py:143:run_engine] Running workflow default_workflow
[2025-10-19 22:04:30,243] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:04:30,245] [INFO] [accelerator_creator.py:222:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:224:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:867:_create_system] Creating target system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:870:_create_system] Target system created in 0.000098 seconds
[2025-10-19 22:04:30,245] [INFO] [engine.py:882:_create_system] Creating host system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:885:_create_system] Host system created in 0.000082 seconds
[2025-10-19 22:04:30,253] [INFO] [engine.py:686:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:195: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
b_array_torch = torch.from_numpy(b_ndarray)
[2025-10-19 22:04:41,394] [INFO] [engine.py:760:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 11.141228 seconds
[2025-10-19 22:04:41,396] [INFO] [engine.py:242:run] Run history for cpu-cpu:
[2025-10-19 22:04:41,402] [INFO] [engine.py:500:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+============+===================+=====================+===========================+
| b4406937 | | | | |
+------------+-------------------+---------------------+----------------+-----------+
| 8239b6f8 | b4406937 | onnxhqqquantization | 11.1412 | |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:04:41,402] [INFO] [cache.py:195:load_model] Loading model 8239b6f8 from cache.
[2025-10-19 22:04:44,468] [INFO] [engine.py:263:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx
> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model/language_model.onnx
Warning: Onnx Runtime version 1.23 has no specified compatible ONNX version. Compatibility issues may occur.
Traceback (most recent call last):
File "/home/user/anaconda3/envs/py311/bin/onnxslim", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 170, in main
slim(
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in slim
model_info_list = [get_info(m, inspect=True) for m in model]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in <listcomp>
model_info_list = [get_info(m, inspect=True) for m in model]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 61, in get_info
model = onnx.load(model)
^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/__init__.py", line 232, in load_model
load_external_data_for_model(model, base_dir)
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 75, in load_external_data_for_model
load_external_data_for_tensor(tensor, base_dir)
File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 53, in load_external_data_for_tensor
external_data_file_path = c_checker._resolve_external_data_location( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: /backbone/layers.0/self_attn/rotary_emb/Constant_attr::value) should be stored in PSALM/onnx/onnx_models_quantized/language_model.onnx/model/_backbone_layers.0_self_attn_rotary_emb_Constant_attr__value, but it doesn't exist or is not accessible.
> ll ./onnx/onnx_models_quantized/language_model.onnx/model/
total 5.2G
-rw-rw-r-- 1 user user 855K 10月 19 22:04 language_model.onnx
-rw-rw-r-- 1 user user 5.2G 10月 19 22:04 language_model.onnx.data
Other information
- OS: Linux
- Olive version: 0.9.2 and
main(broken), 0.9.1 (working) - ONNXRuntime package and version: 1.23
- Transformers package version: 4.57.1