Skip to content

[Bug] ONNX Quantization Broken Since v0.9.2 Due to Incorrect Weight Data Saving #2223

@happyme531

Description

@happyme531

Describe the bug
ONNX model quantization using algorithms like HQQ and RTN appears to be broken since version 0.9.2, and the issue persists on the main branch.

In version 0.9.1, the quantization process correctly produces a single, valid model.onnx file. However, starting from version 0.9.2, the process generates a model with incorrectly large external data (model.onnx + model.onnx.data).

Consequently, tools like onnxslim or even onnx.load() fail to load the model, throwing an onnx.onnx_cpp2py_export.checker.ValidationError because they cannot locate the tensor data. This regression makes the quantization feature unusable for large models that are saved with external data.

The issue has been observed with both HQQ and RTN quantization algorithms.

To Reproduce
Steps to reproduce the behavior:

  1. Install olive-ai version 0.9.2 or build from the main branch.
  2. Get a large ONNX model.
  3. Run the olive quantize command:
    olive quantize -m ./path/to/your_model.onnx --algorithm hqq --precision int4 --output_path ./quantized_model.onnx
  4. Attempt to load or inspect the generated model, which is typically located at ./quantized_model.onnx/model/<original_model_name>.onnx.
    # For example, using onnxslim
    onnxslim --inspect ./quantized_model.onnx/model/<original_model_name>.onnx
  5. Observe the onnx.onnx_cpp2py_export.checker.ValidationError as shown in the logs below.

Expected behavior
The quantization process should produce a valid, self-contained ONNX model. If external data is used, the paths within the .onnx file should be relative to the model file itself, allowing it to be correctly loaded by standard ONNX tools.

The behavior should be consistent with version 0.9.1, where a single, valid ONNX file was successfully generated.

Olive config
The configuration is provided via command-line arguments. No JSON config file was used.

olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1

Olive logs

✅ Working Log on Olive v0.9.1
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:03:08,593] [INFO] [run.py:142:run_engine] Running workflow default_workflow
[2025-10-19 22:03:08,594] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:03:08,595] [INFO] [accelerator_creator.py:217:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:223:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:864:_create_system] Creating target system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:867:_create_system] Target system created in 0.000072 seconds
[2025-10-19 22:03:08,596] [INFO] [engine.py:879:_create_system] Creating host system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:882:_create_system] Host system created in 0.000057 seconds
[2025-10-19 22:03:08,606] [INFO] [engine.py:683:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
...py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:272: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
  b_array_torch = torch.from_numpy(b_array)
[2025-10-19 22:03:28,230] [INFO] [engine.py:757:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 19.624261 seconds
[2025-10-19 22:03:28,231] [INFO] [engine.py:241:run] Run history for cpu-cpu:
[2025-10-19 22:03:28,236] [INFO] [engine.py:497:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id   | parent_model_id   | from_pass           |   duration_sec | metrics   |
+============+===================+=====================+================+===========+
| b4406937   |                   |                     |                |           |
+------------+-------------------+---------------------+----------------+-----------+
| bc8feb05   | b4406937          | onnxhqqquantization |        19.6243 |           |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:03:28,236] [INFO] [cache.py:195:load_model] Loading model bc8feb05 from cache.
[2025-10-19 22:03:28,625] [INFO] [engine.py:266:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx
> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model.onnx
# ... (successful onnxslim inspection output) ...
+------------------------+--------------------------------------+
|       Model Name       |              model.onnx              |
# ... (rest of successful inspection) ...
+------------------------+--------------------------------------+
|       Model Size       |              651.36 MB               |
+------------------------+--------------------------------------+
> ll ./onnx/onnx_models_quantized/language_model.onnx/
total 652M
-rw-rw-r-- 1 user user  408 10月 19 22:03 model_config.json
-rw-rw-r-- 1 user user 652M 10月 19 22:03 model.onnx
❌ Failing Log on Olive v0.9.2 / main
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algori
thm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_leve
l 1
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/transformers/utils/generic.py:441: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:04:30,242] [INFO] [run.py:143:run_engine] Running workflow default_workflow
[2025-10-19 22:04:30,243] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:04:30,245] [INFO] [accelerator_creator.py:222:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:224:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:867:_create_system] Creating target system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:870:_create_system] Target system created in 0.000098 seconds
[2025-10-19 22:04:30,245] [INFO] [engine.py:882:_create_system] Creating host system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:885:_create_system] Host system created in 0.000082 seconds
[2025-10-19 22:04:30,253] [INFO] [engine.py:686:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:195: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
  b_array_torch = torch.from_numpy(b_ndarray)
[2025-10-19 22:04:41,394] [INFO] [engine.py:760:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 11.141228 seconds
[2025-10-19 22:04:41,396] [INFO] [engine.py:242:run] Run history for cpu-cpu:
[2025-10-19 22:04:41,402] [INFO] [engine.py:500:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id   | parent_model_id   | from_pass           |   duration_sec | metrics   |
+============+===================+=====================+===========================+
| b4406937   |                   |                     |                |           |
+------------+-------------------+---------------------+----------------+-----------+
| 8239b6f8   | b4406937          | onnxhqqquantization |        11.1412 |           |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:04:41,402] [INFO] [cache.py:195:load_model] Loading model 8239b6f8 from cache.
[2025-10-19 22:04:44,468] [INFO] [engine.py:263:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx

> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model/language_model.onnx 
Warning: Onnx Runtime version 1.23 has no specified compatible ONNX version. Compatibility issues may occur.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/py311/bin/onnxslim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 170, in main
    slim(
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in slim
    model_info_list = [get_info(m, inspect=True) for m in model]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in <listcomp>
    model_info_list = [get_info(m, inspect=True) for m in model]
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 61, in get_info
    model = onnx.load(model)
            ^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/__init__.py", line 232, in load_model
    load_external_data_for_model(model, base_dir)
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 75, in load_external_data_for_model
    load_external_data_for_tensor(tensor, base_dir)
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 53, in load_external_data_for_tensor
    external_data_file_path = c_checker._resolve_external_data_location(  # type: ignore[attr-defined]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: /backbone/layers.0/self_attn/rotary_emb/Constant_attr::value) should be stored in PSALM/onnx/onnx_models_quantized/language_model.onnx/model/_backbone_layers.0_self_attn_rotary_emb_Constant_attr__value, but it doesn't exist or is not accessible.

> ll ./onnx/onnx_models_quantized/language_model.onnx/model/
total 5.2G
-rw-rw-r-- 1 user user 855K 10月 19 22:04 language_model.onnx
-rw-rw-r-- 1 user user 5.2G 10月 19 22:04 language_model.onnx.data

Other information

  • OS: Linux
  • Olive version: 0.9.2 and main (broken), 0.9.1 (working)
  • ONNXRuntime package and version: 1.23
  • Transformers package version: 4.57.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions