[Bug] ONNX Quantization Broken Since v0.9.2 Due to Incorrect Weight Data Saving

**Describe the bug**
ONNX model quantization using algorithms like HQQ and RTN appears to be broken since version 0.9.2, and the issue persists on the `main` branch.

In version 0.9.1, the quantization process correctly produces a single, valid `model.onnx` file. However, starting from version 0.9.2, the process generates a model with incorrectly large external data (`model.onnx` + `model.onnx.data`).

Consequently, tools like `onnxslim` or even `onnx.load()` fail to load the model, throwing an `onnx.onnx_cpp2py_export.checker.ValidationError` because they cannot locate the tensor data. This regression makes the quantization feature unusable for large models that are saved with external data.

The issue has been observed with both `HQQ` and `RTN` quantization algorithms.

**To Reproduce**
Steps to reproduce the behavior:
1. Install `olive-ai` version 0.9.2 or build from the `main` branch.
2. Get a large ONNX model.
3. Run the `olive quantize` command:
   ```bash
   olive quantize -m ./path/to/your_model.onnx --algorithm hqq --precision int4 --output_path ./quantized_model.onnx
   ```
4. Attempt to load or inspect the generated model, which is typically located at `./quantized_model.onnx/model/<original_model_name>.onnx`.
   ```bash
   # For example, using onnxslim
   onnxslim --inspect ./quantized_model.onnx/model/<original_model_name>.onnx
   ```
5. Observe the `onnx.onnx_cpp2py_export.checker.ValidationError` as shown in the logs below.

**Expected behavior**
The quantization process should produce a valid, self-contained ONNX model. If external data is used, the paths within the `.onnx` file should be relative to the model file itself, allowing it to be correctly loaded by standard ONNX tools.

The behavior should be consistent with version 0.9.1, where a single, valid ONNX file was successfully generated.

**Olive config**
The configuration is provided via command-line arguments. No JSON config file was used.
```bash
olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1
```

**Olive logs**

<details>
<summary><b>✅ Working Log on Olive v0.9.1</b></summary>

```
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algorithm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_level 1
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:03:08,593] [INFO] [run.py:142:run_engine] Running workflow default_workflow
[2025-10-19 22:03:08,594] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:03:08,595] [INFO] [accelerator_creator.py:217:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:223:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:03:08,596] [INFO] [engine.py:864:_create_system] Creating target system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:867:_create_system] Target system created in 0.000072 seconds
[2025-10-19 22:03:08,596] [INFO] [engine.py:879:_create_system] Creating host system ...
[2025-10-19 22:03:08,596] [INFO] [engine.py:882:_create_system] Host system created in 0.000057 seconds
[2025-10-19 22:03:08,606] [INFO] [engine.py:683:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
...py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:272: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
  b_array_torch = torch.from_numpy(b_array)
[2025-10-19 22:03:28,230] [INFO] [engine.py:757:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 19.624261 seconds
[2025-10-19 22:03:28,231] [INFO] [engine.py:241:run] Run history for cpu-cpu:
[2025-10-19 22:03:28,236] [INFO] [engine.py:497:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id   | parent_model_id   | from_pass           |   duration_sec | metrics   |
+============+===================+=====================+================+===========+
| b4406937   |                   |                     |                |           |
+------------+-------------------+---------------------+----------------+-----------+
| bc8feb05   | b4406937          | onnxhqqquantization |        19.6243 |           |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:03:28,236] [INFO] [cache.py:195:load_model] Loading model bc8feb05 from cache.
[2025-10-19 22:03:28,625] [INFO] [engine.py:266:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx
> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model.onnx
# ... (successful onnxslim inspection output) ...
+------------------------+--------------------------------------+
|       Model Name       |              model.onnx              |
# ... (rest of successful inspection) ...
+------------------------+--------------------------------------+
|       Model Size       |              651.36 MB               |
+------------------------+--------------------------------------+
> ll ./onnx/onnx_models_quantized/language_model.onnx/
total 652M
-rw-rw-r-- 1 user user  408 10月 19 22:03 model_config.json
-rw-rw-r-- 1 user user 652M 10月 19 22:03 model.onnx
```
</details>

<details>
<summary><b>❌ Failing Log on Olive v0.9.2 / main</b></summary>

```
> olive quantize -m ./onnx/onnx_models/language_model.onnx --algori
thm hqq --precision int4 --output_path ./onnx/onnx_models_quantized/language_model.onnx --log_leve
l 1
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/transformers/utils/generic.py:441: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
Loading ONNX model from ./onnx/onnx_models/language_model.onnx
pass list: ['OnnxHqqQuantization']
selected pass configs: {'onnxhqqquantization': {'type': 'OnnxHqqQuantization'}}
[2025-10-19 22:04:30,242] [INFO] [run.py:143:run_engine] Running workflow default_workflow
[2025-10-19 22:04:30,243] [INFO] [cache.py:138:__init__] Using cache directory: PSALM/.olive-cache/default_workflow
[2025-10-19 22:04:30,245] [INFO] [accelerator_creator.py:222:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:224:run] Running Olive on accelerator: cpu-cpu
[2025-10-19 22:04:30,245] [INFO] [engine.py:867:_create_system] Creating target system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:870:_create_system] Target system created in 0.000098 seconds
[2025-10-19 22:04:30,245] [INFO] [engine.py:882:_create_system] Creating host system ...
[2025-10-19 22:04:30,245] [INFO] [engine.py:885:_create_system] Host system created in 0.000082 seconds
[2025-10-19 22:04:30,253] [INFO] [engine.py:686:_run_pass] Running pass onnxhqqquantization:onnxhqqquantization
/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/olive/passes/onnx/hqq_quantization.py:195: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
  b_array_torch = torch.from_numpy(b_ndarray)
[2025-10-19 22:04:41,394] [INFO] [engine.py:760:_run_pass] Pass onnxhqqquantization:onnxhqqquantization finished in 11.141228 seconds
[2025-10-19 22:04:41,396] [INFO] [engine.py:242:run] Run history for cpu-cpu:
[2025-10-19 22:04:41,402] [INFO] [engine.py:500:dump_run_history] run history:
+------------+-------------------+---------------------+----------------+-----------+
| model_id   | parent_model_id   | from_pass           |   duration_sec | metrics   |
+============+===================+=====================+===========================+
| b4406937   |                   |                     |                |           |
+------------+-------------------+---------------------+----------------+-----------+
| 8239b6f8   | b4406937          | onnxhqqquantization |        11.1412 |           |
+------------+-------------------+---------------------+----------------+-----------+
[2025-10-19 22:04:41,402] [INFO] [cache.py:195:load_model] Loading model 8239b6f8 from cache.
[2025-10-19 22:04:44,468] [INFO] [engine.py:263:run] Saved output model to PSALM/onnx/onnx_models_quantized/language_model.onnx
Model is saved at PSALM/onnx/onnx_models_quantized/language_model.onnx

> onnxslim --inspect --model-check --verbose ./onnx/onnx_models_quantized/language_model.onnx/model/language_model.onnx 
Warning: Onnx Runtime version 1.23 has no specified compatible ONNX version. Compatibility issues may occur.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/py311/bin/onnxslim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 170, in main
    slim(
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in slim
    model_info_list = [get_info(m, inspect=True) for m in model]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 75, in <listcomp>
    model_info_list = [get_info(m, inspect=True) for m in model]
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnxslim/cli/_main.py", line 61, in get_info
    model = onnx.load(model)
            ^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/__init__.py", line 232, in load_model
    load_external_data_for_model(model, base_dir)
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 75, in load_external_data_for_model
    load_external_data_for_tensor(tensor, base_dir)
  File "/home/user/anaconda3/envs/py311/lib/python3.11/site-packages/onnx/external_data_helper.py", line 53, in load_external_data_for_tensor
    external_data_file_path = c_checker._resolve_external_data_location(  # type: ignore[attr-defined]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: /backbone/layers.0/self_attn/rotary_emb/Constant_attr::value) should be stored in PSALM/onnx/onnx_models_quantized/language_model.onnx/model/_backbone_layers.0_self_attn_rotary_emb_Constant_attr__value, but it doesn't exist or is not accessible.

> ll ./onnx/onnx_models_quantized/language_model.onnx/model/
total 5.2G
-rw-rw-r-- 1 user user 855K 10月 19 22:04 language_model.onnx
-rw-rw-r-- 1 user user 5.2G 10月 19 22:04 language_model.onnx.data
```
</details>

**Other information**
 - **OS**: Linux
 - **Olive version**: 0.9.2 and `main` (broken), 0.9.1 (working)
 - **ONNXRuntime package and version**: 1.23
 - **Transformers package version**: 4.57.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] ONNX Quantization Broken Since v0.9.2 Due to Incorrect Weight Data Saving #2223

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] ONNX Quantization Broken Since v0.9.2 Due to Incorrect Weight Data Saving #2223

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions