Skip to content

Error Unrecognized attribute: qk_output for operator GroupQueryAttention #2276

@jarodxiangliu

Description

@jarodxiangliu

Describe the bug
I tried to convert and quartinize qwen3:1.7B for qualcomm npu. Got this error:
Error Unrecognized attribute: qk_output for operator GroupQueryAttention

To Reproduce

  1. Activate the environment created by vscode ai-toolkit Python-NvidiaGPU-win32-x64-3.12.9
  2. Run below command:
    olive optimize -m Qwen/Qwen3-1.7B --provider QNNExecutionProvider --device npu --precision int4 --num_split 4 --enable_aot --qnn_env_path ..\Python-QNN-win32-x64-3.12.9\Scripts --surgeries RemoveRopeMultiCache,AttentionMaskToSequenceLengths,SimplifiedLayerNormToL2Norm --act_precision uint16 --use_qdq_format --log_level 1

Expected behavior
model could be converted successfully.

Olive config
My olive command as below:
olive optimize -m Qwen/Qwen3-1.7B --provider QNNExecutionProvider --device npu --precision int4 --num_split 4 --enable_aot --qnn_env_path ..\Python-QNN-win32-x64-3.12.9\Scripts --surgeries RemoveRopeMultiCache,AttentionMaskToSequenceLengths,SimplifiedLayerNormToL2Norm --act_precision uint16 --use_qdq_format --log_level 1
And here is my environment:
Using Python 3.12.9 environment at: .
Package Version


accelerate 1.10.1
aiohappyeyeballs 2.6.1
aiohttp 3.11.16
aiosignal 1.4.0
alembic 1.16.5
annotated-types 0.7.0
attrs 25.3.0
auto-gptq 0.8.0.dev0+cu128
certifi 2025.8.3
charset-normalizer 3.4.3
colorama 0.4.6
coloredlogs 15.0.1
colorlog 6.9.0
datasets 3.5.0
dill 0.3.8
filelock 3.18.0
flatbuffers 25.2.10
frozenlist 1.7.0
fsspec 2024.12.0
gekko 1.3.0
greenlet 3.2.4
hf-xet 1.1.9
huggingface-hub 0.34.4
humanfriendly 10.0
idna 3.10
jinja2 3.1.6
lightning-utilities 0.15.2
mako 1.3.10
markupsafe 3.0.2
ml-dtypes 0.5.3
mpmath 1.3.0
multidict 6.6.4
multiprocess 0.70.16
networkx 3.4.2
numpy 2.2.4
olive-ai 0.9.3
onnx 1.17.0
onnx-ir 0.1.5
onnxruntime-genai-cuda 0.9.0
onnxruntime-gpu 1.23.2
onnxscript 0.3.2
optimum 1.26.0
optuna 4.3.0
packaging 24.2
pandas 2.2.3
peft 0.17.1
pillow 11.2.1
propcache 0.3.2
protobuf 3.20.3
psutil 7.1.0
pyarrow 19.0.1
pydantic 2.11.3
pydantic-core 2.33.1
pyreadline3 3.5.4
python-dateutil 2.9.0.post0
pytz 2025.2
pyyaml 6.0.2
regex 2025.9.1
requests 2.32.3
rouge 1.0.1
safetensors 0.6.2
sentencepiece 0.2.1
setuptools 80.9.0
six 1.17.0
sqlalchemy 2.0.43
sympy 1.13.3
tabulate 0.9.0
threadpoolctl 3.6.0
tokenizers 0.21.4
torch 2.7.0+cu128
torchmetrics 1.7.1
torchvision 0.22.0+cu128
tqdm 4.67.1
transformers 4.51.3
typing-extensions 4.15.0
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
xxhash 3.5.0
yarl 1.20.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions