Skip to content

Cast error when casting integer type to string in select statement with datafusion-vortex #6211

@andreashgk

Description

@andreashgk

What happened?

When the latest commit of vortex is set up as a file format with datafusion it appears to be impossible to directly cast column types in a select statement:
The statement SELECT cast(id AS STRING) FROM 'data.vortex' fails with the following error:

External error: Failed to read Vortex file: home/andreas/Projects/wql/data.vortex:
  No compute kernel to cast array vortex.primitive with dtype u32 to utf8?

It appears to happen with i64 and u32 and so presumably all other int types.

This seems to be a regression. The expected behaviour is that if there is no built-in way to cast the column while scanning the file this would be handled by simply having datafusion cast the data after it has been scanned.

Steps to reproduce

  1. Register vortex as a file format in datafusion:
    let ctx = {
        let s = SessionStateBuilder::new()
            .with_file_formats(vec![
                Arc::new(vortex_datafusion::VortexFormatFactory::new()),
            ])
            .with_config(SessionConfig::from_env()?)
            .with_runtime_env(Default::default())
            .with_default_features()
            .build();
        SessionContext::new_with_state(s).enable_url_table()
    };
  1. Find any file with an integer column or create a minimal one with
    ctx
        .sql(r#"copy (select 1 as id) to 'example.vortex'"#)
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();
  1. Try to cast this column to string in a select statement
    ctx
        .sql(r#"select cast(id as string) from 'example.vortex'"#)
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();

At this point the following error is shown:

thread 'main' (52987) panicked at src/main.rs:53:10:
called `Result::unwrap()` on an `Err` value: External(Failed to read Vortex file: home/andreas/Temp/vortex-bug/example.vortex:
  No compute kernel to cast array vortex.primitive with dtype i64 to utf8?
Backtrace:
disabled backtrace)

Full trace can be seen here:
trace.txt

Environment

vortex-datafusion: commit #a4a2f0d7
datafusion: v52.1.0

os: Linux 6.18.4-zen1 (NixOS unstable)

This seems to be a regression specifically with vortex-datafusion from the main branch and datafusion v52. It does not occur with vortex-datafusion v0.58.0 and datafusion 51.

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions