-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Enhancement] Allow disable type auto-detection on CSV Files #66519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🧪 CI InsightsHere's what we observed from your CI run for 7d6ebcb. 🟢 All jobs passed!But CI Insights is watching 👀 |
|
@cursor review |
dcb05a8 to
89c9fc9
Compare
[FE Incremental Coverage Report]❌ fail : 3 / 4 (75.00%) file detail
|
8585f60 to
6bbc22a
Compare
|
@cursor review |
| ExceptionChecker.expectThrowsWithMsg(SemanticException.class, | ||
| "Illegal value of auto_detect_types: notaboolean, only true/false allowed", | ||
| () -> new TableFunctionTable(new ArrayList<>(), properties, new SessionVariable()) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test expects wrong exception type for validation error
The test expects SemanticException but the implementation throws DdlException when auto_detect_types has an invalid value. The similar validation for csv.trim_space in the same class also throws DdlException, confirming the implementation follows the existing pattern. The test will fail at runtime because the wrong exception type is caught.
Additional Locations (1)
| Assertions.assertDoesNotThrow(() -> { | ||
| TableFunctionTable table = new TableFunctionTable(new ArrayList<>(), properties, new SessionVariable()); | ||
| Assertions.assertTrue((Boolean) field.get(table)); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test uses wrong constructor that doesn't parse the property
The testAutoDetectTypes test uses the unload constructor TableFunctionTable(List, Map, SessionVariable) which calls parsePropertiesForUnload, but the auto_detect_types property is only parsed in parsePropertiesForLoad. This means the property is never actually read, and autoDetectTypes stays at its default value of true. Test Case 2 (expecting false) will fail because the field remains true. Test Case 4 (expecting SemanticException for invalid value) will fail because no validation occurs and no exception is thrown.
Additional Locations (2)
Signed-off-by: Pat Buxton <[email protected]>
6bbc22a to
7d6ebcb
Compare
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 7 / 7 (100.00%) file detail
|
|
@cursor review |



Why I'm doing:
When importing CSV, without sampling the whole file, it is possible to get type errors when inserting from FILES. Workarounds include setting the conflicting data to null, which is not ideal.
What I'm doing:
This PR allows the user to disable the type auto-detection and return all columns as string which can then be manipulated with SQL functions to obtain the desired data type.
Fixes #66473
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Introduces
auto_detect_typesto FILES() CSV schema detection, allowing all columns to default to STRING when disabled, with FE/BE support, tests, and docs.auto_detect_typesproperty (defaulttrue) to control type inference; whenfalse, all sampled columns areSTRING.CSVScanner:get_type_descnow respects_scan_range.params.schema_sample_types; schema sampling returnsVARCHARwhen disabled.TableFunctionTable: parseauto_detect_types, validate boolean, and pass viaTBrokerScanRangeParams.schema_sample_types.TBrokerScanRangeParams.schema_sample_types(defaulttrue).type_sniff.csv.auto_detect_typesparsing and validation.files.mdto documentauto_detect_typesand behavior when disabled.Written by Cursor Bugbot for commit 7d6ebcb. This will update automatically on new commits. Configure here.