Skip to content

Conversation

@adamdickmeiss
Copy link
Contributor

@adamdickmeiss adamdickmeiss commented Oct 10, 2025

Backslash (\) is preserved in Lexer, leaving it to later stages to decide what masking is to be used.

As is done with yaz and cql-go.

https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/lexer.go#L94
https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/cql.go#L281

Example of CQL query:

 "a\"b"

which was converted to:

<term>a"b</term>

It is now converted to:

<term>a\"b</term>

The same term without quotes are already preserved, eg:

 a\"b

becomes

<term>a\"b</term>

So existing behavior was inconsistent WRT how " was treated.

\\ is preserved in Lexer, leaving it to later stages to decide
what masking is to be used.
Copy link
Contributor

@julianladisch julianladisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this.

When releasing cql-java the change should be documented in the Changes file with an example.

Copy link
Contributor

@MikeTaylor MikeTaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel very dense looking at this code, but it doesn't really feel right to me.

switch (rnd.nextInt(10)) {
case 0: return "cat";
case 1: return "\"cat\"";
case 1: return "\\\"cat\\\"";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this briefly on Slack, but I still don't understand why the change. Yes this needs to round-trip correctly — but all strings in any CQL query (hence all strings that we generate in the query generator) need to round-trip correctly. So why do we care what this one is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the term so that escape sequences are retained (preserved). If a term includes " it will always be preceded by backslash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that only some terms are round-tripped correctly?

Copy link
Contributor Author

@adamdickmeiss adamdickmeiss Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. If a term contained a bare ", that would not be round-tripped correctly. But that would never be the result of parsing.

str.indexOf('(') != -1 ||
str.indexOf(')') != -1) {
str = '"' + str.replace("\"", "\\\"") + '"';
str = '"' + str + '"';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but this looks wrong to me. It looks like it will render a"b as "a"b", when surely it should be "a\"b" as in the previous version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I repeat: " will always be preceded by backslash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare quote case is now considered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

if (qi == ql - 1) {
break; //unterminated
}
buf.append(qs.charAt(qi));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the intent of this section at all. It looks like it can't lex "a\"b" at all, but will return the string a. Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/CQLParser 
"a\"b"
<searchClause>
  <index>cql.serverChoice</index>
  <relation>
    <value>=</value>
  </relation>
  <term>a\"b</term>
</searchClause>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good to see. My code-reading foo is depressed.

Copy link
Contributor

@MikeTaylor MikeTaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, looks good. Thanks for your patence!

str.indexOf('(') != -1 ||
str.indexOf(')') != -1) {
str = '"' + str.replace("\"", "\\\"") + '"';
str = '"' + str + '"';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.

str.indexOf('(') != -1 ||
str.indexOf(')') != -1) {
str = '"' + str.replace("\"", "\\\"") + '"';
str = '"' + str + '"';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@adamdickmeiss adamdickmeiss merged commit f9fe79d into master Oct 24, 2025
2 checks passed
@adamdickmeiss adamdickmeiss deleted the 5-retain-escape-sequences branch December 30, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants