-
Notifications
You must be signed in to change notification settings - Fork 6
Retain escape sequences in lexer, fixes #5 #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
\\ is preserved in Lexer, leaving it to later stages to decide what masking is to be used.
julianladisch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this.
When releasing cql-java the change should be documented in the Changes file with an example.
MikeTaylor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel very dense looking at this code, but it doesn't really feel right to me.
| switch (rnd.nextInt(10)) { | ||
| case 0: return "cat"; | ||
| case 1: return "\"cat\""; | ||
| case 1: return "\\\"cat\\\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this briefly on Slack, but I still don't understand why the change. Yes this needs to round-trip correctly — but all strings in any CQL query (hence all strings that we generate in the query generator) need to round-trip correctly. So why do we care what this one is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes the term so that escape sequences are retained (preserved). If a term includes " it will always be preceded by backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that only some terms are round-tripped correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. If a term contained a bare ", that would not be round-tripped correctly. But that would never be the result of parsing.
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but this looks wrong to me. It looks like it will render a"b as "a"b", when surely it should be "a\"b" as in the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repeat: " will always be preceded by backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare quote case is now considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
| if (qi == ql - 1) { | ||
| break; //unterminated | ||
| } | ||
| buf.append(qs.charAt(qi)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the intent of this section at all. It looks like it can't lex "a\"b" at all, but will return the string a. Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin/CQLParser
"a\"b"
<searchClause>
<index>cql.serverChoice</index>
<relation>
<value>=</value>
</relation>
<term>a\"b</term>
</searchClause>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good to see. My code-reading foo is depressed.
MikeTaylor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, looks good. Thanks for your patence!
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Backslash (
\) is preserved in Lexer, leaving it to later stages to decide what masking is to be used.As is done with yaz and cql-go.
https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/lexer.go#L94
https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/cql.go#L281
Example of CQL query:
which was converted to:
It is now converted to:
The same term without quotes are already preserved, eg:
becomes
So existing behavior was inconsistent WRT how " was treated.