Skip to content

Conversation

@razvanalex
Copy link
Contributor

Adds check against case insensitive keywords. Most languages are not affected by this change.

In PHP, some names such as Bool cannot be used because it is a reserved keyword for the bool data type. Adding the field keywords_casing in the configs enables checking in lowercase of all characters against every keyword. This should be safe as flatbuffers does not allow non-ASCII characters in its grammar.

This feature will be used in future PRs to improve PHP support in flatbuffers.

@google-cla
Copy link

google-cla bot commented Oct 8, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@razvanalex
Copy link
Contributor Author

Sorry for the CLA notice. I mistakenly used the wrong git account. Rebased and now should be fixed.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 8, 2025

This pull request is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Apr 8, 2025
@razvanalex
Copy link
Contributor Author

It is still relevant.

@github-actions github-actions bot removed the stale label Apr 9, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

This pull request is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Oct 9, 2025
@razvanalex
Copy link
Contributor Author

Still relevant

@github-actions github-actions bot removed the stale label Oct 10, 2025
@jtdavis777 jtdavis777 self-requested a review November 24, 2025 12:47
@jtdavis777
Copy link
Collaborator

@razvanalex - would you please rebase this and take care of the merge conflicts? I've added it to my list to review and try to get merged!

@razvanalex razvanalex force-pushed the keywords-case-insensitive branch 2 times, most recently from bfef3eb to 1d3835d Compare November 27, 2025 20:11
@razvanalex
Copy link
Contributor Author

@jtdavis777, thanks! It should be ready for review now.

I also rebased my other branches that add the remaining PHP functionality (finishSizePrefixed*, object API, and optional scalars), which depend on this PR. I will open PRs for those as well soon.

Most languages are not affected by this change. In PHP, some names such
as Bool cannot be used because it is a reserved keyword for to the bool
data type. The field `keywords_casing` in the configs enables checking
all characters in lowercase against every keyword. This should be safe
as flatbuffers does not allow non-ASCII characters in its grammar.
@razvanalex razvanalex force-pushed the keywords-case-insensitive branch from 1d3835d to 034e10c Compare December 10, 2025 10:49
jtdavis777
jtdavis777 previously approved these changes Dec 14, 2025
@jtdavis777
Copy link
Collaborator

@razvanalex before I merge this, a quick check -- for PHP (and other languages which might use this feature), is this a possible overcorrection? i.e. are there only a few keywords in PHP that have this issue, but we are enabling the feature for all of them?

@razvanalex
Copy link
Contributor Author

At least for PHP, the specification states that keywords are not case-sensitive: https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#keywords

I don't know if there is a language in which some keywords should be used with sensitive casing (e.g., if, for, while) and others with insensitive casing (e.g., bool/Bool, int/Int, float/Float). If someone knows such language, please let me know :) Worst, it will be a bit annoying to have "_" at the end of a keyword-like class name. However, this change will should not affect the rest of (actively maintained) languages.

Another solution would be to keep both lists separate (one for case sensitive keywords, and one for case insensitive keywords), but I feel it would increase complexity a bit, to manage both lists.

Copy link
Collaborator

@jtdavis777 jtdavis777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM - will approve once comment is resolved!

@jtdavis777
Copy link
Collaborator

@razvanalex gotcha! this is fine -- I just wanted to make sure we weren't being unnecessarily restrictive. This looks great.

jtdavis777
jtdavis777 previously approved these changes Dec 20, 2025
@jtdavis777 jtdavis777 added the ready-for-merge This PR has been approved by a maintainer and is ready for merge by a code owner label Dec 20, 2025
if (keywords_.find(name) == keywords_.end()) {
std::string cased_name(name);

if (config_.keywords_casing == Config::KeywordsCasing::CaseInsensitive) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put into a method?

Also, prefer to use an existing function: https://github.com/google/flatbuffers/blob/master/src/util.cpp#L130

Copy link
Contributor Author

@razvanalex razvanalex Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added NormalizeKeywordCase to handle this and used ConvertCase(name, Case::kAllLower) instead.

I have now noticed an edge case in Namer. If config_.keywords_casing is set to CaseInsensitive but the provided keywords set contains uppercase strings (e.g., NULL), the EscapeKeyword check fails.

I could update the Namer constructor (https://github.com/razvanalex/flatbuffers/blob/b4cdffa92796ab49c387e942ddbb5f056d328bf6/include/codegen/namer.h#L117) to convert to lowercase when KeywordsCasing::CaseInsensitive is set, but I feel this is counterintuitive to those who write new generators, since they may use upper/mixed case on purpose (e.g., for historic reasons). It is not an issue now, as I want to use the feature only for PHP. Should I address this case in the constructor? I am asking because the other solution I can think of is to use a private member to store the set processed, or, simpler, leave a comment in the code, and if someone needs it, then implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ codegen Involving generating code from schema dart golang java javascript kotlin lua nim ready-for-merge This PR has been approved by a maintainer and is ready for merge by a code owner rust swift typescript

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants