Skip to content

Conversation

@tnguy19
Copy link

@tnguy19 tnguy19 commented Jan 15, 2026

  1. Array indexing bug : Code used global grid position to index 8-element bit arrays, causing out of bound error for bytes_per_word > 1. Added local_bit = bit - first_byte to identify correct local array position

  2. Buffer naming bug : Duplicate buffer instance names across byte columns, fixed by changing into bit + col * 8

  3. Buffer placement bug: all buffers placed at same grid position, fixed with col * 9 + bit

  4. Grid sizing bug: insufficient decoder columns. Changed from + 1 to `+ bytes_per_word

  5. Decoder placement bug (line 524): overlap in the placement. Fixed by adding + col offset

  6. word_count=2 special case bug (lines 500-503): only created 1 select net but needed read_ports nets. Fixed with loop creating read_ports number of nets (one per read port)

@rovinski

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses several critical bugs in the RAM generator to enable support for multi-byte words. The fixes correct array indexing errors, resolve duplicate naming issues for buffers, fix incorrect placement of various components, and adjust grid sizing for decoders. The changes are well-targeted and appear to correctly resolve the described issues. I've included a couple of minor suggestions to enhance code readability and maintainability.

Comment on lines 133 to 146
for (int bit = first_byte; bit < first_byte + 8; ++bit) {
int local_bit = bit - first_byte;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][bit]->getNet());
outs.push_back(data_output[read_port][local_bit]->getNet());
}

ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop calculates local_bit from bit. For better readability and to make the intent clearer, consider iterating directly on local_bit from 0 to 7 and calculating bit inside the loop. This makes it more obvious that local_bit is used for array indexing within its bounds.

Suggested change
for (int bit = first_byte; bit < first_byte + 8; ++bit) {
int local_bit = bit - first_byte;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][bit]->getNet());
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}
for (int local_bit = 0; local_bit < 8; ++local_bit) {
const int bit = first_byte + local_bit;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini has the right idea, but this should be combined with the other changes.

The iteration should be over the local bit index and then mapped to the global index for the function call.

Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.

Suggested change
for (int bit = first_byte; bit < first_byte + 8; ++bit) {
int local_bit = bit - first_byte;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][bit]->getNet());
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}
for (int local_bit = 0; local_bit < 8; ++local_bit) {
const int local_bit_idx = bit_idx + bit;
auto name = fmt::format("{}.bit{}", prefix, local_bit_idx);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this part, i think the variable local_bit_idx in line 134 actually stores the global index position (bit_idx + local_bit), should this be renamed to global_bit_idx to make it clearer?
then at line 142-145 it will be

ram_grid.addCell(
        makeCellBit(
            name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
        global_bit_idx);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine. I thought about it as an index local to the loop but global is fine too.

decoder_input_nets[row]);

ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9));
ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expression bytes_per_word * 9 is used here, but the variable col_cell_count was defined earlier for this purpose. Using the variable would improve readability and maintainability, as the logic is defined in one place.

Suggested change
ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);
ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@rovinski rovinski self-requested a review January 15, 2026 22:14
Copy link
Collaborator

@rovinski rovinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for asking to make changes to code that goes beyond the fix, but it seems necessary to make the fix make sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte_number is a confusing term and doesn't describe what it actually means. "index" is more appropriate

Suggested change
const int byte_idx,

@@ -131,16 +131,17 @@ void RamGen::makeCellByte(Grid& ram_grid,

int first_byte = byte_number * 9;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_byte also doesn't make sense here. Use bit_idx for the same reason.

Suggested change
int first_byte = byte_number * 9;
int bit_idx = byte_idx * 9;

Comment on lines 133 to 146
for (int bit = first_byte; bit < first_byte + 8; ++bit) {
int local_bit = bit - first_byte;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][bit]->getNet());
outs.push_back(data_output[read_port][local_bit]->getNet());
}

ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini has the right idea, but this should be combined with the other changes.

The iteration should be over the local bit index and then mapped to the global index for the function call.

Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.

Suggested change
for (int bit = first_byte; bit < first_byte + 8; ++bit) {
int local_bit = bit - first_byte;
auto name = fmt::format("{}.bit{}", prefix, bit);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][bit]->getNet());
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}
for (int local_bit = 0; local_bit < 8; ++local_bit) {
const int local_bit_idx = bit_idx + bit;
auto name = fmt::format("{}.bit{}", prefix, local_bit_idx);
vector<dbNet*> outs;
outs.reserve(read_ports);
for (int read_port = 0; read_port < read_ports; ++read_port) {
outs.push_back(data_output[read_port][local_bit]->getNet());
}
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
bit);
}

// extra column is for decoder cells
int col_cell_count = bytes_per_word * 9;
Grid ram_grid(odb::horizontal, col_cell_count + 1);
Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bytes_per_word here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because of another change that i made earlier in the code ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col) (line 524) to allow for more than 1 byte per word by adding one decoder column per byte. So the bytes_per_word here is to create extra space for the decoders. For example if 'bytes_per_word' = 3:

byte 0's decoder = 3 * 9 + 0 = column 27
byte 1's decoder = 3 * 9 + 1 = column 28
byte 2's decoder = 3 * 9 + 2 = column 29 -> need an extra column which is more than 3*9+1= 28 columns

So I set the ram grid size to col_cell_count + bytes_per_word to make sure theres enough columns for the decoder

Copy link
Collaborator

@rovinski rovinski Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show a diagram or before/after layout screenshot demonstrating why it's necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt use a diagram but only added + bytes_per_word to ensure the ram has enough columns from having multiple bytes per word. I checked and saw that Grid::addCell() function can resize the ram dynamically, so both options can generate ram without errors, but I think having it initialized as Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word) will ensure that the code initialize enough space right from the start

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just trying to understand why it's necessary. I would think that if extra instances are added to a given cell, the cell width would expand so that each "column" would be wider. It wouldn't necessarily require more columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, I want to make sure I understand correctly. Currently the code to create and add decoders is in a for loop over column and row (line 467-525):

for (int col = 0; col < bytes_per_word; ++col) {
...
 for (int row = 0; row < word_count; ++row) {
...
auto decoder_and_cell = makeDecoder(...)
ram_grid.addCell(std::move(decoder_and_cell), col_cell_count);
...

My understanding is that for bytes_per_word=2 and word_count=8 the current code will add all the 16 decoders into one 'column' (making it wider) , which is better and having less columns than what I'm suggesting ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col) which would put byte 0's decoders into one column and byte 1's decoders into another column right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no reference for what this looks like which is why I was asking for a screenshot of the layout. I am actually understanding this issue less the more it is discussed.

I also don't understand the terminology you are using. There is only one "decoder" circuit, but the decoder may be composed of multiple gates. For example, a 3-to-8 decoder will decode a 3-bit address signal into activating 1 out of 8 lines. It will be composed of multiple ANDs and inverters.

If this change is correct, please just share a screenshot of what the layout looks like with this code change vs. without this code change.

decoder_input_nets[row]);

ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9));
ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@rovinski
Copy link
Collaborator

To fix DCO please click on the test and follow the instructions to fit. In the future, you can add -s to git commit to automatically sign commits.

@tnguy19
Copy link
Author

tnguy19 commented Jan 23, 2026

@rovinski
I have made a new commit with changes based on your feedback:

  • Refactored and changed code so there is one decoder per word instead of one per byte per word, all bytes of a word now share the same decoder net
  • Refactored makeCellByte loop to iterate over local_bit
  • Renamed byte_number to byte_idx and first_byte to bit_idx
  • Changed grid sizing to col_cell_count +1 and used col_cell_count variable instead of bytes_per_word * 9
    Please let me know if there are any issues and if you need any clarifications on the code changes

@rovinski
Copy link
Collaborator

rovinski commented Jan 25, 2026

  1. Can you post screenshots of the layout highlighting the changes?
  2. Please fix the DCO check, as mentioned before
  3. Please fix build errors. You can click on the test to go to the CI pipeline and view the error. In this case, the ram8x8 test is failing (meaning these changes cause it to deviate from what it was before). You should check on that and make sure that your changes do not affect the 8x8 test. If the 8x8 test is wrong and your changes make it correct, then you need to post an explanation and then update the golden files (.ok files) for ram8x8.

@tnguy19
Copy link
Author

tnguy19 commented Jan 25, 2026

I ran the code to compare against the golden file 8x8 test and got the following difference error:

[INFO DRT-0179] Init gr pin query.
Differences found at line 320.
     RECT  39.76 -0.24 40.24 0.44 ;
     RECT  19.76 0 20.24 0.44 ;
Differences found at line 95.
    - decoder_0_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;
    - decoder_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;

The one on line 95 appears to be a difference in the naming of decoder since before decoders are created per byte per word so I think the naming has both the row and column in the name (decoder_0_0), but in the changes I made there is one decoder per word so I only include the row number in the name (decoder_0).

For line 320 this might be because of the placement difference from the change in how the new code initialize decoder which doesnt cause an error when generating the ram. Please correct me if you know this is a significant issue.

1. For the layout screenshots this is the original ram 8x8 (before changes to allow multi byte):

ram8x8_original

2. This is the layout screenshot of the ram 8x8 (after changes to allow multi bytes):

RAM8x8_after_change

The old and the new code changes have the same layout for the 8x8 case and both have the same number of decoder cells (inverters and and layers of 8).

For the change where I fixed the bug to allow for multi byte but didnt change the decoder placement logic.

3. This is the 8x16 ram before changes to decoder placement logic:

RAm8x16_intermediate_2 In the red circle that I highlight, there is 4 and layers for top row/word in the ram (decoder_7_0.and_layer0, decoder_7_0.and_layer1, decoder_7_1.and_layer0, decoder_7_1.and_layer1) corresponding to 2 decoders since there is 2 bytes per word.

After the change to the decoder placement logic, where there is one decoder per word instead of per byte. This is the layout for 8x16:

4. 8x16 ram after changes to decoder placement logic:
RAM8x16_after_2
With the change there is only 2 and layers which is decoder_7.and_layer0 and decoder_7.and_layer1 highlighted in the red circle. So there is 1 decoder for one word and 2 and layers instead of 4 and layers.

If these explanations and result looks good, I can update the golden files (.ok files) for 8x8

@rovinski
Copy link
Collaborator

I ran the code to compare against the golden file 8x8 test and got the following difference error:

[INFO DRT-0179] Init gr pin query.
Differences found at line 320.
     RECT  39.76 -0.24 40.24 0.44 ;
     RECT  19.76 0 20.24 0.44 ;
Differences found at line 95.
    - decoder_0_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;
    - decoder_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;

The name change is fine. The RECT statements here represent a change in wiring of the ram. Is the diff only 2 lines or is it multiple?

Either way this is likely due to some oversensitivity in the global or detailed router with respect to cell naming or ordering. Basically, the name change causes the instances to be stored in a slightly different order which then causes the router to find a slightly different solution. Which is okay as long as it works.

1. For the layout screenshots this is the original ram 8x8 (before changes to allow multi byte):
2. This is the layout screenshot of the ram 8x8 (after changes to allow multi bytes):

Visually these look the same. As long as the placements didn't change, I think it's fine.

If these explanations and result looks good, I can update the golden files (.ok files) for 8x8

Yes, please do.

@tnguy19
Copy link
Author

tnguy19 commented Jan 26, 2026

The two line differences message are from the output of the built in test script (make_8x8.tcl), I also checked the full files to see all the differences and they are all because of decoder naming change and multiple lines with coordinates difference in the li1 and met1-5 layers (i.e. difference in coordinates for li1, met1, etc. due to new routing as you explained). I have pushed the new make_8x8.defok file for 8x8 in this PR

@rovinski
Copy link
Collaborator

Ok, that's actually changing a lot more than just 2 lines of the routing. Does the routing finish without violations?

Also the test is still failing. You might need to also update the .lefok

@tnguy19
Copy link
Author

tnguy19 commented Jan 28, 2026

I was able to generate 8x8, 8x16, 32x32 ram with no routing errors and have committed both the .defok and .lefok file.

However I keep getting error from the continuous-integration/jenkins/pr-head and continuous-integration/jenkins/pr-merge with the error when pushing to github:

16:59:04  + docker pull us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970
16:59:04  Error response from daemon: failed to resolve reference "us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970": us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970: not found
script returned exit code 1

@maliberty
Copy link
Member

I don't see that error on pr-merge, perhaps it was transient. I just see
09:15:58 The following tests FAILED:
09:15:58 695 - ram.make_8x8.tcl (Failed) IntegrationTest tcl ram log_compare

Fix array indexing for bytes_per_word > 1
Fix buffer naming and placement across byte columns
Fix grid sizing and decoder placement
Fix word_count=2 select nets bug

code can generate RAM8x16, RAM16x16, RAM32x16, etc.

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
- Refactored and changed code so there is one decoder per word instead of one per byte per word, all bytes of a word now share the same decoder net
- Refactored makeCellByte loop to iterate over local_bit
- Renamed byte_number to byte_idx and first_byte to bit_idx

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
@tnguy19
Copy link
Author

tnguy19 commented Jan 29, 2026

Thanks for pointing that out. I'm having errors running the make8x8 file because of what appear to be differences in how the routing result is output. I ran the test and it passes locally but fails during continuous integration with a different format.
This is the latest error:

[2026-01-28T22:30:24.207Z] Differences found at line 587.
[2026-01-28T22:30:24.207Z]     - VDD ( PIN VDD ) ( * VPWR ) + USE POWER
[2026-01-28T22:30:24.207Z]     - VDD ( PIN VDD ) ( tapcell.cell9_8 VPWR ) ( tapcell.cell9_7 VPWR ) ( tapcell.cell9_6 VPWR ) ( tapcell.cell9_5 VPWR ) ( tapcell.cell9_4 VPWR ) ( tapcell.cell9_3 VPWR )
[2026-01-28T22:30:24.207Z] Exitcode:  0

My local build produces - VDD ( PIN VDD ) ( * VPWR ) + USE POWER. Is there a way to ensure consistent routing solution/format? Or should I use a different approach to generate the golden files?

@maliberty
Copy link
Member

There was a recent change in the DEF writer. You can use the save_ok / save_defok scripts to update the results (they just copy the right files from results).

@maliberty
Copy link
Member

I would guess you need to add a global_connect to hook up your new cells to the power grid.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@rovinski
Copy link
Collaborator

@tnguy19 is your OpenROAD binary up to date? Have you merged in the master branch and then rebuilt before trying to update the ok files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants