Fix RAM generator to allow multi byte word #9266

tnguy19 · 2026-01-15T21:13:51Z

Array indexing bug : Code used global grid position to index 8-element bit arrays, causing out of bound error for bytes_per_word > 1. Added local_bit = bit - first_byte to identify correct local array position
Buffer naming bug : Duplicate buffer instance names across byte columns, fixed by changing into bit + col * 8
Buffer placement bug: all buffers placed at same grid position, fixed with col * 9 + bit
Grid sizing bug: insufficient decoder columns. Changed from + 1 to `+ bytes_per_word
Decoder placement bug (line 524): overlap in the placement. Fixed by adding + col offset
word_count=2 special case bug (lines 500-503): only created 1 select net but needed read_ports nets. Fixed with loop creating read_ports number of nets (one per read port)

gemini-code-assist

Code Review

This pull request addresses several critical bugs in the RAM generator to enable support for multi-byte words. The fixes correct array indexing errors, resolve duplicate naming issues for buffers, fix incorrect placement of various components, and adjust grid sizing for decoders. The changes are well-targeted and appear to correctly resolve the described issues. I've included a couple of minor suggestions to enhance code readability and maintainability.

gemini-code-assist · 2026-01-15T21:15:48Z

src/ram/src/ram.cpp

  for (int bit = first_byte; bit < first_byte + 8; ++bit) {
+    int local_bit = bit - first_byte;
    auto name = fmt::format("{}.bit{}", prefix, bit);
    vector<dbNet*> outs;
    outs.reserve(read_ports);
    for (int read_port = 0; read_port < read_ports; ++read_port) {
-      outs.push_back(data_output[read_port][bit]->getNet());
+      outs.push_back(data_output[read_port][local_bit]->getNet());
    }

    ram_grid.addCell(
        makeCellBit(
-            name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
+            name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
        bit);
  }


The loop calculates local_bit from bit. For better readability and to make the intent clearer, consider iterating directly on local_bit from 0 to 7 and calculating bit inside the loop. This makes it more obvious that local_bit is used for array indexing within its bounds.

Suggested change

for (int bit = first_byte; bit < first_byte + 8; ++bit) {

int local_bit = bit - first_byte;

auto name = fmt::format("{}.bit{}", prefix, bit);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][bit]->getNet());

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

for (int local_bit = 0; local_bit < 8; ++local_bit) {

const int bit = first_byte + local_bit;

auto name = fmt::format("{}.bit{}", prefix, bit);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

Gemini has the right idea, but this should be combined with the other changes.

The iteration should be over the local bit index and then mapped to the global index for the function call.

Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.

Suggested change

for (int bit = first_byte; bit < first_byte + 8; ++bit) {

int local_bit = bit - first_byte;

auto name = fmt::format("{}.bit{}", prefix, bit);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][bit]->getNet());

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

for (int local_bit = 0; local_bit < 8; ++local_bit) {

const int local_bit_idx = bit_idx + bit;

auto name = fmt::format("{}.bit{}", prefix, local_bit_idx);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

for this part, i think the variable local_bit_idx in line 134 actually stores the global index position (bit_idx + local_bit), should this be renamed to global_bit_idx to make it clearer?
then at line 142-145 it will be

ram_grid.addCell( makeCellBit( name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), global_bit_idx);

Either way is fine. I thought about it as an index local to the loop but global is fine too.

gemini-code-assist · 2026-01-15T21:15:48Z

src/ram/src/ram.cpp

                                          decoder_input_nets[row]);

-      ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9));
+      ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);


The expression bytes_per_word * 9 is used here, but the variable col_cell_count was defined earlier for this purpose. Using the variable would improve readability and maintainability, as the logic is defined in one place.

Suggested change

ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);

ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col);

rovinski

Apologies for asking to make changes to code that goes beyond the fix, but it seems necessary to make the fix make sense.

rovinski · 2026-01-16T01:46:03Z

src/ram/src/ram.cpp

byte_number is a confusing term and doesn't describe what it actually means. "index" is more appropriate

Suggested change

const int byte_idx,

rovinski · 2026-01-16T01:52:12Z

src/ram/src/ram.cpp

@@ -131,16 +131,17 @@ void RamGen::makeCellByte(Grid& ram_grid,

  int first_byte = byte_number * 9;


first_byte also doesn't make sense here. Use bit_idx for the same reason.

Suggested change

int first_byte = byte_number * 9;

int bit_idx = byte_idx * 9;

rovinski · 2026-01-16T02:06:54Z

src/ram/src/ram.cpp

  for (int bit = first_byte; bit < first_byte + 8; ++bit) {
+    int local_bit = bit - first_byte;
    auto name = fmt::format("{}.bit{}", prefix, bit);
    vector<dbNet*> outs;
    outs.reserve(read_ports);
    for (int read_port = 0; read_port < read_ports; ++read_port) {
-      outs.push_back(data_output[read_port][bit]->getNet());
+      outs.push_back(data_output[read_port][local_bit]->getNet());
    }

    ram_grid.addCell(
        makeCellBit(
-            name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),
+            name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
        bit);
  }


Gemini has the right idea, but this should be combined with the other changes.

The iteration should be over the local bit index and then mapped to the global index for the function call.

Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.

Suggested change

for (int bit = first_byte; bit < first_byte + 8; ++bit) {

int local_bit = bit - first_byte;

auto name = fmt::format("{}.bit{}", prefix, bit);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][bit]->getNet());

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[bit], outs),

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

for (int local_bit = 0; local_bit < 8; ++local_bit) {

const int local_bit_idx = bit_idx + bit;

auto name = fmt::format("{}.bit{}", prefix, local_bit_idx);

vector<dbNet*> outs;

outs.reserve(read_ports);

for (int read_port = 0; read_port < read_ports; ++read_port) {

outs.push_back(data_output[read_port][local_bit]->getNet());

}

ram_grid.addCell(

makeCellBit(

name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),

bit);

}

rovinski · 2026-01-16T02:10:35Z

src/ram/src/ram.cpp

  // extra column is for decoder cells
  int col_cell_count = bytes_per_word * 9;
-  Grid ram_grid(odb::horizontal, col_cell_count + 1);
+  Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word);


why bytes_per_word here?

this is because of another change that i made earlier in the code ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col) (line 524) to allow for more than 1 byte per word by adding one decoder column per byte. So the bytes_per_word here is to create extra space for the decoders. For example if 'bytes_per_word' = 3:

byte 0's decoder = 3 * 9 + 0 = column 27
byte 1's decoder = 3 * 9 + 1 = column 28
byte 2's decoder = 3 * 9 + 2 = column 29 -> need an extra column which is more than 3*9+1= 28 columns

So I set the ram grid size to col_cell_count + bytes_per_word to make sure theres enough columns for the decoder

Can you show a diagram or before/after layout screenshot demonstrating why it's necessary?

I didnt use a diagram but only added + bytes_per_word to ensure the ram has enough columns from having multiple bytes per word. I checked and saw that Grid::addCell() function can resize the ram dynamically, so both options can generate ram without errors, but I think having it initialized as Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word) will ensure that the code initialize enough space right from the start

I am just trying to understand why it's necessary. I would think that if extra instances are added to a given cell, the cell width would expand so that each "column" would be wider. It wouldn't necessarily require more columns.

Thanks for pointing that out, I want to make sure I understand correctly. Currently the code to create and add decoders is in a for loop over column and row (line 467-525):

for (int col = 0; col < bytes_per_word; ++col) { ... for (int row = 0; row < word_count; ++row) { ... auto decoder_and_cell = makeDecoder(...) ram_grid.addCell(std::move(decoder_and_cell), col_cell_count); ...

My understanding is that for bytes_per_word=2 and word_count=8 the current code will add all the 16 decoders into one 'column' (making it wider) , which is better and having less columns than what I'm suggesting ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col) which would put byte 0's decoders into one column and byte 1's decoders into another column right?

I have no reference for what this looks like which is why I was asking for a screenshot of the layout. I am actually understanding this issue less the more it is discussed.

I also don't understand the terminology you are using. There is only one "decoder" circuit, but the decoder may be composed of multiple gates. For example, a 3-to-8 decoder will decode a 3-bit address signal into activating 1 out of 8 lines. It will be composed of multiple ANDs and inverters.

If this change is correct, please just share a screenshot of what the layout looks like with this code change vs. without this code change.

rovinski · 2026-01-16T02:12:17Z

src/ram/src/ram.cpp

                                          decoder_input_nets[row]);

-      ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9));
+      ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);


rovinski · 2026-01-16T02:15:46Z

To fix DCO please click on the test and follow the instructions to fit. In the future, you can add -s to git commit to automatically sign commits.

tnguy19 · 2026-01-23T17:34:50Z

@rovinski
I have made a new commit with changes based on your feedback:

Refactored and changed code so there is one decoder per word instead of one per byte per word, all bytes of a word now share the same decoder net
Refactored makeCellByte loop to iterate over local_bit
Renamed byte_number to byte_idx and first_byte to bit_idx
Changed grid sizing to col_cell_count +1 and used col_cell_count variable instead of bytes_per_word * 9
Please let me know if there are any issues and if you need any clarifications on the code changes

rovinski · 2026-01-25T04:11:30Z

Can you post screenshots of the layout highlighting the changes?
Please fix the DCO check, as mentioned before
Please fix build errors. You can click on the test to go to the CI pipeline and view the error. In this case, the ram8x8 test is failing (meaning these changes cause it to deviate from what it was before). You should check on that and make sure that your changes do not affect the 8x8 test. If the 8x8 test is wrong and your changes make it correct, then you need to post an explanation and then update the golden files (.ok files) for ram8x8.

tnguy19 · 2026-01-25T16:43:42Z

I ran the code to compare against the golden file 8x8 test and got the following difference error:

[INFO DRT-0179] Init gr pin query.
Differences found at line 320.
     RECT  39.76 -0.24 40.24 0.44 ;
     RECT  19.76 0 20.24 0.44 ;
Differences found at line 95.
    - decoder_0_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;
    - decoder_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;

The one on line 95 appears to be a difference in the naming of decoder since before decoders are created per byte per word so I think the naming has both the row and column in the name (decoder_0_0), but in the changes I made there is one decoder per word so I only include the row number in the name (decoder_0).

For line 320 this might be because of the placement difference from the change in how the new code initialize decoder which doesnt cause an error when generating the ram. Please correct me if you know this is a significant issue.

1. For the layout screenshots this is the original ram 8x8 (before changes to allow multi byte):

2. This is the layout screenshot of the ram 8x8 (after changes to allow multi bytes):

The old and the new code changes have the same layout for the 8x8 case and both have the same number of decoder cells (inverters and and layers of 8).

For the change where I fixed the bug to allow for multi byte but didnt change the decoder placement logic.

3. This is the 8x16 ram before changes to decoder placement logic:

In the red circle that I highlight, there is 4 and layers for top row/word in the ram (decoder_7_0.and_layer0, decoder_7_0.and_layer1, decoder_7_1.and_layer0, decoder_7_1.and_layer1) corresponding to 2 decoders since there is 2 bytes per word.

After the change to the decoder placement logic, where there is one decoder per word instead of per byte. This is the layout for 8x16:

4. 8x16 ram after changes to decoder placement logic:

With the change there is only 2 and layers which is decoder_7.and_layer0 and decoder_7.and_layer1 highlighted in the red circle. So there is 1 decoder for one word and 2 and layers instead of 4 and layers.

If these explanations and result looks good, I can update the golden files (.ok files) for 8x8

rovinski · 2026-01-25T21:52:00Z

I ran the code to compare against the golden file 8x8 test and got the following difference error:

[INFO DRT-0179] Init gr pin query.
Differences found at line 320.
     RECT  39.76 -0.24 40.24 0.44 ;
     RECT  19.76 0 20.24 0.44 ;
Differences found at line 95.
    - decoder_0_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;
    - decoder_0.and_layer0 sky130_fd_sc_hd__and2_0 + PLACED ( 11914 0 ) N ;

The name change is fine. The RECT statements here represent a change in wiring of the ram. Is the diff only 2 lines or is it multiple?

Either way this is likely due to some oversensitivity in the global or detailed router with respect to cell naming or ordering. Basically, the name change causes the instances to be stored in a slightly different order which then causes the router to find a slightly different solution. Which is okay as long as it works.

1. For the layout screenshots this is the original ram 8x8 (before changes to allow multi byte):
2. This is the layout screenshot of the ram 8x8 (after changes to allow multi bytes):

Visually these look the same. As long as the placements didn't change, I think it's fine.

If these explanations and result looks good, I can update the golden files (.ok files) for 8x8

Yes, please do.

tnguy19 · 2026-01-26T18:37:22Z

The two line differences message are from the output of the built in test script (make_8x8.tcl), I also checked the full files to see all the differences and they are all because of decoder naming change and multiple lines with coordinates difference in the li1 and met1-5 layers (i.e. difference in coordinates for li1, met1, etc. due to new routing as you explained). I have pushed the new make_8x8.defok file for 8x8 in this PR

rovinski · 2026-01-27T20:08:47Z

Ok, that's actually changing a lot more than just 2 lines of the routing. Does the routing finish without violations?

Also the test is still failing. You might need to also update the .lefok

tnguy19 · 2026-01-28T17:46:44Z

I was able to generate 8x8, 8x16, 32x32 ram with no routing errors and have committed both the .defok and .lefok file.

However I keep getting error from the continuous-integration/jenkins/pr-head and continuous-integration/jenkins/pr-merge with the error when pushing to github:

16:59:04  + docker pull us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970
16:59:04  Error response from daemon: failed to resolve reference "us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970": us-docker.pkg.dev/foss-fpga-tools-ext-openroad/openroad/ubuntu24.04:26Q1-494-g38f189f970: not found
script returned exit code 1

maliberty · 2026-01-28T18:02:55Z

I don't see that error on pr-merge, perhaps it was transient. I just see
09:15:58 The following tests FAILED:
09:15:58 695 - ram.make_8x8.tcl (Failed) IntegrationTest tcl ram log_compare

Fix array indexing for bytes_per_word > 1 Fix buffer naming and placement across byte columns Fix grid sizing and decoder placement Fix word_count=2 select nets bug code can generate RAM8x16, RAM16x16, RAM32x16, etc. Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

- Refactored and changed code so there is one decoder per word instead of one per byte per word, all bytes of a word now share the same decoder net - Refactored makeCellByte loop to iterate over local_bit - Renamed byte_number to byte_idx and first_byte to bit_idx Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

tnguy19 · 2026-01-29T00:46:51Z

Thanks for pointing that out. I'm having errors running the make8x8 file because of what appear to be differences in how the routing result is output. I ran the test and it passes locally but fails during continuous integration with a different format.
This is the latest error:

[2026-01-28T22:30:24.207Z] Differences found at line 587.
[2026-01-28T22:30:24.207Z]     - VDD ( PIN VDD ) ( * VPWR ) + USE POWER
[2026-01-28T22:30:24.207Z]     - VDD ( PIN VDD ) ( tapcell.cell9_8 VPWR ) ( tapcell.cell9_7 VPWR ) ( tapcell.cell9_6 VPWR ) ( tapcell.cell9_5 VPWR ) ( tapcell.cell9_4 VPWR ) ( tapcell.cell9_3 VPWR )
[2026-01-28T22:30:24.207Z] Exitcode:  0

My local build produces - VDD ( PIN VDD ) ( * VPWR ) + USE POWER. Is there a way to ensure consistent routing solution/format? Or should I use a different approach to generate the golden files?

maliberty · 2026-01-29T02:48:59Z

There was a recent change in the DEF writer. You can use the save_ok / save_defok scripts to update the results (they just copy the right files from results).

maliberty · 2026-01-29T02:49:25Z

I would guess you need to add a global_connect to hook up your new cells to the power grid.

github-actions · 2026-01-29T02:51:52Z

clang-tidy review says "All clean, LGTM! 👍"

rovinski · 2026-01-29T03:55:07Z

@tnguy19 is your OpenROAD binary up to date? Have you merged in the master branch and then rebuilt before trying to update the ok files?

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

rovinski self-requested a review January 15, 2026 22:14

rovinski suggested changes Jan 16, 2026

View reviewed changes

tnguy19 force-pushed the ram-update-code branch from 88eeded to 0dad3f6 Compare January 25, 2026 15:33

tnguy19 added 5 commits January 28, 2026 15:48

Update both RAM8x8 golden files (.lefok and .defok)

7847f1d

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

Update RAM8x8 golden log file (.ok)

1bcfb88

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

Update RAM8x8 golden log - use -no_splash flag

c0dfc59

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

tnguy19 force-pushed the ram-update-code branch from 840e9c8 to c0dfc59 Compare January 28, 2026 20:54

Regenerate all RAM8x8 golden files with new decoder naming

efa9572

Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>

	ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col);
	ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col);

		@@ -131,16 +131,17 @@ void RamGen::makeCellByte(Grid& ram_grid,

		int first_byte = byte_number * 9;

	int first_byte = byte_number * 9;
	int bit_idx = byte_idx * 9;

Fix RAM generator to allow multi byte word #9266

Are you sure you want to change the base?

Fix RAM generator to allow multi byte word #9266

Conversation

tnguy19 commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rovinski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rovinski Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rovinski commented Jan 16, 2026

Uh oh!

tnguy19 commented Jan 23, 2026

Uh oh!

rovinski commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnguy19 commented Jan 25, 2026

Uh oh!

rovinski commented Jan 25, 2026

Uh oh!

tnguy19 commented Jan 26, 2026

Uh oh!

rovinski commented Jan 27, 2026

Uh oh!

tnguy19 commented Jan 28, 2026

Uh oh!

maliberty commented Jan 28, 2026

Uh oh!

tnguy19 commented Jan 29, 2026

Uh oh!

maliberty commented Jan 29, 2026

Uh oh!

maliberty commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

rovinski commented Jan 29, 2026

Uh oh!

rovinski Jan 19, 2026 •

edited

Loading

rovinski commented Jan 25, 2026 •

edited

Loading