-
Notifications
You must be signed in to change notification settings - Fork 773
Fix RAM generator to allow multi byte word #9266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses several critical bugs in the RAM generator to enable support for multi-byte words. The fixes correct array indexing errors, resolve duplicate naming issues for buffers, fix incorrect placement of various components, and adjust grid sizing for decoders. The changes are well-targeted and appear to correctly resolve the described issues. I've included a couple of minor suggestions to enhance code readability and maintainability.
src/ram/src/ram.cpp
Outdated
| for (int bit = first_byte; bit < first_byte + 8; ++bit) { | ||
| int local_bit = bit - first_byte; | ||
| auto name = fmt::format("{}.bit{}", prefix, bit); | ||
| vector<dbNet*> outs; | ||
| outs.reserve(read_ports); | ||
| for (int read_port = 0; read_port < read_ports; ++read_port) { | ||
| outs.push_back(data_output[read_port][bit]->getNet()); | ||
| outs.push_back(data_output[read_port][local_bit]->getNet()); | ||
| } | ||
|
|
||
| ram_grid.addCell( | ||
| makeCellBit( | ||
| name, read_ports, gclock_net, select_b_nets, data_input[bit], outs), | ||
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | ||
| bit); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop calculates local_bit from bit. For better readability and to make the intent clearer, consider iterating directly on local_bit from 0 to 7 and calculating bit inside the loop. This makes it more obvious that local_bit is used for array indexing within its bounds.
| for (int bit = first_byte; bit < first_byte + 8; ++bit) { | |
| int local_bit = bit - first_byte; | |
| auto name = fmt::format("{}.bit{}", prefix, bit); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][bit]->getNet()); | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[bit], outs), | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } | |
| for (int local_bit = 0; local_bit < 8; ++local_bit) { | |
| const int bit = first_byte + local_bit; | |
| auto name = fmt::format("{}.bit{}", prefix, bit); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gemini has the right idea, but this should be combined with the other changes.
The iteration should be over the local bit index and then mapped to the global index for the function call.
Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.
| for (int bit = first_byte; bit < first_byte + 8; ++bit) { | |
| int local_bit = bit - first_byte; | |
| auto name = fmt::format("{}.bit{}", prefix, bit); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][bit]->getNet()); | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[bit], outs), | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } | |
| for (int local_bit = 0; local_bit < 8; ++local_bit) { | |
| const int local_bit_idx = bit_idx + bit; | |
| auto name = fmt::format("{}.bit{}", prefix, local_bit_idx); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this part, i think the variable local_bit_idx in line 134 actually stores the global index position (bit_idx + local_bit), should this be renamed to global_bit_idx to make it clearer?
then at line 142-145 it will be
ram_grid.addCell(
makeCellBit(
name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs),
global_bit_idx);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine. I thought about it as an index local to the loop but global is fine too.
src/ram/src/ram.cpp
Outdated
| decoder_input_nets[row]); | ||
|
|
||
| ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9)); | ||
| ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression bytes_per_word * 9 is used here, but the variable col_cell_count was defined earlier for this purpose. Using the variable would improve readability and maintainability, as the logic is defined in one place.
| ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col); | |
| ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
rovinski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for asking to make changes to code that goes beyond the fix, but it seems necessary to make the fix make sense.
src/ram/src/ram.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte_number is a confusing term and doesn't describe what it actually means. "index" is more appropriate
| const int byte_idx, |
src/ram/src/ram.cpp
Outdated
| @@ -131,16 +131,17 @@ void RamGen::makeCellByte(Grid& ram_grid, | |||
|
|
|||
| int first_byte = byte_number * 9; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_byte also doesn't make sense here. Use bit_idx for the same reason.
| int first_byte = byte_number * 9; | |
| int bit_idx = byte_idx * 9; |
src/ram/src/ram.cpp
Outdated
| for (int bit = first_byte; bit < first_byte + 8; ++bit) { | ||
| int local_bit = bit - first_byte; | ||
| auto name = fmt::format("{}.bit{}", prefix, bit); | ||
| vector<dbNet*> outs; | ||
| outs.reserve(read_ports); | ||
| for (int read_port = 0; read_port < read_ports; ++read_port) { | ||
| outs.push_back(data_output[read_port][bit]->getNet()); | ||
| outs.push_back(data_output[read_port][local_bit]->getNet()); | ||
| } | ||
|
|
||
| ram_grid.addCell( | ||
| makeCellBit( | ||
| name, read_ports, gclock_net, select_b_nets, data_input[bit], outs), | ||
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | ||
| bit); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gemini has the right idea, but this should be combined with the other changes.
The iteration should be over the local bit index and then mapped to the global index for the function call.
Realistically, this probably needs to be refactored because it clashes local context with global context, which is not a good structure. But that can be done later.
| for (int bit = first_byte; bit < first_byte + 8; ++bit) { | |
| int local_bit = bit - first_byte; | |
| auto name = fmt::format("{}.bit{}", prefix, bit); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][bit]->getNet()); | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[bit], outs), | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } | |
| for (int local_bit = 0; local_bit < 8; ++local_bit) { | |
| const int local_bit_idx = bit_idx + bit; | |
| auto name = fmt::format("{}.bit{}", prefix, local_bit_idx); | |
| vector<dbNet*> outs; | |
| outs.reserve(read_ports); | |
| for (int read_port = 0; read_port < read_ports; ++read_port) { | |
| outs.push_back(data_output[read_port][local_bit]->getNet()); | |
| } | |
| ram_grid.addCell( | |
| makeCellBit( | |
| name, read_ports, gclock_net, select_b_nets, data_input[local_bit], outs), | |
| bit); | |
| } |
src/ram/src/ram.cpp
Outdated
| // extra column is for decoder cells | ||
| int col_cell_count = bytes_per_word * 9; | ||
| Grid ram_grid(odb::horizontal, col_cell_count + 1); | ||
| Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bytes_per_word here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because of another change that i made earlier in the code ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col) (line 524) to allow for more than 1 byte per word by adding one decoder column per byte. So the bytes_per_word here is to create extra space for the decoders. For example if 'bytes_per_word' = 3:
byte 0's decoder = 3 * 9 + 0 = column 27
byte 1's decoder = 3 * 9 + 1 = column 28
byte 2's decoder = 3 * 9 + 2 = column 29 -> need an extra column which is more than 3*9+1= 28 columns
So I set the ram grid size to col_cell_count + bytes_per_word to make sure theres enough columns for the decoder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show a diagram or before/after layout screenshot demonstrating why it's necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt use a diagram but only added + bytes_per_word to ensure the ram has enough columns from having multiple bytes per word. I checked and saw that Grid::addCell() function can resize the ram dynamically, so both options can generate ram without errors, but I think having it initialized as Grid ram_grid(odb::horizontal, col_cell_count + bytes_per_word) will ensure that the code initialize enough space right from the start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just trying to understand why it's necessary. I would think that if extra instances are added to a given cell, the cell width would expand so that each "column" would be wider. It wouldn't necessarily require more columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, I want to make sure I understand correctly. Currently the code to create and add decoders is in a for loop over column and row (line 467-525):
for (int col = 0; col < bytes_per_word; ++col) {
...
for (int row = 0; row < word_count; ++row) {
...
auto decoder_and_cell = makeDecoder(...)
ram_grid.addCell(std::move(decoder_and_cell), col_cell_count);
...
My understanding is that for bytes_per_word=2 and word_count=8 the current code will add all the 16 decoders into one 'column' (making it wider) , which is better and having less columns than what I'm suggesting ram_grid.addCell(std::move(decoder_and_cell), col_cell_count + col) which would put byte 0's decoders into one column and byte 1's decoders into another column right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no reference for what this looks like which is why I was asking for a screenshot of the layout. I am actually understanding this issue less the more it is discussed.
I also don't understand the terminology you are using. There is only one "decoder" circuit, but the decoder may be composed of multiple gates. For example, a 3-to-8 decoder will decode a 3-bit address signal into activating 1 out of 8 lines. It will be composed of multiple ANDs and inverters.
If this change is correct, please just share a screenshot of what the layout looks like with this code change vs. without this code change.
src/ram/src/ram.cpp
Outdated
| decoder_input_nets[row]); | ||
|
|
||
| ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9)); | ||
| ram_grid.addCell(std::move(decoder_and_cell), (bytes_per_word * 9) + col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
|
To fix DCO please click on the test and follow the instructions to fit. In the future, you can add |
|
@rovinski
|
|
88eeded to
0dad3f6
Compare
The name change is fine. The Either way this is likely due to some oversensitivity in the global or detailed router with respect to cell naming or ordering. Basically, the name change causes the instances to be stored in a slightly different order which then causes the router to find a slightly different solution. Which is okay as long as it works.
Visually these look the same. As long as the placements didn't change, I think it's fine.
Yes, please do. |
|
The two line differences message are from the output of the built in test script ( |
|
Ok, that's actually changing a lot more than just 2 lines of the routing. Does the routing finish without violations? Also the test is still failing. You might need to also update the .lefok |
|
I was able to generate 8x8, 8x16, 32x32 ram with no routing errors and have committed both the .defok and .lefok file. However I keep getting error from the |
|
I don't see that error on pr-merge, perhaps it was transient. I just see |
Fix array indexing for bytes_per_word > 1 Fix buffer naming and placement across byte columns Fix grid sizing and decoder placement Fix word_count=2 select nets bug code can generate RAM8x16, RAM16x16, RAM32x16, etc. Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
- Refactored and changed code so there is one decoder per word instead of one per byte per word, all bytes of a word now share the same decoder net - Refactored makeCellByte loop to iterate over local_bit - Renamed byte_number to byte_idx and first_byte to bit_idx Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
840e9c8 to
c0dfc59
Compare
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
Thanks for pointing that out. I'm having errors running the make8x8 file because of what appear to be differences in how the routing result is output. I ran the test and it passes locally but fails during continuous integration with a different format. My local build produces |
|
There was a recent change in the DEF writer. You can use the save_ok / save_defok scripts to update the results (they just copy the right files from results). |
|
I would guess you need to add a global_connect to hook up your new cells to the power grid. |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
@tnguy19 is your OpenROAD binary up to date? Have you merged in the master branch and then rebuilt before trying to update the ok files? |




Array indexing bug : Code used global grid position to index 8-element bit arrays, causing out of bound error for bytes_per_word > 1. Added
local_bit = bit - first_byteto identify correct local array positionBuffer naming bug : Duplicate buffer instance names across byte columns, fixed by changing into
bit + col * 8Buffer placement bug: all buffers placed at same grid position, fixed with
col * 9 + bitGrid sizing bug: insufficient decoder columns. Changed from
+ 1to `+ bytes_per_wordDecoder placement bug (line 524): overlap in the placement. Fixed by adding
+ coloffsetword_count=2 special case bug (lines 500-503): only created 1 select net but needed
read_portsnets. Fixed with loop creating read_ports number of nets (one per read port)@rovinski