Skip to content

Conversation

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Nov 7, 2025

This change aligns the efficiency of UnsetIterator with that of
Iterator: it runs in a time proportional to the number
of unset bits, not the number of set bits.

The benchmark added in this test ran at about 30s per iteration
previously and now runs in about 1.2ms.

Also add some property-based tests with a small corpus of values.

Description

Please provide a brief description of the changes made in this pull request.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance improvement
  • Code refactoring
  • Documentation update
  • Test improvements
  • Build/CI changes

Changes Made

What was changed?

  • add custom interface methods for unset iteration.

Why was it changed?

  • speed up iteration over unset bits

How was it changed?

  • I got Claude Code to do it. Please carefully review!

Performance Impact

                                │      sec/op      │    sec/op     vs base           │
IterateRoaring/unsetIterator-14   35263.456m ± ∞ ¹   1.147m ± ∞ ¹  ~ (p=1.000 n=1)

Related Issues

Fixes #497

Also add some property-based tests with a small corpus
of values.

Signed-off-by: Roger Peppe <[email protected]>
@rogpeppe rogpeppe force-pushed the 004-new-unsetbits-pr branch from c165abe to 9ebb1c3 Compare November 7, 2025 22:22
return &shortIterator{ac.content, 0}
}

type arrayContainerUnsetIterator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in arraycontainer, I think we don't define iterator types, they are in shortiterator.go. I think it would be more consistent to move the code there, with the other iterators.

(Of course, this changes nothing with respect to the functionality.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


func newArrayContainerUnsetIterator(a *arrayContainer) *arrayContainerUnsetIterator {
acui := &arrayContainerUnsetIterator{content: a.content, pos: 0, nextVal: 0}
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {

I think that this would be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if acui.pos < 0 {
acui.pos = -acui.pos - 1
}
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func (acui *arrayContainerUnsetIterator) next() uint16 {
val := acui.nextVal
acui.nextVal++
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for acui.pos < len(acui.content) && uint16(acui.nextVal) == acui.content[acui.pos] {
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this could be inefficient because it may force you to scan all the values in the array for naught in the user only wanted the unset bits in a range that does not contain a lot of set bits. I think that this might be fine in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

type arrayContainerUnsetIterator struct {
content []uint16
pos int
nextVal int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to add a comment. pos is the position of a set bit that is larger than nextVal. Once nextVal gets to content[pos], we must increment pos. Something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return
}
acui.nextVal = int(minval)
acui.pos = binarySearch(acui.content, minval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So if pos is positive after this, then we have that acui.content[pos] = minval, but if not, then we have that acui.content[pos] > minval (assuming that acui.content[pos] exists).

So we have that minval >= acui.content[pos]

and

So we have that acui.nextVal >= acui.content[pos]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a suggestion for a change, to add a comment, or just an aside explaining your understanding of the logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It explains my other suggestions.

runcontainer.go Outdated
func (rui *runUnsetIterator16) next() uint16 {
val := rui.nextVal
rui.nextVal++
if rui.curIndex < len(rui.rc.iv) && uint16(rui.nextVal) == rui.rc.iv[rui.curIndex].start {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if rui.curIndex < len(rui.rc.iv) && uint16(rui.nextVal) == rui.rc.iv[rui.curIndex].start {
if rui.curIndex < len(rui.rc.iv) && uint16(rui.nextVal) >= rui.rc.iv[rui.curIndex].start {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lemire
Copy link
Member

lemire commented Nov 8, 2025

This looks good to me. I have minor comments that you may consider. Otherwise, we shall merge.

@lemire lemire merged commit daab0fd into RoaringBitmap:master Nov 10, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnsetIterator is very slow on densely populated bitsets

2 participants