Skip to content

[Bug] Beat crashes #22

@redacted-text

Description

@redacted-text

Beat process crashes sometimes:

[2025-12-06 11:36:55,055: ERROR/Beat] Process Beat
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/dbm/sqlite3.py", line 83, in _execute
    return closing(self._cx.execute(*args, **kwargs))
                   ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: disk I/O error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/billiard/process.py", line 323, in _bootstrap
    self.run()
    ~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/celery/beat.py", line 718, in run
    self.service.start(embedded_process=True)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/celery/beat.py", line 649, in start
    self.scheduler._do_sync()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/celery/beat.py", line 428, in _do_sync
    self.sync()
    ~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/celery/beat.py", line 599, in sync
    self._store.sync()
    ~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/shelve.py", line 168, in sync
    self[key] = entry
    ~~~~^^^^^
  File "/usr/local/lib/python3.13/shelve.py", line 125, in __setitem__
    self.dict[key.encode(self.keyencoding)] = f.getvalue()
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/dbm/sqlite3.py", line 100, in __setitem__
    self._execute(STORE_KV, (key, value))
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/dbm/sqlite3.py", line 85, in _execute
    raise error(str(exc))
dbm.sqlite3.error: disk I/O error

I found a possible reason behind it.
Celery uses sqlite3 to store schedule, and it conflicts with how helm upgrade goes.
Helm first start new resources, and only after that terminating older ones, which makes beat unable to write to database on upgrade. I tried workaround: removed schedule database and then did an upgrade. for now - seems like it works fine.

Proposal

Use redis for scheduler persistance

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority/lowNice to have but not critical to the project's successtype/bugSomething isn't working correctly or is broken

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions