Skip to content

Proposal: Add max_live_wal_files to prevent inode exhaustion #14169

@0xdeafbeef

Description

@0xdeafbeef

Context

  • I've found huge amount of wal files (1m+) in db with pretty low traffic, while having 22 SSTs. It has several column families with active writes and one which holds rarely updated metadata.
  • Workload: write batches; auto-compactions disabled; write_buffer_size=256 MiB, max_write_buffer_number=2, min_write_buffer_number_to_merge=2, level_compaction_dynamic_level_bytes=true.
  • CF had only a couple of records; it seems that a large memtable allowed WAL files to pile up. After adding a manual flush for that column family every n records, the WALs were compacted.
EXT                COUNT    SIZE
.sst                22      426398673
.log               1372126  1200701440

Problem

  • RocksDB currently gates WAL accumulation by size (max_total_wal_size) and archive size/age (WAL_size_limit_MB, WAL_ttl_seconds). There is no count-based limit on live WALs, so many small logs can accumulate, burning inodes and potentially making you unable to recover. Eg you have hit ext4 file limit per dir, you change code to call flush, but you can't create the new SST required for compaction to progress.

Proposal

  • Add a live-WAL count cap (e.g., max_live_wal_files). When the live WAL count exceeds this cap, trigger FlushReason::kWalFull on CFs holding the oldest WALs until the count drops below the limit. Honor atomic_flush by batching CFs when enabled.
  • (Optional) Add an archive-WAL count cap (e.g., max_archived_wal_files) to delete oldest archived WALs when the count exceeds the cap, alongside existing TTL/size pruning.
  • Telemetry: new stats for count-based flush triggers and count-based archive deletions.

All code for it seems to be here, so if it looks ok for you i can send a pr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions