Skip to content

Use hash func to boost file creation and lookup#79

Open
RoyWFHuang wants to merge 3 commits intosysprog21:masterfrom
RoyWFHuang:feature/op_perf
Open

Use hash func to boost file creation and lookup#79
RoyWFHuang wants to merge 3 commits intosysprog21:masterfrom
RoyWFHuang:feature/op_perf

Conversation

@RoyWFHuang
Copy link
Copy Markdown
Collaborator

@RoyWFHuang RoyWFHuang commented Nov 10, 2025

Previously, SimpleFS used a sequential insertion method to create files, which worked efficiently when the filesystem contained only a small number of files.
However, in real-world use cases, filesystems often manage a large number of files, making sequential search and insertion inefficient.
Inspired by Ext4’s hash-based directory indexing, this change adopts a hash function to accelerate file indexing and improve scalability.

Change:
Implemented hash-based file index lookup
Improved scalability for large directory structures

hash_code = file_hash(file_name);

extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT
block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

 inode
  +-----------------------+
  | i_mode = IFDIR | 0755 |      block 123 (simplefs_file_ei_block)
  | ei_block = 123    ----|--->  +----------------+
  | i_size = 4 KiB        |      | nr_files  = 7  |
  | i_blocks = 1          |      |----------------|
  +-----------------------+    0 | ee_block  = 0  |
              (extent index = 0) | ee_len    = 8  |
                                 | ee_start  = 84 |--->  +-------------+ block 84(simplefs_dir_block)
                                 | nr_file   = 2  |      |nr_files = 2 | (block index = 0)
                                 |----------------|      |-------------|
                               1 | ee_block  = 8  |    0 | inode  = 24 |
              (extent index = 1) | ee_len    = 8  |      | nr_blk = 1  |
                                 | ee_start  = 16 |      | (foo)       |
                                 | nr_file   = 5  |      |-------------|
                                 |----------------|    1 | inode  = 45 |
                                 | ...            |      | nr_blk = 14 |
                                 |----------------|      | (bar)       |
                             341 | ee_block  = 0  |      |-------------|
            (extent index = 341) | ee_len    = 0  |      | ...         |
                                 | ee_start  = 0  |      |-------------|
                                 | nr_file   = 12 |   14 | 0           |
                                 +----------------+      +-------------+ block 85(simplefs_dir_block)
                                                         |nr_files = 2 | (block index = 1)
                                                         |-------------|
                                                       0 | inode  = 48 |
                                                         | nr_blk = 15 |
                                                         | (foo1)      |
                                                         |-------------|
                                                       1 | inode  = 0  |
                                                         | nr_blk = 0  |
                                                         |             |
                                                         |-------------|
                                                         | ...         |
                                                         |-------------|
                                                      14 | 0           |
                                                         +-------------+

Performance test

  • Random create 30600 files into filesystem

legacy:

         168140.12 msec task-clock                       #    0.647 CPUs utilized
            111367      context-switches                 #  662.346 /sec
             40917      cpu-migrations                   #  243.351 /sec
           3736053      page-faults                      #   22.220 K/sec
      369091680702      cycles                           #    2.195 GHz
      168751830643      instructions                     #    0.46  insn per cycle
       34044524391      branches                         #  202.477 M/sec
         768151711      branch-misses                    #    2.26% of all branches

     259.842753513 seconds time elapsed
      23.000247000 seconds user
     150.380145000 seconds sys

full_name_hash

         167926.13 msec task-clock                       #    0.755 CPUs utilized
            110631      context-switches                 #  658.808 /sec
             43835      cpu-migrations                   #  261.037 /sec
           3858617      page-faults                      #   22.978 K/sec
      392878398961      cycles                           #    2.340 GHz
      207287412692      instructions                     #    0.53  insn per cycle
       42556269864      branches                         #  253.423 M/sec
         840868990      branch-misses                    #    1.98% of all branches

     222.274028604 seconds time elapsed
      20.794966000 seconds user
     151.941876000 seconds sys

  • Random remove 30600 files into filesystem

legacy:

         104332.44 msec task-clock                       #    0.976 CPUs utilized
             56514      context-switches                 #  541.672 /sec
              1174      cpu-migrations                   #   11.252 /sec
           3796962      page-faults                      #   36.393 K/sec
      258293481279      cycles                           #    2.476 GHz
      153853176926      instructions                     #    0.60  insn per cycle
       30434271757      branches                         #  291.705 M/sec
         532967347      branch-misses                    #    1.75% of all branches

     106.921706288 seconds time elapsed
      16.987883000 seconds user
      91.268661000 seconds sys

full_name_hash

          83278.61 msec task-clock                       #    0.967 CPUs utilized
             52431      context-switches                 #  629.585 /sec
              1309      cpu-migrations                   #   15.718 /sec
           3796501      page-faults                      #   45.588 K/sec
      199894058328      cycles                           #    2.400 GHz
      110625460371      instructions                     #    0.55  insn per cycle
       20325767251      branches                         #  244.069 M/sec
         490549944      branch-misses                    #    2.41% of all branches

      86.132655220 seconds time elapsed
      19.180209000 seconds user
      68.476075000 seconds sys
  • Random check (ls -la filename) 30600 files into filesystem
    Use perf stat ls -la to measure the query time for each file and sum up all elapsed times to calculate the total lookup cost.

Legacy :
min time: 0.00171 s
max time: 0.03799 s
avg time: 0.00423332 s
tot time: 129.539510 s

full_name_hash:
min time: 0.00171 s
max time: 0.03588 s
avg time: 0.00305601 s
tot time: 93.514040 s


Summary by cubic

Switched SimpleFS to hash-based directory indexing using a deterministic FNV-1a simplefs_hash to map filenames to extent/block slots for faster create, lookup, and delete. On 30.6k files: create ~15% faster, delete ~19% faster, lookup ~28% faster.

  • New Features

    • Hash-guided placement and lookup in create/link/symlink/rename/unlink with ring-scan fallback; added fast __file_lookup used by lookup/rename.
    • Faster directory traversal: iterate skips fully-scanned extents/blocks using nr_files; unlink/rename free empty directory extents; same-dir rename updates in place, cross-dir rename inserts-then-removes with rollback safety.
  • Refactors

    • Added hash.c (FNV-1a) and CHECK_AND_SET_RING_INDEX; renamed helpers/variables (e.g., simplefs_get_new_ext, src/dest in rename); enforced filename max 254 chars (reserve NUL) and updated README; Makefile includes hash.o.

Written for commit 54807e2. Summary will update on new commits.

@jserv
Copy link
Copy Markdown
Collaborator

jserv commented Nov 10, 2025

How can you determine which hash function is the most suitable?

cubic-dev-ai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.

@RoyWFHuang
Copy link
Copy Markdown
Collaborator Author

How can you determine which hash function is the most suitable?

I’m not sure if "fnv" is the most suitable, but index in SimpleFS is relatively small, using a more complex algorithm might not provide significant benefits. I think fnv is a reasonable balance between simplicity and performance.

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from ca74c03 to 51e0478 Compare November 10, 2025 21:54
Copy link
Copy Markdown
Contributor

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ignored many of my comments without making any changes or providing any replies. You still retained many irrelevant changes, making the review difficult. Additionally, a single-line commit message saying only "optimize the file search process" is way too vague. Please improve the git commit message.

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from 0519d61 to 864a9a1 Compare November 13, 2025 21:21
@RoyWFHuang
Copy link
Copy Markdown
Collaborator Author

I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.

Added all hash results into the commit.

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from 56c0522 to 0176a4b Compare November 14, 2025 17:38
Copy link
Copy Markdown
Contributor

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoted from patch 2:

Align the print function with the Simplefs print format for consistency.
Also adjust variable declarations to fix compiler warnings when building
under the C90 standard.

I'm unsure which Linux kernel versions simplefs currently intends to support, but AFAIK, the Linux kernel currently uses gnu c11 as its standard.

Furthermore, the word "Also" is often a sign that the change should be in a separate patch. In my view, you are performing two distinct actions here:

a) Changing printk -> pr_err.
b) Fixing a compiler warning.

I also remain confused as to whether the printk to pr_err change is truly warranted, and what relevance it has to the PR's title, which is "Use hash func to boost file creation and lookup".

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 28, 2025
cubic-dev-ai[bot]

This comment was marked as resolved.

@RoyWFHuang

This comment was marked as resolved.

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from acfa00f to 19c1b8e Compare February 3, 2026 21:46
@jserv
Copy link
Copy Markdown
Collaborator

jserv commented Feb 4, 2026

Worst case is still O(n) linear search when hash collisions cluster. No evidence hash quality was tested. What's the distribution?

  hash_code = simplefs_hash(dentry) % (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
  ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
  bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

The above is good for deterministic FNV-1a hash, proper modulo distribution. It uses 64-bit hash but returns 32-bit (truncates), no analysis of collision rates.

inode.c Outdated
Comment on lines +933 to +934
strncpy(dblock->files[fi].filename, dest_dentry->d_name.name,
SIMPLEFS_FILENAME_LEN);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File becomes unfindable after rename because:

  • Old entry stays at hash(old_name) location
  • Future lookups calculate hash(new_name) and search wrong bucket
  • File effectively vanishes from namespace

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from 066a066 to 3d8fd72 Compare April 3, 2026 12:09
@RoyWFHuang
Copy link
Copy Markdown
Collaborator Author

Worst case is still O(n) linear search when hash collisions cluster. No evidence hash quality was tested. What's the distribution?

  hash_code = simplefs_hash(dentry) % (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
  ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
  bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

The above is good for deterministic FNV-1a hash, proper modulo distribution. It uses 64-bit hash but returns 32-bit (truncates), no analysis of collision rates.

The current structure relies on 2,040 buckets, each handling 15 items. We are aware of the $O(n)$ risk when clusters occur.I've performed some benchmarks on the FNV-1a hash distribution within this range.

Please see the attached graphics for the actual collision rates and distribution stats. We are monitoring the impact of the 32-bit truncation to ensure hash quality

case 1 random file name (200000 files)
random alphabet set from 4 ~ 12 characters for the file name

A one-shot statistical analysis of 200,000 common file names.
count_dist_random_fold

Analysis of the first collision occurrence (averaged over 200 runs).
random_analysis_fold_200_collision

Frequency of collisions occurring when a bucket reaches its maximum capacity of 15 items (averaged over 200 runs).
random_analysis_fold_200_full

case 2 common file name for human used (200000 files)
use
prefixes = ["IMG", "DSC", "Document", "Scan", "Final", "Draft", "Backup", "Project", "Meeting_Notes", "Invoice"]
topics = ["Marketing", "Finance", "Trip", "Family", "Work", "Old_Files", "Report", "Presentation", "Photo", "Data"]
extensions = [".jpg", ".pdf", ".docx", ".xlsx", ".png", ".txt", ".mp4", ".zip"]
to create file name

A one-shot statistical analysis of 200,000 common file names.
count_dist_common_fold

Analysis of the first collision occurrence (averaged over 200 runs).
common_analysis_fold_200_collision

Frequency of collisions occurring when a bucket reaches its maximum capacity of 15 items (averaged over 200 runs)
common_analysis_fold_200_full

case 3 random file name (2040 * 15 files), maximum file capacity based on simplefs settings ($2040 \times 15$ files).

Each collision requires a linear probe (incrementing the index) to find an available slot.
probing_single_0_28000

probing_single_28000_30200 probing_single_30200_30600

Based on these statistics, we can observe that the collision rate remains low for the first 20,000 files (approximately 2/3 of the total capacity).

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Apr 3, 2026
cubic-dev-ai[bot]

This comment was marked as resolved.

jserv

This comment was marked as resolved.

@RoyWFHuang RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from f3a1971 to 7323d62 Compare April 3, 2026 17:08
@RoyWFHuang RoyWFHuang requested a review from jserv April 5, 2026 10:08
Copy link
Copy Markdown
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash-based directory indexing direction is right and the benchmark numbers are real, but the orphan paths and unbounded ring loops are merge blockers. Findings posted as inline comments below.

_ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
_bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
int nr_ei_files = eblock->nr_files;
for (idx_ei = 0; nr_ei_files; _ei++, idx_ei++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — unbounded ring loop on corrupt metadata

for (idx_ei = 0; nr_ei_files; _ei++, idx_ei++) {
    CHECK_AND_SET_RING_INDEX(_ei, SIMPLEFS_MAX_EXTENTS);
    if (!eblock->extents[_ei].ee_start)
        continue;          /* nr_ei_files unchanged: spin forever */
    nr_ei_files -= eblock->extents[_ei].nr_files;
    ...
}

On a directory whose eblock->nr_files != 0 but all extents[].ee_start == 0 (corruption), this never terminates — kernel hang in process context. Never trust on-disk metadata as a sole loop bound.

Fix: add idx_ei < SIMPLEFS_MAX_EXTENTS to the loop condition. Validate extents[_ei].nr_files <= nr_ei_files before subtracting, return -EFSCORRUPTED on inconsistency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our design, eblock->nr_files records the total number of files within a simplefs_file_ei_block. This value is expected to be identical to the sum of nr_files across all extent blocks (eblock->extents[_ei].nr_files). If these values do not match, it would indicate a bug in our file creation or management logic.

However, I acknowledge the importance of defensive programming when dealing with on-disk metadata to prevent potential kernel hangs. I will create a dedicated issue to improve these validation checks and ensure the file system's robustness against metadata corruption.

nr_ei_files -= eblock->extents[_ei].nr_files;
/* Iterate blocks in extent */
int nr_bi_files = eblock->extents[_ei].nr_files;
for (idx_bi = 0; nr_bi_files; _bi++, idx_bi++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — same unbounded-loop pattern at the inner block level

int nr_bi_files = eblock->extents[_ei].nr_files;
for (idx_bi = 0; nr_bi_files; _bi++, idx_bi++) {
    ...
    nr_bi_files -= dblock->nr_files;   /* signed int, can go negative */
}

A corrupt dblock->nr_files == 0 keeps nr_bi_files constant; an oversized value drives it negative, which is still truthy. Kernel hang.

Fix: idx_bi < eblock->extents[_ei].ee_len; reject dblock->nr_files > SIMPLEFS_FILES_PER_BLOCK and return -EFSCORRUPTED.

Copy link
Copy Markdown
Collaborator Author

@RoyWFHuang RoyWFHuang Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

for (; dir_nr_files; ei++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — simplefs_remove_from_dir repeats the unbounded-ring pattern

for (; dir_nr_files; ei++) {
    CHECK_AND_SET_RING_INDEX(ei, SIMPLEFS_MAX_EXTENTS);
    if (eblock->extents[ei].ee_start) { ... }
    bi = 0;
}

When extents[ei].ee_start == 0, the body is skipped and dir_nr_files is never decremented. Same hang risk as __file_lookup.

Fix: bound the outer scan with a separate counter < SIMPLEFS_MAX_EXTENTS and validate counters before subtracting.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

brelse(bh2);
return 0;
}
_fi += dblock->files[_fi].nr_blk;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — unchecked nr_blk in run-length walk

_fi += dblock->files[_fi].nr_blk;

On-disk nr_blk == 0 → infinite loop. Oversized nr_blk → next iteration indexes past the block. Same code pattern exists in simplefs_try_remove_entry line 493.

Fix: validate nr_blk >= 1 && _fi + nr_blk <= SIMPLEFS_FILES_PER_BLOCK before stepping; return -EFSCORRUPTED on violation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the concerns about nr_blk == 0, potential infinite loops, and metadata validation (including line 493 in simplefs_try_remove_entry):

As I mentioned previously:
"I believe this situation should not occur under normal conditions, as nr_blk (or nr_files) should always be at least 1. If we encounter a 0, it likely indicates a bug in the block merging or splitting logic."

While I understand the concern regarding out-of-bounds indexing or infinite loops in the case of disk corruption, I plan to implement these defensive programming measures and error handling as part of a separate, dedicated metadata optimization task.

This current PR is strictly focused on the core hash-based indexing improvements and the resulting performance gains. To maintain a clean and focused commit history, I believe it is better to handle these robustness enhancements in a subsequent PR, ensuring the code review process remains efficient for both distinct functional changes.

}
blk_nr_files--;
}
fi += dblock->files[fi].nr_blk;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — same unchecked nr_blk walk as __file_lookup

fi += dblock->files[fi].nr_blk;

Mirror of the bug at line 178. Validate nr_blk >= 1 && fi + nr_blk <= SIMPLEFS_FILES_PER_BLOCK.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above
#79 (comment)

if (chk < 0)
return ERR_PTR(chk); /* I/O error */

bh = sb_bread(sb, ci_dir->ei_block);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — redundant double-read defeats the PR's perf goal

chk = __file_lookup(dir, dentry, &ei, &bi, &fi);
...
bh  = sb_bread(sb, ci_dir->ei_block);              /* re-read ei_block */
bh2 = sb_bread(sb, eblock->extents[ei].ee_start + bi); /* re-read dir block */
inode = simplefs_iget(sb, dblock->files[fi].inode);    /* needed only the ino_t */

__file_lookup already had the inode number. Two extra sb_getblk round-trips per successful lookup, on the exact path the PR cover letter is selling as faster. The buffer cache makes the I/O cheap but not free — and this is the hottest path in the filesystem.

Fix: have __file_lookup return the inode number via an out-pointer; drop the second sb_bread pair entirely. simplefs_lookup then collapses to roughly:

err = __file_lookup(dir, dentry, &ino);
switch (err) {
case 0:        inode = simplefs_iget(sb, ino); break;
case -ENOENT:  inode = NULL; break;
default:       return ERR_PTR(err);
}
return d_splice_alias(inode, dentry);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as #79 (comment)

refactoring __file_lookup involves a larger scope and will be handled in a dedicated PR.

Introduce a hash-based mechanism to speed up file creation and lookup
operations. The hash function enables faster access to extent and logical
block extent index, improving overall filesystem performance.

hash_code = file_hash(file_name);
extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT
block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

Use perf to measure:
1. File Creation (random)
Legacy:
259.842753513 seconds time elapsed
23.000247000 seconds user
150.380145000 seconds sys

full_name_hash:
222.274028604 seconds time elapsed
20.794966000 seconds user
151.941876000 seconds sys

2. File Listing (random)
Legacy:
min time: 0.00171 s
max time: 0.03799 s
avg time: 0.00423332 s
tot time: 129.539510 s

full_name_hash:
min time: 0.00171 s
max time: 0.03588 s
avg time: 0.00305601 s
tot time: 93.514040 s

3. files Removal (Random)
Legacy:
106.921706288 seconds time elapsed
16.987883000 seconds user
91.268661000 seconds sys

full_name_hash:
86.132655220 seconds time elapsed
19.180209000 seconds user
68.476075000 seconds sys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants