Skip to content

Core: Use ArrayList for manifest list materialization#15640

Open
manuzhang wants to merge 1 commit intoapache:mainfrom
manuzhang:replace-linkedlist
Open

Core: Use ArrayList for manifest list materialization#15640
manuzhang wants to merge 1 commit intoapache:mainfrom
manuzhang:replace-linkedlist

Conversation

@manuzhang
Copy link
Copy Markdown
Member

For lower memory overhead and better locality. It's used for allManifests in BaseSnapshot.

@github-actions github-actions Bot added the core label Mar 16, 2026
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logically makes sense to me, curious do you see any substancial improvement with this ?

@nastra
Copy link
Copy Markdown
Contributor

nastra commented Mar 16, 2026

please run ManifestReadBenchmark and share before/after results

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Apr 16, 2026
@Baunsgaard
Copy link
Copy Markdown
Contributor

Just to help out a bit I ran the benchmark. While there are no practical difference end-to-end since other things dominate, it is a minor but valid improvement.

ManifestReadBenchmark Results

OpenJDK 17.0.18, Ubuntu 22.04, warmup=5, measure=10 iterations.

End-to-end: read manifest list + iterate all data files

Size LinkedList ArrayList Ratio
5 manifests × 100 rows 9.70 ms 9.48 ms 1.02x
50 manifests × 1K rows 316.5 ms 317.8 ms 1.00x
200 manifests × 10K rows 11,827 ms 11,820 ms 1.00x

Materialization only (ManifestLists.read)

Size LinkedList ArrayList Ratio
5 manifests 868 us 841 us 1.03x
50 manifests 864 us 835 us 1.04x
200 manifests 1.14 ms 1.15 ms 0.99x

Extra specific measurements:

Iteration after materialization

Size LinkedList ArrayList Ratio
5 manifests 1.9 us 730 ns 2.60x
50 manifests 12.2 us 6.9 us 1.76x
200 manifests 25.9 us 23.2 us 1.12x

addAll into ArrayList (FastAppend.apply pattern)

Size LinkedList ArrayList Ratio
5 manifests 847 ns 546 ns 1.55x
50 manifests 1.3 us 486 ns 2.59x
200 manifests 3.8 us 523 ns 7.33x

Full pipeline: read + filter + ImmutableList.copyOf

Size LinkedList ArrayList Ratio
5 manifests 409 us 422 us 0.97x
50 manifests 387 us 390 us 0.99x
200 manifests 566 us 581 us 0.97x

@github-actions github-actions Bot removed the stale label Apr 17, 2026
@manuzhang
Copy link
Copy Markdown
Member Author

@Baunsgaard Thanks for help on the benchmark. I have similar results locally. It's up to the community whether this is worth a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants