Skip to content

extract: scan way nodes once across all extracts in complete_ways pass 1#314

Open
yuiseki wants to merge 1 commit into
osmcode:masterfrom
yuiseki:feat/extract-complete-ways-eway-all
Open

extract: scan way nodes once across all extracts in complete_ways pass 1#314
yuiseki wants to merge 1 commit into
osmcode:masterfrom
yuiseki:feat/extract-complete-ways-eway-all

Conversation

@yuiseki
Copy link
Copy Markdown

@yuiseki yuiseki commented Apr 29, 2026

See #312.

In Pass 1 of --strategy=complete_ways, eway() is called independently for
each extract on every way. This means way.nodes() is scanned up to twice per
extract, giving up to 2N scans total for N extracts.

This adds an eway_all() override in strategy_complete_ways that scans
way.nodes() at most twice regardless of N. A uint64_t bitmask tracks which
extracts have claimed the way in the first pass; the second pass then records
all node refs into extra_node_ids for matched extracts only.

When there are more than 64 extracts the method falls back to the original
per-extract eway() loop.

Benchmark

japan-260423.osm.pbf (~1 GB), 8-tile extraction:

Version Elapsed vs baseline
upstream 1m 29s
this PR 1m 25s -4%

planet-260413.osm.pbf (~86 GB), 16-tile extraction:

Version Elapsed vs baseline
upstream 51m 36s
this PR 49m 27s -4%

Output verified to be identical to the upstream result for all tiles.

@joto
Copy link
Copy Markdown
Member

joto commented May 10, 2026

I understand why this optimization helps for longer ways, going over 2000 way nodes multiple times adds memory accesses which might be outside the memory cache in the CPU. But I wonder whether this is also the case for ways with only a few nodes. Could you try measuring the impact if the old algorithm is used for ways with a small number of nodes?

Copy link
Copy Markdown
Member

@joto joto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the variable name and the other comment about short ways this looks good.

Comment thread src/extract/strategy.hpp
// Default implementation: call eway() for each extract separately.
// Subclasses may override this to process all extracts in a single
// pass over way.nodes().
void eway_all(std::vector<extract_data>& exts, const osmium::Way& way) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exts -> extracts. No reason to be too stingy with characters in variable names.

// eway() loop when there are more than 64 extracts.
// Pass A finds which extracts claim this way.
// Pass B records all node refs into extra_node_ids for matched extracts.
void eway_all(std::vector<extract_data>& exts, const osmium::Way& way) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants