Skip to content

Skip per-document stored field reads on sorted indices when no stored fields are present#15718

Merged
romseygeek merged 4 commits intoapache:mainfrom
fcofdez:skip-stored-fields-reads
Feb 20, 2026
Merged

Skip per-document stored field reads on sorted indices when no stored fields are present#15718
romseygeek merged 4 commits intoapache:mainfrom
fcofdez:skip-stored-fields-reads

Conversation

@fcofdez
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez commented Feb 18, 2026

Avoid unnecessary seeks and reads on the stored fields file during
flush in SortingStoredFieldsConsumer for segments that do not contain
stored fields.

Closes #15713

… fields are present

Avoid unnecessary seeks and reads on the stored fields file during
flush in SortingStoredFieldsConsumer for segments that do not contain
stored fields.

Closes apache#15713
@msfroh
Copy link
Copy Markdown
Contributor

msfroh commented Feb 19, 2026

I'm wondering if you could just skip the flush logic altogether if there are no stored fields in the segment? Then you might be able to avoid the startDocument() and finishDocument() calls for each doc.

Though looking at the behavior in IndexingChain that always calls startStoredFields, and the logic in StoredFieldsConsumer, it looks like we do require that startDocument() and finishDocument() get called for each doc. That's a bummer.

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Feb 19, 2026

Though looking at the behavior in IndexingChain that always calls startStoredFields, and the logic in StoredFieldsConsumer, it looks like we do require that startDocument() and finishDocument() get called for each doc. That's a bummer.

Yes, I think that's a contract that we have to keep. Alternatively we could use a different approach and just write a flag in the stored fields metadata where we track that there are no stored fields, but we can still respond calls to the #document methods. But that seems like a deeper change than this and the remaining overhead should be low compared to all the seeks + reads that we do now.

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have one question about the test. Also, can you add a CHANGES entry in the 10.5 section?

new FilterDirectory(dir) {
@Override
public IndexInput openInput(String name, IOContext context) throws IOException {
if (name.contains(Lucene90CompressingStoredFieldsWriter.FIELDS_EXTENSION)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test write to any files other than the StoredFields file? I worry that relying on the filename might be a bit fragile.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also writes into a metadata file and into the fields index files. I understand you concern, but I couldn't find another way of asserting that we don't read n times, do you have other ideas?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal, but we're not doing Codec randomization here so I think we should be OK.

@github-actions github-actions Bot added this to the 10.5.0 milestone Feb 19, 2026
@fcofdez fcofdez requested a review from romseygeek February 19, 2026 15:03
super.writeField(info, value);
hasStoredFields = true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization only works if all docs in the segment have no stored fields, right? Which I think is probably fine, as that's a fairly common case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right 👍

@romseygeek romseygeek merged commit 91af8ee into apache:main Feb 20, 2026
13 checks passed
romseygeek pushed a commit that referenced this pull request Feb 20, 2026
… fields are present (#15718)

Avoid unnecessary seeks and reads on the stored fields file during
flush in SortingStoredFieldsConsumer for segments that do not contain
stored fields.

Closes #15713
@romseygeek
Copy link
Copy Markdown
Contributor

Thanks @fcofdez!

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Feb 20, 2026

Thanks for the review @romseygeek!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StoredFields flush overhead with index sorting when no fields are stored

3 participants