Skip to content

Adding overwrite option to ParticleFile API#2655

Open
erikvansebille wants to merge 7 commits into
Parcels-code:mainfrom
erikvansebille:particlefile_append_overwrite
Open

Adding overwrite option to ParticleFile API#2655
erikvansebille wants to merge 7 commits into
Parcels-code:mainfrom
erikvansebille:particlefile_append_overwrite

Conversation

@erikvansebille

@erikvansebille erikvansebille commented Jun 4, 2026

Copy link
Copy Markdown
Member

Description

This PR adds an option to ParticleFile.__init__ to control when the file already exists: either raise an error (default), or overwrite the exisiting file.

Checklist

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.
    • Describe how you used it (e.g., by pasting your prompt):
      I used Claude code to help with the implementation of the append option in particlefile.write()

Comment thread src/parcels/_core/particlefile.py Outdated
Comment on lines +64 to +68
if_exists : {"error", "overwrite", "append"}, optional
Behavior when the output file already exists.
- "error" (default): raise a ValueError.
- "overwrite": remove the existing file before writing.
- "append": preserve existing rows and append new rows.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick closer to convention here

Suggested change
if_exists : {"error", "overwrite", "append"}, optional
Behavior when the output file already exists.
- "error" (default): raise a ValueError.
- "overwrite": remove the existing file before writing.
- "append": preserve existing rows and append new rows.
mode : {"w", "a", None}, optional
Writing behaviour.
- None (default): Write dataset, and raise an error if it already exists.
- "w": Write dataset, overwriting it.
- "a": Append to dataset.

also rename ._if_exists to ._mode

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, address in aafde6a

Comment thread src/parcels/_core/particlefile.py Outdated
Comment on lines +165 to +180
self._tmp_path = self.path.with_name(f"{self.path.stem}.append_tmp{self.path.suffix}")
if self._tmp_path.exists():
self._tmp_path.unlink()

self._writer = pq.ParquetWriter(self._tmp_path, existing_schema, compression=self._compression)

# Parquet can't directly append, so we need to rewrite the existing data along with the new data.
for batch in existing_file.iter_batches():
self._writer.write_table(pa.Table.from_batches([batch], schema=existing_schema))
else:
assert not self.path.exists(), "If the file exists, the writer should already be set"
self._writer = pq.ParquetWriter(
self.path,
schema,
compression=self._compression,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking a step back here - why do we need an append mode for the ParticleFile? Could users easily just create multiple particlefiles and join them into one after the fact?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think especially since the file format of Parquet doesn't support this, and calling "append" would require rewriting the current data that we have, is maybe an indication that we either shouldn't have append or should consider something else.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right; it doesn't make sense to add an "append" option if we are copying the data internally anyways. I now removed it in aafde6a

The only thing users will need to be aware of, is that the last record of the first file is the same as the first second of the second file when concatenating, so they will probably need to filter these duplicate records out. Is that something we should add to the tutorial?

Comment thread src/parcels/_core/particlefile.py Outdated
raise ValueError(f"outputdt must be positive/non-zero. Got {outputdt=!r}")

self._outputdt = outputdt
self._mode = mode

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we don't use mode outside of this function anymore, do we need to make it an object attribute? What is best practice in such cases?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 27cc62f

@erikvansebille erikvansebille changed the title Adding append and overwrite options to ParticleFile API Adding overwrite option to ParticleFile API Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants