Skip to content

Add overwrite and delete children features with retries#40

Open
leonelramirez wants to merge 8 commits intoimport-featuresfrom
overwrite-feature
Open

Add overwrite and delete children features with retries#40
leonelramirez wants to merge 8 commits intoimport-featuresfrom
overwrite-feature

Conversation

@leonelramirez
Copy link
Copy Markdown
Contributor

This pull request introduces new functionality to the arcflow/utils/bulk_import.py script, allowing users to delete or overwrite existing archival object children during CSV bulk imports. It also adds support for retrying failed imports, and enhances command-line options for greater flexibility.

Key changes

  • Added delete_archival_object and delete_children functions to enable deletion of archival object children using the ASnake API, with parallel execution for efficiency.
  • Modified csv_bulk_import to support two new modes: overwriting existing children before import, and deleting children without performing an import, controlled by overwrite_children and only_delete_children flags.
  • Integrated the new deletion logic into the import workflow, ensuring that import proceeds only if child deletion is successful.
  • Added support for retrying failed imports using a report file, with logic to process only entries that previously failed.
  • Improved the save_report function to return the path to the generated report text file, and updated the main workflow to use this for retries.

Command-Line Interface Improvements

  • Introduced new command-line arguments: --overwrite-children, --only-delete-children, and --max-retries to control child deletion behavior and configure retry limits for jobs that fail due to database locks.

These changes make the bulk import process more robust and flexible, especially when dealing with resources that have existing children in ArchivesSpace.

@leonelramirez leonelramirez requested a review from graykr March 31, 2026 17:37
Copy link
Copy Markdown
Contributor

@graykr graykr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this Leo! Functionally, it seems to be working well.

There are some things coming up with the combinations of the various flags that I spotted, though:

  • I put in a suggestion on the help text for the "overwrite children" option, since this will also delete children even if you are validating the csvs rather than importing them. I think that works as a behavior (having the option to see the validation results for the finding aid that is empty seems most useful if you want to overwrite the children during import later anyway), but is not quite intuitive (someone not familiar with how the code works might thing that is it just running the validation as if the children would be overwritten even though that is not possible, practically speaking).
  • When I went to just delete the children and didn't specify a max retries variable it gave me a message of "Maximum retries reached. Exiting." which seemed to suggest an error, but it was just because it wasn't going to retry anything anyway

I also think it would be good to log how many children are deleted when we use the overwrite option, just so we can track that it happened. I think we'd probably need to add another column for this.

In relation to that, I did strangely encounter a situation where I ran a validation with the overwrite children flag and then ran the import also with the overwrite children flag. When I ran the import, the code somehow still found a finding aid with children remaining to delete. (And only noticed because I happened to see it in the output running by.) I'm not sure how this could be possible unless someone else was uploading at the same time as me. But it would be good to log when children are deleted so we don't have to watch the output stream by to notice something like this happening (I'm also now wondering if I mistakenly saw it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants