Add overwrite and delete children features with retries#40
Add overwrite and delete children features with retries#40leonelramirez wants to merge 8 commits intoimport-featuresfrom
Conversation
graykr
left a comment
There was a problem hiding this comment.
Thanks for all your work on this Leo! Functionally, it seems to be working well.
There are some things coming up with the combinations of the various flags that I spotted, though:
- I put in a suggestion on the help text for the "overwrite children" option, since this will also delete children even if you are validating the csvs rather than importing them. I think that works as a behavior (having the option to see the validation results for the finding aid that is empty seems most useful if you want to overwrite the children during import later anyway), but is not quite intuitive (someone not familiar with how the code works might thing that is it just running the validation as if the children would be overwritten even though that is not possible, practically speaking).
- When I went to just delete the children and didn't specify a max retries variable it gave me a message of "Maximum retries reached. Exiting." which seemed to suggest an error, but it was just because it wasn't going to retry anything anyway
I also think it would be good to log how many children are deleted when we use the overwrite option, just so we can track that it happened. I think we'd probably need to add another column for this.
In relation to that, I did strangely encounter a situation where I ran a validation with the overwrite children flag and then ran the import also with the overwrite children flag. When I ran the import, the code somehow still found a finding aid with children remaining to delete. (And only noticed because I happened to see it in the output running by.) I'm not sure how this could be possible unless someone else was uploading at the same time as me. But it would be good to log when children are deleted so we don't have to watch the output stream by to notice something like this happening (I'm also now wondering if I mistakenly saw it).
Co-authored-by: graykr <graykr@users.noreply.github.com>
This pull request introduces new functionality to the
arcflow/utils/bulk_import.pyscript, allowing users to delete or overwrite existing archival object children during CSV bulk imports. It also adds support for retrying failed imports, and enhances command-line options for greater flexibility.Key changes
delete_archival_objectanddelete_childrenfunctions to enable deletion of archival object children using the ASnake API, with parallel execution for efficiency.csv_bulk_importto support two new modes: overwriting existing children before import, and deleting children without performing an import, controlled byoverwrite_childrenandonly_delete_childrenflags.save_reportfunction to return the path to the generated report text file, and updated the main workflow to use this for retries.Command-Line Interface Improvements
--overwrite-children,--only-delete-children, and--max-retriesto control child deletion behavior and configure retry limits for jobs that fail due to database locks.These changes make the bulk import process more robust and flexible, especially when dealing with resources that have existing children in ArchivesSpace.