aws: support multipart copy for objects larger than 5GB#561
aws: support multipart copy for objects larger than 5GB#561james-rms wants to merge 9 commits intoapache:mainfrom
Conversation
4719ef4 to
09cef9b
Compare
09cef9b to
acc8cc4
Compare
|
I think this probably warrants a higher level ticket to discuss how we should support this, as a start it would be good to understand how other stores, i.e. GCS and Azure handle this, so that we can develop an abstraction that makes sense. In particular I wonder if adding this functionality would make more sense as part of the multipart upload functionality? This of course depends on what other stores support. In general filing an issue first to get consensus on an approach is a good idea before jumping in on an implementation |
|
Great, created #563. |
|
@tustvold updated to pass CI and a small tweak to avoid overflow. |
There was a problem hiding this comment.
Thanks @james-rms and @tustvold -- the high level idea seems reasonable to me, but I think this code needs tests (maybe unit tests?) or something otherwise we may break the functionality inadvertently in some future refactor
a8d535c to
1fdf7d6
Compare
|
@tustvold I've refactored slightly for unit tests and added a couple of integration tests as well. Please take another look when you have some time. |
| mode, | ||
| extensions: _, | ||
| } = options; | ||
| // Determine source size to decide between single CopyObject and multipart copy |
There was a problem hiding this comment.
Is there some way we can avoid this, e.g. we try CopyObject normally and on error fallback to multipart? Otherwise this adds an additional S3 roundtrip to every copy request.
tustvold
left a comment
There was a problem hiding this comment.
Looks good, and well tested, sorry for the delay in reviewing, been absolutely swamped.
I think we need to find a way to avoid regressing the common case of files smaller than 5GB, e.g. by first attempting CopyObject and then falling back if it errors (I am presuming S3 gives a sensible error here).
This is what we get back from S3: As explained by AWS: So the real question is if we make a CopyObject call, can we assume that any InvalidRequest that comes back is because the object was >5GB in size. Given the documentation I think that's OK? but i'm not really clear that it will remain OK going forward. I'll push a commit that does this and let you decide on the approach. |
|
I guess another option would be to make this disabled by default and therefore opt-in... How do other clients handle this? |
|
Go S3 SDK leaves this up to the caller to figure out: https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3#Client.CopyObject Go Cloud SDK seems to not support copies >5GB, because copy calls are forwarded directly to the CopyObject API, and it doesn't expose a multipart copy method: https://github.com/google/go-cloud/blob/a52bb6614a70209265758ad7a795a4a3931fbe0b/blob/s3blob/s3blob.go#L856 |
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
|
@tustvold i've implemented this as disabled by default (>5GB copies will just fail) and opt-in. By default there is no pre-copy head request. |
|
@tustvold any more on this one? |
|
I will try to find some time to take a look at the weekend, I'm afraid I am currently absolutely swamped at work |
|
No worries, and thanks for putting up with my regular pings on this one! |
|
@crepererum and I were talking about this today and maybe he will have some time to review as well |
| .await?; | ||
|
|
||
| let mut parts = Vec::new(); | ||
| for (idx, payload) in self |
There was a problem hiding this comment.
Technically we could do perform these requests concurrently (at least up to a certain amount), but I think the serial implementation is "good enough" for now. It's certainly better than the status quo (i.e. not being able to copy the object at all).
| /// Threshold (bytes) above which copy uses multipart copy. If not set, all copies are performed | ||
| /// as single requests. | ||
| multipart_copy_threshold: Option<ConfigValue<u64>>, | ||
| /// Preferred multipart copy part size (bytes). If not set, defaults to 5 GiB. |
There was a problem hiding this comment.
That's a question I have but that I also think should be included in the doc-string: if the object was created using a multi-part upload, are there any requirements on the alignment of the copy-parts or is this irrelevant?
There was a problem hiding this comment.
It's not explicitly documented (though there is a note around range requests). There's every chance it's faster to line up the part boundaries exactly, but I haven't tested this directly.
I hadn't thought about it and this bothers me. It doesn't feel good to be opaquely putting part boundaries into a copied object that the user has no visibility into. This also points to the right API being one where the user creates parts directly and keeps track of the boundaries.
| self.copy_multipart(from, to, size, CompleteMultipartMode::Overwrite) | ||
| .await?; |
There was a problem hiding this comment.
| self.copy_multipart(from, to, size, CompleteMultipartMode::Overwrite) | |
| .await?; | |
| tracing::debug!(%from, %to, "multipart copy"); | |
| self.copy_multipart(from, to, size, CompleteMultipartMode::Overwrite) | |
| .await?; |
(you may need to check if Path implements Display or not)
Reasoning: we are changing the client behavior dynamically based on data size and if a server misbehaves due to that, it would be nice if we would at least leave some breadcrumbs.
There was a problem hiding this comment.
Seems reasonable, i'll take a look.
Which issue does this PR close?
Rationale for this change
Today, users that attempt to copy a >5GB object in S3 using
object_storewill see this error:The way to get around this problem per AWS's docs is to do the copy in several parts using multipart copies. This PR adds that functionality to the AWS client.
It adds two additional configuration parameters:
The defaults are chosen to minimise surprise: if people are used to copies not requiring several requests, we don't switch to that method until it's absolutely necessary, and when necessary, we use as few parts as possible.
What changes are included in this PR?
See above.
Are there any user-facing changes?
Yes - these configuration parameters should be covered by the docstring changes.