Skip to content

CLDSRV-863: Checksums for PutObject and UploadPart#6112

Open
leif-scality wants to merge 15 commits intodevelopment/9.4from
improvement/CLDSRV-863-checksums-put-object-part
Open

CLDSRV-863: Checksums for PutObject and UploadPart#6112
leif-scality wants to merge 15 commits intodevelopment/9.4from
improvement/CLDSRV-863-checksums-put-object-part

Conversation

@leif-scality
Copy link
Contributor

No description provided.

@bert-e
Copy link
Contributor

bert-e commented Mar 16, 2026

Hello leif-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Mar 16, 2026

Incorrect fix version

The Fix Version/s in issue CLDSRV-863 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.4.0

Please check the Fix Version/s of CLDSRV-863, or the target
branch of this pull request.

@codecov
Copy link

codecov bot commented Mar 16, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
8443 1 8442 0
View the top 1 failed test(s) by shortest run time
"after each" hook for "should replicate/put metadata to a destination that has a null version"::backbeat routes for replication (same account) "after each" hook for "should replicate/put metadata to a destination that has a null version"
Stack Traces | 0.054s run time
The bucket you tried to delete is not empty.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@claude
Copy link

claude bot commented Mar 16, 2026

LGTM

The checksum pipeline (prepareStream → ChecksumTransform → storeObject validation) is well-structured. The mutable errCb + jsutil.once pattern in storeObject correctly prevents double callbacks across the finish/error race. The defensive err.is && guards in services.js fix a real crash path. Arsenal pinned to tag 8.3.6. Tests cover the key edge cases (chunk boundary splits, trailer parsing, double-callback prevention, batchDelete failures).

Review by Claude Code

_flush(callback) {
Promise.resolve(this.algo.digestFromHash(this.hash))
.then(digest => { this.digest = digest; })
.then(() => callback(), err => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using catch for error handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude complains about double calling callback, once in the then and again in the catch if the first callback throws

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also use only 1 then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return callback();
} else if (!this.streamClosed && this.readingTrailer && this.trailerBuffer.length !== 0) {
this.log.error('stream ended without trailer "\r\n"');
return callback(errors.IncompleteBody.customizeDescription(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use errorInstances when using customizeDescription to avoid instantiating twice an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

crc64nvme: 'AAAAAAAAAAA=', // 12 chars
};

it('no headers: returns crc64nvme with isTrailer=false and expected=undefined', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer starting tests with should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

error: ChecksumError.XAmzMismatch,
details: { algorithm: 'crc32', calculated: 'a', expected: 'b' },
});
assert(result.is.BadDigest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that is helper is tests because if test fails you compare booleans (false != true) instead of names (InternalError != BadDigest), so you don't have much details.

Can we compre on error name with strings ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@claude
Copy link

claude bot commented Mar 19, 2026

LGTM — well-structured checksum pipeline with good error handling and thorough tests.

- trailingChecksumTransform.js:119: Inconsistent error construction — uses errors.IncompleteBody.customizeDescription(...) inline while the identical error is already a module-level constant (incompleteBodyError on line 5, built from errorInstances). Should reuse the constant for consistency.

Review by Claude Code

return null;
}

if (this.trailerChecksumValue) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateChecksum() uses a truthy check to detect unexpected trailers on line 56. If setExpectedChecksum is called with an empty string value (e.g. from a malformed trailer like x-amz-checksum-crc32: with nothing after the colon), the check evaluates to false and the unexpected trailer goes undetected. Consider using this.trailerChecksumValue !== undefined for consistency with line 27. — Claude Code

@claude
Copy link

claude bot commented Mar 19, 2026

  • ChecksumTransform.validateChecksum() line 56: truthy check on this.trailerChecksumValue can miss an unexpected trailer when the value is an empty string. Use !== undefined instead.

    Review by Claude Code

const dataStream = stripTrailingChecksumStream(dataStreamTmp, log, cbOnce);
return data.put(
cipherBundle, dataStream, size, objectContext, backendInfo, log,
cipherBundle, checksumedStream.stream, size, objectContext, backendInfo, log,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storeObject.js passes checksumedStream.stream (a ChecksumTransform) to data.put, but ChecksumTransform never has .headers set on it. In the old code, prepareStream always returned a stream with .headers = stream.headers. If any data backend or middleware reads .headers from the stream passed to data.put, this will be undefined.

I checked lib/data/wrapper.js and it does not reference .headers, but worth verifying against all backend implementations (sproxyd, azure, gcp, etc.) that none rely on it.

— Claude Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.headers: checked — no backend in lib/data/ or Arsenal reads .headers off the stream passed to data.put.

if (checksumedStream.stream.writableFinished) {
return doValidate();
}
checksumedStream.stream.once('finish', doValidate);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race: if ChecksumTransform errors during _flush (e.g. crc64nvme digest fails), the error goes through errCb which calls cbOnce. But if writableFinished was false, the doValidate listener is also registered on 'finish'. Since _flush called callback(error), the stream will not emit 'finish' so the listener is harmless — but the 'finish' and error paths are independent, and both can potentially reach cbOnce. The jsutil.once(cb) guard prevents double-callback, but the upgraded onStreamError (line 88-96) could trigger batchDelete even though doValidate also triggers batchDelete on checksum failure. Consider whether both paths could fire in an edge case.

— Claude Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-delete: not a risk — finish and error are mutually exclusive in the Node.js stream lifecycle, so both handlers can't fire for the same stream.

@claude
Copy link

claude bot commented Mar 19, 2026

Well-structured PR that adds checksum validation for PutObject and UploadPart across all streaming modes. The ChecksumTransform, trailer parsing, and error mapping are clean. Two items to verify:

- ChecksumTransform (returned by prepareStream) does not have .headers set, unlike the old code which always set .headers on the returned stream. Verify no data backend reads .headers from the stream passed to data.put.
- In storeObject.js, the finish-wait path (line 121) and the upgraded onStreamError (line 88) are both registered after data.put succeeds. If both could fire (unlikely but worth reasoning about), batchDelete could be called twice for the same key. The jsutil.once(cb) prevents double-callback but not double-delete.

Review by Claude Code

// urlFn() is called lazily at test runtime so that uploadId is available.
function makeScenarioTests(urlFn) {
itSkipIfAWS(
'testS3PutNoChecksum: signed sha256 in x-amz-content-sha256, no x-amz-checksum header -> 200 OK',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, prefer starting tests with should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

};

describe('prepareStream return value shape', () => {
it('returns { error: null, stream: ChecksumTransform } for UNSIGNED-PAYLOAD', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here prefer starting test with should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

});
});

describe('prepareStream STREAMING-AWS4-HMAC-SHA256-PAYLOAD', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer one global prepareStream describe, and then other describes inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@claude
Copy link

claude bot commented Mar 19, 2026

LGTM

The stream pipeline design is solid: ChecksumTransform always wraps the data path, the mutable errCb upgrade pattern in storeObject.js correctly handles cleanup after data.put, and jsutil.once guards prevent double callbacks across piped streams. Arsenal is correctly pinned to tag 8.3.6. The defensive err.is check in services.js is a good drive-by fix. Test coverage is comprehensive across all protocol modes and edge cases.

Review by Claude Code

}

describe('normal behaviour', () => {
it('calls data.put with the stream returned by prepareStream', done => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start with should

});

for (const algo of algos) {
it(`[${algo}] passes data through unchanged`, async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start with should

});
});

describe('ChecksumTransform validateChecksum — non-trailer mode (isTrailer=false)', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using one global describe and then others inside

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants