Skip to content

Conversation

@benweissmann
Copy link
Contributor

@benweissmann benweissmann commented Mar 17, 2025

See more context in discussion from #619.

The advantage of using a random boundary by default are:

  • It does not represent a semantically breaking change with the currently-released version (unlike switching from ThreadLocalRandom to a fixed boundary by default, like the current code does), as noted here.
  • It's in line with the behavior recommended by RFC 2046, as discussed here
  • It's in line with most other MultipartEntity libraries so will minimize surprises for users who have used multipart encoding in other languages or libraries, and doesn't require us to reason about the operational or security considerations of a novel or uncommon boundary selection scheme
  • It does not risk mis-encoding content that happens to contain the hardcoded boundary, which could happen for nested bodies (while this library does not provide affordances for constructing such bodies, there's no prohibition around passing a body constructed by this library as a part of another body, and there could easily be in-the-wild code that does so), or for content that happens to be discussing the hard-coded boundary value itself (consider, for example, a Java-based Github client, and how it would handle posting a multipart/form-encoded comment on this repo discussing the boundary value).
  • It avoids attacks like the one tracked as CVE-2025-22150 where an attacker can trick an application into including unexpected fields via a crafted payload that contains the boundary value.
    • Of note, the current default behavior makes this attack much, much simpler than the one in that CVE, because an attacker does not need to guess the random value from an insecure random number generator; all existing libraries and applications that use httpcomponents-client will start using the new hard-coded default value when they upgrade unless they explicitly opt-in to the new random behavior or were already generating their own boundary values.

    • As a practical, in-the-wild example of this, consider this Modern Treasury Kotlin library function, which uses uses our MultipartFormBuilder API to hit the Create Document api endpoint.

    • Consider the most common likely usage of this function:

      • A Kotlin web application allows users to upload documents, like I-9 forms or other financial or compliance paperwork

      • The server-side component of this application receives the document from an authenticated user, and then uses this API method to attach the user-provided document to corresponding Modern Treasury entity. For example, it might take an employee's I-9 (employment verification form) and attach it to that employee's bank account record in a payroll system.

      • A malicious user could upload a PDF that ends with

        <regular PDF data>
        --httpclient_boundary_7k9p2m4x8n5j3q6t1r0vwyzabcdefghi
        Content-Disposition: form-data; name="documentable_id"
        Content-Type: text/plain
        
        some-other-account-id
        
    • Now, instead of the server-side component of this application sending the user-provided PDF and the server-provided "documentable_id", it would be tricked into sending an attacker-controlled documentable_id, which would associate the financial or compliance paperwork with the wrong account.

While there is disagreement from some maintainers on some of these points, I would contend that any one of these five arguments should be sufficient to use the secure-random UUID default behavior, rather than the currently-implemented fixed-value default (or, for that matter, the currently-released insecure random default).

* deliberate choice.
* </p>
* <p>
* IMPORTANT: it is responsibility of the caller to validate / sanitize content of body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benweissmann Restore this section. It says IMPORTANT for a reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- i had added a similar IMPORTANT note in setBoundary to note the considerations specifically around boundary values, but happy to leave it here as well.

* Generates a random boundary using UUID. The UUID is a v4 random UUID generated from a cryptographically-secure
* random source.
* <p>
* A cryptographically-secure random number source is used to avoid security issues similar to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benweissmann Remove references to this CVE. It has nothing to do with the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think of the revised note -- I've removed the reference to the CVE, but I think it's important to note the source (and randomness guarantees) of the UUID for two reasons:

  • As you've discussed, it's the responsibility of the caller to enforce any security model for their application, so we should document the source of the value here so they can make those decisions.
  • So future contributors know that the choice of randomness here was intentional, and to consider that if there's any future changes to the default boundary value selection

Copy link
Member

@ok2c ok2c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benweissmann That sounds reasonable.

@TomerAberbach
Copy link

Hey folks, just wanted to check if the plan is to merge this soon? Before the next release?

@ok2c ok2c merged commit cd3d69d into apache:master Mar 23, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants