Skip to content

Conversation

@arturobernalg
Copy link
Member

             content-decoding (optional)
  • New ContentDecoderRegistry discovers extra codecs (br, zstd, xz, lz4, …) via Commons-Compress when that jar is on the class-path; otherwise falls back to the built-ins (gzip, deflate) only.
  • No hard dependency added—projects that need the extra algorithms just add commons-compress (and helper jars like google-brotli, zstd-jni, xz-java) to their pom and HttpClient uses them automatically.

@arturobernalg arturobernalg requested a review from ok2c June 17, 2025 16:59
@arturobernalg
Copy link
Member Author

@ok2c @garydgregory — could you please review this PR when you have a moment? I’ve closed the earlier one because I opted for a cleaner approach here.
Thank you!

Copy link
Member

@ok2c ok2c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturobernalg It is much better and can basically be merged as it, but it can be made better still.

  1. CommonsCompressDecoderFactory uses reflection. I does not have to. It is perfectly OK to have Commons Compress as a optional compile dependency. what is important that there is no direct dependency on Commons Compress in ContentDecoderRegistry, What #commonsCompressPresent in ContentDecoderRegistry does perfectly fine and sufficient. CommonsCompressDecoderFactory import Commons Compress classes directly.
  2. REGISTRY in ContentDecoderRegistry should be unmodifiable. It is only a matter of tweaking #build method a little. This would make #snapshot unnecessary. Better yet you should probably use org.apache.hc.core5.http.config.Registry instead of a plan Map

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY @arturobernalg !

I agree with @ok2c comments, especially regarding reflection. I've made a few low-level comments throughout.

* Returns a <em>defensive copy</em> of the internal registry
* (key = canonical encoding token, value = {@link InputStreamFactory}).
*/
public static Map<String, InputStreamFactory> snapshot() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot() is a bad method name here IMO, rename to getRegistry().
Saying this is a defensive copy is misleading because InputStreamFactory makes no thread-safety guarantees. I would say this is a "shallow copy" which is technically correct and doesn't imply intent of attack or defense for call sites.

false,
ContentDecoderRegistry.class.getClassLoader());
return true;
} catch (final ClassNotFoundException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about LinkageError?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping?


final List<String> list = new ArrayList<>(4);
list.add("gzip");
list.add("x-gzip");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a later PR could gather all these magic strings ("x-gzip" and so on) into a class (existing or new).


/* 2. Optional Commons-Compress decoders (reflection, safe) */
if (commonsCompressPresent()) {
addCommons(m, "br");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about magic strings, for a later PR.

Class.forName(cn, false,
CommonsCompressDecoderFactory.class.getClassLoader());
return true;
} catch (final ClassNotFoundException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about LinkageError?

@arturobernalg arturobernalg force-pushed the HTTPCLIENT-1843-5 branch 2 times, most recently from 95f58c5 to 7ea87d5 Compare June 18, 2025 17:11
@arturobernalg
Copy link
Member Author

Please @ok2c @garydgregory do another pass.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturobernalg
Look good. I
have some
comment
scattered

also, why
     not use longer
     lines?
         I'm not a fan
         of the
         newspaper format

;-)

false,
ContentDecoderRegistry.class.getClassLoader());
return true;
} catch (final ClassNotFoundException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping?

}
}

private enum Probe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit redundant with the new ContentCoding enum. What do you think about moving the "requires this type name" aspect (a.k.a. "probe class") of Probe to ContentCoding and turning this "Probe" into a Set<ContentCoding>?


final String enc, probeClass;

Probe(final String enc, final String probeClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probeClass is misleading to me because it's a name, not an actual class. I think "requiredTypeName" or "requiredClassName" would be a clearer name because "probe" doesn't really tell you anything (to me).

this.probeClass = probeClass;
}

static String helperFor(final String enc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stay away from cryptic names like "enc", even in private code, if it's an "encoding", the call it that ;-)


/**
* {@link InputStreamFactory} backed by
* <a href="https://commons.apache.org/proper/commons-compress/">Apache&nbsp;Commons Compress</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a non-breaking space between words 1 and 2, why not between words 2 and 3? I would just not bother since it might cause odd rendering on small screens like phones and tablets.

* {@link InputStreamFactory} backed by
* <a href="https://commons.apache.org/proper/commons-compress/">Apache&nbsp;Commons Compress</a>.
* <p>
* The class is compiled against Commons Compress but lives behind an <i>optional</i>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use <i> in Javadocs, that's up to the rederer to decide what style to apply, which is why Javadoc supports <em> for ephasis which is usually translated to italics and <strong> which is usually translated to bold.

* </p>
*
* <h4>Run-time guards</h4>
* Some encodings (e.g.&nbsp;{@code br}, {@code zstd}, {@code xz/lzma})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I stay away from Latin abbreviations, I find it doesn't make the text clearer to read. No need for non-breaking space IMO.

* <a href="https://commons.apache.org/proper/commons-compress/">Apache&nbsp;Commons Compress</a>.
* <p>
* The class is compiled against Commons Compress but lives behind an <i>optional</i>
* dependency. At run-time it will only be loaded when the library is present,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need two spaces after a .; we're not using typewriters anymore and Javadoc is not rendered in mono-space ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please @garydgregory do another pass

                 content-decoding (optional)

* New ContentDecoderRegistry discovers extra codecs (br, zstd, xz, lz4, …)
  via Commons-Compress when that jar is on the class-path; otherwise falls
  back to the built-ins (gzip, deflate) only.
* No hard dependency added—projects that need the extra algorithms just add
  `commons-compress` (and helper jars like google-brotli, zstd-jni, xz-java)
  to their pom and HttpClient uses them automatically.
@arturobernalg arturobernalg requested a review from ok2c June 19, 2025 08:56
Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@arturobernalg arturobernalg merged commit d1eb5b8 into apache:master Jun 19, 2025
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants