Layering HTTP Client by mookums · Pull Request #2079 · lightpanda-io/browser

mookums · 2026-04-03T17:30:18Z

This allows us to write HTTP Client layers to add functionality without messing with the core HTTPClient too much. This is just an experiment right now so don't merge but would love to hear thoughts.

This ideally should make cache revalidation easier to implement as it would be entirely local to the Cache Layer and transparent to the other layers.

karlseguin · 2026-04-06T00:32:31Z

Something like this is needed. A bit of rambling thoughts...

layers

We need singleflight protection (both module import and robots implement their own mechanism for this) and request interception handling (and we should probably make sure we handle robots.txt and caches correct with request interception).

ergonomics

I don't know it this would be implemented as a "layer", but I'd like synchronous request handling to better. Right now ScriptManager handles it, but it should be pushed down to the http client library, since we need it in a few other places:

const res = try http.requestSync(.{.arena = arena, ...});
// res complete

And probably improve the ergonomics for async/callback too so that they can just register 1 doneCallback and get the whole response (only XHR (and in theory fetch) care about streaming, and even they just want the whole body collected at the end).

script caching

The concern with this type of layering is that it makes some things a little harder. I was thinking of how Script caching would work with layers. I think you might end up losing a bit of efficiency (and maybe even consistency), because script caching might need to happen separately from http caching (e.g. in the ScriptManager). Maybe not. But, it did make me realize that the new caching code could itself be split into Storage and HTTP caching.

script manager

Tha Robots and Cache layers are nice since they cover recent additions, I think going back to the original users of HttClient (ScriptManager, XHR and fetch) and see how they can be improved is worthwhile. That's where I think of things like singleflight and synchronous requests and there are probably other things that I'm missing from their code that would be nice to streamline if possible.

krichprollsch · 2026-04-20T09:50:53Z

I tried to run go run integrations/main.go with this branch, and I get a leak reports after stopping the server:

error(gpa): memory address 0x742b70605640 leaked:
/home/pierre/wrk/browser/src/network/Network.zig:150:56: 0x372cc12 in _allocBlock (lightpanda)
        const slice = instance.?.allocator.alignedAlloc(u8, .fromByteUnits(alignment), Block.fullsize(size)) catch return null;
                                                       ^
/home/pierre/wrk/browser/src/network/Network.zig:199:35: 0x364c38a in strdup (lightpanda)
        const header = _allocBlock(len + 1) orelse return null;
                                  ^
/home/pierre/wrk/browser/src/sys/libcurl.zig:544:41: 0x358378e in cb (lightpanda)
            return @ptrCast(alloc.strdup(s));
                                        ^
/home/pierre/.cache/zig/p/N-V-__8AALp9QAGn6CCHZ6fK_FfMyGtG824LSHYHHasM3w-y/lib/slist.c:87:19: 0x840fa1f in curl_slist_append (/home/pierre/.cache/zig/p/N-V-__8AALp9QAGn6CCHZ6fK_FfMyGtG824LSHYHHasM3w-y/lib/slist.c)
  char *dupdata = curlx_strdup(data);
                  ^
/home/pierre/wrk/browser/src/sys/libcurl.zig:899:31: 0x2efde6f in curl_slist_append (lightpanda)
    return c.curl_slist_append(list, header);
                              ^
/home/pierre/wrk/browser/src/network/http.zig:97:58: 0x2efe00b in add (lightpanda)
        const updated_headers = libcurl.curl_slist_append(self.headers, header);
                                                         ^
/home/pierre/wrk/browser/src/browser/Page.zig:432:24: 0x2efe29a in headersForRequest (lightpanda)
        try headers.add(referer);
                       ^
/home/pierre/wrk/browser/src/browser/ScriptManager.zig:146:36: 0x2efe5f0 in getHeaders (lightpanda)
    try self.page.headersForRequest(&headers);
                                   ^
/home/pierre/wrk/browser/src/browser/ScriptManager.zig:316:51: 0x2f00b81 in addFromElement__anon_110945 (lightpanda)
                    .headers = try self.getHeaders(),
                                                  ^
/home/pierre/wrk/browser/src/browser/Page.zig:1144:40: 0x2f03111 in scriptAddedCallback__anon_110924 (lightpanda)
    self._script_manager.addFromElement(from_parser, script, "parsing") catch |err| {

karlseguin

I need to review more of it, I'll go through the individual layers more thoroughly tomorrow.

karlseguin · 2026-04-21T10:20:40Z

-    errdefer allocator.destroy(client);
+pub const PerformStatus = enum { cdp_socket, normal };
+
+pub const Transport = struct {


Don't think this is ever used externally? (Same with Context..I think). Though, I'm not opposed to breaking this out in to many files.

karlseguin · 2026-04-21T10:31:39Z

-// BrowserContext deinit.
-pub fn restoreOriginalProxy(self: *Client) !void {
-    try self.ensureNoActiveConnection();
+transport: *Transport,


What's the benefit of having all these previously Clients field moved to Transport (and having a Transport at all)

And, if there is a good reason to split it out, why have it a pointer her? Client owns the Transport? I looked for who else creates a transport and nothing else does. transport: Transport would just document this better. (And if the issue is you need a stable pointer *Client is already on the heap, so &client.transport is a stable pointer)

karlseguin · 2026-04-21T10:36:41Z


-    self.queue.append(&transfer._node);
-}
+pub const Context = struct {


Not sure I see the point of this. We could inject *Client into every layer? Or even store *Client in req.

karlseguin · 2026-04-21T10:51:22Z

-    }
-    self.tls_verify = verify;
-}
+        transfer.req.params.notification.dispatch(.http_request_start, &.{ .transfer = transfer });


Not sent if the request was serviced by the cache layer? I don't think that's correct.

Yeah so then this would require the InterceptionLayer right? I can't see any other way you'd handle all of these (including the fact that some notifications would need to now take a Response instead of a Transfer to ensure we can still intercept on cached responses?)

I have some local changes trying this out with the InterceptionLayer but I'm running into some trouble. What's the best way to handle this because the RI stuff feels so tightly coupled with the *Transfer type (in both the HttpClient but also in all of the CDP paths). I'm able to reasonably hoist the interception part but the auth part is giving me challenges.

Do we know how RI and caching interact on Chrome?

You're able to intercept the cached responses on Chrome I'm pretty sure. The Response type has a couple cache fields like fromDiskCache, fromPrefetchCache and fromEarlyHints which are all just signifying that it comes from one of the caches. (https://chromedevtools.github.io/devtools-protocol/tot/Network/#type-Response)

I believe you are also able to disable the cache from CDP if you want to just get fresh responses with Network.setCacheDisabled.

lightpanda-io/demo#165 passes in chrome. So in this case, it doesn't.

karlseguin · 2026-04-21T11:04:04Z

+            },
+        }
+
+        try l.next.request(ctx, req);


So if this throws an error, transport will end up calling errorCallback which itself calls l.next.request (the failing code) again. I think.

Oh, and if that's right, you defer self.deinit() at the top, so the errorCallback will be a use-after-free.

karlseguin

I think it's worth removing SingleFlightLayer.

I believe there's a lot of footguns around error propagation in the doneCallbacks firing the errorCallback (free after uses, and questionable retries in the error path). I'm not sure what the fix is (or even if there is a fix). The problem might exist in main, not sure.

Interaction with request interaction needs to be flushed out. Does chrome allow RI for cached items? Can it be a layer, and if so, it should probably the first one (with cache being the second one).

There are no new tests. I'm not sure what can and cannot be truly unit tested, but we do launch an HTTP server in the unit tests, and we do have the demo project for integration test (lightpanda-io/demo#164 would enable setting up more complex cases).

karlseguin · 2026-04-22T06:30:51Z

+const log = @import("../../log.zig");
+
+const http = @import("../http.zig");
+const Transfer = @import("../../browser/HttpClient.zig").Transfer;


Should move HttpClient to src/network/ IMO.

mookums · 2026-04-22T14:55:28Z

I think it's worth removing SingleFlightLayer.

Agreed. Been trying to fight it lately and it seems like whenever an issue is solved that another one pops up. It makes some of the lifetimes really nasty and duplicates a lot of behavior that existed in the HttpClient to handle the fanout.

krichprollsch

Here are 2 more comments from Claude:

r.deinit() exists publicly on both Request and RequestParams and only frees headers. Easy to mistake for "free everything" (CDP.zig:459 already did). Consider renaming to deinitHeaders or removing the public surface in favor of always going through Client.deinitRequest.

Client.request's errdefer only releases the arena — it doesn't req.deinit() the headers. Callers compensate (addFromElement line 286), but ScriptManager.preloadImport (419) and getAsyncImport (527), and webapi/Fetch.zig / XMLHttpRequest.zig / Worker.zig / Frame.zig do not have errdefer headers.deinit() — if client.request fails, the libcurl slist leaks. (This is preexisting — not introduced by this refactor — but the contract is fragile and worth pinning down. Easiest fix: have Client.request deinit headers in its errdefer.)

krichprollsch · 2026-04-27T07:27:02Z

+            .shutdown_callback = RobotsContext.shutdownCallback,
+        });
+    } else {
+        client.network.app.arena_pool.release(arena);


If I'm not wrong here, the arena is req.params.arena.
Is it really the responsibility of a layer to release the request's params arena?
What would happen if another layer use it?

It now creates a new arena for the subrequest that is used by the RobotsContext. This is cleaned up later by Transfer.deinit() or by any deinit of the Request.

krichprollsch · 2026-04-27T09:05:54Z

+    self.next.request(client, req) catch |err| {
+        const ctx: *InterceptContext = @ptrCast(@alignCast(req.ctx));
+        ctx.client.deinitRequest(req);
+        return err;


I'm not sure I get who is responsible to call req.err_callback in case of error.
Should we here?

We now pass the errors up the Layer chain. Ideally, you can just return an error and expect the error_callback to be called and the Request to be de-initalized. On cases where we manage our own Request like in RobotsLayer OR the CDP callbacks in InterceptionLayer, you have to capture and call the callbacks + deinit yourself.

mookums force-pushed the http-client-layering branch from 6a34427 to de4f91a Compare April 3, 2026 20:33

mookums force-pushed the http-client-layering branch 3 times, most recently from 39d5a3c to 0472e58 Compare April 14, 2026 14:55

karlseguin mentioned this pull request Apr 20, 2026

Implement Worker and SharedWorker #2017

Open

karlseguin reviewed Apr 21, 2026

View reviewed changes

karlseguin reviewed Apr 22, 2026

View reviewed changes

mookums force-pushed the http-client-layering branch 2 times, most recently from ffe1a86 to 0de5ba4 Compare April 22, 2026 23:10

mookums added 8 commits April 26, 2026 16:35

use layers for Cache, Robots and WebBotAuth

3dcdfe8

add RequestParams and SyncRequest

a1e5c7b

use syncRequest in ScriptManager

a22d87e

remove Context and thread *Client

fb2472d

Transfer now uses Request's request_id

81264da

move RequestStart to InterceptionLayer

883ad6f

move arena up to Request instead of Transfer

dc9bbe3

crude InterceptionLayer

ade95c2

mookums force-pushed the http-client-layering branch from 6237e02 to ade95c2 Compare April 27, 2026 02:22

mookums added 9 commits April 26, 2026 19:56

more fixing of hanging in cdp interception

c17c413

fix request id mismatch on cdp

679b104

partial auth challenge support

30e9945

working authentication with InterceptionLayer

0e00c40

remove blocking from RequestParams

c46b414

remove InterceptState

c35d03e

use arena pool in InterceptionLayer

71cefe6

ensure robots params have arena and request id

b616923

Requests now use arenas from the arena pool

a681ddf

mookums added 4 commits April 26, 2026 23:09

use Request arnea in InterceptionLayer

49e9159

use Request Arena in WebBotAuthLayer

babcd66

use Request Arena in CacheLayer

ff01fa3

use Request Arena in RobotsLayer

721fe76

krichprollsch reviewed Apr 27, 2026

View reviewed changes

mookums added 9 commits April 27, 2026 06:54

use lightpanda module log in layers

363cfce

assert that intercepted isn't 0 before decrementing

2a1e147

properly call error callback in InterceptionLayer

54297da

pass error all the way up to Layer chain to clean

eedd5bb

add a note about cdp callback cb

746031a

add assert to SyncRequest headerCallback

f9b2448

remove interception stuff in Transfer

8e23126

better arena management in Robots Layer and Context

c338639

decrement intercepted and properly deinit on BrowserContext deinit

2496d5b

Conversation

mookums commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlseguin commented Apr 6, 2026

layers

ergonomics

script caching

script manager

Uh oh!

krichprollsch commented Apr 20, 2026

Uh oh!

karlseguin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karlseguin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mookums commented Apr 22, 2026

Uh oh!

krichprollsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mookums Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

mookums commented Apr 3, 2026 •

edited

Loading

mookums Apr 27, 2026 •

edited

Loading