Conversation
6a34427 to
de4f91a
Compare
|
Something like this is needed. A bit of rambling thoughts... layersWe need singleflight protection (both module import and robots implement their own mechanism for this) and request interception handling (and we should probably make sure we handle robots.txt and caches correct with request interception). ergonomicsI don't know it this would be implemented as a "layer", but I'd like synchronous request handling to better. Right now ScriptManager handles it, but it should be pushed down to the http client library, since we need it in a few other places: const res = try http.requestSync(.{.arena = arena, ...});
// res completeAnd probably improve the ergonomics for async/callback too so that they can just register 1 script cachingThe concern with this type of layering is that it makes some things a little harder. I was thinking of how Script caching would work with layers. I think you might end up losing a bit of efficiency (and maybe even consistency), because script caching might need to happen separately from http caching (e.g. in the ScriptManager). Maybe not. But, it did make me realize that the new caching code could itself be split into Storage and HTTP caching. script managerTha Robots and Cache layers are nice since they cover recent additions, I think going back to the original users of HttClient (ScriptManager, XHR and fetch) and see how they can be improved is worthwhile. That's where I think of things like singleflight and synchronous requests and there are probably other things that I'm missing from their code that would be nice to streamline if possible. |
39d5a3c to
0472e58
Compare
|
I tried to run |
karlseguin
left a comment
There was a problem hiding this comment.
I need to review more of it, I'll go through the individual layers more thoroughly tomorrow.
| errdefer allocator.destroy(client); | ||
| pub const PerformStatus = enum { cdp_socket, normal }; | ||
|
|
||
| pub const Transport = struct { |
There was a problem hiding this comment.
Don't think this is ever used externally? (Same with Context..I think). Though, I'm not opposed to breaking this out in to many files.
| // BrowserContext deinit. | ||
| pub fn restoreOriginalProxy(self: *Client) !void { | ||
| try self.ensureNoActiveConnection(); | ||
| transport: *Transport, |
There was a problem hiding this comment.
What's the benefit of having all these previously Clients field moved to Transport (and having a Transport at all)
And, if there is a good reason to split it out, why have it a pointer her? Client owns the Transport? I looked for who else creates a transport and nothing else does. transport: Transport would just document this better. (And if the issue is you need a stable pointer *Client is already on the heap, so &client.transport is a stable pointer)
|
|
||
| self.queue.append(&transfer._node); | ||
| } | ||
| pub const Context = struct { |
There was a problem hiding this comment.
Not sure I see the point of this. We could inject *Client into every layer? Or even store *Client in req.
| } | ||
| self.tls_verify = verify; | ||
| } | ||
| transfer.req.params.notification.dispatch(.http_request_start, &.{ .transfer = transfer }); |
There was a problem hiding this comment.
Not sent if the request was serviced by the cache layer? I don't think that's correct.
There was a problem hiding this comment.
Yeah so then this would require the InterceptionLayer right? I can't see any other way you'd handle all of these (including the fact that some notifications would need to now take a Response instead of a Transfer to ensure we can still intercept on cached responses?)
There was a problem hiding this comment.
I have some local changes trying this out with the InterceptionLayer but I'm running into some trouble. What's the best way to handle this because the RI stuff feels so tightly coupled with the *Transfer type (in both the HttpClient but also in all of the CDP paths). I'm able to reasonably hoist the interception part but the auth part is giving me challenges.
There was a problem hiding this comment.
Do we know how RI and caching interact on Chrome?
There was a problem hiding this comment.
You're able to intercept the cached responses on Chrome I'm pretty sure. The Response type has a couple cache fields like fromDiskCache, fromPrefetchCache and fromEarlyHints which are all just signifying that it comes from one of the caches. (https://chromedevtools.github.io/devtools-protocol/tot/Network/#type-Response)
There was a problem hiding this comment.
I believe you are also able to disable the cache from CDP if you want to just get fresh responses with Network.setCacheDisabled.
There was a problem hiding this comment.
lightpanda-io/demo#165 passes in chrome. So in this case, it doesn't.
| }, | ||
| } | ||
|
|
||
| try l.next.request(ctx, req); |
There was a problem hiding this comment.
So if this throws an error, transport will end up calling errorCallback which itself calls l.next.request (the failing code) again. I think.
Oh, and if that's right, you defer self.deinit() at the top, so the errorCallback will be a use-after-free.
karlseguin
left a comment
There was a problem hiding this comment.
I think it's worth removing SingleFlightLayer.
I believe there's a lot of footguns around error propagation in the doneCallbacks firing the errorCallback (free after uses, and questionable retries in the error path). I'm not sure what the fix is (or even if there is a fix). The problem might exist in main, not sure.
Interaction with request interaction needs to be flushed out. Does chrome allow RI for cached items? Can it be a layer, and if so, it should probably the first one (with cache being the second one).
There are no new tests. I'm not sure what can and cannot be truly unit tested, but we do launch an HTTP server in the unit tests, and we do have the demo project for integration test (lightpanda-io/demo#164 would enable setting up more complex cases).
| const log = @import("../../log.zig"); | ||
|
|
||
| const http = @import("../http.zig"); | ||
| const Transfer = @import("../../browser/HttpClient.zig").Transfer; |
There was a problem hiding this comment.
Should move HttpClient to src/network/ IMO.
Agreed. Been trying to fight it lately and it seems like whenever an issue is solved that another one pops up. It makes some of the lifetimes really nasty and duplicates a lot of behavior that existed in the |
ffe1a86 to
0de5ba4
Compare
6237e02 to
ade95c2
Compare
krichprollsch
left a comment
There was a problem hiding this comment.
Here are 2 more comments from Claude:
r.deinit()exists publicly on bothRequestandRequestParamsand only frees headers. Easy to mistake for "free everything" (CDP.zig:459 already did). Consider renaming todeinitHeadersor removing the public surface in favor of always going throughClient.deinitRequest.
Client.request's errdefer only releases the arena — it doesn'treq.deinit()the headers. Callers compensate (addFromElementline 286), butScriptManager.preloadImport(419) andgetAsyncImport(527), andwebapi/Fetch.zig/XMLHttpRequest.zig/Worker.zig/Frame.zigdo not haveerrdefer headers.deinit()— ifclient.requestfails, the libcurl slist leaks. (This is preexisting — not introduced by this refactor — but the contract is fragile and worth pinning down. Easiest fix: haveClient.requestdeinit headers in its errdefer.)
| .shutdown_callback = RobotsContext.shutdownCallback, | ||
| }); | ||
| } else { | ||
| client.network.app.arena_pool.release(arena); |
There was a problem hiding this comment.
If I'm not wrong here, the arena is req.params.arena.
Is it really the responsibility of a layer to release the request's params arena?
What would happen if another layer use it?
There was a problem hiding this comment.
It now creates a new arena for the subrequest that is used by the RobotsContext. This is cleaned up later by Transfer.deinit() or by any deinit of the Request.
| self.next.request(client, req) catch |err| { | ||
| const ctx: *InterceptContext = @ptrCast(@alignCast(req.ctx)); | ||
| ctx.client.deinitRequest(req); | ||
| return err; |
There was a problem hiding this comment.
I'm not sure I get who is responsible to call req.err_callback in case of error.
Should we here?
There was a problem hiding this comment.
We now pass the errors up the Layer chain. Ideally, you can just return an error and expect the error_callback to be called and the Request to be de-initalized. On cases where we manage our own Request like in RobotsLayer OR the CDP callbacks in InterceptionLayer, you have to capture and call the callbacks + deinit yourself.
This allows us to write HTTP Client layers to add functionality without messing with the core HTTPClient too much. This is just an experiment right now so don't merge but would love to hear thoughts.
This ideally should make cache revalidation easier to implement as it would be entirely local to the Cache Layer and transparent to the other layers.