http2: Move all socket operations to the session thread #4368

dridi · 2025-07-18T20:11:25Z

This is a port of a change initially made by @mbgrydeland first in Varnish Enterprise for planning reasons.

It's a major change of the architecture:

all socket operations happen in the h2_sess thread
the session's file descriptor switches to non-blocking
coordination with h2_req threads is done with a file descriptor
- based on vefd: Thin wrapper around eventfd with pipe emulation #4348 originally submitted independently
- this enables polling together with the socket
control frames are sent from a fixed-length buffer living in workspace

I was the main reviewer of the change, and I personally extracted several commits (@mbgrydeland did too based on my reviews) in order to reduce the size of the main commit as much as possible:

http2: Rework HTTP/2 to be using non-blocking sockets

After porting the change to this branch, I wish I had more aggressively extracted small changes into self-contained commits because the amount of conflicts to resolve made the port of this commit alone several days of work.

There were a lot of explicit (line) conflicts, and a fair amount of implicit commits because at this point trunk and Varnish Enterprise (based on 6.0) diverged in many small ways:

Varnish Enterprise notably has
- a waiting list "walk-away" feature (see h2: Make requests walk away from the waiting list #3835)
- listen socket management (trf: Add client traffic management to the CLI #3959)
- task tracking with a varnishscoreboard utility
and Varnish Cache has
- a workspace emulator and hardened workspace semantics (ws_emu: Introduce the workspace emulator #3644)
  - I defaulted to ws_emu+asan in my porting endeavor, it actually helped
- a generalized VDP API (Add req.filters and bereq.filters #4035)
- a new asynchronous iterator for delivery (Prepare Varnish-Cache for the age of AI #4209)

Between these major changes (and probably others I'm forgetting) and lots of small core changes accumulated over the years, there was pretty much always a merge conflict for all the commits I cherry-picked. I snuck two original commits at the beginning, to reduce noise when conflicts happened during the port.

I caught two minor bugs in this port, and besides the workspace conflict, things went rather smoothly. It was just tedious to replay, so tedious that I frequently switched to other (important) tasks, which added delays. But I ran the test suite for each commit, stressing them with load, the workspace emulator paired with sanitizers.

The original change went to production, facing real world h2 traffic, that's why the patch series ends with a bunch of bug fixes. For similar workloads it's putting overall a lower load on the system, and it's more reactive. If you wish to use OpenSSL, you can't perform reads and writes concurrently, so moving all socket operations to the session thread also removes a good deal of contention for HTTPS workloads.

I'm requesting @mbgrydeland's review so he can make sure that the port is faithful to our original work. I'm also requesting @walid-git's review because he rebased his trailer implementation (Varnish Enterprise counterpart of #4125) on this architectural change (and eventually he should do the same for #4125). It was reportedly easier to implement with this new architecture.

I have one demand.

Leave structural changes aside if they can wait until after merging this. If they must really happen as part of this change, it will require new commits. I didn't apply the bug fixes in the commit introducing the bugs because keeping the same list of commits will help us track what has or has not been ported either way (among commits relevant in both projects).

I'm otherwise open to squashing whitespace improvements or general polish in the right commits, but I will be away for a couple weeks.

No diff with the --ignore-all-space option.

The connection may be closed before either stream 1 or 3 is done sending a CONTINUATION frame.

This is a minimal API not attempting to keep track of the eventfd counter, and only caring about threads signalling progress on one end, and another thread being able to notice that progress can be made. This is an alternative to condvars when waiting for progress will content with polling concurrently waiting for progress elsewhere. Since locks/condvars don't compose well with file descriptors, the VEFD API offers an alternative that can integrate an existing poll on other file descriptors.

The caller is responsible for passing a req for stream 0 depending on a prior knowledge setup of the session or an opportunistic upgrade from an HTTP/1 session. The signature reflects that h2_sess takes ownership of req.

The signature of h2_new_req() changed to reflect that.

The h2_ou_rel() function is fail-safe, and passing a phony return value further down is misleading.

This clarifies what is being released without looking at the signature.

It becomes unusable at the call site, so the signature should reflect that.

The signature is changed to reflect that the h2_req is no longer usable after calling the function.

This is to be better prepared for handling errors at the call site.

Waiting for the PRISM request immediately after sending the 101 (switch protocols) is a prerequisite to ensure a proper order of events. A subsequent change will ensure that the first frame sent by Varnish is the initial SETTINGS frame.

The code calling h2_rxframe() assumes that the workspace will have been released upon its return. This was not done in the case of early return due to goaway having been sent.

The H/2 RFC requires that the server sends the settings frame during an oppurtunistic upgrade before any stream frames. But we scheduled the stream==1 (H/1 received upgraded request that will get the response over H/2) thread before sending the settings. This technically (though unlikely) means that it could win the race to get the send mutex, and start sending frames before the settings. To clean this up, this patch delays the scheduling of the stream==1 request thread until after the settings have been sent. Also it changes things so that failures to schedule the stream==1 will result in a ENHANCE_YOUR_CALM error being sent to the client. A test case is updated to expect the error message.

The H2 spec uses the same bit for frame flags of the same meaning, regardless of the frame they are used in. Simplify and reduce number of headers by defining each only once.

At this point, we alrerady have a complete frame.

It can be used anywhere we expect short writes using writev(). It's hosted by cache_session.c until there is enough momentum to give VIOV its own C file.

This makes cache_http2_proto.c smaller.

The t_window field in particular could be confused with the naming convention for timestamps. Using rx_ and tx_ prefixes removes the ambiguity.

This is not just any rx thread, and in the future the h2_sess thread will be in charge of sending frames too. While at it, properly check the h2 argument, properly compare threads, and add an h2_req counterpart since any thread working with h2 that is not the session thread, must be a request stream thread.

And make it operate on the r2_req only, from which the h2_sess can be reached.

This event FD will be used to signal from client threads to the session thread when sending data.

This large commit switches H2 sockets to be using non-blocking sockets instead of blocking operations. It also moves all IO handling into the session thread (where previously write IO was performed both by the session thread and request threads, using a mutex to give exclusive access to the writing). The non-blocking IO is handled as reentrant functions that will write what they can when called. A poll() is used in order to monitor the socket state and notice when further progress can be made. Request threads will queue their payload writes for to the session thread, which will pick them up and execute both the framing and the writes. The request thread is then signalled when the payload has been transmitted. Because the change to non-blocking on the socket throws the rug from beneath most of the core H2 functionality, this is one large commit that changes everything in one go. A send buffer for control frames is allocated at the beginning of the H2 workspace for stream 0. HTC Pipe-lining can no longer afford a complete rollback, and was changed to figure how to reset or roll back the HTC workspace at HTC_RxInit() time, or leave it alone. The HTC owner is in charge of providing a workspace snapshot when needed.

We have been failing to enforce a MUST requirement of the H2 spec that states that a PRISM MUST be followed by a SETTINGS frame (which may be empty). Note that this also goes for H1->H2 upgrades, even though the settings were also transferred as an H1 header. Relevant RFC references: rfc7540,l,579,637 rfc7540,l,482,485 This change comes with a need to adjust several of our test cases that failed to adhere to this requirement. The test case changes mostly consist of not doing the preface manually, allowing varnishtest to do the needful.

This centralizes the logic around h2 stream state changes to one function.

When we are sending a GOAWAY frame during session shutdown, we have to make sure not to cancel the current large frame if any. Fix this by changing the send framework shutdown into a two stage operation. Prior to sending the GOAWAY we cancel any queued large frames, then we attempt to get the GOAWAY out the door, and finally cancel the current large frame if still present. Better diff with the --ignore-all-space option.

An assertion was wrongly being made that when killing any request, the hpack lock would have been released. This of course only is true when we are killing the request holding the hpack lock. Better diff with the --ignore-all-space option.

If we end up killing a req that is holding the hpack lock, we will have lost the compression state. This change makes sure to set a connection error if this happens, which in turn will ensure that the connection is closed.

The per-frame handler functions are not allowed to call h2_kill_req() from within its handlers. This is because the caller will be holding a pointer to it, and h2_kill_req() would free the req if it happens to not currently be scheduled. This patch fixes h2_rx_rst_stream() so that it adheres to this rule.

This unsets the `struct h2_req` magic value as held in the req's workspace allocation when releasing the req. This ensures that a dangling `struct h2_req` pointer would fail its magic check after the req has been released.

The window update handling code would interpret (r2==NULL) as meaning the connection window always. This is faulty logic, as r2 will be NULL also for a stream that has been closed. The RFC clearly states that a window update for a closed stream must be ignored. Fix this by testing on the received stream number being zero rather than (r2==NULL) to determine connection window updates. One way this issue can become a handling problem is that multiple faulty increases of the connection window can make us overflow the connection window maximum size, causing us to send a flow control error to the client.

It is no longer needed to hold the lock when manipulating the open streams list. This list is after the redesign only ever accessed by the session thread.

Avoid the data race on `r2->scheduled` by holding the session mutex when testing it.

dridi · 2025-07-21T13:41:12Z

Hello from a bus in the middle of nowhere. It was brought to my attention that someone was "not terribly happy about" me doing a "and don't do anything else, while I'm away on vacation" salute.

Apologies for giving this impression, I'll try to clarify what I wrote on a late Friday night since I was not sure I would get a reliable-enough internet access today to submit it.

Leave structural changes aside if they can wait until after merging this.

What I mean here is not "don't do anything else" but rather, if you can help it, "do anything else" after merging. In other words, if you register review comments that could be applied afterwards, let's apply them afterwards.

If they must really happen as part of this change, it will require new commits.

If you must insist on making changes before merging, "do anything else" as new commits at the end of the patch series.

I didn't apply the bug fixes in the commit introducing the bugs because keeping the same list of commits will help us track what has or has not been ported either way (among commits relevant in both projects).

I could have submitted a smaller patch series, but that would make coordination harder for us, so in order to facilitate that kind of contribution (already gone to production) from us, it was easier to retain the same list of applicable patches.

Please do "anything else" while I'm on vacation, except introducing structural changes in the middle of the patch series. 🫡

mbgrydeland · 2025-07-24T13:42:53Z

bin/varnishd/cache/cache_ws_common.c


 #include "cache_varnishd.h"

+uintptr_t const ws_pipeline_rollback = (uintptr_t)&ws_pipeline_rollback;


Shouldn't this be const uintptr_t ws_pipeline_rollback?

I tripped on not-quite-constant symbols in this form:

// .h extern const char *sentinel_value; // .c const char *sentinel_value = "literal";

So to avoid this, I took the habit of adding the const modifier to the symbol when it is meant to never change, instead of its type. While I agree that in practice it does not make a difference here for this non-pointer type, it better conveys (to me) the intent of making ws_pipeline_rollback read-only.

See 0cfb214 for a recent example.

mbgrydeland

I have gone through all the patches, and I believe this is a faithful port of the code. Only a couple of minor comments made.

mbgrydeland · 2025-07-28T12:55:13Z

bin/varnishd/cache/cache_session.c

 /*--------------------------------------------------------------------*/

 void
 HTC_RxInit(struct http_conn *htc, struct ws *ws)


I had a hard time groking the logic wrt to the pipeline_snap usage. Wouldn't it be cleaner to add a snapshot argument to HTC_RxInit(), and have each call site give the right argument. This argument looks to me like it could be done unconditionally on each of the call sites, making the code easier to follow and cleaner IMO.

Fair enough, and that would avoid increasing the size of struct http_conn.

mbgrydeland · 2025-07-28T14:01:39Z

bin/varnishd/http2/cache_http2_proto.c

+	r2->magic = 0;
+	req->transport_priv = NULL;


I believe these two lines need to be moved after Req_Cleanup(). This due to Req_Cleanup() calling VDP_Fini(), which I guess may for some VDP call on code expecting to see the complete structure.

Good catch, I forgot about the increased scope of Req_Cleanup() in this branch.

desdic · 2025-09-08T09:33:16Z

I tried to apply this patch and then rebase master (trunk) and this seems to solve the issue we had with H2 where we had to set h2_window_timeout to something high in order to avoid GOAWAY. Now its been running with this patch for about a week and I cannot reproduce the issue we had.

dridi · 2025-09-08T09:36:53Z

Thank you for your feedback. In order to move this forward, I plan to revert the latest VSV patches at the beginning of the patch series, and add the patches we originally wrote for this new architecture at the end.

But I'm not starting without a sign that this will receive attention.

desdic · 2025-09-15T08:56:05Z

Not sure if its related to this patch but it could be .. after running it for an extended period we got a crash

Child (23116) Panic at: Sat, 13 Sep 2025 17:08:11 GMT
Assert error in h2_del_sess(), http2/cache_http2_session.c line 185:
  Condition(!WS_IsReserved(req->ws)) not true.
version = varnish-trunk revision 43fbfd6051f0ea3760d04b893ebebcb5351ad263, vrt api = 21.0
ident = Linux,6.8.0-51-generic,x86_64,-jlinux,-smalloc,-sdefault,-hcritbit,epoll
now = 11612746.444027 (mono), 1757783219.043331 (real)
Backtrace:
  ip=0x5931b56e2ab5 sp=0x734bb316a300 <VBT_format+0x65>
  ip=0x5931b56562cd sp=0x734bb316a420 <pan_ic+0x24d>
  ip=0x5931b56e1d39 sp=0x734bb316a580 <VAS_Fail+0x19>
  ip=0x5931b569e19b sp=0x734bb316a590 <h2_del_sess+0x11b>
  ip=0x5931b569ed37 sp=0x734bb316a5c0 <h2_new_session+0x8a7>
  ip=0x5931b5683122 sp=0x734bb316a870 <WRK_Thread+0x3c2>
  ip=0x5931b568370c sp=0x734bb316b420 <pool_thread+0x3c>
  ip=0x734bb329caa4 sp=0x734bb316b450 <pthread_condattr_setpshared+0x684>
  ip=0x734bb3329c3c sp=0x734bb316b500 <__clone+0x24c>
errno = 110 (Connection timed out)
argv = {
  [0] = \"/usr/sbin/varnishd\",
  [1] = \"-a\",
  [2] = \":80\",
  [3] = \"-a\",
  [4] = \"127.0.0.1:81,PROXY\",
  [5] = \"-a\",
  [6] = \"/var/run/varnish.sock,PROXY,user=vcache,group=varnish,mode=666\",
  [7] = \"-f\",
  [8] = \"/etc/varnish/onecom.vcl\",
  [9] = \"-T\",
  [10] = \":6082\",
  [11] = \"-t\",
  [12] = \"120\",
  [13] = \"-l\",
  [14] = \"512M\",
  [15] = \"-s\",
  [16] = \"malloc,735G\",
  [17] = \"-S\",
  [18] = \"/etc/varnish/secret\",
  [19] = \"-p\",
  [20] = \"vcc_err_unref=false\",
  [21] = \"-p\",
  [22] = \"thread_pool_stack=192k\",
  [23] = \"-p\",
  [24] = \"workspace_client=128k\",
  [25] = \"-p\",
  [26] = \"workspace_backend=128k\",
  [27] = \"-p\",
  [28] = \"send_timeout=3600\",
  [29] = \"-p\",
  [30] = \"transit_buffer=1M\",
  [31] = \"-p\",
  [32] = \"h2_window_timeout=200\",
  [33] = \"-p\",
  [34] = \"timeout_idle=60\",
  [35] = \"-p\",
  [36] = \"vsl_mask=-H2TxHdr,-H2TxBody\",
  [37] = \"-p\",
  [38] = \"feature=+http2\",
  [39] = \"-p\",
  [40] = \"thread_pool_max=8000\",
  [41] = \"-p\",
  [42] = \"http_max_hdr=128\",
}
pthread.self = 0x734bb316c6c0
pthread.name = (cache-worker)
pthread.attr = {
  guard = 4096,
  stack_bottom = 0x734bb313d000,
  stack_top = 0x734bb316d000,
  stack_size = 196608,
}
thr.req = 0x7341998453e0 {
  vxid = 1017251858, transport = HTTP/2
  step = Req Step transport
  req_body = NULL,
  err_code = 1, err_reason = (null),
  restarts = 0, esi_level = 0,
  vary_b = (nil), vary_e = (nil),
  d_ttl = 0.000000, d_grace = 0.000000,
  storage = (nil),
  sess = 0x727f88caea20 {
    fd = 216, vxid = 1017251858,
    t_open = 1757783218.307428,
    t_idle = 1757783219.043316,
    ws = 0x727f88caea60 {
      id = \"ses\",
      {s, f, r, e} = {0x727f88caeab8, +168, (nil), +576},
    },
    transport = HTTP/2 {
      h2_sess = 0x734bb316a660 {
        refcnt = 0, bogosity = 0, error = H2CE_RAPID_RESET
        open_streams = 0, highest_stream = 113, goaway_last_stream = 0,
        local_settings = {0x1000, 0x1, 0x64, 0xffff, 0x4000, 0x8000, 0x0, 0x0, 0x0},
        remote_settings = {0x1000, 0x0, 0x64, 0x200000, 0x4000, 0x7fffffff, 0x0, 0x0, 0x0},
        {rxf_len, rxf_type, rxf_flags, rxf_stream} = {4, 3, 0x0, 57},
    }
    client = 2a02:a18:9159:9c01:2906:de9e:9c8a:519d 55668 /var/run/varnish.sock,
    local.endpoint = /var/run/varnish.sock,
    local.socket = a2,
    local.ip = 0.0.0.0:0,
    remote.ip = 0.0.0.0:0,
    server.ip = 2a02:2350:6::b788:3355:443,
    client.ip = 2a02:a18:9159:9c01:2906:de9e:9c8a:519d:55668,
  },
  ws = 0x734199845538 {
    id = \"req\",
    {s, f, r, e} = {0x73419984b140, +0, +107128, +107128},
  },
  http_conn = 0x73419984b0b8 {
    fd = 216 (@0x727f88caea44),
    doclose = null(Not Closing)
    ws = 0x734199845538 {
      [Already dumped, see above]
    },
    {rxbuf_b, rxbuf_e} = {0x73419984b140, 0x73419984b140},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = 0,
    body_status = NULL,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7341998455e8 {
    ws = NULL
    hdrs {
    },
  },
  vdc = 0x73419984b058 {
    .magic = 0x00000000 EXPECTED: VDP_CTX_MAGIC=0xee501df7
  }
  vcl[vcl] = NULL
  flags = {
  },
  privs = 0x7341998455d8 {
  },
  top = 0x73419984b120 {
    req = 0x7341998453e0 {
      [Already dumped, see above]
    },
    privs = 0x73419984b138 {
    },
    vcl[vcl0] = NULL
  },
},
thr.busyobj = NULL
thr.worker = 0x734bb316b320 {
  ws = 0x734bb316b3a0 {
    id = \"wrk\",
    {s, f, r, e} = {0x734bb316a880, +0, (nil), +2040},
  },
  VCL::method = BACKEND_RESPONSE,
  VCL::methods = {},
},
vmods = {
  std = {p=0x734bb2a2c500, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
  one = {p=0x734bb2a2c580, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"5a495ac72e70c4821f77fe99804209a90ea35ae6\", version=\"libvmod-one 0.3\"},
  maxminddb = {p=0x734bb2a2c680, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"8d3077d81ff50a9686978d0d72e06654967fdee1\", version=\"libvmod-maxminddb 0.3\"},
  directors = {p=0x734bb2a2c700, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
  xkey = {p=0x734bb2a2c780, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
  header = {p=0x734bb2a2c800, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
  cookie = {p=0x734bb2a2c900, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
  vsthrottle = {p=0x734bb2a2c980, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
  bodyaccess = {p=0x734bb2a2ca00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
  blob = {p=0x734bb2a2ca80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
  digest = {p=0x734bb2a2cb80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
    vcs=\"bf26880ece69f2cd321cbf57dd59b687c24920e3\", version=\"libvmod-digest 1.0.3\"},
  xcounter = {p=0x734bb2a2cc00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"6d9d8d3ac261a49b11453cdd37da5594619ca335\", version=\"libvmod-xcounter 0.1\"},
  re = {p=0x734bb2a2cc80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"79313a2e71b6a2f478dfa57c821697a2f3da00cc\", version=\"libvmod-re 2.6.0\"},
  taskvar = {p=0x734bb2a2cd80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"ce0f348f87f63aca9b2908265d4b4028830d9530\", version=\"varnish-objvar 0.1\"},
  topvar = {p=0x734bb2a2ce00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"ce0f348f87f63aca9b2908265d4b4028830d9530\", version=\"varnish-objvar 0.1\"},
  querystring = {p=0x734bb2a2cf00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
    vcs=\"NOGIT\", version=\"libvmod-querystring 2.0.3\"},
},
pools = {
  pool = 0x734b90400140 {
    nidle = 396,
    nthr = 1202,
    lqueue = 0
  },
  pool = 0x734b90400000 {
    nidle = 286,
    nthr = 1130,
    lqueue = 0
  },
},

walid-git · 2025-09-19T11:01:43Z

There is something confusing here:

err_code = 1, err_reason = (null),

req->err_code is used to make the difference between an opportunistic h2 upgrade and a prior knowledge one, and the value 1 corresponds to H2_PU_MARKER (prior knowledge). However, req->err_code is cleared here, at the beginning of the h2 session.
But, the panic indicates that the h2 session has been running for a while and has seen several streams:

open_streams = 0, highest_stream = 113, goaway_last_stream = 0,

These two items seem to be contradicting each other.

walid-git · 2025-09-19T11:17:40Z

@desdic I just noticed something, it seems that the varnish instance from which the panic was reported is not running this branch:

version = varnish-trunk revision 43fbfd6, vrt api = 21.0

Could you open a separate issue ?

walid-git · 2025-09-19T16:19:42Z

Could you open a separate issue ?

Nevermind: #4396

walid-git · 2025-10-01T14:54:31Z

Thank you for your feedback. In order to move this forward, I plan to revert the latest VSV patches at the beginning of the patch series, and add the patches we originally wrote for this new architecture at the end.

But I'm not starting without a sign that this will receive attention.

Bugwash: @nigoroll Thinks that he will be able to look at it this month.
He also requested some clean up on the PR.
I have cherry picked a9e1667 and 6e33056 to master and @dridi will do the rebase.

dridi and others added 30 commits July 18, 2025 21:23

vtc: White space cleanup in t02027

a9e1667

No diff with the --ignore-all-space option.

vtc: Stabilize r2387

6e33056

The connection may be closed before either stream 1 or 3 is done sending a CONTINUATION frame.

build: Detect eventfd() at configure time

4c0b6f5

vefd: Use eventfd() when available

845dfc5

http2_session: Always pass a req to h2_init_sess()

bd12dce

The caller is responsible for passing a req for stream 0 depending on a prior knowledge setup of the session or an opportunistic upgrade from an HTTP/1 session. The signature reflects that h2_sess takes ownership of req.

http2_proto: Take immediate ownership of req in h2_req

04fbc50

The signature of h2_new_req() changed to reflect that.

http2_session: Remove misleading return value

e37c41a

The h2_ou_rel() function is fail-safe, and passing a phony return value further down is misleading.

http2_session: Rename h2_ou_rel() to h2_ou_rel_req()

b0c75fd

This clarifies what is being released without looking at the signature.

http2_session: Take ownership of req in h2_ou_rel_req()

433106d

It becomes unusable at the call site, so the signature should reflect that.

http2_proto: Make h2_del_req() take the struct h2_req

b19d4e7

The signature is changed to reflect that the h2_req is no longer usable after calling the function.

http2_session: Directly return the h2_req set up

950a578

This is to be better prepared for handling errors at the call site.

http2_proto: Expose h2_tx_goaway()

b7a2fe6

http2_proto: Release WS in h2_rxframe() on early return

24277a7

The code calling h2_rxframe() assumes that the workspace will have been released upon its return. This was not done in the case of early return due to goaway having been sent.

http2: Consolidate frame flags

ce499f9

The H2 spec uses the same bit for frame flags of the same meaning, regardless of the frame they are used in. Simplify and reduce number of headers by defining each only once.

http2_proto: Drain stale comment

dd79a62

At this point, we alrerady have a complete frame.

viov: New VIOV_prune() function

eec922f

It can be used anywhere we expect short writes using writev(). It's hosted by cache_session.c until there is enough momentum to give VIOV its own C file.

http2_send: Adopt VIOV_prune()

7b68db3

http2_deliver: Rename VDP callbacks to h2_vdp_*()

a758f8d

http2_deliver: Prepare response flags upfront

cc63579

http2_deliver: Prepare minimal response flags upfront

4b7150d

http2_reqbody: Manage req body in its own file

d015983

This makes cache_http2_proto.c smaller.

http2: Rename h2_req::[rt]_window fields

25eb677

The t_window field in particular could be confused with the naming convention for timestamps. Using rx_ and tx_ prefixes removes the ambiguity.

http2_send: Move h2_errcheck() to cache_http2_proto.c

2610bc0

And make it operate on the r2_req only, from which the h2_sess can be reached.

http2_session: Associate an eventfd to each h2_sess

319c043

This event FD will be used to signal from client threads to the session thread when sending data.

mbgrydeland added 9 commits July 18, 2025 21:23

http2_proto: Centralize stream state changes

1553eca

This centralizes the logic around h2 stream state changes to one function.

H2: Fix a bad assertion being made

78b0d0b

An assertion was wrongly being made that when killing any request, the hpack lock would have been released. This of course only is true when we are killing the request holding the hpack lock. Better diff with the --ignore-all-space option.

H2: Clear the h2_req magic value when freeing the req

359ee11

This unsets the `struct h2_req` magic value as held in the req's workspace allocation when releasing the req. This ensures that a dangling `struct h2_req` pointer would fail its magic check after the req has been released.

H2: Don't lock when adding a new req

d9691fc

It is no longer needed to hold the lock when manipulating the open streams list. This list is after the redesign only ever accessed by the session thread.

H2: Hold session lock while checking the req scheduled status

022025f

Avoid the data race on `r2->scheduled` by holding the session mutex when testing it.

dridi requested review from mbgrydeland and walid-git July 18, 2025 20:11

dridi added r=trunk c=H/2 labels Jul 18, 2025

mbgrydeland reviewed Jul 24, 2025

View reviewed changes

mbgrydeland approved these changes Jul 28, 2025

View reviewed changes

walid-git mentioned this pull request Sep 19, 2025

H2 workspace related panic #4396

Closed

nigoroll self-requested a review October 1, 2025 13:06


		#include "cache_varnishd.h"

		uintptr_t const ws_pipeline_rollback = (uintptr_t)&ws_pipeline_rollback;

http2: Move all socket operations to the session thread #4368

Are you sure you want to change the base?

http2: Move all socket operations to the session thread #4368

Conversation

dridi commented Jul 18, 2025

Uh oh!

dridi commented Jul 21, 2025

Uh oh!

mbgrydeland Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dridi Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

mbgrydeland left a comment

Choose a reason for hiding this comment

Uh oh!

mbgrydeland Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

dridi Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

mbgrydeland Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

dridi Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

desdic commented Sep 8, 2025

Uh oh!

dridi commented Sep 8, 2025

Uh oh!

desdic commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

walid-git commented Sep 19, 2025

Uh oh!

walid-git commented Sep 19, 2025

Uh oh!

walid-git commented Sep 19, 2025

Uh oh!

walid-git commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbgrydeland Jul 24, 2025 •

edited

Loading

desdic commented Sep 15, 2025 •

edited

Loading