Partial support for Cache-Control's no-cache directive #4073

dridi · 2024-03-05T11:12:06Z

This is the third iteration of this change, following a first iteration in #3992 that I spontaneously replaced with a second iteration in #4032 that received very helpful reviews.

In this new installment I took the design from #4032 and moved one more step in that direction.

The first important change is the strict semantics of the OC_F_BUSY flags. Its main purpose is now to inform the decision to disemnbark a request into an objhead waiting list. It still applies in other places to identify objects meant to be looked up for which the caching outcome is not known yet.

As a result, this flag is only ever raised in hsh_insert_busyobj().

In a couple places, this flag was checked as a means of coordination between a client and a fetch tasks. This is the role of a boc to coordinate these tasks, so the boc grew the responsibility of (indirectly) clearing the OC_F_BUSY flag in the fetch task.

In that regard, there are now stronger guarantees around this flag:

it only applies to potentially shareable objects (excluding req.body, synthetic and private objects)
there are four general paths that clear the flag
- when the beresp is ready for streaming (if beresp.do_stream)
- when the beresp fetch completes (if ! beresp.do_stream)
- if the fetch fails before the step implied by beresp.do_stream
- if the fetch task is never created (the busy objcore is withdrawn)
clearing the flag, and only that, can trigger a rush
- with the notable exception of HSH_Insert(), not involving a fetch
the rush policy is derived from the objcore flags
- the new policy for shareable objects is to wake up all requests

The convergence of all paths towards consistently dropping the OC_F_BUSY flag as soon as there is a caching outcome or lack thereof allows a finer grained waiting list activity (much less spurious wake ups) with lower latency.

This should result in overall less locking on objheads.

A request rushed reembarks the lookup step with the objcore that just dropped its busy flag. This means that we can perform a cache hit before entering the large lookup critical section if the object is shareable and matches the request. Even if the objcore was cached stale, it can now be served to all waiting clients. This enables a response-wide no-cache behavior (by opposition to per-header) that is currently prevented by the built-in VCL, but now in the realm of the possible.

A little bit of trivia: the exact same test case covering the partial no-cache support exists in all iterations of this change.

Being able to serve stale (but valid) waiting list hits solves the serialization problem for the most part. It shifts the problem towards incompatible variants, which creates spurious wake ups that are compensated by all the ones that were eliminated. It makes the problem much worse when there are a lot of variants, for example with a Vary: User-Agent header (that should be a reason to make them uncacheable).

In that case the vary_notice parameter can help diagnose such scenarios. A new vary_limit parameter could also be added as a mitigation, to result in pass transactions above that threshold. This is outside of the scope of this patch series, since this problem already exists with very similar detrimental effects.

dridi · 2024-03-19T13:14:12Z

There was one waiting list quirk I was aware of that I completely forgot: the safety net for vcl_backend_error.

Since the default ttl+grace+keep is zero upon entering vcl_backend_error, we currently inject an artificial life time for the synthetic object to allow the waiting list to get a chance to see it. With this patch series, this is no longer needed, because the waiting list can now process a lack of life time for a shareable object (aka cache-control: no-cache).

It is debatable whether we want to keep the artificial life time: on one hand it buffers backend errors for a little while, on the other hand it prevents new fetch attempts for a little while for epiphenomenonal backend errors.

Either way, the waiting list is no longer a problem, see 5bfcd0f.

dridi · 2024-04-08T12:15:19Z

#4085 added a minor merge conflict, I will try to deal with it before bugwash.

dridi · 2024-04-09T19:15:09Z

After a first offline review with @bsdphk there are several things to change:

keep an exponential rush even for cacheable objcores
be less formal on [re]validation through waiting list hits
maybe keep the vcl_backend_error safety net

The third point is not a strict requirement, up for debate. If we keep the the safety net, we should prune its keep period (@AlveElde noticed that the safety net goes higher than the default value for shortlived, defeating its purpose in this scenario).

dridi · 2024-05-06T08:28:45Z

Patch series updated as per @bsdphk's review:

exponential rush is back (see b0f36e1 + 683bf6e)
less formal wording (see 683bf6e)
safety net still present for @nigoroll's review

Only two commits changed, the aforementioned ones.

dridi · 2024-05-06T09:06:05Z

I'm looking at the waiting list coverage that is failing in CI, I was able to reproduce it locally with some load.

dridi · 2024-05-06T10:45:56Z

This might be a bug in varnishtest that I don't understand yet. When a timeout happens, the server spec ends before the final txresp for no apparent reason (in the logs).

I made the following change:

--- bin/varnishtest/tests/r02422.vtc
+++ bin/varnishtest/tests/r02422.vtc
@@ -7,7 +7,7 @@ server s1 {
 
        rxreq
        # wait until the new version is ready
-       delay 1
+       loop 10 {delay 0.1}
        txresp -hdr "Etag: 6"
 } -start

And the logs for s1 look like this:

**** s1    rxhdr|GET / HTTP/1.1\r
**** s1    rxhdr|Host: 127.0.0.1\r
**** s1    rxhdr|User-Agent: c4\r
**** s1    rxhdr|X-Forwarded-For: 127.0.0.1\r
**** s1    rxhdr|Via: 1.1 v1 (Varnish/trunk)\r
**** s1    rxhdr|Accept-Encoding: gzip\r
**** s1    rxhdr|X-Varnish: 1014\r
**** s1    rxhdr|\r
**** s1    rxhdrlen = 148
**** s1    http[ 0] |GET
**** s1    http[ 1] |/
**** s1    http[ 2] |HTTP/1.1
**** s1    http[ 3] |Host: 127.0.0.1
**** s1    http[ 4] |User-Agent: c4
**** s1    http[ 5] |X-Forwarded-For: 127.0.0.1
**** s1    http[ 6] |Via: 1.1 v1 (Varnish/trunk)
**** s1    http[ 7] |Accept-Encoding: gzip
**** s1    http[ 8] |X-Varnish: 1014
**** s1    bodylen = 0
**   s1    === loop 10 {delay 0.1}
**** s1    Loop #0
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.231
**** s1    Loop #1
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.331
**** s1    Loop #2
**** s1    Loop #3
**** s1    Loop #4
**** s1    Loop #5
**** s1    Loop #6
**** s1    Loop #7
**** s1    Loop #8
**** s1    Loop #9
***  s1    shutting fd 5
**   s1    Ending

So c4 is left hanging without a response and eventually times out.

dridi · 2024-05-06T12:27:07Z

I found the culprit, the test became racy and varnishtest was misleading (and I was also not paying enough attention).

dridi · 2024-05-06T12:38:02Z

I added 5 patches at the beginning of the series to help future troubleshooting and close the race before making it too easy to trigger.

dridi · 2024-05-06T12:53:39Z

Never mind, I only made the problem harder to reproduce on my end with better synchronization, there is wrong linked to bringing back the exponential rush for hits.

dridi · 2024-05-06T15:55:11Z

The problem was that the exponential rush of hits was introduced without safeguards, allowing requests to rush each other in turn when a waiting list hit was followed by a VCL restart.

In r2422 the clients c5 and c6 would restart upon a regular hit and they are both supposed to enter c4's waiting list. However, if one of them "lost" the race against c4, it would fail a third fetch and rush the other one that in turn would fail a fetch and rush the other than meanwhile entered the waiting list. Until one of them would exceed max_restarts.

The fix introduces a fence between clients entering a waiting list and objcores rushing the waiting list. The objcores can only rush clients that entered before their initial rush.

dridi · 2024-06-10T06:18:12Z

Rebased onto master to resolve minor conflict with #4109 (touching adjacent lines in cache_varnishd.h).

dridi · 2024-12-09T12:34:26Z

Force push:

rebase against current trunk
squashme commits squashed where they belong
two more commits from the enterprise side of the fence

dridi · 2025-07-11T14:28:04Z

Rebased onto current master (8cbf914) as requested in #4365 (comment).

nigoroll · 2025-07-11T16:33:10Z

To get ahead, I pushed a rebase to https://github.com/nigoroll/varnish-cache/tree/partial_nocache

dridi · 2025-07-11T16:48:32Z

How is that different from what I did?

nigoroll · 2025-07-11T17:04:58Z

Sorry, I guess I had seen an outdated head of your branch. I am currently traveling and it might be that bad connectivity made me draw wrong conclusions.

dridi · 2025-07-11T18:21:56Z

I thought I had done something wrong and I couldn't see the difference...

nigoroll · 2025-07-13T08:21:15Z

I thought I had done something wrong and I couldn't see the difference...

Yes, it was my fault, sorry again.

dridi · 2025-08-26T10:02:27Z

After @hermunn's review I had another one from @mbgrydeland and I pushed 8 commits that came out of our discussions, half of which were directly requested by Martin.

The actual review happened on the varnish enterprise side and we are moving forward with this change. It has seen high traffic in production without causing any problem, and while this exposure to production traffic may not offer comprehensive coverage of this change, we believe that it results in all-around better behavior of waiting lists.

Please review the additional commits and if approved, I can clean up the patch series, and then rebase it onto current master to resolve the minor merge conflict with #4362.

dridi · 2025-08-26T10:27:33Z

Two test cases failed in the sanitizer job, both of them ran into EAGAIN in ASAN's pthread_create() wrapper, what I suggested to address in #4352 (comment).

nigoroll · 2025-08-27T13:32:46Z

bugwash: OK, @dridi please fight any fires before release

(edit, bugwash verdict changed)

This reduces a great deal of spurious activity on the private_oh waiting list and removes unncessary conditionals when dealing with cacheable (or at least searchable) objects.

The OC_F_BUSY flag is used to decide whether to disembark a request into an objhead's waiting list. Therefore, it makes sense to only give this flag to objcores for which the outcome may be a shareable object. This excludes synthetic and private (req.body and passed req) objcores, and objects inserted with HSH_Insert(). Inserted objects are never busy since they come already ready to use. Pass transactions don't need this flag either. The OC_F_BUSY flag was somewhat conflated with struct busyobj in the fetch state machine. We don't need OC_F_BUSY to convey that a fetch task is coordinated with a req task, oc->boc is already doing that. The OC_F_BUSY|OC_F_PRIVATE bits should be considered an oxymoron since OC_F_BUSY drives waiting list activity and OC_F_PRIVATE is for objects that are not shareable by definition.

This is not a nice caching outcome. In fact, when a fetch fails we don't know the outcome. In that case we need to rush one request that will trigger a new fetch and hopefully complete with a proper outcome to rush other requests accordingly. As far as lookup is concerned there is no point in keeping the OC_F_BUSY flag so in the case where a failure happens before a proper HSH_Unbusy() it can be dropped before the rush. This would otherwise happen once the last objcore reference is dropped. One is held by the current fetch task and the other one by the client task that triggerred the fetch. It would amount to an unnecessary delay. The OC_F_FAILED flag will prevent it from being looked up, so there is no reason to delay the rush until the final objcore reference is dropped. This also means that once the client task that triggered the fetch task resumes it can only be operating on a non-busy object.

The BOS_PREP_STREAM state is retired. It was used as a stop-gap to "unbusy" objects before waking up the client task for good. Instead, the HSH_Unbusy() and HSH_Fail() functions are called once the gap is closed, depending on the outcome. This removes one unnecessary signal and consolidates multiple call sites, ensuring that objects always drop the OC_F_BUSY flag from a central location, for fetch tasks.

This is the counterpart of HSH_Unbusy() for the cases where the req task will not schedule a fetch task, at which point we know we can already wake up a request, if there is one on the waiting list. This encapsulates the "wake up only one request" semantic using a new OC_F_WITHDRAWN flag. Although this flag is not being used yet, it will when the state of the objcore determines the rush policy instead of the call site. When a bgfetch is withdrawn, the extra rush becomes redundant with the policy rush on the hit objcore. There will eventually be no more rush for the hit objcore, which will remove the redundancy just introduced. In order to withdraw objcores in a single critical section, a function HSH_DerefObjCoreUnlock() is added.

Rushing the waiting list because an objcore was dropped was a source of spurious wakeups, including the case where the dropped objcore is not even relevant to the objhead waiting list. It is now guaranteed that an object either dropped its OC_F_BUSY flag once ready (otherwise failed or withdrawn) or never had this flag to begin with for objcores that can't be looked up.

Operating on an objcore allows to preserve the different rush strategies that existed so far: - rush one request for failed fetch tasks - rush one request for withdrawn objects - rush rush_exponent requests otherwise For the cases where no rush is expected, a null objcore is passed. For cacheable objects, all waiters are temporarily woken up at once, until waiting list matches are handled separately.

From now on, the policy will be derived from the objcore.

A rush used to be indiscriminate, and it is now operated based on the state of an objcore, by inspecting its flags. That change of behavior freed call sites from having to know what kind of rush to operate, and more importantly removed the need to attempt a rush whenever an objcore reference was released. It ensured that an objcore would only get the OC_F_BUSY flag if it was created for the purpose of a fetch, with the guarantee that a rush would be triggered as soon as the fetch got an outcome: - withdrawn (fetch not even attempted) - failed - ready (either for streaming, or complete) All these prior changes built up to this change, allowing the effective implementation of the unqualified form for the cache-control no-cache directive. RFC9112 describe the server-side no-cache directive like this: > The no-cache response directive, in its unqualified form (without an > argument), indicates that the response MUST NOT be used to satisfy any > other request without forwarding it for validation and receiving a > successful response; see Section 4.3. This can translate in VCL as beresp.ttl == 0 and beresp.grace == 0, but still allowing to serve from cache. Doing this naively leads to an infamous "waiting list serialization" phenomenon that tends to take time to diagnose and depending on the setup may be easy to remediate. One way to solve this from a Varnish perspective is to assimilate the waiting list as a collection of requests waiting for a response to be validated or revalidated. In the event of a no-cache directive (or VCL override to zero beresp.ttl and beresp.grace) and in the absence of any other criterion preventing an object to be cached, all the requests can consume the just-validated response. To make this possible, a few things needed to change. A happens-before relationship between entering the waiting list and taking part in the rush is needed, and the time reference for this relationship is the insertion of a new objcore in the cache. Let's call this time t_insert. When a rush begins there are two scenarios for requests. Requests that entered the waiting list before t_insert should evaluate the objcore, and when applicable, propagate the rush exponentially. Requests that entered the waiting list after t_insert already evaluated the objcore because by the time a rush begins, the OC_F_BUSY flag got cleared already. They should not evaluate the objcore again, and another incoming fetch outcome will eventually wake them up. What guarantees the happens-before relationship is that t_insert happens under the objhead lock. The relationship is effectively implemented with a generation counter for the objhead. Requests inherit the generation when they enter the waiting list. Objcores inherit the generation when they begin the rush. This allows the exponential rush propagation to know when to stop, and the objcore can be evaluated outside of the lookup critical section. Only incompatible variants should force requests to reenter the waiting list, reducing the scope for the serialization problem to a profusion of variants (like the classic 'Vary: User-Agent' pitfall). Since most spurious rushes at every corner of objhead activity are gone, this change puts all the spurious activity on the incompatible variants alone instead of all objects on more occasions. If a cacheable object was inserted in the cache, but already expired, this behavior enables cache hits. This can be common with multi-tier Varnish setups where one Varnish server may serve a graced object to another, but true of any origin server that may serve stale-yet-valid responses. The waiting list enables a proper response-wide no-cache behavior from now on, but the built-in VCL prevents it by default. This is also the first step towards implementing no-cache and private support at the header field granularity.

If a synthetic beresp is inserted into the cache with no TTL, grace or keep at all, it is now considered validated (although not in the best circumstances) and will be passed to all requests in the waiting list.

This centralizes the rules regarding BOC state changes and whether to inform clients of those changes. For example, BOS_REQ_DONE is only needed by grace hits, saving one broadcast on the BOC condvar on misses and straight passes. It also formalizes when a busyobj will release its client request.

To ease test coverage for waitling list operations, requiring less disembarked requests for specific cases. Added is a test case showing that a rush miss propagates further wake ups.

Better diff with the --ignore-all-space option.

dridi · 2025-08-27T14:25:09Z

I have force-pushed to squash the last batch of review commits, and to rebase onto current master.

I'm having a pass on the code while CI is running.

dridi added b=enhancement r=trunk c=varnishd b=cleanup labels Mar 5, 2024

dridi mentioned this pull request Mar 5, 2024

hash: A waiting list match is a formal revalidation #4032

Closed

dridi requested a review from mbgrydeland March 5, 2024 11:13

dridi force-pushed the partial_nocache branch 2 times, most recently from 23fd38b to 8ed6d3d Compare April 8, 2024 12:49

nigoroll mentioned this pull request Apr 29, 2024

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

Open

dridi force-pushed the partial_nocache branch from 8ed6d3d to ba76e89 Compare May 6, 2024 08:26

dridi force-pushed the partial_nocache branch from ba76e89 to 66cf4f1 Compare May 6, 2024 12:36

dridi force-pushed the partial_nocache branch from 66cf4f1 to 8aaeced Compare May 6, 2024 15:23

dridi force-pushed the partial_nocache branch from 8aaeced to a31c916 Compare June 10, 2024 06:16

dridi mentioned this pull request Sep 12, 2024

Unexpected VCL subroutine flow #4163

Open

dridi force-pushed the partial_nocache branch from a31c916 to 7e4bc71 Compare October 14, 2024 13:19

dridi mentioned this pull request Dec 2, 2024

Optimize HSH_Lookup #4236

Closed

dridi force-pushed the partial_nocache branch from 1dd77c2 to 9115bb7 Compare December 9, 2024 12:33

dridi mentioned this pull request Mar 31, 2025

8.0 Compliance improvements #3246

Open

dridi force-pushed the partial_nocache branch from 52655d7 to 0b3826e Compare August 26, 2025 09:49

dridi force-pushed the partial_nocache branch from 0b3826e to 4c14d8c Compare August 27, 2025 14:02

dridi and others added 14 commits August 27, 2025 16:18

hash: Skip the unbusy ceremony for private objects

faf10b3

This reduces a great deal of spurious activity on the private_oh waiting list and removes unncessary conditionals when dealing with cacheable (or at least searchable) objects.

hash: Retire HSH_RUSH_POLICY

5c3aa38

From now on, the policy will be derived from the objcore.

fetch: Retire waiting list safety net

30ef04f

If a synthetic beresp is inserted into the cache with no TTL, grace or keep at all, it is now considered validated (although not in the best circumstances) and will be passed to all requests in the waiting list.

param: Lower rush_exponent minimum value

fef09b6

To ease test coverage for waitling list operations, requiring less disembarked requests for specific cases. Added is a test case showing that a rush miss propagates further wake ups.

hash: Centralize vary match

6fcef42

Better diff with the --ignore-all-space option.

hash: Missing oh null check in HSH_Fail()

5d97332

dridi force-pushed the partial_nocache branch from 4c14d8c to 5d97332 Compare August 27, 2025 14:19

dridi merged commit 66b17b7 into varnishcache:master Aug 27, 2025
9 of 10 checks passed

dridi deleted the partial_nocache branch August 27, 2025 15:22

dridi mentioned this pull request Oct 3, 2025

fetch: Close data race between BOC state and objcore flags #4402

Merged

Partial support for Cache-Control's no-cache directive #4073

Partial support for Cache-Control's no-cache directive #4073

Uh oh!

Conversation

dridi commented Mar 5, 2024

Uh oh!

dridi commented Mar 19, 2024

Uh oh!

dridi commented Apr 8, 2024

Uh oh!

dridi commented Apr 9, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented May 6, 2024

Uh oh!

dridi commented Jun 10, 2024

Uh oh!

dridi commented Dec 9, 2024

Uh oh!

dridi commented Jul 11, 2025

Uh oh!

nigoroll commented Jul 11, 2025

Uh oh!

dridi commented Jul 11, 2025

Uh oh!

nigoroll commented Jul 11, 2025

Uh oh!

dridi commented Jul 11, 2025

Uh oh!

nigoroll commented Jul 13, 2025

Uh oh!

dridi commented Aug 26, 2025

Uh oh!

dridi commented Aug 26, 2025

Uh oh!

nigoroll commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dridi commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nigoroll commented Aug 27, 2025 •

edited

Loading