-
Notifications
You must be signed in to change notification settings - Fork 420
suggestion: external worker api #1928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestion: external worker api #1928
Conversation
|
Made this PR just to explain what I meant with exposing a struct. The interface feels very brittle since it has so many methods. This way there is just a Worker struct that is initialized with func (Worker) SendRequest(http.ResponseWriter, http.Request)
any = func (Worker) SendMessage(any, http.ResponseWriter)The Send functions being blocking feels more in line with how go usually operates, the in-between goroutine that pipes the request is largely unnecessary unless I'm missing something. |
|
This looks better than the current implementation to me. I would like to have @withinboredom opinion on this, and try to migrate the queue and the gRPC extension to be sure we don't miss something. |
|
I’m not really a fan of this because it accidentally encourages people to close over hidden state. Considering that people might be new to Go, coming from PHP, they may not realise they’re creating an unmaintainable mess. Example of hidden state: hiddenCounter := 0
opts.WithWorkerOnReady(func() {
hiddenCounter += 1
})But you could imagine that Instead, it might be better to make the interface smaller and break it up: type WorkerLifecycle interface {
OnReady(threadID int)
OnShutdown(threadID int)
OnServerShutdown(threadID int)
}And then remove those from the worker interface. From there, we only have to notify if the Interfaces in Go are for the receiver not for the sender (opposite of PHP). So, we can make them as fine-grained or coarse as we want them. Either the sender implements it, or not. (this is why many style guides for Go say something along the lines of: "always receive an interface and return a struct". |
|
Yeah you're right, the functions can be inline, in which case they include the scope. type Livecycle interface {
Handle(threadID int)
}
func WithWorkerOnReady(Livecycle) WorkerOption
func WithWorkerOnShutdown(Livecycle) WorkerOption
func WithWorkerOnServerShutdown(Livecycle) WorkerOptionOn a sidenote: I'd be much more concerned about the direct access to PHP's memory allocation in these hooks since they run on the C threads |
|
Being fluent in Go is a prerequisite to write FrankenPHP extensions. |
|
@withinboredom @AlliBalliBaba could we organize a video call in the next few days to find an agreement about what to do? I would like to tag a new release soon. |
|
👍 available most of the week, just not tomorrow evening. |
|
I'm on Discord (#1634) in the evenings. |
| } | ||
| } | ||
| // EXPERIMENTAL: SendRequest sends an HTTP request to the worker and writes the response to the provided ResponseWriter. | ||
| func (w Worker) SendRequest(rw http.ResponseWriter, r *http.Request) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we mean to make the receiver here a non-pointer? I'm not 100% sure, but this would mean (potentially) a copy needs to be made every time a message is sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it won't need to make a copy, the compiler should be smart enough in that case. The struct would only be copied if it's modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 99% sure Go doesn't have COW semantics. In any case, the struct header will def be copied each time these are called, and put on the stack. Letting it live in the heap would likely be kept in an L1/L2 cache. If we can kick out the options slice (make it a part of registration), it can fit safely into a cache line and basically be "free" (and not even matter which type of receiver we use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, that's a micro-optimisation (though this is on a hot-path). I think I have a good idea here, I'll create a PR to this PR -- but I do need to leave for a date in the next hour. I think this is fine for now, so it isn't a blocker from merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to guess what the CPU will actually do here without benchmarks, but I'm fine with making this a pointer either way, I'll wait for your PR👍
| w.activatedCount.Add(1) | ||
| } | ||
| // EXPERIMENTAL: SendMessage sends a message to the worker and waits for a response. | ||
| func (w Worker) SendMessage(message any, rw http.ResponseWriter) (any, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| fileName: fileName, | ||
| num: num, | ||
| options: options, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add back some of the original methods from the original interface:
- Name()
- NumThreads()
- Options()?
- FileName()
Now it is basically an opaque token for the extension (they can't access these fields) and they have to keep track of the values they passed themselves which just increases memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I'd rather keep the api minimal. Methods like these can be added anytime afterwards if necessary without breaking the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s an actual use case:
I’m working on a NATs Jetstream client, which has headers/bodies/etc -- so I need requests, not messages. There are also different types of workers (key/value change workers and stream workers). These details are needed after initialisation.
I can’t assume that the number of threads is the actual number of threads I requested. I need access to that value to ensure I initialise the appropriate number of consumers.
I can’t assume that the worker name is the same worker name I requested either, and I’d like my metrics to match frankenphp’s without hardcoding things.
Being that I know how FrankenPHP works, I can kinda cheat, but I’m trying to also work out "best practices" instead of cheating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are returning by value, we can just export the fields, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we actually won't be able to see the worker configuration until after registration, but since it is a value, it won't have the most updated information.
maybe this is a non-issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm in that case NumThreads() should return the actual number of running threads, not the num that was passed on startup. name and fileName shouldn't be modified after startup, so they probably shouldn't be public.
That also makes me think: wouldn't the extension workers currently drain threads from the general threadpool? So if someone uses an extension, all their threads might be unknowingly consumed by the extension. Seems kinda messy, the extension threads should probably be counted on-top of the configured num_threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also makes me think: wouldn't the extension workers currently drain threads from the general threadpool?
Heh, yeah, that's why the comment said: "don't be greedy". I would expect extension developers using this would have a way to define how many workers to consume, and it is up to the operator to do a bit of math to ensure there are either enough threads configured or enough CPUs allocated.
the extension threads should probably be counted on-top of the configured num_threads.
I don't think it is a good idea to try and be smart about this. Setting the number of threads in the caddyfile isn't hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm we can leave it as-is for now, maybe extensions will make that configurable anyways.
It might also make sense in the future to have a per-worker max_threads, since currently any worker can potentially consume all threads on scaling.
| fc.responseWriter = rw | ||
| fc.handlerParameters = message | ||
|
|
||
| internalWorker.handleRequest(fc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, there was a way to signal to the external worker that we were ready to receive requests, but now it only applies if the user knew to pass an option in. Can we safely handle the case where we start receiving messages/requests before the worker is ready? If I understand correctly, this would just end up forwarding to a non-worker thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you mean that ServeRequest() can be called prematurely? It's actually not enough to just wait for workers to be ready, Init() has to be completely finished to guarantee no race conditions.
I think we might need a OnServerReady() hook instead (that fires once), since the extension might not have control over Init() unlike a library user.
|
Can I merge this one into Thanks for the hard word btw! |
withinboredom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dunglas this looks like it is in a stable state? I think we've ironed out most of the weirdness. I haven't specifically tested the branch for the last few commits, but I don't see anything wrong with merging it. We can always bug fix anything major. It is a breaking change for all external worker extensions. So, we def need to call it out in the release notes -- but since it is currently marked experiemental, I think that is fine.
|
Should I pass the worker by reference or do you still plan a separate PR @withinboredom ?
Otherwise you can merge |
|
I'll do a separate PR! |
|
Let's merge then! (into refactor/external-worker-api) |
This is a suggestion for #1915, better integration into existing library options and differentiation between messages and requests.