Skip to content

Conversation

@herin049
Copy link
Contributor

Adds support for FaaS telemetry API metrics.

@herin049 herin049 requested a review from a team as a code owner December 10, 2025 07:17
@herin049 herin049 force-pushed the feat/telemetryapi-metrics branch from c03090e to bed59f8 Compare December 19, 2025 18:33
@wpessers wpessers added enhancement New feature or request go Pull requests that update Go code labels Dec 21, 2025
@herin049 herin049 force-pushed the feat/telemetryapi-metrics branch from 465a284 to ff4db69 Compare December 23, 2025 07:40
@herin049 herin049 requested a review from wpessers December 23, 2025 07:44
@wpessers
Copy link
Contributor

wpessers commented Dec 23, 2025

Regarding the fix for the concurrent writes, I was actually thinking more in this direction: #2091 Lmk what you think. If it looks good, I suggest we just merge it and then add your changes on top after reverting 652d63e

The reason is I'd like to keep the expensive work of reading and parsing the JSON from the incoming requests concurrent. Not sure if it will really make a big difference, as I can't really estimate the rps that aws lambda telemetryapi sends, but it seems like an easy optimization to keep.

@herin049
Copy link
Contributor Author

Regarding the fix for the concurrent writes, I was actually thinking more in this direction: #2091 Lmk what you think. If it looks good, I suggest we just merge it and then add your changes on top after reverting 652d63e

The reason is I'd like to keep the expensive work of reading and parsing the JSON from the incoming requests concurrent. Not sure if it will really make a big difference, as I can't really estimate the rps that aws lambda telemetryapi sends, but it seems like an easy optimization to keep.

The changes in this PR do still keep the reading/parsing of JSON concurrent given that the telemetryHandler parses the record before inserting it into the channel.

I did initially consider just using a mutex instead of a separate channel, but I went with this approach to ensure that if we receive a burst of requests, the handler will still return almost immediately (assuming there is still capacity in the channel). If we use the mutex, it might result in blocking other requests, potentially resulting in dropped telemetry events

If the subscriber cannot process incoming telemetry fast enough, or if your function code generates very high log volume, Lambda might drop records to keep memory utilization bounded. When this occurs, Lambda sends a platform.logsDropped event.

However, the telemetry API does perform some buffering internally, so this probably isn't as big of an issue I make it out to be. I'll leave it up to you, I'm fine with just using the mutex, I do agree it is much simpler than setting up a channel and I'm guessing the performance is going to be roughly identical.

@wpessers
Copy link
Contributor

#2091 has since been discussed, approved and now merged.
For future transparency: We opted for that mutex solution because it's simpler and we can also apparently configure the buffering of the telemetry api ourselves if needed. @herin049 pointed to this AWS doc that clearly shows the possible configuration: https://docs.aws.amazon.com/lambda/latest/dg/telemetry-api-reference.html#telemetry-subscribe-api

@herin049 I'm ready to merge this one whenever you get the time to rebase :)

@herin049 herin049 force-pushed the feat/telemetryapi-metrics branch from ff4db69 to fd98d56 Compare December 26, 2025 18:44
@herin049
Copy link
Contributor Author

@wpessers rebased the changes, should be good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants