feat: add support for running on AWS Lambda managed instance types by herin049 · Pull Request #2083 · open-telemetry/opentelemetry-lambda

herin049 · 2025-12-20T04:12:39Z

Adds support for running on AWS Lambda managed instances.

Lambda managed instances differ from standard lambda functions in several areas, but the differences most relevant for the OpenTelemetry collector layer are:

For Managed instance types the Extension API does not allow subscribing to the Invoke event type.
For Managed instances the telemetry API does not report platform.runtimeDone events
A Managed Lambda instance is never frozen (removes the need for the decouple processor)
Multiple Lambda processes can be created within a single execution environment (this is particularly relevant for ensuring that auto-instrumentation works with the bootstrap script)
Multiple Lambda function invocations may be simultaneously in process for a given execution environment

For more information see: https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html

In order to accommodate the changes above, the following changes have been made to the layer if the extension determines the initialization type is lambda-managed-instances using the AWS_LAMBDA_INITIALIZATION_TYPE environment variable:

The Extension no longer subscribes to the Invoke event type, and the extension no longer subscribes to the telemetry API to listen for the platform.runtimeDone event if the initialization type is lambda-managed-instances
The decouple processor is no longer added to any pipelines.
The FunctionInvoked() and FunctionFinished() lifecycle methods are no longer invoked for lifecycle listeners.
The wrapper script has been updated to instrument inside the wrapper script itself to ensure instrumentation is properly applied for newly created Python processes. (applies to all initialization types)
A few changes have been made to the telemetry API receiver to accomidate changes to the events reported by the telemetry API.

I have added relevant unit tests and have manually verified the implementation using the following repo: https://github.com/herin049/aws-lambda-managed which configures the layer to export signals to Grafana cloud.

serkan-ozal · 2025-12-20T08:32:17Z

Hi @herin049,

For the Python SDK related changes, as far as I understood from your explanations, AWS_LAMBDA_EXEC_WRAPPER is not taken care for the spawned Lambda processes (but main Lambda runtime process) in Lambda managed instances. Is that correct? Is there any official AWS documentation mentioning/explaining this behavior?

Because otherwise (if AWS_LAMBDA_EXEC_WRAPPER could still be used for the spawned Lambda processes) opentelemetry-instrument CLI would instrument spawned Lambda processes.

herin049 · 2025-12-23T19:33:08Z

Hi @serkan-ozal, thanks for the review.

Yes, your understanding is correct. To reiterate, here is what I assume happens internally for AWS Lambda managed instances:

During initialization of the EC2 VM, Lambda first executes the AWS_LAMBDA_EXEC_WRAPPER as usual.
After this initialization phase completes, Lambda spawns N child Python processes and imports the handler module directly in each child process by calling importlib.import_module. This is not an issue for most wrapper scripts since typically they involve just setting environment variables/manipulating the file system in some manner, so these side effects will be visible to each process. However, for the auto instrumentation library, this is an issue because the new processes no longer have the patching applied from the auto instrumentation libraries because they are fresh interpreter processes.

I can't find any documentation on this except with the docs stating that managed lambda instances can serve many requests concurrently, and the only way to do this while achieving true parallelism with Python is to use multiple processes.

Regardless, this change will ALWAYS be safe to make because even for standard lambda instances, the behavior will be identical. That is, running opentelemetry-instrument python main.py is nearly identical to running python main.py and calling auto_instrumentation.initialize() at the top of the file. These changes also make this PR: #2069 irrelevant. Essentially, even if the assumptions I am making are incorrect, regular lambda functions will still be instrumented properly, which is why this is the most straightforward/safest approach to take.

serkan-ozal · 2025-12-24T07:36:52Z

@herin049

Yes, I know that using a wrapper handler which delegates to the user handler is a very common approach. But while asking to verify this behavioral change with the AWS_LAMBDA_EXEC_WRAPPER env var, I was mostly thinking about the other runtimes. Except for the Ruby runtime, auto instrumentation for the other runtimes works without wrapper (NODE_OPTIONS for NodeJS and JAVA_TOOL_OPTIONS for Java agent).

However, even though AWS_LAMBDA_EXEC_WRAPPER env var is not taken care for spawned processes, this should not be an issue for the NodeJS and Java runtimes as they are only set setting some env vars to configure/activate OTEL instrumentation and these env vars should be inherited by the spawned worker/child processes, unless main process filters env vars to pass them (that is the point we need to check). And one more point on this, loading user handler in the wrapper handler is a little bit complex in NodeJS as we need to take care of some more cases (paths, CJS vs ESM, etc ...). Another point here is we may also need to be sure that OTEL SDK is not instrumenting the main process itself. Because, otherwise, depending on the implementation of the main process, there might be some spans reported by OTEL SDK which are not related to the user code.

In addition to the points above, for Python, instead of wrapper handler, one other approach would be using sitecustomize.py which will be available under PYTHONPATH. Basically, you should be able to initialize the OpenTelemetry SDK automatically at Python startup by placing your OTEL initialization code in a sitecustomize.py file and ensuring that its directory is included in PYTHONPATH. Python will import sitecustomize on interpreter startup, allowing OTEL to be configured before any application code runs.

herin049 · 2025-12-24T20:50:54Z

Thanks @serkan-ozal

I see where you are coming from now. To limit the scope of these changes I've just focused on the collector level changes and adding support for Python for now. If you'd like, I can create an issue for verifying/making changes for all of the supported lambda runtime for Lambda managed instances. I am not as familiar with how auto instrumentation works for the other runtimes, but I can certainly look into this more and make any required changes based on my findings in order to support managed instances. In either case, the collector changes I have made in this PR will not change even if there are substantial changes to the auto instrumentation logic in some runtimes.

With regard to your concern with instrumenting the original parent process, I don't think this is a concern. I have not observed any irregular spans being reported even when enabling all of the auto instrumentation libraries.

From what I have found so far, it seems like the auto instrumentation wrapper command is not working properly for the worker Python processes because somehow sitecustomize.py is not being loaded properly for the worker processes (the auto instrumentation wrapper command already adds a custom sitecustomize.py file to the PYTHON_PATH environment variable). There are two possible reasons for this that I can think of: the updated PYTHON_PATH environment variable is not being propagated to the other Python processes correctly or sitecustomize.py is not being loaded for the new Python processes. I think the latter scenario is more likely because I know that the environment variables set in the wrapper script are indeed propagated to the worker processes correctly since otherwise, the modifications to the _HANDLER and ORIG_HANDLER environment variables would not be properly set, leading to auto instrumentation not being applied at all (but from my tests I do see telemetry being properly reported in all cases). So, we can try to switch to an approach that involves us creating our own sitecustomize.py file and having the wrapper script add it to the PYTHON_PATH, but I am guessing that we will run into the same issues as with the wrapper command.

The reason I made the changes to the wrapper script the way I did is because: they are relatively minimal and the changes are backwards compatible, ensuring that the behavior is the same as the previous wrapper script.

serkan-ozal · 2025-12-25T15:40:39Z

@herin049 I think it is better to limit the scope of this PR to only changes related to the collector. For the SDK related changes, first, I would like to understand the behavior for the main process and spawned processes (I will be looking into it too) when AWS Lambda managed instance is used.

So, my take is please create issue(s) for the SDK related changes and remove the Python related changes from this PR.
@tylerbenson @wpessers @pragmaticivan @maxday WDYT?

herin049 · 2025-12-25T17:23:56Z

@herin049 I think it is better to limit the scope of this PR to only changes related to the collector. For the SDK related changes, first, I would like to understand the behavior for the main process and spawned processes (I will be looking into it too) when AWS Lambda managed instance is used.

So, my take is please create issue(s) for the SDK related changes and remove the Python related changes from this PR. @tylerbenson @wpessers @pragmaticivan @maxday WDYT?

Sounds good to me @serkan-ozal. I have reverted the Python SDK related changes in this PR, I can work on a follow-up PR to update all of the SDKs where necessary to support managed instance types, and do some additional research myself.

wpessers · 2025-12-25T17:48:13Z

So, my take is please create issue(s) for the SDK related changes and remove the Python related changes from this PR. @tylerbenson @wpessers @pragmaticivan @maxday WDYT?

Yes I agree!

serkan-ozal · 2026-02-12T15:29:15Z

I am OK with the changes here, @herin049 can you please resolve the conflicts, so then we can merge this one.

herin049 · 2026-02-12T15:33:39Z

I am OK with the changes here, @herin049 can you please resolve the conflicts, so then we can merge this one.

Awesome, will resolve the conflicts shortly.

herin049 · 2026-02-12T18:48:38Z

@wpessers and @serkan-ozal made a trivial cleanup change after reviewing the code again, should be good to go

Copilot

Pull request overview

This PR updates the Lambda layer/collector to support AWS Lambda managed instance types by branching behavior based on AWS_LAMBDA_INITIALIZATION_TYPE (notably: no Invoke subscription, no platform.runtimeDone wait path, and no decouple processor insertion), and adjusts telemetry parsing to tolerate differences in managed-instance telemetry events.

Changes:

Add collector/lambdalifecycle module to parse AWS_LAMBDA_INITIALIZATION_TYPE into a typed InitType.
Update lifecycle manager + extension client registration to subscribe only to SHUTDOWN (and skip Telemetry API listener) for managed instances.
Update telemetry API receiver to better handle missing/partial report fields and request-id association, and adjust tests accordingly.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
collector/receiver/telemetryapireceiver/receiver_test.go	Updates expected platform report log formatting behavior.
collector/receiver/telemetryapireceiver/receiver.go	Adds init-type awareness, improves report formatting tolerance, and adjusts request-id handling for managed instances.
collector/receiver/telemetryapireceiver/go.mod	Adds dependency/replace wiring for the new lambdalifecycle module.
collector/lambdalifecycle/types_test.go	Adds unit tests for InitType parsing/string/env behavior.
collector/lambdalifecycle/types.go	Defines InitType enum + parsing helpers.
collector/lambdalifecycle/go.sum	Adds sums for lambdalifecycle module dependencies.
collector/lambdalifecycle/go.mod	Declares the lambdalifecycle submodule and its test dependency.
collector/lambdalifecycle/constants.go	Defines `AWS_LAMBDA_INITIALIZATION_TYPE` env var constant.
collector/internal/lifecycle/manager_test.go	Updates tests to use new extension client constructor signature.
collector/internal/lifecycle/manager.go	Branches extension event subscriptions + Telemetry API listener startup based on init type.
collector/internal/lifecycle/constants.go	Centralizes `AWS_LAMBDA_RUNTIME_API` env var name.
collector/internal/extensionapi/client.go	Extends NewClient/Register to accept configurable subscribed event types.
collector/internal/confmap/converter/decoupleafterbatchconverter/converter_test.go	Adds test ensuring decouple isn’t appended for managed instances.
collector/internal/confmap/converter/decoupleafterbatchconverter/converter.go	Skips decouple insertion when init type is managed instances.

Comments suppressed due to low confidence (1)

collector/receiver/telemetryapireceiver/receiver.go:384

In createLogs(), the current request ID is updated twice for platform.start: first via updateCurrentRequestId(requestId) and then again via direct assignment to r.currentFaasInvocationID. The direct assignment bypasses the LambdaManagedInstances guard in updateCurrentRequestId and is redundant for other init types; remove the direct assignment and rely on updateCurrentRequestId (or route all writes through the helper) so managed-instance behavior stays consistent.

			if requestId != "" {
				logRecord.Attributes().PutStr(string(semconv.FaaSInvocationIDKey), requestId)

				// If this is the first event in the invocation with a request id (i.e. the "platform.start" event),
				// set the current invocation id to this request id.
				if el.Type == string(telemetryapi.PlatformStart) {
					r.currentFaasInvocationID = requestId
				}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T14:22:38Z

collector/receiver/telemetryapireceiver/receiver.go

+			if r.lastPlatformStartTime != "" && el.Time != "" {
+				r.lastPlatformEndTime = el.Time
+				r.logger.Info(fmt.Sprintf("Init end: %s", r.lastPlatformEndTime), zap.Any("event", el))
+			}


The PlatformInitReport branch updates lastPlatformEndTime but never creates the init span or clears lastPlatformStartTime/lastPlatformEndTime. If managed instances emit platform.initReport instead of platform.initRuntimeDone, this will prevent coldstart/init spans from being produced and can leave stale init timestamps in the receiver state. Consider mirroring the PlatformInitRuntimeDone handling here (createPlatformInitSpan when start/end are present, then reset the timestamps).

Suggested change

}

}

if len(r.lastPlatformStartTime) > 0 && len(r.lastPlatformEndTime) > 0 {

if record, ok := el.Record.(map[string]any); ok {

if td, err := r.createPlatformInitSpan(record, r.lastPlatformStartTime, r.lastPlatformEndTime); err == nil {

err := r.nextTraces.ConsumeTraces(context.Background(), td)

if err == nil {

r.lastPlatformEndTime = ""

r.lastPlatformStartTime = ""

} else {

r.logger.Error("error receiving traces", zap.Error(err))

}

}

}

}

Copilot · 2026-02-16T14:22:39Z

collector/internal/lifecycle/manager.go

 	listener           *telemetryapi.Listener
 	wg                 sync.WaitGroup
 	lifecycleListeners []lambdalifecycle.Listener
+	initType           lambdalifecycle.InitType
 }


The new manager.initType field is assigned in NewManager but never read anywhere, so it adds state without affecting behavior. Either remove the field or use it in processEvents()/other methods so it serves a purpose (e.g., for logging, guarding listener usage, or future branching).

Copilot · 2026-02-16T14:22:39Z

collector/internal/lifecycle/manager.go

+	var extensionEvents []extensionapi.EventType
+	initType := lambdalifecycle.InitTypeFromEnv(lambdalifecycle.InitTypeEnvVar)
+	if initType == lambdalifecycle.LambdaManagedInstances {
+		extensionEvents = []extensionapi.EventType{extensionapi.Shutdown}
+	} else {
+		extensionEvents = []extensionapi.EventType{extensionapi.Invoke, extensionapi.Shutdown}
+	}


The Lambda managed-instances branching in NewManager (event subscription selection + skipping Telemetry API listener startup) is new behavior but isn't covered by unit tests in this package. Add a test that sets AWS_LAMBDA_INITIALIZATION_TYPE to lambda-managed-instances and verifies the extension client is registered with only SHUTDOWN and that listener/Wait + FunctionInvoked/Finished paths are not used.

herin049 requested a review from a team as a code owner December 20, 2025 04:12

herin049 mentioned this pull request Dec 26, 2025

Update AWS Lambda instrumentation SDKs to support managed lambda instance types #2094

Open

herin049 force-pushed the feat/managed-instances branch from 2b22ef9 to 114e673 Compare December 27, 2025 04:50

herin049 added 7 commits February 12, 2026 13:11

feat: add support for running on AWS Lambda managed instance types

3dabd5e

fix go tidy error

94760f9

update lambdalifecycle path for telemetryapireceiver

1335267

fix failing test

0eea331

fix missing request id

182dfbb

revert Python SDK related changes

02f7c14

fix merge conflicts

904b61f

herin049 force-pushed the feat/managed-instances branch from 114e673 to 904b61f Compare February 12, 2026 18:38

wpessers approved these changes Feb 12, 2026

View reviewed changes

small cleanup

866ab56

tylerbenson requested a review from Copilot February 16, 2026 14:17

Copilot started reviewing on behalf of tylerbenson February 16, 2026 14:17 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

wpessers approved these changes Feb 18, 2026

View reviewed changes

wpessers merged commit 9faf756 into open-telemetry:main Feb 18, 2026
20 checks passed

-			}
+			}
+			if len(r.lastPlatformStartTime) > 0 && len(r.lastPlatformEndTime) > 0 {
+				if record, ok := el.Record.(map[string]any); ok {
+					if td, err := r.createPlatformInitSpan(record, r.lastPlatformStartTime, r.lastPlatformEndTime); err == nil {
+						err := r.nextTraces.ConsumeTraces(context.Background(), td)
+						if err == nil {
+							r.lastPlatformEndTime = ""
+							r.lastPlatformStartTime = ""
+						} else {
+							r.logger.Error("error receiving traces", zap.Error(err))
+						}
+					}
+				}
+			}

Conversation

herin049 commented Dec 20, 2025

Uh oh!

serkan-ozal commented Dec 20, 2025

Uh oh!

herin049 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serkan-ozal commented Dec 24, 2025

Uh oh!

herin049 commented Dec 24, 2025

Uh oh!

serkan-ozal commented Dec 25, 2025

Uh oh!

herin049 commented Dec 25, 2025

Uh oh!

wpessers commented Dec 25, 2025

Uh oh!

serkan-ozal commented Feb 12, 2026

Uh oh!

herin049 commented Feb 12, 2026

Uh oh!

herin049 commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

herin049 commented Dec 23, 2025 •

edited

Loading