Skip to content

[Feature] AEnv sdk support env instance retry while it can't get healty in limited time #37

@JacksonMei

Description

@JacksonMei

Checklist

  • This feature will maintain backward compatibility with the current sdk. If not, please raise a refactor issue first.

Background

When an env instance fails to become healthy within the configured timeout, the SDK immediately raises an error and the whole job is aborted. Users would like the SDK to retry spawning a new instance a few times before giving up, reducing flakiness caused by transient infra issues.

Potential Solution

Use max_retries=3 (configurable) to Environment Class.
In AEnv._wait_for_healthy() wrap the health-check loop with a retry block: on timeout release the unhealthy instance and create a new one until retry count is exhausted.
Expose AEnvLastRetryExhausted exception for final failure so callers can still handle it explicitly.

Additional Information

(Add any relevant context, references, or supporting data here.)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions