Skip to content

Comments

feat: Reduce streaming DNS failures with stale-fallback DNS cache#323

Open
abelonogov-ld wants to merge 6 commits intomainfrom
andrey/dns-sdk34
Open

feat: Reduce streaming DNS failures with stale-fallback DNS cache#323
abelonogov-ld wants to merge 6 commits intomainfrom
andrey/dns-sdk34

Conversation

@abelonogov-ld
Copy link

@abelonogov-ld abelonogov-ld commented Feb 21, 2026

Summary

  • Add CachingDns, a thread-safe OkHttp Dns wrapper that caches successful lookups (10-min TTL) and returns stale cached addresses when a fresh resolution fails — preventing UnknownHostException from killing the stream during network transitions
  • Share a ConnectionPool and the DNS resolver across StreamingDataSource restarts (context switches, foreground/background toggles, network changes) via StreamingDataSourceBuilderImpl, so cached state survives data source recreation
  • Explicitly set retryOnConnectionFailure(true) on the streaming OkHttpClient on all API levels

Background

We observed a high percentage of DNS failures in streaming connections. The root cause is that StreamingDataSource creates a new OkHttpClient on every start() call, and ConnectivityManager restarts the data source on every network change — exactly when DNS is most fragile. OkHttp uses Dns.SYSTEM (InetAddress.getAllByName) with no caching, and Android's system DNS cache has very short TTLs (sometimes ~2 seconds) that get cleared on network transitions.

This pattern of application-level DNS caching with stale fallback is well-established: Alibaba's HTTPDNS SDK, gRPC-Java's DnsNameResolver, and Square's own DnsOverHttps module all implement similar approaches. Google validated the pattern by adding DnsOptions.StaleDnsOptions to the Android framework in API 34.

Test plan

  • Unit tests for CachingDns: fresh resolution, cache hits within TTL, TTL expiry refresh, stale fallback on failure, cold failure propagation, per-hostname isolation, expiration boundary
  • Existing StreamingDataSourceTest passes (builder creates data source with new constructor args transparently)
  • Verify via logs that CachingDns warns on stale fallback and that stream reconnects succeed during network changes

Note

Medium Risk
Touches streaming network connection setup and DNS resolution behavior; incorrect caching/pooling could cause connectivity regressions or use stale IPs longer than intended.

Overview
Adds CachingDns, an OkHttp Dns wrapper that caches successful lookups with a TTL and falls back to stale cached addresses when fresh resolution fails, reducing UnknownHostException disruptions during mobile network transitions.

Updates streaming to reuse a shared DNS resolver and ConnectionPool across StreamingDataSource restarts, and wires these into the EventSource OkHttp client configuration (including enabling retryOnConnectionFailure(true)). Includes unit tests covering cache hit/expiry behavior, stale fallback, per-host caching, and eviction behavior when exceeding MAX_ENTRIES.

Written by Cursor Bugbot for commit 0a6a23e. This will update automatically on new commits. Configure here.

@abelonogov-ld abelonogov-ld requested a review from a team as a code owner February 21, 2026 00:14
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@abelonogov-ld abelonogov-ld changed the title feat: Reduce streaming DNS failures with stale-fallback DNS cache (API < 34) Body: feat: Reduce streaming DNS failures with stale-fallback DNS cache Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant