-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description
A segmentation fault occurs in the MIT Kerberos native libraries when multiple SQL Server connections concurrently perform SSPI authentication (Integrated Security) on Alpine Linux 3.23. The crash occurs during the P/Invoke call from NegotiateStreamPal.GssInitSecurityContext into gss_init_sec_context() in libgssapi_krb5.so.
This regression was introduced when Alpine updated their krb5 packages in version 3.23. The same workload runs without issue on Alpine 3.20, 3.21, and 3.22.
Our base image is mcr.microsoft.com/dotnet/aspnet:8.0-alpine, so a few weeks ago this change was pulled in via our CI automatically and caused issues under load.
Reproduction Steps
- Create a .NET 8 application that connects to SQL Server using Integrated Security
- Deploy to a container based on mcr.microsoft.com/dotnet/aspnet:8.0-alpine (which pulls Alpine 3.23)
- Apply concurrent load that causes connection pool expansion (4+ simultaneous new connections)
- Application terminates with SIGSEGV
Expected behavior
Concurrent SSPI authentication completes successfully, as it does on Alpine 3.22.
Actual behavior
Process terminates with SIGSEGV (signal 11) during GSS-API security context initialization.
Exception Type: 0x20000000 (CLR signal-based exception)
Regression?
Yes, in the previous release of mcr.microsoft.com/dotnet/aspnet:8.0-alpine which uses Alpine 3.22, there is no issue. We confirmed by only setting the base image to mcr.microsoft.com/dotnet/aspnet:8.0-alpine3.22, which resolved the issue.
Known Workarounds
Pinning to mcr.microsoft.com/dotnet/aspnet:8.0-alpine3.22 to use Alpine 3.22.
Configuration
| Component | Version |
|---|---|
| .NET Runtime | 8.0.23 (8.0.2325.60607) |
| Microsoft.Data.SqlClient | 5.x (via EF Core) |
| OS | Alpine Linux 3.23 (x64) |
| C Library | musl libc (ld-musl-x86_64.so.1) |
| libkrb5.so | 3.3 |
| libgssapi_krb5.so | 2.2 |
| libkrb5support.so | 0.1 |
| Container Base | mcr.microsoft.com/dotnet/aspnet:8.0-alpine |
Native Module Build IDs
From the crash dump:
/usr/lib/libkrb5.so.3.3 Build ID: 77009165be6662630a3749ba1e753130a5e2db4c
/usr/lib/libgssapi_krb5.so.2.2 Build ID: 3d4d5273b668d9a2b0eed2a57b4810429079d683
/usr/lib/libkrb5support.so.0.1 Build ID: e7c672f5159054fff41976fa2877d7d647dfb330
Other information
Crash Analysis
Call Stack (from core dump)
IL_STUB_PInvoke(Status ByRef, SafeGssCredHandle, SafeGssContextHandle ByRef, ...)
System.Net.Security.NegotiateStreamPal.GssInitSecurityContext(...)
System.Net.Security.NegotiateStreamPal.EstablishSecurityContext(...)
System.Net.Security.NegotiateStreamPal.InitializeSecurityContext(...)
Microsoft.Data.SqlClient.SNI.SNIProxy.GenSspiClientContext(...)
Microsoft.Data.SqlClient.TdsParser.SNISSPIData(...)
Microsoft.Data.SqlClient.TdsParser.ProcessSSPI(...)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(...)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(...)
Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(...)
Microsoft.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()
SafeHandle Corruption Evidence
Multiple SafeGssContextHandle objects show inconsistent state at crash time:
SafeGssContextHandle:
handle = 0x0000000000000000 (NULL - invalid)
_state = 4 (Closed/Disposed)
_fullyInitialized = true (Inconsistent with null handle)
This pattern indicates memory corruption or a race condition in the native GSS-API layer.
Parallel Stacks
Analysis of two independent crash dumps shows:
- Dump 1: 16 threads blocked in
GssInitSecurityContext, 24 active SQL connections - Dump 2: 4 threads blocked in
GssInitSecurityContext, 5 active SQL connections
The crash occurs at both high and low concurrency levels.
Root Cause Analysis
The issue appears to be a thread-safety bug in the MIT Kerberos libraries (libkrb5, libgssapi_krb5) shipped with Alpine 3.23. When multiple threads concurrently call gss_init_sec_context():
- The native credential cache or GSS context structures are accessed without proper synchronization
- One thread corrupts or frees memory that another thread is actively using
- This results in SIGSEGV when the corrupted pointer is dereferenced
The .NET libSystem.Net.Security.Native.so interop layer calls into these native libraries, and the crash occurs before control returns to managed code.