FIX: Resolve paths with non-ASCII characters in Windows#376
Merged
bewithgaurav merged 16 commits intomainfrom Jan 6, 2026
Merged
FIX: Resolve paths with non-ASCII characters in Windows#376bewithgaurav merged 16 commits intomainfrom
bewithgaurav merged 16 commits intomainfrom
Conversation
…characters (#370) Root cause: On Windows, paths containing non-ASCII characters (e.g., usernames like 'Thalén' with 'é') were being corrupted due to: 1. GetModuleDirectory() using ANSI APIs (char[], PathRemoveFileSpecA) 2. LoadDriverLibrary() using broken UTF-8→UTF-16 conversion via std::wstring(path.begin(), path.end()) 3. LoadDriverOrThrowException() using same broken pattern for mssql-auth.dll Fix: Use std::filesystem::path which properly handles encoding on all platforms. On Windows, fs::path::c_str() returns wchar_t* with correct UTF-16 encoding. This fix enables users with non-ASCII characters in their Windows username or installation path to use Entra ID authentication successfully.
Add comprehensive tests for the non-ASCII path encoding fix:
1. Default tests (cross-platform):
- Verify module import exercises path handling code
- Test UTF-8 string operations with international characters
- Test pathlib with non-ASCII directory names
2. Windows-specific tests:
- Verify DLL loading succeeds
- Verify libs directory structure
3. Integration tests (Windows only, ~2-4 min total):
- Create venv in paths with Swedish (Thalén), German (Müller),
Japanese (日本語), and Chinese (中文) characters
- Install mssql-python and verify import succeeds
These tests ensure the fs::path fix for LoadLibraryW works correctly
for users with non-ASCII characters in their Windows username.
Mark 4 tests as @pytest.mark.stress (skipped by default per pytest.ini): - test_aggressive_dbc_segfault_reproduction: 10 real DB connections - test_force_gc_finalization_order_issue: 10 connections + 5 GC cycles - test_rapid_connection_churn_with_shutdown: 10 connections with churn - test_active_connections_thread_safety: 200 mock connections + 10 threads These tests are resource-intensive and slow down CI. They will still run when explicitly requested with 'pytest -m stress' or 'pytest -m ""'.
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.4%
mssql_python.pybind.ddbc_bindings.h: 71.7%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 83.9%🔗 Quick Links
|
…com/microsoft/mssql-python into bewithgaurav/fix-utf8-path-encoding
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request fixes a critical bug (Issue #370) where the mssql-python driver failed to load when installed in paths containing non-ASCII characters on Windows, such as usernames like "Thalén" or directories with accented characters. The fix refactors path handling to use C++17's std::filesystem for proper cross-platform UTF-8 path support.
Key changes:
- Replaced platform-specific path manipulation code with
std::filesystem::pathfor unified, encoding-aware path handling - Fixed UTF-8 to UTF-16 conversion on Windows by using
fs::path::c_str()instead of incorrectstd::wstring(path.begin(), path.end())conversion - Added comprehensive test suite covering UTF-8 path handling with real-world non-ASCII characters (Swedish, German, Japanese, Chinese, etc.)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| mssql_python/pybind/ddbc_bindings.cpp | Refactored GetModuleDirectory(), LoadDriverLibrary(), and LoadDriverOrThrowException() to use std::filesystem::path for proper UTF-8 encoding on all platforms, with correct UTF-16 conversion on Windows |
| tests/test_015_utf8_path_handling.py | Added comprehensive test coverage including code path verification tests, non-ASCII string handling tests, Windows-specific tests, and full integration tests with virtual environments in non-ASCII paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jahnvi480
approved these changes
Jan 5, 2026
gargsaumya
approved these changes
Jan 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
This pull request refactors the way file system paths are handled in the
mssql_python/pybind/ddbc_bindings.cppfile to use C++17'sstd::filesystemfor improved cross-platform compatibility and proper handling of UTF-8 paths. The changes simplify and unify path manipulation logic, especially for dynamic library loading, and ensure correct encoding is used on all platforms.Cross-platform and encoding improvements:
GetModuleDirectory()withstd::filesystem::pathto extract the module directory in a cross-platform and UTF-8 safe manner. This removes the need for separate Windows and Unix/macOS code paths.fs::path::c_str(), which provides a correctly encodedwchar_t*forLoadLibraryW, ensuring proper handling of UTF-8 paths. This change is applied both inLoadDriverLibraryand when loadingmssql-auth.dllinLoadDriverOrThrowException(). [1] [2]