Skip to content

FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy#576

Draft
bewithgaurav wants to merge 4 commits into
mainfrom
bewithgaurav/gh534-bulkcopy-service-principal
Draft

FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy#576
bewithgaurav wants to merge 4 commits into
mainfrom
bewithgaurav/gh534-bulkcopy-service-principal

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav commented May 13, 2026

Summary

Adds Authentication=ActiveDirectoryServicePrincipal support for bulk copy. Partial fix for #534 — addresses the exact error in the issue report ("Authentication method 'ActiveDirectoryServicePrincipal' is not supported. No token provider was registered for this method.").

Why a callback rather than pre-acquire

The other Entra ID modes already supported in bulk copy (Default, DeviceCode, Interactive) pre-acquire a JWT in Python and pass it as access_token. ServicePrincipal can't work that way:

  • ClientSecretCredential(tenant_id, client_id, client_secret) requires tenant_id as a constructor argument.
  • The connection string doesn't carry tenant_id; azure-identity doesn't discover it from anywhere.
  • The only place tenant is available client-side is the STS URL the server hands back during the FedAuth pre-login (this is what ODBC does internally — that's why ODBC has never needed tenant_id on the connection string).

So bulk copy must register a callback that's invoked mid-handshake. mssql-py-core hands the callback (spn, sts_url, auth_method) and we parse the tenant from the STS URL on every call.

Saurabh approved the callback approach in the EXP--Drivers--Python thread (May 13 2026). The matching mssql-py-core change (Rust-side wiring of the callback into auth_method_map) is in mssql-rs PR and ships in mssql-py-core 0.1.5+.

What changes

File Change
mssql_python/constants.py Add AuthType.SERVICE_PRINCIPAL = "activedirectoryserviceprincipal".
mssql_python/auth.py Add ServicePrincipalAuth.make_token_factory(client_id, client_secret) that returns the callable, plus _parse_tenant_id(). Update process_auth_parameters() to detect SP (and leave ODBC's native query path alone), update extract_auth_type() to propagate "serviceprincipal" for bulkcopy.
mssql_python/cursor.py Bulkcopy: when _auth_type == "serviceprincipal", build the factory from connection-string UID/PWD and register it via the new entra_id_token_factory dict key. Keep authentication/user_name/password in pycore_context (the auth validator + transformer in py-core need them to resolve the method before the factory is dispatched). Existing Model A flow for Default/DeviceCode/Interactive untouched.
tests/test_008_auth.py 17 new tests.
CHANGELOG.md Unreleased entry.

Flow

mssql-python                          mssql-py-core (Rust)             mssql-tds (Rust)
────────────                          ──────────────────────           ────────────────
cursor.bulkcopy()
  ├─ parses conn string (UID/PWD)
  ├─ ServicePrincipalAuth
  │   .make_token_factory(uid, pwd)
  │      → returns _factory closure
  │
  └─ pycore_context = {
       "authentication": "AD-SP",
       "user_name":  client_id,
       "password":   client_secret,
       "entra_id_token_factory": _factory,
       ...
     }
                                      PyCoreConnection(ctx)
                                        ├─ validate_auth/transform_auth
                                        ├─ register factory in
                                        │  auth_method_map[SP]
                                        └─ open TCP/TLS ──────────────►  TDS pre-login
                                                                          ├─ server: FedAuthInfo
                                                                          │  (spn, sts_url)
                                                                          ├─ context.entra_id_
                                                                          │     token_factory()
                                      ◄────────── invoked ──────────────  ├─ factory.create_token
_factory(spn, sts_url, auth_method)                                       │     (spn, sts_url, m)
  ├─ _parse_tenant_id(sts_url)
  ├─ ClientSecretCredential(
  │     tenant_id, client_id,
  │     client_secret)
  ├─ .get_token(scope=spn + "/.default")
  └─ return jwt.encode("utf-16-le")
                                                                          └─ FedAuth token packet
                                                                                       ──────────►
                                                                                                  SQL Server
                                                                                                  validates

Test

$ python -m pytest tests/test_008_auth.py -q
65 passed in 0.77s

17 new tests cover:

  • TestAuthType::test_auth_type_constants extended for SERVICE_PRINCIPAL
  • TestProcessAuthParameters::test_service_principal_auth_leaves_odbc_path_alone, test_service_principal_auth_case_insensitive
  • TestExtractAuthType::test_serviceprincipal
  • TestParseTenantId: GUID, GUID without trailing slash, domain, query string, extra path segments, empty string, no path
  • TestServicePrincipalAuth: factory returns callable, requires non-empty client_id and client_secret, returns UTF-16LE bytes, forwards credentials to ClientSecretCredential, builds scope from SPN (and keeps /.default suffix when already present), errors on unparseable STS URL, propagates ClientAuthenticationError as RuntimeError

Connection string example

import mssql_python

conn = mssql_python.connect(
    "Server=tcp:myserver.database.windows.net;"
    "Database=mydb;"
    "Authentication=ActiveDirectoryServicePrincipal;"
    "UID=<client-id-guid>;"
    "PWD=<client-secret>;"
    "Encrypt=yes;"
)
cur = conn.cursor()
cur.bulkcopy("mytable", [(1, "a"), (2, "b")])  # uses the callback under the hood

Same connection string keeps working for regular queries (ODBC handles ServicePrincipal natively in the non-bulk-copy path).

Out of scope

  • ActiveDirectoryPassword — needs the same callback shape but a different azure-identity credential class. Separate PR; py-core still rejects it today with a clear error.
  • ActiveDirectoryIntegrated — needs SSPI/Kerberos wiring at the Rust layer, not a Python callback. Separate work.

Dependencies

  • Requires mssql-py-core 0.1.5+ (mssql-rs PR dev/gaurav/entra-id-token-factory-py-callback). This Python PR will fail at import or runtime against earlier py-core versions because the entra_id_token_factory dict key is silently ignored there. Will update the dependency pin once the mssql-rs PR merges and a new wheel is published.

Copilot AI review requested due to automatic review settings May 13, 2026 12:14
@bewithgaurav bewithgaurav changed the title Add ActiveDirectoryServicePrincipal support for bulk copy FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy May 13, 2026
@bewithgaurav bewithgaurav marked this pull request as draft May 13, 2026 12:16
Wires a Python token-factory callback into the mssql-py-core connection
context so bulk copy can authenticate with `Authentication=
ActiveDirectoryServicePrincipal`. The callback is invoked by mssql-tds
mid-handshake (FedAuth workflow 0x02), receives the STS URL from the
server, parses tenant_id from it, and uses azure-identity's
ClientSecretCredential to acquire a JWT — matching what ODBC does for
the query path.

Why a callback rather than pre-acquire (Model A):
- ServicePrincipal needs `tenant_id` to build ClientSecretCredential.
- The connection string does not carry it; azure-identity does not
  discover it. The only place we can learn tenant_id client-side is
  from the STS URL the server hands back during pre-login (which is
  exactly what ODBC does internally).
- A pre-acquire flow therefore can't work. Saurabh approved the
  callback model on EXP--Drivers--Python (May 13 2026).

Scope: ServicePrincipal only. ActiveDirectoryPassword and
ActiveDirectoryIntegrated remain on their existing code paths (still
rejected in py-core today with a clear error). Separate follow-ups.

Changes:
- constants.py: Add AuthType.SERVICE_PRINCIPAL.
- auth.py:
  - Add ServicePrincipalAuth.make_token_factory(client_id, secret)
    that returns a (spn, sts_url, auth_method) -> bytes callable
    matching the entra_id_token_factory contract py-core expects.
  - Add _parse_tenant_id() helper.
  - process_auth_parameters: detect SP and return auth_type=None
    (let ODBC handle the query path natively, msodbcsql 17.3+ has
    native SP support).
  - extract_auth_type: propagate "serviceprincipal" so bulkcopy can
    distinguish the SP path.
- cursor.py bulkcopy: when _auth_type=="serviceprincipal", build the
  factory from connection-string UID/PWD and register it via the new
  entra_id_token_factory dict key. Keep authentication/user_name/
  password in pycore_context (py-core's auth validator + transformer
  need them to resolve the method to ActiveDirectoryServicePrincipal
  before the factory is dispatched). Existing Default/DeviceCode/
  Interactive (Model A) flow unchanged.

Requires mssql-py-core 0.1.5+ which wires the entra_id_token_factory
dict key into ClientContext.auth_method_map.

Tests: 17 new in test_008_auth.py covering tenant parsing
(GUID/domain/query-string/trailing-slash/empty/etc), credential kwarg
forwarding, scope construction, UTF-16LE encoding, and error paths
(missing client_id/secret, bad STS URL, auth failure propagation).

Partial fix for #534.
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/gh534-bulkcopy-service-principal branch from 2e81b7f to cd80d9c Compare May 13, 2026 12:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds bulk copy support for Authentication=ActiveDirectoryServicePrincipal by registering an Entra ID token-factory callback (invoked mid-handshake) so the tenant can be derived from the STS URL returned by the server, aligning bulk copy behavior with ODBC’s native Service Principal handling for the normal query path.

Changes:

  • Added AuthType.SERVICE_PRINCIPAL and updated auth-type extraction/processing to recognize Service Principal mode.
  • Implemented ServicePrincipalAuth.make_token_factory() and _parse_tenant_id() to generate UTF-16LE JWTs during FedAuth handshake.
  • Updated cursor.bulkcopy() to register entra_id_token_factory for Service Principal auth; added tests and a changelog entry.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mssql_python/constants.py Adds SERVICE_PRINCIPAL enum value for connection-string auth detection.
mssql_python/auth.py Adds tenant parsing + token-factory callback for Service Principal; updates auth detection/mapping.
mssql_python/cursor.py Registers the callback for bulkcopy when _auth_type == "serviceprincipal"; preserves credentials for py-core auth resolution.
tests/test_008_auth.py Adds unit tests for tenant parsing, factory behavior, and new auth-type detection.
CHANGELOG.md Documents new bulk copy Service Principal support and notes py-core requirement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mssql_python/cursor.py
Comment on lines +2951 to +2964
try:
factory = ServicePrincipalAuth.make_token_factory(
client_id, client_secret
)
except (RuntimeError, ValueError) as e:
raise RuntimeError(
f"Bulk copy failed: unable to build ServicePrincipal "
f"token factory: {e}"
) from e
pycore_context["entra_id_token_factory"] = factory
# Keep authentication/user_name/password in pycore_context —
# py-core's auth validator + transformer need them to resolve
# the auth method to ActiveDirectoryServicePrincipal before
# the factory is dispatched at handshake time.
Comment thread mssql_python/cursor.py Outdated
Comment on lines +2966 to +2967
"Bulk copy: registered ServicePrincipal token factory for client_id=%s",
client_id,
Comment thread mssql_python/auth.py
Comment on lines +147 to +155
try:
parsed = urlparse(sts_url)
except (ValueError, AttributeError):
return None
path = (parsed.path or "").strip("/")
if not path:
return None
first_segment = path.split("/", 1)[0]
return first_segment or None
Comment thread mssql_python/cursor.py
Comment on lines +2960 to +2964
pycore_context["entra_id_token_factory"] = factory
# Keep authentication/user_name/password in pycore_context —
# py-core's auth validator + transformer need them to resolve
# the auth method to ActiveDirectoryServicePrincipal before
# the factory is dispatched at handshake time.
- black: reformat auth.py and cursor.py per project line-length=100
- _parse_tenant_id: reject non-URL inputs (no scheme/netloc) so a bare
  string from an unexpected STS URL shape can't silently become tenant_id
- cursor.py debug log: drop client_id from log message (credential identifier)
- cursor.py finally cleanup: also pop entra_id_token_factory so the closure
  (which captures client_secret) doesn't outlive the dict reference
- cursor.py: collapse f"..." f"..." adjacent string with no second-half
  interpolation
- tests: 2 new TestParseTenantId cases for non-URL rejection

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the pr-size: medium Moderate update size label May 14, 2026
…https-only

- ServicePrincipalAuth: cache ClientSecretCredential per tenant inside the
  closure so azure-identity's per-instance token cache actually works across
  handshake retries / reconnects (previously the credential was rebuilt on
  every call, defeating the cache and forcing a fresh AAD round-trip)
- ServicePrincipalAuth: reject empty SPN early (was silently constructing
  scope='/.default' and surfacing a confusing azure-identity error)
- ServicePrincipalAuth: surfaced RuntimeError no longer echoes provider
  message; provider detail stays in debug logs only (the user-facing chain
  could otherwise leak provider-controlled text)
- _parse_tenant_id: tighten to https-only (Azure AD STS URLs are always https)
- tests: 4 new cases - http rejection, empty SPN, no provider-message leak,
  per-tenant credential caching

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread tests/test_008_auth.py Dismissed
Comment thread mssql_python/auth.py
# nothing). Lives for the lifetime of the closure (== lifetime of the
# bulkcopy connection on the Rust side).
#
# TODO(#573-followup): once the MSI PR (#573) merges, refactor to
@github-actions
Copy link
Copy Markdown

📊 Code Coverage Report

🔥 Diff Coverage

91%


🎯 Overall Coverage

25%


📈 Total Lines Covered: 7003 out of 27114
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/auth.py (91.1%): Missing lines 149-150,205-206
  • mssql_python/constants.py (100%)

Summary

  • Total: 46 lines
  • Missing: 4 lines
  • Coverage: 91%

mssql_python/auth.py

Lines 145-154

  145     from urllib.parse import urlparse
  146 
  147     try:
  148         parsed = urlparse(sts_url)
! 149     except (ValueError, AttributeError):
! 150         return None
  151     # Reject anything that isn't an https URL with a netloc. ``urlparse`` will
  152     # happily put a bare string like ``"tenant-guid"`` into ``path``, which
  153     # would then look like a valid tenant. Azure AD STS URLs are always https.
  154     if parsed.scheme != "https" or not parsed.netloc:

Lines 201-210

  201             # pylint: disable=import-outside-toplevel,unused-argument
  202             try:
  203                 from azure.identity import ClientSecretCredential
  204                 from azure.core.exceptions import ClientAuthenticationError
! 205             except ImportError as e:
! 206                 raise RuntimeError(
  207                     "Azure authentication libraries are not installed. "
  208                     "Please install with: pip install azure-identity azure-core"
  209                 ) from e


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.build._deps.simdutf-src.src.haswell.implementation.cpp: 0.4%
mssql_python.pybind.build._deps.simdutf-src.src.implementation.cpp: 6.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.implementation.h: 10.4%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.scalar.utf16_to_utf8.utf16_to_utf8.h: 25.3%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.internal.isadetection.h: 65.3%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants