feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474
Open
dgokeeffe wants to merge 3 commits intodatabricks-solutions:mainfrom
Open
feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474dgokeeffe wants to merge 3 commits intodatabricks-solutions:mainfrom
dgokeeffe wants to merge 3 commits intodatabricks-solutions:mainfrom
Conversation
added 3 commits
April 19, 2026 22:01
Fills the gap between databricks-mlflow-evaluation (GenAI agent eval) and databricks-model-serving (real-time endpoints). Covers: - Classic ML model training with MLflow tracking (sklearn / XGBoost / PyTorch) - Experiment creation with UC volume artifact_location (required in UC-enforced workspaces) - Unity Catalog model registration with three-level names - @Champion / @Challenger alias management - Batch inference via mlflow.pyfunc.load_model (notebook, up to ~10k rows) - Distributed batch via mlflow.pyfunc.spark_udf in Lakeflow SDP pipelines Structure mirrors databricks-mlflow-evaluation: - SKILL.md: workflows + trigger description + quick start - references/GOTCHAS.md: 12 common mistakes with symptoms + fixes - references/CRITICAL-interfaces.md: exact API signatures + models:/ URI format - references/patterns-experiment-setup.md: UC volume artifact_location setup - references/patterns-training.md: logging with signature + input_example - references/patterns-uc-registration.md: register + alias + verify + A/B - references/patterns-batch-inference.md: pyfunc.load_model + spark_udf + ai_query anti-pattern - references/user-journeys.md: 7 end-to-end workflows including debugging Key gotchas covered that other MLflow guides miss: - Experiment creation now requires UC volume artifact_location in UC-enforced workspaces (DBFS root writes are rejected) - mlflow.set_registry_uri('databricks-uc') is required; silent workspace registry fallback is the databricks-solutions#1 support question - ai_query does NOT work on custom UC-registered models unless they're deployed to a serving endpoint; use pyfunc.load_model or spark_udf instead - UC aliases (@champion/@Challenger) replace deprecated stage transitions (transition_model_version_stage is a no-op on UC models) - mlflow.pyfunc.spark_udf must be constructed at module scope in Lakeflow SDP pipelines, not inside the function body Tested against MLflow 2.16+ on Databricks Runtime 15.4 LTS. Content battle- tested in the Coles Vibe Workshop (classic-ML track running in an airgapped environment where online MLflow docs aren't reachable).
Field-tested the skill end-to-end from a local Python environment against a live Databricks workspace. Surfaced two gotchas not in the original set: databricks-solutions#12 mlflow[databricks] extras missing when running outside Databricks: plain `pip install mlflow` omits azure-core / boto3 / google.cloud SDKs that UC registration needs to stage artifacts. Training + log_model work; register_model fails with opaque "No module named 'azure'". Databricks clusters ship the extras pre-installed, so this only bites laptops / CI. databricks-solutions#13 artifact_path= deprecated in favour of name= (MLflow 2.16+): emits warning on every log_model call. Non-blocking, but worth flagging since most online tutorials + training courses still use the old param. Both verified against the workshop's test run — skill workflow 1 now completes cleanly with these fixes documented.
Original SKILL.md didn't state a runtime target. Adds a "Runtime compatibility" section anchored on what the skill was actually tested against — MLflow 3.11 on Lakeflow SDP serverless compute v5 — with a compat note for MLflow 2.16+ (classic DBR 15.4 LTS still ships 2.x). Points at GOTCHAS.md for the 3.x-vs-2.x divergence (artifact_path deprecation, etc.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The existing MLflow-related skills leave a gap for classic ML practitioners:
databricks-mlflow-evaluationmlflow.genai.evaluate, scorers, judges)databricks-model-servingdatabricks-unity-catalogdatabricks-mlflow-ml(this PR)A data scientist training a forecasting model, registering it to Unity Catalog, and scoring predictions in a notebook or Lakeflow pipeline has no skill to trigger on. This PR fills that gap.
What's in the skill
SKILL.md — workflow index (Train → Register → Score, Retrain + Promote A/B, Debugging), quick-start, runtime compatibility note, and trigger description.
7 reference files:
GOTCHAS.md— 14 common mistakes with symptoms + fixesCRITICAL-interfaces.md— exact API signatures + themodels:/catalog.schema.model@aliasURI formatpatterns-experiment-setup.md— UC volumeartifact_location(required in UC-enforced workspaces)patterns-training.md— logging withsignature+input_example,sklearn.Pipelinewrapping, autologgingpatterns-uc-registration.md— three-level names,@champion/@challengeraliases, verification viaDESCRIBE MODEL, A/B promotionpatterns-batch-inference.md— notebookpyfunc.load_model(Tier 1), Lakeflow SDPpyfunc.spark_udf(Tier 2), champion-vs-challenger validation, explicit warning againstai_queryon custom UC modelsuser-journeys.md— 7 end-to-end workflows including debugging scenariosKey gotchas this skill teaches that other guides miss
artifact_locationon experiment creation — DBFS root is rejected in UC-enforced workspaces. Everylog_modelcall fails with opaque errors untilartifact_locationpoints at a UC volume.mlflow.set_registry_uri('databricks-uc')— without this,register_modelsilently routes to the legacy workspace registry. The Add initial skills for Databricks development #1 "my model isn't showing up in Catalog Explorer" support question.ai_queryon custom UC models — doesn't work. Requires a serving endpoint. Correct primitive ismlflow.pyfunc.load_model(notebook) ormlflow.pyfunc.spark_udf(Lakeflow).@champion/@challengeraliases — replace deprecatedtransition_model_version_stage()stages. The legacy API still exists but is a no-op on UC-registered models (no error, no effect).mlflow.pyfunc.spark_udfin Lakeflow SDP — must be constructed at module scope, not inside@dp.materialized_view. Otherwise deserialization repeats on every pipeline evaluation.pip install 'mlflow[databricks]'— required for UC registration outside Databricks clusters. Plainpip install mlflowomits the cloud-storage SDKs (azure-core / boto3 / google.cloud) MLflow needs to stage UC artifacts. Clusters ship the extras pre-installed.Testing
Field-tested end-to-end against a live Databricks workspace:
GradientBoostingRegressor@championalias — verified in Catalog Explorer UImlflow.pyfunc.load_model— predictions within ~2% of actualsmlflow[databricks]install +artifact_pathdeprecation) and added to GOTCHAS.mdRuntime verified: MLflow 3.11 on Lakeflow SDP serverless compute v5 (current default). Patterns compatible with MLflow 2.16+ — pairs on older classic DBRs still get correct behaviour. 2.x/3.x divergences called out in GOTCHAS.md (e.g.,
artifact_path→name=).Structure parity
File layout matches
databricks-mlflow-evaluation(sameSKILL.md+references/+GOTCHAS.md+CRITICAL-interfaces.md+patterns-*.mdconvention). Installable via the existinginstall_skills.sh:Not in scope
databricks-model-servingcovers that)databricks-mlflow-evaluationcovers that)databricks-unity-catalogcovers those)Deliberately narrow — classic ML + UC registration + batch inference only.
Origin
Built to fill a gap encountered during the Coles Vibe Workshop (airgapped Databricks field-engineer hackathon). DS pairs needed UC-scoped MLflow guidance that wasn't covered by any existing skill. Content battle-tested in the workshop before being contributed upstream.