Skip to content

Add MCP Databricks table filtering implementation#475

Open
eliezerze wants to merge 3 commits intodatabricks-solutions:mainfrom
eliezerze:mcp-table-filtering
Open

Add MCP Databricks table filtering implementation#475
eliezerze wants to merge 3 commits intodatabricks-solutions:mainfrom
eliezerze:mcp-table-filtering

Conversation

@eliezerze
Copy link
Copy Markdown

Includes table filter logic, integration tests, MCP config, and project setup.

Made-with: Cursor

Includes table filter logic, integration tests, MCP config, and project setup.

Made-with: Cursor
- Copy TableTagFilter source into the repo (no external path deps)
- Fix SQL injection in main.py (validate identifiers)
- Remove hardcoded workspace URL and email from databricks.yml
- Remove unrelated boilerplate (taxis.py, sample_notebook.ipynb)
- Fix duplicate imports and PEP 8 issues in conftest.py
- Move test_mcp_live.py to scripts/ (not a pytest test)
- Add sqlglot and databricks-sdk as proper dependencies
- Make test constants configurable via env vars
- Add proper __init__.py exports

Made-with: Cursor
Critical security & correctness fixes:
- Use parameterized SQL (StatementParameterListItem) to prevent SQL
  injection from env-supplied tag_name/tag_value
- Use the correct Databricks SDK API: execute_statement returning
  StatementResponse (the previous execute_and_wait context-manager
  pattern was wrong and would have failed at runtime)
- Default to fail-closed validation: unparseable SQL is now rejected
  rather than silently allowed (was a security bypass)
- CTE-aware table extraction using sqlglot scope traversal (the
  previous find_all approach is documented as wrong) plus DML target
  handling so INSERT/UPDATE/DELETE/MERGE targets are validated
- Cache fetch now happens INSIDE the lock, eliminating the
  thundering-herd problem where many threads triggered duplicate
  Databricks queries on cache expiry
- Public API returns frozenset, preventing accidental cache mutation
  by callers

Design improvements:
- Split into config / repository / sql_parser / table_filter modules
  following SRP; tests now use dependency injection (FakeRepository)
  instead of hacking private attributes
- Add FilterConfig dataclass with from_env() factory + validation of
  tag_name pattern, cache TTL, and fail-closed flag
- Lazy thread-safe singleton with reset_singleton() for tests
- Configurable warehouse_id for multi-warehouse environments

Other fixes:
- Fix databricks.yml: declare workspace_host variable (was undeclared)
- Pin dependency versions (databricks-sdk, sqlglot) for reproducibility
- Rename scripts/test_mcp_live.py -> scripts/smoke_check.py so pytest
  doesn't try to collect it
- Move pytest marker config to pyproject.toml; remove redundant conftest
- Commit uv.lock for reproducible installs
- Add ruff lint rules (E, F, I, B, UP, SIM); 0 lint errors
- 36 offline tests passing; covers SQL injection, fail-closed mode,
  CTE handling, INSERT validation, frozen cache, and config validation

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant