Skip to content

feat: switch all job compute to serverless#224

Open
veenaramesh wants to merge 4 commits into
databricks:testfrom
veenaramesh:feat/serverless-compute
Open

feat: switch all job compute to serverless#224
veenaramesh wants to merge 4 commits into
databricks:testfrom
veenaramesh:feat/serverless-compute

Conversation

@veenaramesh

Copy link
Copy Markdown
Collaborator

Summary

  • Replace new_cluster/job_clusters with serverless environments (client version 4) across all resource YAMLs
  • Dependencies managed via environments.spec.dependencies: [-r ../requirements.txt]
  • Remove %pip install, dbutils.library.restartPython(), and %autoreload from all notebooks
  • Clean up requirements.txt for serverless: remove pyspark, Jinja2, pytz, pytest; add lightgbm
  • Remove env_manager="virtualenv" from spark_udf (unsupported on serverless)

Depends on #219 (simplify-project-structure).

Test plan

  • Model training pipeline passes on serverless (~2.5 min vs ~10 min on classic clusters)
  • Batch inference passes on serverless (~2 min)
  • All dependencies install correctly via environment spec

Move all Databricks notebooks out of nested notebooks/ subdirectories
to sit alongside their sibling helper modules. This eliminates the
%cd .. hacks that were needed to import from parent directories and
simplifies the project layout for a data science workflow.

Moves:
- training/notebooks/*.py → training/*.py
- validation/notebooks/ModelValidation.py → validation/ModelValidation.py
- deployment/*/notebooks/*.py → deployment/*/*.py
- feature_engineering/notebooks/*.py → feature_engineering/*.py
- monitoring/notebooks/*.py → monitoring/*.py

Updates all resource YAMLs, update_layout.tmpl, and README references.

Co-authored-by: Isaac
Databricks bundle deployments set the notebook CWD correctly, so
the notebook_path/%cd $notebook_path pattern is unnecessary. Sibling
imports work naturally after the flatten-notebooks change.

Also inlines the trivial get_deployed_model_stage_for_env helper
(a dict lookup) to eliminate cross-directory sys.path.append in
deploy.py and BatchInference.py.

Co-authored-by: Isaac
No longer needed with databricks bundle init.

Co-authored-by: Isaac
Replace new_cluster/job_clusters definitions with serverless
environments across all resource YAMLs. Tasks now use
environment_key: default with a serverless environment spec
that installs dependencies from requirements.txt.

Remove %pip install, dbutils.library.restartPython(), and
%autoreload boilerplate from all notebooks — dependencies
are now managed by the serverless environment.

Clean up requirements.txt for serverless compatibility:
- Remove pyspark, Jinja2, pytz, pytest (pre-installed or not needed)
- Add lightgbm (not pre-installed on serverless)
- Remove env_manager=virtualenv from spark_udf (unsupported on serverless)

Co-authored-by: Isaac
@veenaramesh veenaramesh force-pushed the feat/serverless-compute branch from 6e60e67 to 4306d90 Compare June 12, 2026 17:08
@veenaramesh veenaramesh linked an issue Jun 12, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Serverless updates needed

1 participant