feat: switch all job compute to serverless#224
Open
veenaramesh wants to merge 4 commits into
Open
Conversation
Move all Databricks notebooks out of nested notebooks/ subdirectories to sit alongside their sibling helper modules. This eliminates the %cd .. hacks that were needed to import from parent directories and simplifies the project layout for a data science workflow. Moves: - training/notebooks/*.py → training/*.py - validation/notebooks/ModelValidation.py → validation/ModelValidation.py - deployment/*/notebooks/*.py → deployment/*/*.py - feature_engineering/notebooks/*.py → feature_engineering/*.py - monitoring/notebooks/*.py → monitoring/*.py Updates all resource YAMLs, update_layout.tmpl, and README references. Co-authored-by: Isaac
Databricks bundle deployments set the notebook CWD correctly, so the notebook_path/%cd $notebook_path pattern is unnecessary. Sibling imports work naturally after the flatten-notebooks change. Also inlines the trivial get_deployed_model_stage_for_env helper (a dict lookup) to eliminate cross-directory sys.path.append in deploy.py and BatchInference.py. Co-authored-by: Isaac
No longer needed with databricks bundle init. Co-authored-by: Isaac
Replace new_cluster/job_clusters definitions with serverless environments across all resource YAMLs. Tasks now use environment_key: default with a serverless environment spec that installs dependencies from requirements.txt. Remove %pip install, dbutils.library.restartPython(), and %autoreload boilerplate from all notebooks — dependencies are now managed by the serverless environment. Clean up requirements.txt for serverless compatibility: - Remove pyspark, Jinja2, pytz, pytest (pre-installed or not needed) - Add lightgbm (not pre-installed on serverless) - Remove env_manager=virtualenv from spark_udf (unsupported on serverless) Co-authored-by: Isaac
6e60e67 to
4306d90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
new_cluster/job_clusterswith serverlessenvironments(client version 4) across all resource YAMLsenvironments.spec.dependencies: [-r ../requirements.txt]%pip install,dbutils.library.restartPython(), and%autoreloadfrom all notebooksrequirements.txtfor serverless: removepyspark,Jinja2,pytz,pytest; addlightgbmenv_manager="virtualenv"fromspark_udf(unsupported on serverless)Depends on #219 (simplify-project-structure).
Test plan