Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,6 @@ Or used for project initialization:
* ``input_schema_name``: If using [Models in Unity Catalog](https://docs.databricks.com/en/mlflow/models-in-uc.html#models-in-unity-catalog), specify the name of the schema under which the models should be registered, but we recommend keeping the name the same as the project name. We default to using the same `schema_name` across catalogs, thus this schema must exist in each catalog used. For example, the training pipeline when executed in the staging environment will register the model to `staging.<schema_name>.<model_name>`, whereas the same pipeline executed in the prod environment will register the mode to `prod.<schema_name>.<model_name>`. Also, be sure that the service principals in each respective environment have the right permissions to access this schema, which would be `USE_CATALOG`, `USE_SCHEMA`, `MODIFY`, `CREATE_MODEL`, and `CREATE_TABLE`.
* ``input_unity_catalog_read_user_group``: If using [Models in Unity Catalog](https://docs.databricks.com/en/mlflow/models-in-uc.html#models-in-unity-catalog), define the name of the user group to grant `EXECUTE` (read & use model) privileges for the registered model. Defaults to "account users".
* ``input_include_feature_store``: If selected, will provide [Databricks Feature Store](https://docs.databricks.com/machine-learning/feature-store/index.html) stack components including: project structure and sample feature Python modules, feature engineering notebooks, ML resource configs to provision and manage Feature Store jobs, and automated integration tests covering feature engineering and training.
* ``input_include_mlflow_recipes``: If selected, will provide [MLflow Recipes](https://mlflow.org/docs/latest/recipes.html) stack components, dividing the training pipeline into configurable steps and profiles.

See the generated ``README.md`` for next steps!

Expand Down
3 changes: 0 additions & 3 deletions cookiecutter.json

This file was deleted.

32 changes: 0 additions & 32 deletions databricks_template_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -285,38 +285,6 @@
}
}
},
"input_include_mlflow_recipes": {
"order": 19,
"type": "string",
"description": "\nWhether to include MLflow Recipes",
"default": "no",
"enum": ["no", "yes"],
"skip_prompt_if": {
"anyOf":[
{
"properties": {
"input_include_models_in_unity_catalog": {
"const": "yes"
}
}
},
{
"properties": {
"input_include_feature_store": {
"const": "yes"
}
}
},
{
"properties": {
"input_setup_cicd_and_project": {
"const": "CICD_Only"
}
}
}
]
}
},
"input_docker_image": {
"order": 20,
"type": "string",
Expand Down
46 changes: 4 additions & 42 deletions template/update_layout.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -32,51 +32,13 @@
{{ skip (printf `%s/%s` $root_dir `_params_testing_only.txt`) }}
{{ end }}

# Remove Delta and Feature Store code in cases of MLflow Recipes.
{{ if (eq .input_include_mlflow_recipes `yes`) }}
# delta_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/Train.py`) }}
# feature_store_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `feature_engineering`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/feature_engineering`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithFeatureStore.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `resources/feature-engineering-workflow-resource.yml`) }}
# Remove Delta and MLflow Recipes code in cases of Feature Store.
{{ else if (eq .input_include_feature_store `yes`) }}
# delta_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/Train.py`) }}
# recipe_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/profiles`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/steps`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/data`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/__init__.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithMLflowRecipes.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/recipe.yaml`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/README.md`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/ingest_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/split_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/train_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/test_sample.parquet`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/transform_test.py`) }}
# Remove MLflow Recipes and Feature Store code in cases of Delta Table.
# Remove Feature Store code if not selected; remove Delta Train notebook if Feature Store is selected
{{ if (eq .input_include_feature_store `yes`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/Train.py`) }}
{{ else }}
# recipe_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/profiles`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/steps`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/data`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/__init__.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithMLflowRecipes.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/recipe.yaml`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/README.md`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/ingest_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/split_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/train_test.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/test_sample.parquet`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/transform_test.py`) }}
# feature_store_paths
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `feature_engineering`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/feature_engineering`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithFeatureStore.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/TrainWithFeatureStore.py`) }}
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `resources/feature-engineering-workflow-resource.yml`) }}
{{ end }}

Expand Down
41 changes: 1 addition & 40 deletions template/{{.input_root_dir}}/README.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ contained in the following files:
│ │
│ ├── databricks.yml <- databricks.yml is the root bundle file for the ML project that can be loaded by databricks CLI bundles. It defines the bundle name, workspace URL and resource config component to be included.
│ │
{{- if and (eq .input_include_feature_store `no`) (eq .input_include_mlflow_recipes `no`) }}
{{- if (eq .input_include_feature_store `no`) }}
│ ├── training <- Training folder contains Notebook that trains and registers the model.
│ │
│ ├── validation <- Optional model validation step before deploying a model.
Expand Down Expand Up @@ -93,45 +93,6 @@ contained in the following files:
│ ├── ml-artifacts-resource.yml <- ML resource config definition for model and experiment
│ │
│ ├── monitoring-resource.yml <- ML resource config definition for quality monitoring workflow
{{- else }}
│ ├── training <- Folder for model development via MLflow recipes.
│ │ │
│ │ ├── steps <- MLflow recipe steps (Python modules) implementing ML pipeline logic, e.g. model training and evaluation. Most
│ │ │ development work happens here. See https://mlflow.org/docs/latest/recipes.html for details
│ │ │
│ │ ├── notebooks <- Databricks notebook that runs the MLflow recipe, i.e. run the logic in `steps`. Used to
│ │ │ drive code execution on Databricks for CI/CD. In most cases, you do not need to modify
│ │ │ the notebook.
│ │ │
│ │ ├── recipe.yaml <- The main recipe configuration file that declaratively defines the attributes and behavior
│ │ │ of each recipe step, such as the input dataset to use for training a model or the
│ │ │ performance criteria for promoting a model to production.
│ │ │
│ │ ├── profiles <- Environment-specific (e.g. dev vs test vs prod) configurations for MLflow recipes execution.
│ │
│ │
│ ├── validation <- Optional model validation step before deploying a model.
│ │
│ ├── monitoring <- Model monitoring, feature monitoring, etc.
│ │
│ ├── deployment <- Model deployment and endpoint deployment.
│ │ │
│ │ ├── batch_inference <- Batch inference code that will run as part of scheduled workflow.
│ │ │
│ │ ├── model_deployment <- As part of CD workflow, promote model to Production stage in model registry.
│ │
│ ├── tests <- Unit tests for the ML project, including modules under `steps`.
│ │
│ ├── resources <- ML resource (ML jobs, MLflow models) config definitions expressed as code, across dev/staging/prod/test.
│ │
│ ├── model-workflow-resource.yml <- ML resource config definition for model training, validation, deployment workflow
│ │
│ ├── batch-inference-workflow-resource.yml <- ML resource config definition for batch inference workflow
│ │
│ ├── ml-artifacts-resource.yml <- ML resource config definition for model and experiment
│ │
│ ├── monitoring-resource.yml <- ML resource config definition for quality monitoring workflow
{{- end }}
{{- end }}
{{- if or (eq .input_cicd_platform `github_actions`) (eq .input_cicd_platform `github_actions_for_github_enterprise_servers`) }}
Expand Down
1 change: 0 additions & 1 deletion template/{{.input_root_dir}}/_params_testing_only.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ input_default_branch={{.input_default_branch}}
input_release_branch={{.input_release_branch}}
input_read_user_group={{.input_read_user_group}}
input_include_feature_store={{.input_include_feature_store}}
input_include_mlflow_recipes={{.input_include_mlflow_recipes}}
input_include_models_in_unity_catalog={{.input_include_models_in_unity_catalog}}
input_schema_name={{.input_schema_name}}
input_unity_catalog_read_user_group={{.input_unity_catalog_read_user_group}}
Expand Down
13 changes: 0 additions & 13 deletions template/{{.input_root_dir}}/docs/mlops-setup.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
* [Create a hosted Git repo](#create-a-hosted-git-repo)
* [Configure CI/CD]({{ if (eq .input_cicd_platform `github_actions`) }}#configure-cicd---github-actions{{ else if (eq .input_cicd_platform `azure_devops`) }}#configure-cicd---azure-devops{{ else if (eq .input_cicd_platform `gitlab`) }}#configure-cicd---gitlab{{ end }})
{{- if (eq .input_setup_cicd_and_project `CICD_and_Project`)}}
{{- if (eq .input_include_mlflow_recipes `yes`) }}
* [Configure profiles for tests, staging, and prod](#configure-profiles-for-tests-staging-and-prod){{ end }}
* [Merge PR with initial ML code](#merge-a-pr-with-your-initial-ml-code)
{{- end }}
{{ if not (eq .input_release_branch .input_default_branch) -}}
Expand Down Expand Up @@ -355,17 +353,6 @@ add the value `.gitlab/pipelines/{{.input_project_name}}-triggers-cicd.yml` whic
{{ end }}

{{- if (eq .input_setup_cicd_and_project `CICD_and_Project`)}}
{{- if (eq .input_include_mlflow_recipes `yes`) }}
## Configure profiles for tests, staging, and prod
Address the TODOs in the following files:
* [databricks-dev.yaml](../{{template `project_name_alphanumeric_underscore` .}}/training/profiles/databricks-dev.yaml): specify recipe configs to use in dev workspace
* [databricks-staging.yaml](../{{template `project_name_alphanumeric_underscore` .}}/training/profiles/databricks-staging.yaml): specify recipe configs to use in recurring model training and batch inference
jobs that run in the staging workspace
* [databricks-prod.yaml](../{{template `project_name_alphanumeric_underscore` .}}/training/profiles/databricks-prod.yaml) specify recipe configs to use in recurring model training and batch inference
jobs that run in the prod workspace
* [databricks-test.yaml](../{{template `project_name_alphanumeric_underscore` .}}/training/profiles/databricks-test.yaml): specify recipe configs to use in integration tests(CI)
{{- end }}

## Merge a PR with your initial ML code
Create and push a PR branch adding the ML code to the repository.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ contained in the following files:
│ │
│ ├── databricks.yml <- databricks.yml is the root bundle file for the ML project that can be loaded by databricks CLI bundles. It defines the bundle name, workspace URL and resource config component to be included.
│ │
{{- if and (eq .input_include_feature_store `no`) (eq .input_include_mlflow_recipes `no`) }}
{{- if (eq .input_include_feature_store `no`) }}
│ ├── training <- Training folder contains Notebook that trains and registers the model.
│ │
│ ├── validation <- Optional model validation step before deploying a model.
Expand Down Expand Up @@ -93,44 +93,6 @@ contained in the following files:
│ ├── ml-artifacts-resource.yml <- ML resource config definition for model and experiment
│ │
│ ├── monitoring-resource.yml <- ML resource config definition for quality monitoring workflow
{{- else }}
│ ├── training <- Folder for model development via MLflow recipes.
│ │ │
│ │ ├── steps <- MLflow recipe steps (Python modules) implementing ML pipeline logic, e.g. model training and evaluation. Most
│ │ │ development work happens here. See https://mlflow.org/docs/latest/recipes.html for details
│ │ │
│ │ ├── notebooks <- Databricks notebook that runs the MLflow recipe, i.e. run the logic in `steps`. Used to
│ │ │ drive code execution on Databricks for CI/CD. In most cases, you do not need to modify
│ │ │ the notebook.
│ │ │
│ │ ├── recipe.yaml <- The main recipe configuration file that declaratively defines the attributes and behavior
│ │ │ of each recipe step, such as the input dataset to use for training a model or the
│ │ │ performance criteria for promoting a model to production.
│ │ │
│ │ ├── profiles <- Environment-specific (e.g. dev vs test vs prod) configurations for MLflow recipes execution.
│ │
│ │
│ ├── validation <- Optional model validation step before deploying a model.
│ │
│ ├── monitoring <- Model monitoring, feature monitoring, etc.
│ │
│ ├── deployment <- Model deployment and endpoint deployment.
│ │ │
│ │ ├── batch_inference <- Batch inference code that will run as part of scheduled workflow.
│ │ │
│ │ ├── model_deployment <- As part of CD workflow, promote model to Production stage in model registry.
│ │
│ ├── tests <- Unit tests for the ML project, including modules under `steps`.
│ │
│ ├── resources <- ML resource (ML jobs, MLflow models) config definitions expressed as code, across dev/staging/prod/test.
│ │
│ ├── model-workflow-resource.yml <- ML resource config definition for model training, validation, deployment workflow
│ │
│ ├── batch-inference-workflow-resource.yml <- ML resource config definition for batch inference workflow
│ │
│ ├── ml-artifacts-resource.yml <- ML resource config definition for model and experiment
│ │
│ ├── monitoring-resource.yml <- ML resource config definition for quality monitoring workflow
{{- end }}
```

Expand All @@ -146,7 +108,7 @@ In each module, there is `compute_features_fn` method that you need to implement
The output dataframe will be persisted in a [time-series Feature Store table]({{ template `generate_doc_link` (map (pair "cloud" .input_cloud) (pair "path" "machine-learning/feature-store/time-series.html")) }}).
See the example modules' documentation for more information.
* Python unit tests for feature computation modules in `tests/feature_engineering` folder.
* Feature engineering notebook, `feature_engineering/notebooks/GenerateAndWriteFeatures.py`, that reads input dataframes, dynamically loads feature computation modules, executes their `compute_features_fn` method and writes the outputs to a Feature Store table (creating it if missing).
* Feature engineering notebook, `feature_engineering/GenerateAndWriteFeatures.py`, that reads input dataframes, dynamically loads feature computation modules, executes their `compute_features_fn` method and writes the outputs to a Feature Store table (creating it if missing).
* Training notebook that [trains]({{ template `generate_doc_link` (map (pair "cloud" .input_cloud) (pair "path" "machine-learning/feature-store/train-models-with-feature-store.html")) }} ) a regression model by creating a training dataset using the Feature Store client.
* Model deployment and batch inference notebooks that deploy and use the trained model.
* An automated integration test is provided (in `.github/workflows/{{ .input_project_name }}-run-tests.yml`) that executes a multi task run on Databricks involving the feature engineering and model training notebooks.
Expand Down Expand Up @@ -200,7 +162,7 @@ Otherwise, e.g. if iterating on ML code for a new project, follow the steps belo
You can iterate on the feature transform modules locally in your favorite IDE before running them on Databricks.

#### Running code on Databricks
You can iterate on ML code by running the provided `feature_engineering/notebooks/GenerateAndWriteFeatures.py` notebook on Databricks using
You can iterate on ML code by running the provided `feature_engineering/GenerateAndWriteFeatures.py` notebook on Databricks using
[Repos]({{ template `generate_doc_link` (map (pair "cloud" .input_cloud) (pair "path" "repos/index.html")) }}). This notebook drives execution of
the feature transforms code defined under ``features``. You can use multiple browser tabs to edit
logic in `features` and run the feature engineering pipeline in the `GenerateAndWriteFeatures.py` notebook.
Expand Down
Loading