From 70ce6dddc5067708cfc6904643fe4dbcfcf51d7f Mon Sep 17 00:00:00 2001 From: James Anstey Date: Mon, 23 Feb 2026 20:27:29 +0000 Subject: [PATCH 01/12] example opportunity yaml format --- scripts/unharmonised/opp_test.yaml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 scripts/unharmonised/opp_test.yaml diff --git a/scripts/unharmonised/opp_test.yaml b/scripts/unharmonised/opp_test.yaml new file mode 100644 index 0000000..acf2f53 --- /dev/null +++ b/scripts/unharmonised/opp_test.yaml @@ -0,0 +1,20 @@ +title: Baseline Climate Variables for Earth System Modelling + +description: The Baseline Climate Variables for Earth System Modelling (ESM-BCVs) for intercomparison and evaluation are defined as a list of 132 variables which have high utility for the evaluation and exploitation of climate simulations. The list reflects the most heavily used elements of the CMIP6 archive. Successive phases of CMIP have achieved strong results in science and substantial influence in international climate policy formulation. This paper responds both to interest in exploiting CMIP data standards in a broader range of climate modelling activities and a need to achieve greater clarity about the significance and intention of variables in the CMIP Data Request. As archives of Earth System Model (ESM) outputs grow in scale and complexity there are emerging problems associated with weak standardisation at the variable collection level. That is, there are good standards covering how specific variables should be archived, but this paper fills a gap in the standardisation of which variables should be archived. The ESM-BCV list is intended as a resource for ESM-MIPs developing requests to enable greater consistency among MIPs, and as a reference for modelling centres to enhance consistency within MIPs. Provisional planning for the CMIP7 Data Request exploits the ESM-BCVs as a core element. The baseline variables list includes 103 variables which have modest or minor data volume footprints and could be generated systematically when simulations are produced and archived for exploitation by the WCRP community. A further 34 variables are classed as high volume and are only suitable for production when the resource implications are justified. + +justification_of_resources: The baseline variables will support a wide range of applications and the production of all the variables on the list should not impose excessive burdens on modelling centers and infrastructure providers. + +expected_impacts: Greater consistency in provision of high-impact data variables. + +variable_groups: +- baseline_monthly +- baseline_daily +- baseline_fixed +- baseline_subdaily + +experiment_groups: +- deck +- fast-track +- scenarios +- historical +- all-non-fasttrack From 834b0ac0dfc44468803eb5abdcca2dcc4e141f3a Mon Sep 17 00:00:00 2001 From: James Anstey Date: Wed, 15 Apr 2026 05:45:06 +0000 Subject: [PATCH 02/12] removed first test yaml file --- scripts/unharmonised/opp_test.yaml | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/scripts/unharmonised/opp_test.yaml b/scripts/unharmonised/opp_test.yaml index acf2f53..94a11a2 100644 --- a/scripts/unharmonised/opp_test.yaml +++ b/scripts/unharmonised/opp_test.yaml @@ -1,18 +1,18 @@ -title: Baseline Climate Variables for Earth System Modelling +Title: Baseline Climate Variables for Earth System Modelling -description: The Baseline Climate Variables for Earth System Modelling (ESM-BCVs) for intercomparison and evaluation are defined as a list of 132 variables which have high utility for the evaluation and exploitation of climate simulations. The list reflects the most heavily used elements of the CMIP6 archive. Successive phases of CMIP have achieved strong results in science and substantial influence in international climate policy formulation. This paper responds both to interest in exploiting CMIP data standards in a broader range of climate modelling activities and a need to achieve greater clarity about the significance and intention of variables in the CMIP Data Request. As archives of Earth System Model (ESM) outputs grow in scale and complexity there are emerging problems associated with weak standardisation at the variable collection level. That is, there are good standards covering how specific variables should be archived, but this paper fills a gap in the standardisation of which variables should be archived. The ESM-BCV list is intended as a resource for ESM-MIPs developing requests to enable greater consistency among MIPs, and as a reference for modelling centres to enhance consistency within MIPs. Provisional planning for the CMIP7 Data Request exploits the ESM-BCVs as a core element. The baseline variables list includes 103 variables which have modest or minor data volume footprints and could be generated systematically when simulations are produced and archived for exploitation by the WCRP community. A further 34 variables are classed as high volume and are only suitable for production when the resource implications are justified. +Description: The Baseline Climate Variables for Earth System Modelling (ESM-BCVs) for intercomparison and evaluation are defined as a list of 132 variables which have high utility for the evaluation and exploitation of climate simulations. The list reflects the most heavily used elements of the CMIP6 archive. Successive phases of CMIP have achieved strong results in science and substantial influence in international climate policy formulation. This paper responds both to interest in exploiting CMIP data standards in a broader range of climate modelling activities and a need to achieve greater clarity about the significance and intention of variables in the CMIP Data Request. As archives of Earth System Model (ESM) outputs grow in scale and complexity there are emerging problems associated with weak standardisation at the variable collection level. That is, there are good standards covering how specific variables should be archived, but this paper fills a gap in the standardisation of which variables should be archived. The ESM-BCV list is intended as a resource for ESM-MIPs developing requests to enable greater consistency among MIPs, and as a reference for modelling centres to enhance consistency within MIPs. Provisional planning for the CMIP7 Data Request exploits the ESM-BCVs as a core element. The baseline variables list includes 103 variables which have modest or minor data volume footprints and could be generated systematically when simulations are produced and archived for exploitation by the WCRP community. A further 34 variables are classed as high volume and are only suitable for production when the resource implications are justified. -justification_of_resources: The baseline variables will support a wide range of applications and the production of all the variables on the list should not impose excessive burdens on modelling centers and infrastructure providers. +Justification of Resources: The baseline variables will support a wide range of applications and the production of all the variables on the list should not impose excessive burdens on modelling centers and infrastructure providers. -expected_impacts: Greater consistency in provision of high-impact data variables. +Expected Impacts: Greater consistency in provision of high-impact data variables. -variable_groups: +Variable Groups: - baseline_monthly - baseline_daily - baseline_fixed - baseline_subdaily -experiment_groups: +Experiment Groups: - deck - fast-track - scenarios From d2f8d60ab910ab11ada2c310a9d8f7c516e4b38b Mon Sep 17 00:00:00 2001 From: James Anstey Date: Wed, 15 Apr 2026 05:46:34 +0000 Subject: [PATCH 03/12] new test opp yaml --- scripts/unharmonised/opp_test.yaml | 61 +++++++++++++++++++++++++----- 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/scripts/unharmonised/opp_test.yaml b/scripts/unharmonised/opp_test.yaml index 94a11a2..5d65db8 100644 --- a/scripts/unharmonised/opp_test.yaml +++ b/scripts/unharmonised/opp_test.yaml @@ -1,20 +1,61 @@ -Title: Baseline Climate Variables for Earth System Modelling +# Sketch of yaml file to define an Opportunity in the unharmonised DR +# +# - Minimal definition of Opportunity (one yaml file = one Opportunity) +# - Variable Groups can be existing (just give name) or new (define it here) +# - Ditto for Experiment Groups +# +# Use of existing experiments (in CVs) and variables (in CMIP7 DR) +# Defining new experiments done in CVs +# Defining new variables needs new process (variable register governance?) +# +# Ingestion script processes the yaml file: +# - validate experiment names against CVs (use esgvoc) +# - validate variables against DR +# - supplement with derived info (e.g. no. of variables, Opportunity volume, ...) +# - saves the Opportunity in equivalent json format to Harmonised DR Opportunities +# Same format ==> can be treated like any Harmonised Opportunities using existing DR python tools -Description: The Baseline Climate Variables for Earth System Modelling (ESM-BCVs) for intercomparison and evaluation are defined as a list of 132 variables which have high utility for the evaluation and exploitation of climate simulations. The list reflects the most heavily used elements of the CMIP6 archive. Successive phases of CMIP have achieved strong results in science and substantial influence in international climate policy formulation. This paper responds both to interest in exploiting CMIP data standards in a broader range of climate modelling activities and a need to achieve greater clarity about the significance and intention of variables in the CMIP Data Request. As archives of Earth System Model (ESM) outputs grow in scale and complexity there are emerging problems associated with weak standardisation at the variable collection level. That is, there are good standards covering how specific variables should be archived, but this paper fills a gap in the standardisation of which variables should be archived. The ESM-BCV list is intended as a resource for ESM-MIPs developing requests to enable greater consistency among MIPs, and as a reference for modelling centres to enhance consistency within MIPs. Provisional planning for the CMIP7 Data Request exploits the ESM-BCVs as a core element. The baseline variables list includes 103 variables which have modest or minor data volume footprints and could be generated systematically when simulations are produced and archived for exploitation by the WCRP community. A further 34 variables are classed as high volume and are only suitable for production when the resource implications are justified. -Justification of Resources: The baseline variables will support a wide range of applications and the production of all the variables on the list should not impose excessive burdens on modelling centers and infrastructure providers. +Title: Example unharmonised DR opportunity -Expected Impacts: Greater consistency in provision of high-impact data variables. +MIP: More appropriate for unharmonised? (Instead of MIPs high/low priority entries as in AFT DR) + +Description: Description of the DR Opportunity. + +Justification of Resources: Optional for non-AFT? + +Expected Impacts: Optional for non-AFT? + +Time Subset: Variable Groups: - baseline_monthly - baseline_daily -- baseline_fixed -- baseline_subdaily +- new_var_group1: + Title: First new variable group + Justification: Absolutely essential variables + Priority Level: High + Priority Level: Medium + Region: + Notes: + Variables: + - aerosol.abs550aer.tavg-u-hxy-u.mon.glb + - aerosol.abs550bc.tavg-u-hxy-u.mon.glb + +- new_var_group2: + Title: Second new variable group + Justification: Nice-to-have variables + Priority Level: Medium + Region: + Notes: + Variables: + - seaIce.ts.tavg-u-hxy-si.mon.glb + - not.really.a.variable Experiment Groups: - deck -- fast-track -- scenarios -- historical -- all-non-fasttrack +- new_expt_group2: + Title: NewMIP experiments + Experiments: + - new-mip-expt1 + - new-mip-expt2 From dfe57a3442e7c626aa7a8e04141cc8ebca2b30de Mon Sep 17 00:00:00 2001 From: James Anstey Date: Wed, 15 Apr 2026 05:46:45 +0000 Subject: [PATCH 04/12] add ingest script --- scripts/unharmonised/ingest.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100755 scripts/unharmonised/ingest.py diff --git a/scripts/unharmonised/ingest.py b/scripts/unharmonised/ingest.py new file mode 100755 index 0000000..f26e705 --- /dev/null +++ b/scripts/unharmonised/ingest.py @@ -0,0 +1,14 @@ +#!/usr/bin/env python + +import pprint +import os +import yaml + + +filepath = 'opp_test.yaml' +with open(filepath, 'r') as f: + opp = yaml.safe_load(f) + +pprint.pprint(opp, width=120) + + From ea8fd6005573134d84b129707ad480e643de9627 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 05:36:52 +0000 Subject: [PATCH 05/12] template opportunity yaml file and corresponding ingestion script for unharmonised DR --- .../unharmonised/DR_Opportunity_template.yaml | 43 +++++++ scripts/unharmonised/ingest.py | 116 +++++++++++++++++- 2 files changed, 153 insertions(+), 6 deletions(-) create mode 100644 scripts/unharmonised/DR_Opportunity_template.yaml diff --git a/scripts/unharmonised/DR_Opportunity_template.yaml b/scripts/unharmonised/DR_Opportunity_template.yaml new file mode 100644 index 0000000..cffefe6 --- /dev/null +++ b/scripts/unharmonised/DR_Opportunity_template.yaml @@ -0,0 +1,43 @@ +# Minimal template for a data request Opportunity, for use by community MIPs. + +Title: Short descriptive title of the Opportunity + +MIP: Name of MIP + +Description: Statement of the general purpose of this data request. + +Expected Impacts: (Optional) Explanation of why this combination of variables and experiments is important. + +Justification of Resources: (Optional) Explanation of how the requested variables map onto the impacts, and estimate of the resources required. + +Experiment Groups: +# An Experiment Group specifies a list of experiments for which requested variables should be produced. +# A community MIP may need only one experiment group, but existing groups can also be listed, or existing +# experiments included in a new group (but note that runs of AFT experiments may already have been completed +# by modelling centres.) +- example_experiment_group # new Experiment Group, defined below + +Variable Groups: +# Each Variable Group defines a set of requested variables and its priority. +# The list of Variable Groups can include both new and existing groups. +- baseline_monthly # existing Variable Group +- example_variable_group # new Variable Group, defined below + +New Experiment Groups: + + example_experiment_group: + Title: Short descriptive title of the Experiment Group + Experiments: + - amip # existing experiment + - example_new_experiment # new experiment, must be registered in CVs + +New Variable Groups: + + example_variable_group: + Title: Short descriptive title of the Variable Group + Priority Level: High # High, Medium, or Low (not case sensitive) + Justification: (Optional) Explanation of why these variables are important. + Notes: (Optional) Any additional comments about the variable group. + Variables: # list of requested variable names + - land.gpp.tavg-u-hxy-lnd.mon.glb + - atmos.fco2nat.tavg-u-hxy-u.mon.glb diff --git a/scripts/unharmonised/ingest.py b/scripts/unharmonised/ingest.py index f26e705..9b0bc07 100755 --- a/scripts/unharmonised/ingest.py +++ b/scripts/unharmonised/ingest.py @@ -1,14 +1,118 @@ #!/usr/bin/env python +''' +Ingest a yaml file that specifies a data request Opportunity +''' -import pprint -import os +# import os +import argparse +import json import yaml +from collections import OrderedDict +from pydantic import BaseModel -filepath = 'opp_test.yaml' -with open(filepath, 'r') as f: - opp = yaml.safe_load(f) +import data_request_api.content.dreq_content as dc +import data_request_api.query.dreq_query as dq +from data_request_api.query.dreq_classes import ( + PRIORITY_LEVELS, format_attribute_name) -pprint.pprint(opp, width=120) +class ExperimentGroup(BaseModel): + title: str + experiments: list[str] +class VariableGroup(BaseModel): + title: str + priority_level: str + justification: str = '' + notes: str = '' + variables: list[str] + +class Opportunity(BaseModel): + title: str + mip: str + description: str + expected_impacts: str = '' + justification_of_resources: str = '' + experiment_groups: list[str] + variable_groups: list[str] + + +def parse_args(): + ''' Parse command line arguments''' + parser = argparse.ArgumentParser(description="Validate data request Opportunity specified by input yaml file") + + # Mandatory arguments + parser.add_argument('input', + help="Opportunity specifications (yaml file)") + parser.add_argument('output', + help="Validated Opportunity specifications (json file)") + parser.add_argument('dreq_version', choices=dc.get_versions(), + help="Data Request version used to validate input") + + return parser.parse_args() + + +if __name__ == '__main__': + + args = parse_args() + filepath = args.input + + # Read setup file for new Opportunity + # filepath = 'DR_Opportunity_template.yaml' + with open(filepath, 'r') as f: + opp = yaml.safe_load(f) + + # Validate any new variable or experiment groups + sections = ['New Experiment Groups', 'New Variable Groups'] + for section in sections: + for name,info in opp[section].items(): + opp[section][name] = {format_attribute_name(k):v for k,v in info.items()} + match section: + case 'New Experiment Groups': + expt_groups = {name: ExperimentGroup(**info) for name,info in opp[section].items()} + case 'New Variable Groups': + variable_groups = {name: VariableGroup(**info) for name,info in opp[section].items()} + case _: + raise ValueError('Invalid section: ' + section) + opp.pop(section) + + # Check priority levels in new Variable Groups are valid + for vg_name, vg in variable_groups.items(): + if vg.priority_level.lower() not in PRIORITY_LEVELS: + raise ValueError(f'Unknown Priority Level for Variable Group {vg_name}: {vg.priority_level}') + + # Check variable names in new Variable Groups are valid + content = dc.load(args.dreq_version) + all_var_info = dq.get_variables_metadata(content, args.dreq_version) + cmip7_compound_names = set([var_info['cmip7_compound_name'] for var_info in all_var_info.values()]) + cmip6_compound_names = set([var_info['cmip6_compound_name'] for var_info in all_var_info.values()]) + # assert len(cmip7_compound_names) == len(cmip6_compound_names) + for vg_name, vg in variable_groups.items(): + invalid_variables = [] + for var_name in vg.variables: + # TODO: should user be forced to say whether using CMIP6 or CMIP7 variable names? + if not (var_name in cmip7_compound_names or var_name in cmip6_compound_names): + invalid_variables.append(var_name) + if len(invalid_variables) > 0: + msg = f'Found {len(invalid_variables)} invalid variables found in Variable Group {vg_name}:\n' \ + + '\n'.join(invalid_variables) + raise ValueError(msg) + + # Validate Opportunity + opp = {format_attribute_name(k):v for k,v in opp.items()} + opp = Opportunity(**opp) + + # Write output file + out = OrderedDict({ + 'Header': OrderedDict({ + 'Provenance': f'Validated Opportunity from input file {args.input}', + 'Data Request version used for validation': args.dreq_version, + }), + 'Opportunity' : OrderedDict(opp), + 'New Experiment Groups': OrderedDict({name: OrderedDict(info) for name,info in expt_groups.items()}), + 'New Variable Groups': OrderedDict({name: OrderedDict(info) for name,info in variable_groups.items()}) + }) + with open(args.output, 'w') as f: + json.dump(out, f, indent=4) + print('Wrote ' + args.output) From 0524a82926059d48eaaaac64b31bb76bda361528 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 05:44:04 +0000 Subject: [PATCH 06/12] tidying --- scripts/unharmonised/ingest.py | 4 ++ scripts/unharmonised/opp_test.yaml | 61 ------------------------------ 2 files changed, 4 insertions(+), 61 deletions(-) delete mode 100644 scripts/unharmonised/opp_test.yaml diff --git a/scripts/unharmonised/ingest.py b/scripts/unharmonised/ingest.py index 9b0bc07..236b66c 100755 --- a/scripts/unharmonised/ingest.py +++ b/scripts/unharmonised/ingest.py @@ -99,6 +99,10 @@ def parse_args(): + '\n'.join(invalid_variables) raise ValueError(msg) + # Validate experiments against CVs + # TODO: get valid CMIP7 experiments using esgvoc + # (cannot rely on AFT DR list since community MIPs will define new experiments) + # Validate Opportunity opp = {format_attribute_name(k):v for k,v in opp.items()} opp = Opportunity(**opp) diff --git a/scripts/unharmonised/opp_test.yaml b/scripts/unharmonised/opp_test.yaml deleted file mode 100644 index 5d65db8..0000000 --- a/scripts/unharmonised/opp_test.yaml +++ /dev/null @@ -1,61 +0,0 @@ -# Sketch of yaml file to define an Opportunity in the unharmonised DR -# -# - Minimal definition of Opportunity (one yaml file = one Opportunity) -# - Variable Groups can be existing (just give name) or new (define it here) -# - Ditto for Experiment Groups -# -# Use of existing experiments (in CVs) and variables (in CMIP7 DR) -# Defining new experiments done in CVs -# Defining new variables needs new process (variable register governance?) -# -# Ingestion script processes the yaml file: -# - validate experiment names against CVs (use esgvoc) -# - validate variables against DR -# - supplement with derived info (e.g. no. of variables, Opportunity volume, ...) -# - saves the Opportunity in equivalent json format to Harmonised DR Opportunities -# Same format ==> can be treated like any Harmonised Opportunities using existing DR python tools - - -Title: Example unharmonised DR opportunity - -MIP: More appropriate for unharmonised? (Instead of MIPs high/low priority entries as in AFT DR) - -Description: Description of the DR Opportunity. - -Justification of Resources: Optional for non-AFT? - -Expected Impacts: Optional for non-AFT? - -Time Subset: - -Variable Groups: -- baseline_monthly -- baseline_daily -- new_var_group1: - Title: First new variable group - Justification: Absolutely essential variables - Priority Level: High - Priority Level: Medium - Region: - Notes: - Variables: - - aerosol.abs550aer.tavg-u-hxy-u.mon.glb - - aerosol.abs550bc.tavg-u-hxy-u.mon.glb - -- new_var_group2: - Title: Second new variable group - Justification: Nice-to-have variables - Priority Level: Medium - Region: - Notes: - Variables: - - seaIce.ts.tavg-u-hxy-si.mon.glb - - not.really.a.variable - -Experiment Groups: -- deck -- new_expt_group2: - Title: NewMIP experiments - Experiments: - - new-mip-expt1 - - new-mip-expt2 From 5cc8c47bcc040a9f868c1c947b09316c9966d6a3 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 05:46:24 +0000 Subject: [PATCH 07/12] added example output json file from ingestion script --- .../example_validated_opportunity.json | 41 +++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 scripts/unharmonised/example_validated_opportunity.json diff --git a/scripts/unharmonised/example_validated_opportunity.json b/scripts/unharmonised/example_validated_opportunity.json new file mode 100644 index 0000000..a104b11 --- /dev/null +++ b/scripts/unharmonised/example_validated_opportunity.json @@ -0,0 +1,41 @@ +{ + "Header": { + "Provenance": "Validated Opportunity from input file DR_Opportunity_template.yaml", + "Data Request version used for validation": "v1.2.2.3" + }, + "Opportunity": { + "title": "Short descriptive title of the Opportunity", + "mip": "Name of MIP", + "description": "Statement of the general purpose of this data request.", + "expected_impacts": "(Optional) Explanation of why this combination of variables and experiments is important.", + "justification_of_resources": "(Optional) Explanation of how the requested variables map onto the impacts, and estimate of the resources required.", + "experiment_groups": [ + "example_experiment_group" + ], + "variable_groups": [ + "baseline_monthly", + "example_variable_group" + ] + }, + "New Experiment Groups": { + "example_experiment_group": { + "title": "Short descriptive title of the Experiment Group", + "experiments": [ + "amip", + "example_new_experiment" + ] + } + }, + "New Variable Groups": { + "example_variable_group": { + "title": "Short descriptive title of the Variable Group", + "priority_level": "High", + "justification": "(Optional) Explanation of why these variables are important.", + "notes": "(Optional) Any additional comments about the variable group.", + "variables": [ + "land.gpp.tavg-u-hxy-lnd.mon.glb", + "atmos.fco2nat.tavg-u-hxy-u.mon.glb" + ] + } + } +} \ No newline at end of file From bbac18133c2742a762946f265d488604402e0e07 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 05:47:21 +0000 Subject: [PATCH 08/12] updated requirements.txt --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index 1f8a8f9..b54a747 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,6 +2,7 @@ bs4 coverage openpyxl pooch +pydantic pytest pyyaml requests From 38e53652d953a4e640d06a1aac50d4476dd9caac Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 19:37:09 +0000 Subject: [PATCH 09/12] added further validation against DR content --- scripts/unharmonised/ingest.py | 63 ++++++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 19 deletions(-) diff --git a/scripts/unharmonised/ingest.py b/scripts/unharmonised/ingest.py index 236b66c..9538360 100755 --- a/scripts/unharmonised/ingest.py +++ b/scripts/unharmonised/ingest.py @@ -3,7 +3,6 @@ Ingest a yaml file that specifies a data request Opportunity ''' -# import os import argparse import json import yaml @@ -56,11 +55,12 @@ def parse_args(): if __name__ == '__main__': args = parse_args() - filepath = args.input + input_file = args.input + output_file = args.output + dreq_version = args.dreq_version # Read setup file for new Opportunity - # filepath = 'DR_Opportunity_template.yaml' - with open(filepath, 'r') as f: + with open(input_file, 'r') as f: opp = yaml.safe_load(f) # Validate any new variable or experiment groups @@ -70,28 +70,38 @@ def parse_args(): opp[section][name] = {format_attribute_name(k):v for k,v in info.items()} match section: case 'New Experiment Groups': - expt_groups = {name: ExperimentGroup(**info) for name,info in opp[section].items()} + new_expt_groups = {name: ExperimentGroup(**info) for name,info in opp[section].items()} case 'New Variable Groups': - variable_groups = {name: VariableGroup(**info) for name,info in opp[section].items()} + new_var_groups = {name: VariableGroup(**info) for name,info in opp[section].items()} case _: raise ValueError('Invalid section: ' + section) opp.pop(section) # Check priority levels in new Variable Groups are valid - for vg_name, vg in variable_groups.items(): + for vg_name, vg in new_var_groups.items(): if vg.priority_level.lower() not in PRIORITY_LEVELS: raise ValueError(f'Unknown Priority Level for Variable Group {vg_name}: {vg.priority_level}') + # Get DR content to use in further validating the input + dreq_content = dc.load(dreq_version) + base = dq._get_base_dreq_tables(dreq_content, dreq_version, purpose='request') + dreq_var_info = dq.get_variables_metadata(base, dreq_version) + cmip7_compound_names = set([var_info['cmip7_compound_name'] for var_info in dreq_var_info.values()]) + cmip6_compound_names = set([var_info['cmip6_compound_name'] for var_info in dreq_var_info.values()]) + dreq_expt_group_names = set(rec.name for rec in base['Experiment Group'].records.values()) + dreq_var_group_names = set(rec.name for rec in base['Variable Group'].records.values()) + + # Check new Variable Group names don't conflict with any already in the DR + for vg_name in new_var_groups: + if vg_name in dreq_var_group_names: + raise ValueError(f'Variable Group already exists in DR {dreq_version}: {vg_name}') + # Check variable names in new Variable Groups are valid - content = dc.load(args.dreq_version) - all_var_info = dq.get_variables_metadata(content, args.dreq_version) - cmip7_compound_names = set([var_info['cmip7_compound_name'] for var_info in all_var_info.values()]) - cmip6_compound_names = set([var_info['cmip6_compound_name'] for var_info in all_var_info.values()]) - # assert len(cmip7_compound_names) == len(cmip6_compound_names) - for vg_name, vg in variable_groups.items(): + for vg_name, vg in new_var_groups.items(): invalid_variables = [] for var_name in vg.variables: # TODO: should user be forced to say whether using CMIP6 or CMIP7 variable names? + # TODO: if new variables are defined (beyond those in AFT DR) then need to add these here as valid names if not (var_name in cmip7_compound_names or var_name in cmip6_compound_names): invalid_variables.append(var_name) if len(invalid_variables) > 0: @@ -99,6 +109,11 @@ def parse_args(): + '\n'.join(invalid_variables) raise ValueError(msg) + # Check new Experiment Group names don't conflict with any already in the DR + for eg_name in new_expt_groups: + if eg_name in dreq_expt_group_names: + raise ValueError(f'Experiment Group already exists in DR {dreq_version}: {eg_name}') + # Validate experiments against CVs # TODO: get valid CMIP7 experiments using esgvoc # (cannot rely on AFT DR list since community MIPs will define new experiments) @@ -107,16 +122,26 @@ def parse_args(): opp = {format_attribute_name(k):v for k,v in opp.items()} opp = Opportunity(**opp) + # Check full Variable Group and Experiment Group lists either defined as new or existing in the DR + all_expt_group_names = dreq_expt_group_names.union(new_expt_groups.keys()) + all_var_group_names = dreq_var_group_names.union(new_var_groups.keys()) + for eg_name in opp.experiment_groups: + if eg_name not in all_expt_group_names: + raise ValueError(f'Experiment Group {eg_name} has not been newly defined and does not already exist in DR {dreq_version}') + for vg_name in opp.variable_groups: + if vg_name not in all_var_group_names: + raise ValueError(f'Variable Group {vg_name} has not been newly defined and does not already exist in DR {dreq_version}') + # Write output file out = OrderedDict({ 'Header': OrderedDict({ - 'Provenance': f'Validated Opportunity from input file {args.input}', - 'Data Request version used for validation': args.dreq_version, + 'Provenance': f'Validated Opportunity from input file {input_file}', + 'Data Request version used for validation': dreq_version, }), 'Opportunity' : OrderedDict(opp), - 'New Experiment Groups': OrderedDict({name: OrderedDict(info) for name,info in expt_groups.items()}), - 'New Variable Groups': OrderedDict({name: OrderedDict(info) for name,info in variable_groups.items()}) + 'New Experiment Groups': OrderedDict({name: OrderedDict(info) for name,info in new_expt_groups.items()}), + 'New Variable Groups': OrderedDict({name: OrderedDict(info) for name,info in new_var_groups.items()}) }) - with open(args.output, 'w') as f: + with open(output_file, 'w') as f: json.dump(out, f, indent=4) - print('Wrote ' + args.output) + print('Wrote ' + output_file) From 9cc3f30275af9f3ebfae50975e634974e6c61a14 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Thu, 16 Apr 2026 19:50:13 +0000 Subject: [PATCH 10/12] updated yaml template and example output json --- scripts/unharmonised/DR_Opportunity_template.yaml | 9 +++------ scripts/unharmonised/example_validated_opportunity.json | 9 +++++---- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/scripts/unharmonised/DR_Opportunity_template.yaml b/scripts/unharmonised/DR_Opportunity_template.yaml index cffefe6..db5364a 100644 --- a/scripts/unharmonised/DR_Opportunity_template.yaml +++ b/scripts/unharmonised/DR_Opportunity_template.yaml @@ -4,7 +4,7 @@ Title: Short descriptive title of the Opportunity MIP: Name of MIP -Description: Statement of the general purpose of this data request. +Description: Statement of the general purpose of this Opportunity's data request. Expected Impacts: (Optional) Explanation of why this combination of variables and experiments is important. @@ -12,16 +12,13 @@ Justification of Resources: (Optional) Explanation of how the requested variable Experiment Groups: # An Experiment Group specifies a list of experiments for which requested variables should be produced. -# A community MIP may need only one experiment group, but existing groups can also be listed, or existing -# experiments included in a new group (but note that runs of AFT experiments may already have been completed -# by modelling centres.) - example_experiment_group # new Experiment Group, defined below +- deck # existing Experiment Group Variable Groups: # Each Variable Group defines a set of requested variables and its priority. -# The list of Variable Groups can include both new and existing groups. -- baseline_monthly # existing Variable Group - example_variable_group # new Variable Group, defined below +- baseline_monthly # existing Variable Group New Experiment Groups: diff --git a/scripts/unharmonised/example_validated_opportunity.json b/scripts/unharmonised/example_validated_opportunity.json index a104b11..d612881 100644 --- a/scripts/unharmonised/example_validated_opportunity.json +++ b/scripts/unharmonised/example_validated_opportunity.json @@ -6,15 +6,16 @@ "Opportunity": { "title": "Short descriptive title of the Opportunity", "mip": "Name of MIP", - "description": "Statement of the general purpose of this data request.", + "description": "Statement of the general purpose of this Opportunity's data request.", "expected_impacts": "(Optional) Explanation of why this combination of variables and experiments is important.", "justification_of_resources": "(Optional) Explanation of how the requested variables map onto the impacts, and estimate of the resources required.", "experiment_groups": [ - "example_experiment_group" + "example_experiment_group", + "deck" ], "variable_groups": [ - "baseline_monthly", - "example_variable_group" + "example_variable_group", + "baseline_monthly" ] }, "New Experiment Groups": { From a87dc5ae79f7f70a67fa19e7ec8eea9bc6044b94 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Wed, 13 May 2026 19:29:31 +0000 Subject: [PATCH 11/12] added initial README for unharmonised workflow --- .../README_unharmonised_workflow.md | 29 +++++++++++++++++++ scripts/unharmonised/ingest.py | 11 ++++--- 2 files changed, 36 insertions(+), 4 deletions(-) create mode 100644 scripts/unharmonised/README_unharmonised_workflow.md diff --git a/scripts/unharmonised/README_unharmonised_workflow.md b/scripts/unharmonised/README_unharmonised_workflow.md new file mode 100644 index 0000000..b361288 --- /dev/null +++ b/scripts/unharmonised/README_unharmonised_workflow.md @@ -0,0 +1,29 @@ + +## MIP workflow for Unharmonised Data Request + +⚠️ *Everything in this document is a proposal, under development, and likely to change* + +### Opportunity template + +This allows MIPs to create a `json` file representation of a DR "Opportunity" with minimal effort. + +A DR Opportunity lists variables that are requested from a specified set of experiments. +It includes a description of the scienfitic purpose of the request. +This can be very brief, but including detailed information is also possible. +A template Opportunity is provided in `yaml` format, which a MIP can edit. + +First, copy the template: +```bash +cp DR_Opportunity_template.yaml new_MIP_data_request.yaml +``` +Edit the new file, which in this example is named `new_MIP_data_request.yaml`, to specify the requested variables and experiments from which they're requested. +Variables are grouped into Variable Groups, which have a priority level (High, Medium, Low) attached. +Experiments are grouped into Experiment Groups. +If a MIP simply has one list of variables that are all requested from the same list of experiments, then one Variable Group and one Experiment Group is sufficient. + +Then validate the new request against existing DR content: +```bash +./ingest.py new_MIP_data_request.yaml new_MIP_data_request.json v1.2.2.3 +``` +This performs some sanity checks, including checking that variable and experiment names are valid (i.e., they are defined in existing DR content and CMIP7 CVs). +If the checks pass, the output file, which here is `new_MIP_data_request.json`, represents in the new request's information in a format that can be used in the DR python API. diff --git a/scripts/unharmonised/ingest.py b/scripts/unharmonised/ingest.py index 9538360..5c4cb4f 100755 --- a/scripts/unharmonised/ingest.py +++ b/scripts/unharmonised/ingest.py @@ -63,7 +63,9 @@ def parse_args(): with open(input_file, 'r') as f: opp = yaml.safe_load(f) - # Validate any new variable or experiment groups + # Retrieve specs for any new variable or experiment groups so that they can be validated + # against existing DR content, below. + # The ExperimentGroup & VariableGroup pydantic models perform validation of the input. sections = ['New Experiment Groups', 'New Variable Groups'] for section in sections: for name,info in opp[section].items(): @@ -96,7 +98,7 @@ def parse_args(): if vg_name in dreq_var_group_names: raise ValueError(f'Variable Group already exists in DR {dreq_version}: {vg_name}') - # Check variable names in new Variable Groups are valid + # Check that the variable names in new Variable Groups are valid for vg_name, vg in new_var_groups.items(): invalid_variables = [] for var_name in vg.variables: @@ -118,11 +120,12 @@ def parse_args(): # TODO: get valid CMIP7 experiments using esgvoc # (cannot rely on AFT DR list since community MIPs will define new experiments) - # Validate Opportunity + # Use Opportunity pydantic model to validate the input opp = {format_attribute_name(k):v for k,v in opp.items()} opp = Opportunity(**opp) - # Check full Variable Group and Experiment Group lists either defined as new or existing in the DR + # Check full Variable Group and Experiment Group lists are either (1) defined as new, + # or (2) exist already in the DR. all_expt_group_names = dreq_expt_group_names.union(new_expt_groups.keys()) all_var_group_names = dreq_var_group_names.union(new_var_groups.keys()) for eg_name in opp.experiment_groups: From f2757c4a6d9b5c8a4aa5ac5c4e75bf84660c59e0 Mon Sep 17 00:00:00 2001 From: James Anstey Date: Wed, 13 May 2026 19:32:37 +0000 Subject: [PATCH 12/12] edited README --- scripts/unharmonised/README_unharmonised_workflow.md | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/unharmonised/README_unharmonised_workflow.md b/scripts/unharmonised/README_unharmonised_workflow.md index b361288..503de59 100644 --- a/scripts/unharmonised/README_unharmonised_workflow.md +++ b/scripts/unharmonised/README_unharmonised_workflow.md @@ -25,5 +25,6 @@ Then validate the new request against existing DR content: ```bash ./ingest.py new_MIP_data_request.yaml new_MIP_data_request.json v1.2.2.3 ``` +This should be run in an env where the DR python API is installed ([see here](https://github.com/CMIP-Data-Request/CMIP7_DReq_Software#installation) for installation guidance). This performs some sanity checks, including checking that variable and experiment names are valid (i.e., they are defined in existing DR content and CMIP7 CVs). If the checks pass, the output file, which here is `new_MIP_data_request.json`, represents in the new request's information in a format that can be used in the DR python API.