Datasets and Metadata#
The OneStopCoffea Analyzer separates dataset definitions and era-specific configurations from the core code. This allows for flexible management of different data-taking periods and MC campaigns.
The configuration is loaded from two locations:
Eras: analyzer_resources/eras/
Datasets: analyzer_resources/datasets/
Eras#
Era files (e.g., 2018.yaml) define global metadata and corrections for a specific data-taking period.
Key fields include:
name: The era name (e.g., ‘2018’).
lumi: Integrated luminosity in fb^-1.
trigger_names: Mapping of abstract trigger names (e.g., HT) to concrete HLT paths.
jet_corrections: Paths to JEC/JER files and definitions of systematics.
json_pog: Paths to JSON POG correction files.
See analyzer_resources/eras/ for examples.
Datasets#
Dataset files are organized by era (e.g., analyzer_resources/datasets/2018/). They group related samples (like different HT bins of QCD) into a single logical dataset.
Each file is a YAML list containing one or more dataset definitions.
Common fields:
name: Unique identifier for the dataset.
title: Human-readable title.
sample_type: Data or MC.
era: The era this dataset belongs to.
samples: A list of specific file sets (DAS paths or file lists).
Data Example#
- name: data_JetHT_2018
title: Data
sample_type: Data
era: '2018'
samples:
- name: Run2018A-UL2018_MiniAODv2_NanoAODv9-v2
n_events: 171484635
das_path: /JetHT/Run2018A-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD
- name: Run2018B-UL2018_MiniAODv2_NanoAODv9-v1
n_events: 78255208
das_path: /JetHT/Run2018B-UL2018_MiniAODv2_NanoAODv9-v1/NANOAOD
- name: Run2018C-UL2018_MiniAODv2_NanoAODv9-v1
n_events: 70027804
das_path: /JetHT/Run2018C-UL2018_MiniAODv2_NanoAODv9-v1/NANOAOD
- name: Run2018D-UL2018_MiniAODv2_NanoAODv9-v2
n_events: 356976276
das_path: /JetHT/Run2018D-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD
Background MC Example#
- name: wjets_to_qq_2018
title: WJetsQQ
sample_type: MC
era: '2018'
samples:
- name: WJetsToQQ_HT-200to400_TuneCP5_13TeV-madgraphMLM-pythia8
n_events: 14494966
das_path: /WJetsToQQ_HT-200to400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM
x_sec: 2546000.0
- name: WJetsToQQ_HT-400to600_TuneCP5_13TeV-madgraphMLM-pythia8
n_events: 9335298
das_path: /WJetsToQQ_HT-400to600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM
x_sec: 276500.0
- name: WJetsToQQ_HT-600to800_TuneCP5_13TeV-madgraphMLM-pythia8
n_events: 13633226
das_path: /WJetsToQQ_HT-600to800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM
x_sec: 59250.0
- name: WJetsToQQ_HT-800toInf_TuneCP5_13TeV-madgraphMLM-pythia8
n_events: 13581343
das_path: /WJetsToQQ_HT-800toInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM
x_sec: 28750.0
Metadata Access#
In an AnalyzerModule, the columns.metadata dictionary provides access to the merged configuration:
Era Metadata: Everything from the era YAML file is available under the era key.
# Access trigger mapping defined in 2018.yaml trigger_name = columns.metadata["era"]["trigger_names"]["HT"]
Dataset Metadata: Top-level keys like sample_type and x_sec come from the dataset definition.
if columns.metadata["sample_type"] == "MC": weight = columns.metadata["x_sec"]