Writing New Modules#
The OneStopCoffea Analyzer framework is built around modular units of analysis called “Analyzer Modules”. Users can extend the framework by creating their own custom modules.
Basic Structure#
All analyzer modules must inherit from analyzer.core.analysis_modules.AnalyzerModule and use the attrs library for defining parameters.
Here is a minimal example of a custom module:
from analyzer.core.analysis_modules import AnalyzerModule
from analyzer.core.columns import Column
from attrs import define
import awkward as ak
@define
class MyCustomModule(AnalyzerModule):
"""
A custom module that computes the square of a column.
Parameters
----------
input_col : Column
The input column to process.
output_col : Column
The output column where results will be stored.
"""
input_col: Column
output_col: Column
def inputs(self, metadata):
"""
Declare required input columns.
"""
return [self.input_col]
def outputs(self, metadata):
"""
Declare output columns produced by this module.
"""
return [self.output_col]
def run(self, columns, params):
"""
Execute the analysis logic.
Parameters
----------
columns : TrackedColumns
The columnar data container.
params : dict
Runtime parameters (e.g. systematics).
Returns
-------
columns : TrackedColumns
The updated columns object.
results : list
A list of any additional results (like histograms).
"""
# Access the data
data = columns[self.input_col]
# Perform computation
result = data ** 2
# Store the result
columns[self.output_col] = result
# Return modified columns and an empty list of side-results
return columns, []
Core Components#
Subclasses of AnalyzerModule must implement three key methods:
inputs(self, metadata): Returns a list of Column objects that this module requires. This allows the framework to manage dependencies and load only necessary data.
outputs(self, metadata): Returns a list of Column objects that this module produces. This is used for checking if requirements of downstream modules are met.
run(self, columns, params): The core logic. It receives the data in columns and any runtime params.
Metadata and Configuration#
Modules can inspect columns.metadata to change their behavior based on the dataset (e.g. Data vs MC, different eras).
def run(self, columns, params):
if columns.metadata["sample_type"] == "MC":
# Apply MC-specific corrections
pass
return columns, []