Writing New Modules#

The OneStopCoffea Analyzer framework is built around modular units of analysis called “Analyzer Modules”. Users can extend the framework by creating their own custom modules.

Basic Structure#

All analyzer modules must inherit from analyzer.core.analysis_modules.AnalyzerModule and use the attrs library for defining parameters.

Here is a minimal example of a custom module:

from analyzer.core.analysis_modules import AnalyzerModule
from analyzer.core.columns import Column
from attrs import define
import awkward as ak

@define
class MyCustomModule(AnalyzerModule):
    """
    A custom module that computes the square of a column.

    Parameters
    ----------
    input_col : Column
        The input column to process.
    output_col : Column
        The output column where results will be stored.
    """
    input_col: Column
    output_col: Column

    def inputs(self, metadata):
        """
        Declare required input columns.
        """
        return [self.input_col]

    def outputs(self, metadata):
        """
        Declare output columns produced by this module.
        """
        return [self.output_col]

    def run(self, columns, params):
        """
        Execute the analysis logic.

        Parameters
        ----------
        columns : TrackedColumns
            The columnar data container.
        params : dict
            Runtime parameters (e.g. systematics).

        Returns
        -------
        columns : TrackedColumns
            The updated columns object.
        results : list
            A list of any additional results (like histograms).
        """
        # Access the data
        data = columns[self.input_col]

        # Perform computation
        result = data ** 2

        # Store the result
        columns[self.output_col] = result

        # Return modified columns and an empty list of side-results
        return columns, []

Core Components#

Subclasses of AnalyzerModule must implement three key methods:

  1. inputs(self, metadata): Returns a list of Column objects that this module requires. This allows the framework to manage dependencies and load only necessary data.

  2. outputs(self, metadata): Returns a list of Column objects that this module produces. This is used for checking if requirements of downstream modules are met.

  3. run(self, columns, params): The core logic. It receives the data in columns and any runtime params.

Metadata and Configuration#

Modules can inspect columns.metadata to change their behavior based on the dataset (e.g. Data vs MC, different eras).

def run(self, columns, params):
    if columns.metadata["sample_type"] == "MC":
         # Apply MC-specific corrections
         pass
    return columns, []