Introduction | ATLAS Common Analysis Framework

A bit of history

The software project documented on this website is called CAF, the "common analysis framework". This framework was originally designed by members of the ATLAS HWW anaysis group, which was called Higgs Search Group 3 back in the days of 2011 and 2012 when the Higgs boson was first discovered. This framework was developed for and used by the H→WW→lνlν anaylsis, which significantly contributed to the Higgs bosons discovery in July of 2012. Back then, it was already written in a very general way and with a few broad ideas in mind:

Most of the work done by HEP analyses is very similar. Once the objects are selected and calibration and overlap removal are applied, performing the event selection, filling histograms and producing cutflows is a very straight-forward task that does not need to be reimplemented by several people in their own private code bases, which is a waste of time and effort, and dangerously error-prone. It should thus be possible to provide a framework to this end that is completely independent of the details of the analysis and performs these (simple) tasks of bookkeeping and visualizing your data efficiently.
Writing code is cumbersome and error-prone. It should not be required to edit and recompile your code when you just want to add a cut or change a cut value, book a new histogram or change the ordering of your cuts, change a colour or axis label in your plot, the stacking order of the legend or similarly trivial details.
In order to work efficiently and effectively, the output should be a single, human-readable file that can be exchanged between analysts, visualized and edited and has all information needed to debug your analysis.

With these ideas in mind, the idea of a SampleFolder was born. They are basically like folders on your file system, but they contain samples (that is, links to your input files) or other sample folders, histograms or event counters. They live inside a ROOT file, and since they are based on the TFolder class, you can look at them with a TBrowser. They can hold meta-information, which we call tags. These can be arbitrary strings or numbers. These SampleFolders are the central storage unit of your analysis data. On top of these SampleFolders, a visitor operates that can visit samples and perform the event loop, write out histograms or perform other useful tasks. You can retrieve that information from one (or several) parts of your SampleFolder at any time with a SampleDataReader, but at this point, before you get any further, you might want to head over to the Central Concepts section.

Since those times, the framework has evolved significantly and is now being used by a much broader audience: The HLeptons group and part of the Standard Model group use CAF, and many individual analysts employ our framework or parts of it for their own purposes. The components of this framework have held multiple names over the years: HSG3AnalysisCode, HWWAnalysisCode, QFramework and finally CAF.

The structure

The framework is separated in two main packages, which all live in our gitlab group:

CAFCore contains the central C++ code base of the framework with lots of classes and helper functions.
CAFExample is a python-based example package that will show you how to set up an analysis using CAF - you can fork from it to get started!

Most of the functionality of CAFCore is provided by the subpackage QFramework. However, several other subpackages exist.

QFramework: Core functionality, helper functions and classes.
SFramework: Statistics code, workspace building and fitting, front end to HistFactory
CAFxAODUtils: General utitlities for reading xAOD input files.
CxAODUtils: Specific utilities for reading CxAOD-type input files.

The use case

One of the central assumptions of this framework is that whatever your inputs are, you already have performed your object selection, calibration and overlap removal. And while you can also use CAF for these tasks, they are not really what the framework was designed for, and it might require some coding from your side to get these things going. So for now, we will assume that you have some form of nTuples, DxAODs, miniTrees or whatever you might call them to start with. During Run 1, the framework was designed for and used with preprocessed, skimmed and slimmed flat nTuples, but from 2014 onwards has been extensively used to run over Physics xAODs (PxAODs) as they are created by the PxAOD maker.

A central feature of PxAODs compared to other flavours of DxAODs is that they already prepare an event candidate, an instance of xAOD::CompositeParticle that will contain links to all relevant objects in the event with fixed indices. This makes accessing these objects much easier from CAF, and also allows to store systematic variations very efficiently using ElementLinks, but we're again getting ahead of ourselves here.

If you're wondering whether CAF is the analysis package you are looking for, you should mainly think about what your primary need is. Do you need something that will help you perform your object selection and apply the recommendations of the combined performance groups? Then you might want to get started with the PxAOD maker instead, or you'll just use your local analysis groups flavour of nTuple production code instead. Afterwards, however, you will probably need to perform an event selection, make stack plots and cutflows, produce workspaces, run fits and plot the fit results. And this is where CAF comes into play!

Getting involved

If you want to get started using CAF for your own analysis, how about you head over to the getting started section?

If you want to browse our code base, visit our gitlab group!

If you are not sure yet and want to ask some questions or talk to the developers, join the qframework-users mailing list and ask!

We, the CAF development team, look forward to having you in our user base, and, if you are interested, also on our team!