CMS DAS Pre-Exercises

CMS Data Analysis School Pre-Exercises - First Set

Overview

Teaching: 0 min
Exercises: 60 min
Questions
  • How do you setup a CMSSW release?

  • How do you find a dataset using the Data Aggregation Service (DAS)?

  • What are some EDM standalone utilities and what do they do?

  • What is MiniAOD and how do you use it?

Objectives
  • Understand how to setup a CMSSW release.

  • Know how to find a CMS dataset.

  • Know how to use the EDM utilities to find information about a dataset.

  • Become familiar with the MiniAOD format.

Introduction

Welcome to the first set of CMS Data Analysis School (CMSDAS) pre-exercises. The purpose of these exercises is to become familiar with the basic software tools required to perform physics analysis at the school. Please run and complete these exercises. Throughout the exercises there will be questions for you to answer. Submit your answers in the online response form available from the course web area - For CMSDAS@CERN 2023, Fermilab, the complete set of links can be found at the CMSDAS pre-exercises indico page. A large amount of additional information about these topics is available in the twikis that we reference. Please remember that twikis evolve but aim to provide the best information at any time.

Note

The CMSDAS exercises (pre-exercises as well as exercises during the school) are intended to be as generic as possible. However, CMSDAS is held at different CMS collaborating institutes - e.g. CERN, the LPC at Fermilab, DESY, etc.) Participants are expected to request and obtain local (at the intended school location) computer accounts well in advance of the school start date, to ensure they will be able to work right away. In the case of the CMSDAS@CERN 2023, the computer account you should use for all exercises is the standard CERN computing account. It is very important for participants to use the pre-exercises as a setup tool, so we recommend to use the same laptop they intend to bring with them at the school (no computer/laptop will be provided at the school), and to connect to the CERN computing resources that will be used for the school.

There are several sets of pre-exercises. As outlined above, if you are going through the pre-exercises in preparation for attending a CMSDAS, we strongly recommend using the laptop you intend to bring to the school and logging into the computing cluster local to the school, as specified below.

Note

Before proceeding with this and the following pre-exercises, make sure that you have gone through all setup steps.

Exercise 1 - Simple cut and paste exercise

This exercise is designed to run only on lxplus as copies of the scripts are present there.

Login to the lxplus cluster. If you are preparing for CMSDAS@CERN 2023, this is the cluster you are supposed to use for the pre-exercises. If you have not used the Linux command line before, you may learn more at WorkBookBasicLinux.

To connect to lxplus service, try the following commands (using Terminal with a Mac/Linux operating system; or putty or cygwin with a Windows operating system):

ssh -Y <YourUsername>@lxplus.cern.ch

replacing <YourUsername> with your actual username. Enter the password. After a successful login, you should see the following message:

    * ********************************************************************
    * Welcome to lxplus753.cern.ch, CentOS Linux release 7.9.2009 (Core)
    * Archive of news is available in /etc/motd-archive
    * Reminder: you have agreed to the CERN
    *   computing rules, in particular OC5. CERN implements
    *   the measures necessary to ensure compliance.
    *   https://cern.ch/ComputingRules
    * Puppet environment: production, Roger state: production
    * Foreman hostgroup: lxplus/nodes/login
    * Availability zone: cern-geneva-b
    * LXPLUS Public Login Service - http://lxplusdoc.web.cern.ch/
    * An AlmaLinux8 based lxplus8.cern.ch is now available
    * An AlmaLinux9 based lxplus9.cern.ch is now available
    * Please read LXPLUS Privacy Notice in http://cern.ch/go/TpV7
    * ********************************************************************

As the exercises often require copying and pasting from instructions, we will make sure that you will have no problems. To verify if cut and paste to/from a terminal window works, first copy the script runThisCommand.py as follows. Once connected use the following command to copy the runThisCommand.py script and make it so that the script is executable (Mac/Linux/Windows):

cp /afs/cern.ch/cms/Tutorials/CMSDASatCERN23/runThisCommand.py .
chmod +x runThisCommand.py

Next, cut and paste the following and then hit return:

./runThisCommand.py "asdf;klasdjf;kakjsdf;akjf;aksdljf;a" "sldjfqewradsfafaw4efaefawefzdxffasdfw4ffawefawe4fawasdffadsfef"

The response should be your username followed by alphanumeric string of characters unique to your username, for example for a user named slaurila:

success: slaurila fynhevyn

If you executed the command without copy-pasting (i.e. only ./runThisCommand.py without the additional arguments) the command will return:

Error: You must provide the secret key

Alternatively, copying incorrectly (i.e. different arguments) will return:

Error: You didn't paste the correct input string

If you are not running on lxplus7 (for example locally on a laptop), trying to run the command will result in:

bash: ./runThisCommand.py: No such file or directory

or (for example):

Unknown user: slaurila.

Question 1

Post the alphanumeric string of characters unique to your username. For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN 2023 Google Form first set. NOTE, answer only Question 1 at this point. Question 2 in the form is related to the next exercise. There is a one-to-one correspondence between the question numbers here and in the Google Form.

Exercise 2 - Simple edit exercise

This exercise is designed to run only on lxplus.

The purpose of this exercise is to ensure that the user can edit files. We will first copy and then edit the editThisCommand.py script. This means that you need to be able to use one of the standard text editors (emacs, pico, nano, vi, vim, etc.) available on the cluster you are running (lxplus), open a file, edit it and save it!

On the lxplus cluster, run:

cp /afs/cern.ch/cms/Tutorials/CMSDASatCERN23/editThisCommand.py .

Then open editThisCommand.py with your favorite editor (e.g. emacs -nw editThisCommand.py) and make sure that the 11th line has # (hash character) as the first character of the line. If not, explicitly change the following three lines:

# Please comment the line below out by adding a '#' to the front of
# the line.
raise RuntimeError, "You need to comment out this line with a #"

to:

# Please comment the line below out by adding a '#' to the front of
# the line.
#raise RuntimeError, "You need to comment out this line with a #"

Save the file (e.g. in emacs CTRL+x CTRL+s to save, CTRL+x CTRL+c to quit the editor) and execute the command:

./editThisCommand.py

If this is successful, the result will again contain your username and another string, i.e. something like:

success:  slaurila 0x-7343CEEA

If the file has not been successfully edited, an error message will result such as:

Traceback (most recent call last):
  File "./editThisCommand.py", line 11, in ?
    raise RuntimeError, "You need to comment out this line with a #"
RuntimeError: You need to comment out this line with a #

Question 2

Paste the line beginning with “success”, resulting from the execution of ./editThisCommand.py, into the form provided.

Exercise 3 - Setup a CMSSW release area

CMSSW is the CMS SoftWare framework used in our collaboration to process and analyze data. In order to use it, you need to set up your environment and set up a local CMSSW release.

source /cvmfs/cms.cern.ch/cmsset_default.sh
export CMSSW_GIT_REFERENCE=/cvmfs/cms.cern.ch/cmssw.git.daily
source /cvmfs/cms.cern.ch/cmsset_default.csh
setenv CMSSW_GIT_REFERENCE /cvmfs/cms.cern.ch/cmssw.git.daily

Actually you should edit your ~/.tcshrc file (or ~/.bash_profile if bash is your default shell), create it if you do not have one, to include the above commands so that they are automatically executed after login and you do not have to execute them manually each time you log into the cluster.

For the following exercises, or generally when you start working with larger scripts, code repositories, configuration files and possibly larger input and output files, it is a good idea NOT to do this inside your lxplus home directory, but in an area with more disk space. We won’t stop you if you wish to use your afs user space, but have in mind that you might face a “disk quota full” problem at some point in time. An alternative can for example, on CERN lxplus, arises in the fact that every user has an eos user home directory of the form /eos/user/z/zorro (for a user named Zorro) that can be used for “heavier” projects.

Now let us proceed with the creation of a working area (called YOURWORKINGAREA in the following):

cd /eos/user/<first-letter-of-username>/<username>
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using Bash shell
export SCRAM_ARCH=slc7_amd64_gcc700
### Alternatively, If you are using the default tcsh shell (or csh shell)
setenv SCRAM_ARCH slc7_amd64_gcc700
### Then, in both cases:
cmsrel CMSSW_10_6_18
cd CMSSW_10_6_18/src
cmsenv

To be able to check out specific CMSSW packages from GitHub, you will need to configure your local account. You only have to do this command once for any given cluster you are working on, such as lxplus:

git config --global user.name "[Name]"
git config --global user.email [Email]
git config --global user.github [Account]

Note

To see your current git configuration you can use the following command:

git config --global -l

More information will be given in the fifth set of pre-exercises.

Now you can initialize the CMSSW area as a local git repository:

git cms-init

This last command will take some time to execute and will produce some long output, be patient.

When you get the prompt again, run the following command:

echo $CMSSW_BASE

Question 3

Paste the result of executing the above command (echo $CMSSW_BASE) in the form provided.

Note

The directory (on lxplus) /eos/user/<initial>/<username>/CMSSW_10_6_18/src is referred to as your WORKING DIRECTORY.

Every time you log out or exit a session you will need to setup your environment in your working directory again. To do so, once you have executed the steps above for the first time (assuming you have added the source /cvmfs/cms.cern.ch/cmsset_default.(c)sh in your ~/.tcshrc or ~/.bash_profile file), you can simply do:

cd /eos/user/<initial>/<username>/CMSSW_10_6_18/src
cmsenv

And you are ready to go!

Exercise 4 - Find data in the Data Aggregation Service (DAS)

In this exercise we will locate the MC dataset RelValZMM and the collision dataset /DoubleMuon/Run2018A-12Nov2019_UL2018-v2/MINIAOD using the Data Aggregation Service (not to be confused with the Data Analysis School in which you are partaking!).

Go to the DAS webpage. You will be asked for your Grid certificate, which you should have loaded into your browser by now. Also note that there may be a security warning message, which you will need to ignore and still load the page. From there, enter the following into the space provided:

dataset release=CMSSW_10_6_14 dataset=/RelValZMM*/*CMSSW_10_6_14*/MINIAOD*

This will search for datasets, processed with release CMSSW_10_6_14, which is named like /RelValZMM*/*CMSSW_10_6_14*/MINIAOD*. The syntax for searches is found here, with many useful common search patterns under “CMS Queries”.

For this query, several results should be displayed (you may be queried for security exceptions in the process). Select (click) on the dataset name /RelValZMM_13/CMSSW_10_6_14-106X_mc2017_realistic_v7-v1/MINIAODSIM and after a few seconds another page will appear.

Question 4.1a

What is the size of this (/RelValZMM_13/CMSSW_10_6_14-106X_mc2017_realistic_v7-v1/MINIAODSIM)dataset in MB? Make sure your answer is only numerical (no units)

Question 4.1b

Click on “Sites” to get a list of sites hosting this data. Is this data available at FNAL or DESY?

Back in the main dataset page, click on the “Files” link to get a list of the ROOT files in our selected dataset. One of the files contained in the dataset should look like this:

/store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root

If you want to know the name of the dataset from the name of a file, one can go to DAS and type:

dataset file=/store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root

and hit “Enter”.

Now we will locate a fresh 2023 collisions dataset using the keyword search, which is often convenient if you know the dataset you are looking for. In this example, the dataset that we are looking for is the “MuonEG” dataset (which contains events with a muon plus an electron or photon).

In DAS type:

dataset=/MuonEG/*Run2023A*/MINIAOD*

and hit “Enter”.

Question 4.2

What release was the dataset /MuonEG/Run2023A-PromptReco-v2/MINIAOD collected in?

Note: If you see more than one release, just answer with a single release.

Having set your CMSSW environment one can also search for the dataset /MuonEG/Run2023A-PromptReco-v2/MINIAOD by invoking the DAS command in your WORKING DIRECTORY. The DAS command dasgoclient is in the path for CMSSW_9_X_Y versions and above, so you do not need to download anything additional. More about dasgoclient can be found here.

First, we need to initialize the Grid proxy:

voms-proxy-init --valid 192:00 --voms cms

You will be asked for your grid certificate passphrase. Then you can execute the query with:

dasgoclient --query="dataset=/MuonEG/Run2023A-PromptReco-v2/MINIAOD" --format=plain

You will see something like:

/MuonEG/Run2023A-PromptReco-v2/MINIAOD

More information about accessing data in the Data Aggregation Service can be found in WorkBookDataSamples.

Exercise 5 - Event Data Model (EDM) standalone utilities

The overall collection of CMS software, referred to as CMSSW, is built around a framework, an Event Data Model (EDM), and services needed by the simulation, calibration and alignment, and reconstruction modules that process event data so that physicists can perform analysis. The primary goal of the Framework and EDM is to facilitate the development and deployment of reconstruction and analysis software. The EDM is centered around the concept of an Event. An Event is a C++ object container for all RAW and reconstructed data related to a particular recorded collision. To understand what is in a data file and more, several EDM utilities are available. In this exercise, one will use three of these EDM utilities. They will be very useful at CMSDAS and after. More about these EDM utilities can be found at WorkBookEdmUtilities. These together with the GitHub web interface for CMSSW and the CMS LXR Cross Referencer are very useful to understand and write CMS code.

AAA and xrootd

Since the various datasets listed in CMSDAS and needed for data analysis may be stored on different grid sites around the world, CMS has implemented a service known as Any Data, Anytime, Anywhere (AAA), which is an implementation of a more generic xrootd service. It allows analysis of CMS data located at any grid site with bare ROOT or the CMSSW/FWLite environment, without downloading it to your local storage space.

The AAA service works via so-called redirectors, which are intermediate servers that automatically find the physical location of the given file and transmit it to you. Which redirector you use depends on your region, to minimize the distance over which the data must travel and thus minimize the reading latency. These “regional” redirectors will try file locations in your region first before trying to go overseas.

If you are working in the US, it is best to use the redirector cmsxrootd.fnal.gov, while in Europe and Asia, it is best to use xrootd-cms.infn.it. There is also a “global redirector” at cms-xrd-global.cern.ch which will query all locations.

In the examples below, cms-xrd-global.cern.ch is always used, but feel free to replace that with a choice more appropriate for your region.

To open a file from the MuonEG 2023A file (stored at CERN), with ROOT:

root -l
TFile *f =TFile::Open("root://cms-xrd-global.cern.ch///store/data/Run2023A/MuonEG/MINIAOD/PromptReco-v2/000/366/323/00000/f2b1462f-6d41-4b11-b8e3-7624af2e29bf.root");

If this works correctly, you should see a long list of warnings about missing dictionaries, such as:

Warning in <TClass::Init>: no dictionary for class pat::TauJetCorrFactors is available```
{: .output}

Soon we will learn how to properly deal with the NanoAOD file format. Similarly, you can open the `RelValZMM_13` file that we previously located at FNAL:

```shell
TFile *f =TFile::Open("root://cms-xrd-global.cern.ch///store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root");

You can quit the ROOT command line with:

.q

edmDumpEventContent

Next we will use edmDumpEventContent to dump a summary of the products that are contained within the file we’re interested in. We will be able to see what class names etc. to use in order to access the objects in the MiniAOD file.

If you want to look at a specific object (say, slimmedMuons), then execute:

edmDumpEventContent --all --regex slimmedMuons root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root

This will return:

Type                                  Module           Label     Process        Full Name
-----------------------------------------------------------------------------------------
edm::RangeMap<CSCDetId,edm::OwnVector<CSCSegment,edm::ClonePolicy<CSCSegment> >,edm::ClonePolicy<CSCSegment> >    "slimmedMuons"   ""        "RECO"         CSCDetIdCSCSegmentsOwnedRangeMap_slimmedMuons__RECO
edm::RangeMap<DTChamberId,edm::OwnVector<DTRecSegment4D,edm::ClonePolicy<DTRecSegment4D> >,edm::ClonePolicy<DTRecSegment4D> >    "slimmedMuons"   ""        "RECO"         DTChamberIdDTRecSegment4DsOwnedRangeMap_slimmedMuons__RECO
vector<pat::Muon>                     "slimmedMuons"   ""        "RECO"         patMuons_slimmedMuons__RECO

The output of edmDumpEventContent has information divided into four variable width columns. The first column is the C++ class type of the data, the second is the module label, the third is the product instance label, and the fourth is the process name. More information is available at Identifying Data in the Event.

Instead of the above, let us try without the option --regex slimmedMuons. This will dump the entire event content - a file with many lines. For this reason we’ll send the output to a file called EdmDumpEventContent.txt with a UNIX output redirection command (then you can inspect the file with your favorite editor or with less EdmDumpEventContent.txt:

edmDumpEventContent root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root > EdmDumpEventContent.txt

Question 5.1a

How many modules produce products of type vector in this particular MiniAOD file?

Note: We mean a plain std::vector, not a BXVector or any other type.

Question 5.1b

What are the names of (any) three of the modules that produce products of type vector?

edmProvDump

To aid in understanding the full history of an analysis, the framework accumulates provenance for all data stored in the standard ROOT output files. Using the command edmProvDump one can print out all the tracked parameters used to create the data file. For example, one can see which modules were run and the CMSSW version used to make the MiniAOD file. In executing the command below it is important to follow the instructions carefully, otherwise a large number of warning messages may appear. The ROOT warning messages can be ignored.

To do this on lxplus execute:

edmProvDump root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root > EdmProvDump.txt

Note

EdmProvDump.txt is a very large file of the order of 40000-60000 lines. Open and look at this file and locate Processing History (about 20-40 lines from the top).

Question 5.2

Which version of CMSSW was used to produce the MiniAOD file? The answer will take the form CMSSW_X_Y_Z, where you will need to fill in the X, Y, and Z with the correct numerical values.

edmEventSize

Finally we will execute edmEventSize to determine the size of different branches in the data file. Further details about this utility may be found at SWGuideEdmEventSize. edmEventSize isn’t actually a ‘Core’ helper function (anyone can slap ‘edm’ on the front of a program in CMSSW).

At lxplus execute the following command:

edmEventSize -v `edmFileUtil -d root://cmsxrootd-site.fnal.gov//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root` > EdmEventSize.txt

Question 5.3

What is the number of events processed (contained in this file) if you execute the edmEventSize command at lxplus?

Open and look at file EdmEventSize.txt and locate the line containing the text patJets_slimmedJetsPuppi__RECO. There are two numbers following this text that measure the plain and the compressed size of this branch.

Question 5.4

What are the plain and compressed size numbers for the branch patJets_slimmedJetsPuppi__RECO in this file?

Exercise 6 - Get familiar with the MiniAOD format

Analyzing physics data at CMS is a very complicated task involving multiple steps, sharing of expertise, cross checks, and comparing different analysis. To maximize physics productivity, CMS developed a high-level data tier MiniAOD in 2014 to serve the needs of mainstream physics analyses while keeping a small event size (30-50 kb/event), with easy access to the algorithms developed by Physics Objects Groups (POGs) in the framework of the CMSSW offline software. The production of MiniAODs is done centrally for common samples. MiniAOD samples are commonly used for Run-2 physics analyses. More information about MiniAOD can be found in WorkBookMiniAOD.

Note

A new, even more compact data tier called NanoAOD has been developed more recently. The goal of this tier is to centralize the ntuple production of ~50% of analyses and to keep the event size below 2kb/event. This pre-exercise will not cover the use of NanoAOD, but you will get familiar with it during the school week. More information can be found at WorkBookNanoAOD.

The main contents of the MiniAOD are:

Please note that the files used in the following are from older releases, but they still illustrate the points they intended to. To avoid the fact that RelVal files (produced to validate new release in the rapid CMSSW development cycle) become unavailable on a short (month) timescale, a small set of files have been copied to the CERN EOS storage. They are available at root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/.

The Z to dimoun MC file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root is made in CMSSW_7_3_0_pre1 release and the datafile root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root made from the collisions dataskim /DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD.

In your working directory, open the root file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root. Begin by opening ROOT:

root -l

## Note If you already have a custom .rootrc or .rootlogon.C, you can start ROOT without them by using the command root -l -n.

On the ROOT prompt, type (or copy-paste) the following:

gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);

TFile *theFile = TFile::Open("root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root");

TBrowser b;

Note

The TBrowser is a graphical ROOT file browser. It runs on the computer, where you started ROOT. Its graphical interface needs to be forwarded to your computer. This can be very slow. You either need a lot of patience, a good connection or you can try to run ROOT locally, copying the root files that are to be inspected. Since everyone is running a different operating system on their local computer, we do not support the setup of ROOT on your local computer. However, instructions exist on the official ROOT website.

Note

You can start the ROOT interpreter and open the file in a single step by doing:

root -l <filename>

This may have some issues when using the xrootd redirector, here we are avoiding that by directly addressing the file at FNAL.

To be able to use the member functions of a CMSSW data class from within ROOT, a ‘dictionary’ for that class needs to be available to ROOT. To obtain that dictionary, it is necessary to load the proper library into ROOT. The first three lines of the code above do exactly that. More information is at WorkBookFWLiteExamples. Note that gROOT->SetStyle ("Plain"); sets a plain white background for all the plots in ROOT.

Note

If the rootlogon.C is created in the home area, and the above five lines of code (fifth line is gStyle) are in that file, the dictionary will be obtained, and all the plots will have a white background automatically upon logging in to ROOT.

Now a ROOT browser window opens and looks like this (“Root Files” may or may not be selected):

TBrowser starting view

In this window click on ROOT Files on the left menu and now the window looks like this:

TBrowser 'ROOT Files' view

Double-click on the ROOT file you opened: root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root, then Events, then scroll down and click patMuons_slimmedMuons__PAT (or the little + that appears next to it), and then patMuons_slimmedMuons__PAT.obj. A window appears that looks like this:

TBrowser slimmedMuons view

Scroll a long way down the file (not too fast) and click on pt(). A PAT Muon Pt distribution will appear. These muons have been produced in the Z to mumu interactions as the name of the data sample implies.

TBrowser slimmedMuons pt() view

Question 6.1

What is the mean value of the muon pt for this file (root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root`)?

Note

To exit ROOT simply type .q in the command line.

Now open the data file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root. Similarly run the following command, and answer the following question:

root -l

On the ROOT prompt type the following:

gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);

TFile *theFile = TFile::Open("root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root");

TBrowser b;

Question 6.2

What is the mean value of the muon pt for the collision data (current file)?

Remember

Be sure to submit your answers to the Google Form first set, then proceed to the second set.

Helpful Hint

Rather than using the TBrowser, you can perform the drawing action using ROOT interpreter. An example is shown below:

root -l root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root
Events->Draw("patMuons_slimmedMuons__PAT.obj.pt()")

Key Points

  • Setting up CMSSW requires some environment setup and the cmsrel command.

  • You can use the web portal for DAS or the dasgoclient to find information about a given dataset.

  • There are several utilities for gaining insight into EDM ROOT files.


CMS Data Analysis School Pre-Exercises - Second Set

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • How to slim a MiniAOD file?

  • How to know the size of a MiniAOD file?

  • How to use FWLite to analyze data and MC?

Objectives
  • Learn how to reduce the size of a MiniAOD by only keeping physics objects of interest.

  • Learn how to determine the size of a MiniAOD file using EDM standalone utilities

  • Learn to use FWLite to perform simple analysis.

Introduction

Welcome to the second set of CMSDAS pre-exercises. As you know by now, the purpose of the pre-workshop exercises is for prospective workshop attendees to become familiar with the basic software tools required to perform physics analysis at CMS before the workshop begins. Post the answers in the online response form available from the course web area:

Indico page

CMSDAS pre-exercises indico page

The Second Set of exercises begins with Exercise 7 . We will use Collision data events and simulated events (Monte Carlo (MC)). To comfortably work with these files, we will first make them smaller by selecting only the objects that we are interested in (electrons and muons in our case)

The collision data events are stored in DoubleMuon.root. DoubleMuon refers here to the fact, that when recording these events, we believed that there are two muons in the event. This is true most of the time, but other objects can fake muons, hence at closer inspection we might find events that actually don’t have two muons.

The MC file is called DYJetsToLL. You will need to get used to cryptic names like this if you want to survive in the high energy physics environment! The MC file contains Drell Yan events, that decay to two leptons and that might be accompanied by one or several jets.

Exercises 8 and Exercise 9 are using FWLite (Frame Work Lite). This is an interactive analysis tool integrated with the CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the shared libraries defining CMSSW data formats and the tools provided, to easily access parts of the event in the EDM format within ROOT interactive sessions. It reads produced ROOT files, has full access to the class methods and there is no need to write full-blown framework modules. Thus having FWLite distribution locally on the desktop one can do CMS analysis outside the full CMSSW framework. In these two exercises, we will analyze the data stored in a MiniAOD sample using FWLite. We will loop over muons and make a Z mass peak.

We assume that having done the first set of pre-exercises by now, one is comfortable with logging onto cmslpc-sl7.fnal.gov and setting up the cms environment.

Exercise 7 - Slim MiniAOD sample to reduce its size by keeping only Muon and Electron branches

In order to reduce the size of the MiniAOD we would like to keep only the slimmedMuons and slimmedElectrons objects and drop all others. The config files should now look like slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py. To work with this config file and make the slim MiniAOD, execute the following steps in the directory YOURWORKINGAREA/CMSSW_10_6_18/src

Cut and paste the script slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py in its entirety and save it with the same name. Open with your favorite editor and take a look at these python files. The number of events has been set to 1000:

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1000) )

To run over all events in the sample, one can change it to -1.

Now run the following command:

cmsRun slimMiniAOD_MC_MuEle_cfg.py

This produces an output file called slimMiniAOD_MC_MuEle.root in your $CMSSW_BASE/src area.

Now run the following command:

cmsRun slimMiniAOD_data_MuEle_cfg.py

This produces an output file called slimMiniAOD_data_MuEle.root in your $CMSSW_BASE/src area.

On opening these two MiniAODs one observes that only the slimmedMuons and the slimmedElectrons objects are retained as intended.

To find the size of your MiniAOD execute following Linux command:

ls -lh slimMiniAOD_MC_MuEle.root

and

ls -lh slimMiniAOD_data_MuEle.root

You may also try the following:

To know the size of each branch, use the edmEventSize utility as follows (also explained in First Set of Exercises):

 edmEventSize -v slimMiniAOD_MC_MuEle.root

and

 edmEventSize -v slimMiniAOD_data_MuEle.root

To see what objects there are, open the ROOT file as follows and browse to the MiniAOD samples as you did in Exercise 6:

Here is how you do it for the output file slimMiniAOD_MC_MuEle.root

root -l slimMiniAOD_MC_MuEle.root;
TBrowser b;

OR

root -l
TFile *theFile = TFile::Open("slimMiniAOD_MC_MuEle.root");
TBrowser b;

To quit ROOT application, execute:

.q

Remember

For CMSDAS@CERN2023 please submit your answers at the Google Form second set.

Question 7.1a

What is the size of the MiniAOD slimMiniAOD_MC_MuEle.rootin MB? Make sure your answer is only numerical (no units).

Question 7.1b

What is the size of the MiniAOD slimMiniAOD_data_MuEle.rootin MB? Make sure your answer is only numerical (no units).

Question 7.2a

What is the mean eta of the muons for MC?

Question 7.2b

What is the mean eta of the muons for data?

Question 7.3a

What is the size of the slimmed output file compared to the original sample?

Compare one of your slimmed output files to the original MiniAOD file it came from. To find sizes of the files in EOS, you can use e.g., edmFileUtil -l root://cms-xrd-global.cern.ch///store/user/filepath/filename.root with the appropriate path and filename.

Question 7.3b

Is the mean eta of muons for MC and data the same as in the MC and data samples in Exercise 6?

Exercise 8 - Use FWLite on the MiniAOD created in Exercise 7 and make a Z Peak (applying pt and eta cuts)

FWLite (pronounced “framework-light”) is basically a ROOT session with CMS data format libraries loaded. CMS uses ROOT to persistify data objects. CMS data formats are thus “ROOT-aware”; that is, once the shared libraries containing the ROOT-friendly description of CMS data formats are loaded into a ROOT session, these objects can be accessed and used directly from within ROOT like any other ROOT class!

In addition, CMS provides a couple of classes that greatly simplify the access to the collections of CMS data objects. Moreover, these classes (Event and Handle) have the same name as analogous ones in the Full Framework; this mnemonic trick helps in making the code to access CMS collections very similar between the FWLite and the Full Framework.

In this exercise we will make a ZPeak using our data and MC sample. We will use the corresponding slim MiniAOD created in Exercise 7. To read more about FWLite, have a look at Section 3.5 of Chapter 3 of the WorkBook.

We will first make a ZPeak. We will loop over the slimmedMuons in the MiniAOD and get the mass of oppositely charged muons. These are filled in a histogram that is written to an output ROOT file.

First make sure that you have the MiniAODs created in Exercise 7. They should be called slimMiniAOD_MC_MuEle.root and slimMiniAOD_data_MuEle.root.

Go to the src area of current CMSSW release

cd $CMSSW_BASE/src

The environment variable CMSSW_BASE will point to the base area of current CMSSW release.

Check out a package from GitHub.

Make sure that you get github setup properly as in obtain a GitHub account. It’s particularly important to set up ssh keys so that you can check out code without problems: https://help.github.com/articles/generating-ssh-keys

To check out the package, run:

git cms-addpkg PhysicsTools/FWLite

Then to compile the packages, do

scram b
cmsenv

Note

You can try scram b -j 4 to speed up the compiling. Here -j 4 will compile with 4 cores. When occupying several cores to compile, you will also make the interactive machine slower for others, since you are using more resources. Use with care!

Note 2

It is necessary to call cmsenv again after compiling this package because it adds executables in the $CMSSW_BASE/bin area.

To make a Z peak, we will use the FWLite executable called FWLiteHistograms. The corresponding code should be in $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc

With this executable we will use the command line options. More about these can be learned from SWGuideCommandLineParsing.

To make a ZPeak from this executable, using the MC MiniAOD, run the following command (which will not work out of the box, see below):

FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100

You can see that you will get the following error

terminate called after throwing an instance of 'cms::Exception'
  what():  An exception of category 'ProductNotFound' occurred.
Exception Message:
getByLabel: Found zero products matching all criteria
Looking for type: edm::Wrapper<std::vector<reco::Muon> >
Looking for module label: muons
Looking for productInstanceName:

The data is registered in the file but is not available for this event

This error occurs because your input files slimMiniAOD_MC_MuEle.root is a MiniAOD and does not contain reco::Muon whose label is muons. It contains, however, slimmedMuons (check yourself by opening the root file with ROOT browser). However, in the code FWLiteHistograms.cc there are lines that say:

using reco::Muon;

and

event.getByLabel(std::string("muons"), muons);

This means you need to change reco::Muon to pat::Muon, and muons to slimmedMuons.

To implement these changes, open the code $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc. In this code, look at the line that says:

using reco::Muon;

and change it to

using pat::Muon;

and in this:

event.getByLabel(std::string("muons"), muons);

and change it to:

event.getByLabel(std::string("slimmedMuons"), muons);

Now you need to re-compile:

scram b

Now again run the executable as follows:

FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100

You can see that now it runs successfully and you get a ROOT file with a histogram called ZPeak_MC.root. Open this ROOT file and see the Z mass peak histogram called mumuMass. Answer the following question.

Question 8.1a

What is mean mass of the ZPeak for your MC MiniAOD?

Question 8.1b

How can you increase statistics in your ZPeak histogram?

Now a little bit about the command that you executed.

In the command above, it is obvious that slimMiniAOD_MC_MuEle.root is the input file, ZPeak_MC.root is output file. maxEvents is the events you want to run over. You can change it any other number. The option -1 means running over all the events which is 1000 in this case. outputEvery means after how any events should the code report the number of event being processed. As you may have noticed, as you specified, when your executable runs, it says processing event: after every 100 events.

If you look at the code FWLiteHistograms.cc , it also contains the defaults corresponding to the above command line options. Answer the following question:

Question 8.2

What is the default name of the output file?

Exercise 9 - Re-run the above executable with the data MiniAOD

Re-run the above executable with the data MiniAOD file called slimMiniAOD_data_MuEle.root as follows:

FWLiteHistograms inputFiles=slimMiniAOD_data_MuEle.root outputFile=ZPeak_data.root maxEvents=-1 outputEvery=100

This will create an output histogram ROOT file called ZPeak_data.root

Then answer the following question.

Question 9a

What is mean mass of the ZPeak for your data MiniAOD?

Question 9b

How can you increase statistics in your ZPeak histogram?

Key Points

  • A MiniAOD file can be slimmed by just retaining physics objects of interest.

  • EDM standalone utilities can be used to determine the size of MiniAOD files.

  • FWLite is a useful tool to perform simple analysis on a MiniAOD file.


CMS Data Analysis School Pre-Exercises - Third Set

Overview

Teaching: 0 min
Exercises: 240 min
Questions
  • How do I do an analysis with so much data that I cannot run it interactively on my computer?

  • What is CRAB? How do I use it to run an analysis on the grid?

  • How do configuration files look like?

  • How do I extract the luminosity of the dataset I analyzed?

Objectives
  • Become familiar with the basic Grid tools used in CMS for user analysis

  • Learn about grid certificate usage

  • Know what CRAB is and how to use it for your analysis

  • Know how to use BRILcalc to extract luminosities

Introduction

This is the third set of CMSDAS exercises. The purpose of these exercises are for the workshop attendees to become familiar with the basic Grid tools used in CMS for user analysis. Please run and complete each of these exercises. However, unlike the previous sets of exercises, this set will take considerably longer. Having your storage space set up may take several days, Grid jobs run with some latency, and there can be problems. You should set aside about a week to complete these five exercises. The actual effort required is not the whole week but a few hours (more than the previous two sets). If, at any time problems are encountered with the exercise please e-mail cmsdas-cern-organizers@cern.ch with a detailed description of your problem. For CRAB questions unrelated to passing these exercises, to send feedback and ask for support in case of CRAB related problems, please consult the CRAB troubleshooting twiki. All CRAB users should subscribe to the very useful hn-cms-computing-tools@cern.ch hypernews forum.

Note

This section assumes that you have access to lxplus at CERN. Learn more about lxplus here and the lxplus knowledge guide.

Later on, you can check with your university contact for Tier 2 or Tier 3 storage area. Once you are granted the write permission to the specified site, for later analysis you can use CRAB as the below exercise but store the output to your Tier 2 or Tier 3 storage area.

AGAIN: To perform this set of exercises, lxplus access, Grid Certificate, and CMS VO membership are required. You should already have these things, but if not, follow these instructions from the first set of exercises.

Question

Questions for each exercise are in boxes such as this.
For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN Google Form third set.

Support

There is a dedicated Mattermost team, called CMSDAS@CERN 2023, setup to facilitate communication and discussions via live chat (which is also archived). You will need your CERN login credentials (SSO) and you will need to join the private CMSDAS@CERN 2023 team in order to be able to see (or find using the search channels functionality) the channels setup for communications related to the school. The sign-up link is here and the Pre-exercises channel can be found here.

Exercise 10 - Verify your grid certificate is OK

This exercise depends on obtaining a grid certificate and VOMS membership, but does not depend on any previous exercises. After you’ve installed your grid certificate, you need to verify it has all the information needed.

Login to lxplus.cern.ch and initialize your proxy:

voms-proxy-init -voms cms

Then run the following command:

voms-proxy-info -all | grep -Ei "role|subject"

The response should look like this:

subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vmilosev/CN=757854/CN=Vukasin Milosevic/CN=40175424
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vmilosev/CN=757854/CN=Vukasin Milosevic
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/country/Role=NULL/Capability=NULL
attribute : /cms/country/ch/Role=NULL/Capability=NULL

If you do not have the first attribute line listed above, you have not completed the VO registration above and you must complete it before continuing.

Question 10

Copy the output corresponding to the text in the output box above.
For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN 2023 Google Form third set.

Exercise 11 - Obtain a /store/user area and setup CRAB

Obtain a /store/user area

This exercise depends on successfully completing Exercise 10. Completion of this exercise requires a users to have /store/user/YourCERNUserName in Tier2 or Tier3 site. (ex, eos area at lxplus). and a user should get this automatically once they have a lxplus account.

CRAB Introduction

In this exercise, you will learn an important tool CRAB, which is used in all the data analysis at CMS. CRAB (CMS Remote Analysis Builder) is a utility to submit CMSSW jobs to distributed computing resources. By using CRAB you will be able to access CMS data and Monte-Carlo which are distributed to CMS aligned centres worldwide and exploit the CPU and storage resources at CMS aligned centres. You will also test your grid certificate and your cms EOS storage element which will be useful during CMSDAS@CERN2023.

Help or questions about CRAB: Follow the FAQ to get help with CRAB.

The most recent CRAB3 tutorial is always in the WorkBook under WorkBookCRABTutorial. This tutorial provides complete instructions for beginner and expert user to use CRAB in their studies. We strongly recommend you to learn the CRAB tutorial after you finish these exercises. In this exercise, you will use CRAB to generate a MC sample yourself and publish it to the DAS.

Setup CRAB

In this exercise, we will use CMSSW_10_6_18.

You can follow the same instructions from Exercise 3. The instructions are reproduced here:

cd ~/YOURWORKINGAREA

export SCRAM_ARCH=slc7_amd64_gcc700
### If you are using the default tcsh shell (or csh shell)
setenv SCRAM_ARCH slc7_amd64_gcc700
###

cmsrel CMSSW_10_6_18
cd CMSSW_10_6_18/src
cmsenv
git cms-init

After setting up the CMSSW environment via cmsenv, you’ll have access to the latest version of CRAB. It is possible to use CRAB from any directory after setup. One can check that the crab command is indeed available and the version being used by executing:

which crab
/cvmfs/cms.cern.ch/common/crab

or

crab --version
CRAB client v3.230404

The /store/user area is commonly used for output storage from CRAB. When you complete Exercise 11, you can follow these instructions to make sure you can read from and write to your space using CRAB command.

Initialize your proxy:

voms-proxy-init -voms cms

Check if you can write to the /store/user/ area. The crab checkwrite command can be used by a user to check if he/she has write permission in a given CERN eos directory path (by default /store/user/<HN-username>/) in a given site. The syntax to be used is:

crab checkwrite --site= <site-name>

For example:

crab checkwrite --site=T3_CH_CERNBOX

The output should look like this:

Show/Hide

Will check write permission in the default location /store/user/<username>
Validating LFN /store/user/vmilosev...
LFN /store/user/vmilosev is valid.
Will use `gfal-copy`, `gfal-rm` commands for checking write permissions
Will check write permission in /store/user/vmilosev on site T3_CH_CERNBOX
Will use PFN: davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp

Attempting to create (dummy) directory crab3checkwrite_20230421_105013 and copy (dummy) file crab3checkwrite_20230421_105013.tmp to /store/user/vmilosev

Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-copy -p -v -t 180 file:///afs/cern.ch/user/v/vmilosev/Test_CMSDAS_Crab/CMSSW_10_6_18/src/crab3checkwrite_20230421_105013.tmp 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp'
Please wait...

Successfully created directory crab3checkwrite_20230421_105013 and copied file crab3checkwrite_20230421_105013.tmp to /store/user/vmilosev

Attempting to delete file davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp

Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-rm -v -t 180 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp'
Please wait...

Successfully deleted file davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp

Attempting to delete directory davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/

Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-rm -r -v -t 180 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/'
Please wait...

Successfully deleted directory davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/

Checkwrite Result:
Success: Able to write in /store/user/vmilosev on site T3_CH_CERNBOX

Choosing the T3_CH_CERNBOX “site” allows you to have the option of outputing crab jobs to your EOS area, providing you with an easy way to access produced files. However this does not allow for publishing of produced samples as CERNBOX is NOT a CMS storage, and files in there can not be listed in DBS. For more details about crab output options, visit the following link.

Question 11

What is the name of your directory name in eos?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.

Exercise 12 - Generate (and publish) a minimum bias dataset with CRAB

CMSSW configuration file to generate MC events

In this section we provide an example of a CMSSW parameter-set configuration file to generate minimum bias events with the Pythia MC generator. We call it CMSDAS_MC_generation.py. Using CRAB to generate MC events requires some special settings in the CRAB configuration file, as we will show later.

We use the cmsDriver tool to generate our configuration file:

cmsDriver.py MinBias_13TeV_pythia8_TuneCUETP8M1_cfi  --conditions auto:run2_mc -n 10 --era Run2_2018 --eventcontent FEVTDEBUG --relval 100000,300 -s GEN,SIM --datatier GEN-SIM --beamspot Realistic25ns13TeVEarly2018Collision --fileout file:step1.root --no_exec --python_filename CMSDAS_MC_generation.py

If successful, cmsDriver will return the following

We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
Step: GEN Spec:
Loading generator fragment from Configuration.Generator.MinBias_13TeV_pythia8_TuneCUETP8M1_cfi
Step: SIM Spec:
Step: ENDJOB Spec:
Config file CMSDAS_MC_generation.py created

Feel free to investigate (look at) the newly outputted CMSDAS_MC_generation.py.

Generating MC events locally

We want to test this Configuration file locally for a small number of events before we submit to CRAB for massive generation. To test this file, we can run

cmsRun CMSDAS_MC_generation.py

This MC generation code will then produce an EDM output file called step1.root with the content of a GEN-SIM data tier for 10 generated events.

Show/Hide

*------------------------------------------------------------------------------------* 
|                                                                                    | 
|  *------------------------------------------------------------------------------*  | 
|  |                                                                              |  | 
|  |                                                                              |  | 
|  |   PPP   Y   Y  TTTTT  H   H  III    A      Welcome to the Lund Monte Carlo!  |  | 
|  |   P  P   Y Y     T    H   H   I    A A     This is PYTHIA version 8.240      |  | 
|  |   PPP     Y      T    HHHHH   I   AAAAA    Last date of change: 20 Dec 2018  |  | 
|  |   P       Y      T    H   H   I   A   A                                      |  | 
|  |   P       Y      T    H   H  III  A   A    Now is 21 Apr 2023 at 11:32:03    |  | 
|  |                                                                              |  | 
|  |   Christian Bierlich;  Department of Astronomy and Theoretical Physics,      |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: christian.bierlich@thep.lu.se                                   |  | 
|  |   Nishita Desai;  Department of Theoretical Physics, Tata Institute,         |  | 
|  |      Homi Bhabha Road, Mumbai 400005, India;                                 |  | 
|  |      e-mail: desai@theory.tifr.res.in                                        |  | 
|  |   Ilkka Helenius;  Department of Physics, University of Jyvaskyla,           |  | 
|  |      P.O. Box 35, FI-40014 University of Jyvaskyla, Finland;                 |  | 
|  |      e-mail: ilkka.m.helenius@jyu.fi                                         |  | 
|  |   Philip Ilten;  School of Physics and Astronomy,                            |  | 
|  |      University of Birmingham, Birmingham, B152 2TT, UK;                     |  | 
|  |      e-mail: philten@cern.ch                                                 |  | 
|  |   Leif Lonnblad;  Department of Astronomy and Theoretical Physics,           |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: leif.lonnblad@thep.lu.se                                        |  | 
|  |   Stephen Mrenna;  Computing Division, Simulations Group,                    |  | 
|  |      Fermi National Accelerator Laboratory, MS 234, Batavia, IL 60510, USA;  |  | 
|  |      e-mail: mrenna@fnal.gov                                                 |  | 
|  |   Stefan Prestel;  Department of Astronomy and Theoretical Physics,          |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: stefan.prestel@thep.lu.se                                       |  | 
|  |   Christine O. Rasmussen;  Department of Astronomy and Theoretical Physics,  |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: christine.rasmussen@thep.lu.se                                  |  | 
|  |   Torbjorn Sjostrand;  Department of Astronomy and Theoretical Physics,      |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: torbjorn@thep.lu.se                                             |  | 
|  |   Peter Skands;  School of Physics,                                          |  | 
|  |      Monash University, PO Box 27, 3800 Melbourne, Australia;                |  | 
|  |      e-mail: peter.skands@monash.edu                                         |  | 
|  |                                                                              |  | 
|  |   The main program reference is 'An Introduction to PYTHIA 8.2',             |  | 
|  |   T. Sjostrand et al, Comput. Phys. Commun. 191 (2015) 159                   |  | 
|  |   [arXiv:1410.3012 [hep-ph]]                                                 |  | 
|  |                                                                              |  | 
|  |   The main physics reference is the 'PYTHIA 6.4 Physics and Manual',         |  | 
|  |   T. Sjostrand, S. Mrenna and P. Skands, JHEP05 (2006) 026 [hep-ph/0603175]  |  | 
|  |                                                                              |  | 
|  |   An archive of program versions and documentation is found on the web:      |  | 
|  |   http://www.thep.lu.se/Pythia                                               |  | 
|  |                                                                              |  | 
|  |   This program is released under the GNU General Public Licence version 2.   |  | 
|  |   Please respect the MCnet Guidelines for Event Generator Authors and Users. |  | 
|  |                                                                              |  | 
|  |   Disclaimer: this program comes without any guarantees.                     |  | 
|  |   Beware of errors and use common sense when interpreting results.           |  | 
|  |                                                                              |  | 
|  |   Copyright (C) 2018 Torbjorn Sjostrand                                      |  | 
|  |                                                                              |  | 
|  |                                                                              |  | 
|  *------------------------------------------------------------------------------*  | 
|                                                                                    | 
*------------------------------------------------------------------------------------* 


*------------------------------------------------------------------------------------* 
|                                                                                    | 
|  *------------------------------------------------------------------------------*  | 
|  |                                                                              |  | 
|  |                                                                              |  | 
|  |   PPP   Y   Y  TTTTT  H   H  III    A      Welcome to the Lund Monte Carlo!  |  | 
|  |   P  P   Y Y     T    H   H   I    A A     This is PYTHIA version 8.240      |  | 
|  |   PPP     Y      T    HHHHH   I   AAAAA    Last date of change: 20 Dec 2018  |  | 
|  |   P       Y      T    H   H   I   A   A                                      |  | 
|  |   P       Y      T    H   H  III  A   A    Now is 21 Apr 2023 at 11:32:03    |  | 
|  |                                                                              |  | 
|  |   Christian Bierlich;  Department of Astronomy and Theoretical Physics,      |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: christian.bierlich@thep.lu.se                                   |  | 
|  |   Nishita Desai;  Department of Theoretical Physics, Tata Institute,         |  | 
|  |      Homi Bhabha Road, Mumbai 400005, India;                                 |  | 
|  |      e-mail: desai@theory.tifr.res.in                                        |  | 
|  |   Ilkka Helenius;  Department of Physics, University of Jyvaskyla,           |  | 
|  |      P.O. Box 35, FI-40014 University of Jyvaskyla, Finland;                 |  | 
|  |      e-mail: ilkka.m.helenius@jyu.fi                                         |  | 
|  |   Philip Ilten;  School of Physics and Astronomy,                            |  | 
|  |      University of Birmingham, Birmingham, B152 2TT, UK;                     |  | 
|  |      e-mail: philten@cern.ch                                                 |  | 
|  |   Leif Lonnblad;  Department of Astronomy and Theoretical Physics,           |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: leif.lonnblad@thep.lu.se                                        |  | 
|  |   Stephen Mrenna;  Computing Division, Simulations Group,                    |  | 
|  |      Fermi National Accelerator Laboratory, MS 234, Batavia, IL 60510, USA;  |  | 
|  |      e-mail: mrenna@fnal.gov                                                 |  | 
|  |   Stefan Prestel;  Department of Astronomy and Theoretical Physics,          |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: stefan.prestel@thep.lu.se                                       |  | 
|  |   Christine O. Rasmussen;  Department of Astronomy and Theoretical Physics,  |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: christine.rasmussen@thep.lu.se                                  |  | 
|  |   Torbjorn Sjostrand;  Department of Astronomy and Theoretical Physics,      |  | 
|  |      Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden;                |  | 
|  |      e-mail: torbjorn@thep.lu.se                                             |  | 
|  |   Peter Skands;  School of Physics,                                          |  | 
|  |      Monash University, PO Box 27, 3800 Melbourne, Australia;                |  | 
|  |      e-mail: peter.skands@monash.edu                                         |  | 
|  |                                                                              |  | 
|  |   The main program reference is 'An Introduction to PYTHIA 8.2',             |  | 
|  |   T. Sjostrand et al, Comput. Phys. Commun. 191 (2015) 159                   |  | 
|  |   [arXiv:1410.3012 [hep-ph]]                                                 |  | 
|  |                                                                              |  | 
|  |   The main physics reference is the 'PYTHIA 6.4 Physics and Manual',         |  | 
|  |   T. Sjostrand, S. Mrenna and P. Skands, JHEP05 (2006) 026 [hep-ph/0603175]  |  | 
|  |                                                                              |  | 
|  |   An archive of program versions and documentation is found on the web:      |  | 
|  |   http://www.thep.lu.se/Pythia                                               |  | 
|  |                                                                              |  | 
|  |   This program is released under the GNU General Public Licence version 2.   |  | 
|  |   Please respect the MCnet Guidelines for Event Generator Authors and Users. |  | 
|  |                                                                              |  | 
|  |   Disclaimer: this program comes without any guarantees.                     |  | 
|  |   Beware of errors and use common sense when interpreting results.           |  | 
|  |                                                                              |  | 
|  |   Copyright (C) 2018 Torbjorn Sjostrand                                      |  | 
|  |                                                                              |  | 
|  |                                                                              |  | 
|  *------------------------------------------------------------------------------*  | 
|                                                                                    | 
*------------------------------------------------------------------------------------* 


*-------  PYTHIA Process Initialization  --------------------------*
|                                                                  |
| We collide p+ with p+ at a CM energy of 1.300e+04 GeV            |
|                                                                  |
|------------------------------------------------------------------|
|                                                    |             |
| Subprocess                                    Code |   Estimated |
|                                                    |    max (mb) |
|                                                    |             |
|------------------------------------------------------------------|
|                                                    |             |
| non-diffractive                                101 |   5.642e+01 |
| A B -> X B single diffractive                  103 |   6.416e+00 |
| A B -> A X single diffractive                  104 |   6.416e+00 |
| A B -> X X double diffractive                  105 |   8.798e+00 |
|                                                                  |
*-------  End PYTHIA Process Initialization -----------------------*

*-------  PYTHIA Multiparton Interactions Initialization  ---------* 
|                                                                  | 
|                   sigmaNonDiffractive =    56.42 mb              | 
|                                                                  | 
|    pT0 =  2.81 gives sigmaInteraction =   267.96 mb: accepted    | 
|                                                                  | 
*-------  End PYTHIA Multiparton Interactions Initialization  -----* 
PYTHIA Warning in MultipartonInteractions::init: maximum increased by factor 1.055

*-------  PYTHIA Multiparton Interactions Initialization  ---------* 
|                                                                  | 
|                          diffraction XB                          | 
|                                                                  | 
|   diffractive mass = 1.00e+01 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  0.46 gives sigmaInteraction =    54.25 mb: accepted    | 
|   diffractive mass = 6.00e+01 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  0.72 gives sigmaInteraction =    28.53 mb: accepted    | 
|   diffractive mass = 3.61e+02 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  1.14 gives sigmaInteraction =    20.25 mb: accepted    | 
|   diffractive mass = 2.16e+03 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  1.79 gives sigmaInteraction =    30.44 mb: accepted    | 
|   diffractive mass = 1.30e+04 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  2.81 gives sigmaInteraction =    52.87 mb: accepted    | 
|                                                                  | 
*-------  End PYTHIA Multiparton Interactions Initialization  -----* 

*-------  PYTHIA Multiparton Interactions Initialization  ---------* 
|                                                                  | 
|                          diffraction AX                          | 
|                                                                  | 
|   diffractive mass = 1.00e+01 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  0.46 gives sigmaInteraction =    54.35 mb: accepted    | 
|   diffractive mass = 6.00e+01 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  0.72 gives sigmaInteraction =    28.27 mb: accepted    | 
|   diffractive mass = 3.61e+02 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  1.14 gives sigmaInteraction =    20.31 mb: accepted    | 
|   diffractive mass = 2.16e+03 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  1.79 gives sigmaInteraction =    30.66 mb: accepted    | 
|   diffractive mass = 1.30e+04 GeV and sigmaNorm =    10.00 mb    | 
|    pT0 =  2.81 gives sigmaInteraction =    52.96 mb: accepted    | 
|                                                                  | 
*-------  End PYTHIA Multiparton Interactions Initialization  -----* 

*-------  PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings (changes only)  ------------------* 
|                                                                                                                 | 
| Name                                          |                      Now |      Default         Min         Max | 
|                                               |                          |                                      | 
| Beams:eCM                                     |                13000.000 |    14000.000    10.00000             | 
| Check:epTolErr                                |                0.0100000 |   1.0000e-04                         | 
| Main:timesAllowErrors                         |                    10000 |           10           0             | 
| MultipartonInteractions:ecmPow                |                  0.25208 |      0.21500         0.0     0.50000 | 
| MultipartonInteractions:expPow                |                  1.60000 |      1.85000     0.40000    10.00000 | 
| MultipartonInteractions:pT0Ref                |                  2.40240 |      2.28000     0.50000    10.00000 | 
| Next:numberShowEvent                          |                        0 |            1           0             | 
| ParticleDecays:allowPhotonRadiation           |                       on |          off                         | 
| ParticleDecays:limitTau0                      |                       on |          off                         | 
| SLHA:minMassSM                                |                 1000.000 |    100.00000                         | 
| SoftQCD:doubleDiffractive                     |                       on |          off                         | 
| SoftQCD:nonDiffractive                        |                       on |          off                         | 
| SoftQCD:singleDiffractive                     |                       on |          off                         | 
| Tune:preferLHAPDF                             |                        2 |            1           0           2 | 
|                                                                                                                 | 
*-------  End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings  -----------------------------* 

--------  PYTHIA Particle Data Table (changed only)  ------------------------------------------------------------------------------

     id   name            antiName         spn chg col      m0        mWidth      mMin       mMax       tau0    res dec ext vis wid
            no onMode   bRatio   meMode     products 

 no particle data has been changed from its default value 

--------  End PYTHIA Particle Data Table  -----------------------------------------------------------------------------------------


*-------  PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings (changes only)  ------------------* 
|                                                                                                                 | 
| Name                                          |                      Now |      Default         Min         Max | 
|                                               |                          |                                      | 
| Next:numberShowEvent                          |                        0 |            1           0             | 
| ParticleDecays:allowPhotonRadiation           |                       on |          off                         | 
| ParticleDecays:limitTau0                      |                       on |          off                         | 
| ProcessLevel:all                              |                      off |           on                         | 
|                                                                                                                 | 
*-------  End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings  -----------------------------* 

--------  PYTHIA Particle Data Table (changed only)  ------------------------------------------------------------------------------

     id   name            antiName         spn chg col      m0        mWidth      mMin       mMax       tau0    res dec ext vis wid
            no onMode   bRatio   meMode     products 

no particle data has been changed from its default value 

--------  End PYTHIA Particle Data Table  -----------------------------------------------------------------------------------------

Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:06.079 CEST

--------  PYTHIA Info Listing  ---------------------------------------- 

Beam A: id =   2212, pz =  6.500e+03, e =  6.500e+03, m =  9.383e-01.
Beam B: id =   2212, pz = -6.500e+03, e =  6.500e+03, m =  9.383e-01.

In 1: id =    3, x =  5.935e-05, pdf =  4.937e-01 at Q2 =  3.474e+00.
In 2: id =    1, x =  1.439e-03, pdf =  4.936e-01 at same Q2.

Process non-diffractive with code 101 is 2 -> 2.
Subprocess q q(bar)' -> q q(bar)' with code 114 is 2 -> 2.
It has sHat =  1.443e+01,    tHat = -5.823e+00,    uHat = -8.610e+00,
      pTHat =  1.864e+00,   m3Hat =  0.000e+00,   m4Hat =  0.000e+00,
   thetaHat =  1.376e+00,  phiHat =  2.086e+00.
    alphaEM =  7.539e-03,  alphaS =  2.754e-01    at Q2 =  1.136e+01.

Impact parameter b =  1.874e+00 gives enhancement factor =  1.343e-02.
Max pT scale for MPI =  1.864e+00, ISR =  1.864e+00, FSR =  1.864e+00.
Number of MPI =     1, ISR =     2, FSRproc =     0, FSRreson =     0.

--------  End PYTHIA Info Listing  ------------------------------------

--------  PYTHIA Event Listing  (hard process)  -----------------------------------------------------------------------------------
 
    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m 
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13000.000  13000.000
     1       2212  (p+)               -12     0     0     3     0     0     0      0.000      0.000   6500.000   6500.000      0.938
     2       2212  (p+)               -12     0     0     4     0     0     0      0.000      0.000  -6500.000   6500.000      0.938
     3          3  (s)                -21     1     0     5     6   101     0      0.000      0.000      0.386      0.386      0.000
     4          1  (d)                -21     2     0     5     6   102     0      0.000      0.000     -9.353      9.353      0.000
     5          3  s                   23     3     4     0     0   102     0      1.581     -0.895     -3.611      4.073      0.500
     6          1  d                   23     3     4     0     0   101     0     -1.581      0.895     -5.356      5.666      0.330
                                   Charge sum: -0.667           Momentum sum:      0.000      0.000     -8.967      9.739      3.799

 --------  End PYTHIA Event Listing  -----------------------------------------------------------------------------------------------
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:09.990 CEST
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:11.147 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:15.916 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:15.918 CEST
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:22.698 CEST
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:22.858 CEST
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:25.345 CEST
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:26.413 CEST
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:39.373 CEST

*-------  PYTHIA Event and Cross Section Statistics  -------------------------------------------------------------*
|                                                                                                                 |
| Subprocess                                    Code |            Number of events       |      sigma +- delta    |
|                                                    |       Tried   Selected   Accepted |     (estimated) (mb)   |
|                                                    |                                   |                        |
|-----------------------------------------------------------------------------------------------------------------|
|                                                    |                                   |                        |
| non-diffractive                                101 |           7          7          7 |   5.642e+01  0.000e+00 |
| A B -> X B single diffractive                  103 |           1          1          1 |   6.416e+00  6.416e+00 |
| A B -> A X single diffractive                  104 |           1          1          1 |   6.416e+00  6.416e+00 |
| A B -> X X double diffractive                  105 |           1          1          1 |   8.798e+00  8.798e+00 |
|                                                    |                                   |                        |
| sum                                                |          10         10         10 |   7.805e+01  1.264e+01 |
|                                                                                                                 |
*-------  End PYTHIA Event and Cross Section Statistics ----------------------------------------------------------*

*-------  PYTHIA Error and Warning Messages Statistics  ----------------------------------------------------------* 
|                                                                                                                 | 
|  times   message                                                                                                | 
|                                                                                                                 | 
|      3   Warning in MultipartonInteractions::init: maximum increased                                            | 
|                                                                                                                 | 
*-------  End PYTHIA Error and Warning Messages Statistics  ------------------------------------------------------* 

*-------  PYTHIA Event and Cross Section Statistics  -------------------------------------------------------------*
|                                                                                                                 |
| Subprocess                                    Code |            Number of events       |      sigma +- delta    |
|                                                    |       Tried   Selected   Accepted |     (estimated) (mb)   |
|                                                    |                                   |                        |
|-----------------------------------------------------------------------------------------------------------------|
|                                                    |                                   |                        |
| non-diffractive                                101 |           7          7          7 |   5.642e+01  0.000e+00 |
| A B -> X B single diffractive                  103 |           1          1          1 |   6.416e+00  6.416e+00 |
| A B -> A X single diffractive                  104 |           1          1          1 |   6.416e+00  6.416e+00 |
| A B -> X X double diffractive                  105 |           1          1          1 |   8.798e+00  8.798e+00 |
|                                                    |                                   |                        |
| sum                                                |          10         10         10 |   7.805e+01  1.264e+01 |
|                                                                                                                 |
*-------  End PYTHIA Event and Cross Section Statistics ----------------------------------------------------------*

*-------  PYTHIA Error and Warning Messages Statistics  ----------------------------------------------------------* 
|                                                                                                                 | 
|  times   message                                                                                                | 
|                                                                                                                 | 
|      3   Warning in MultipartonInteractions::init: maximum increased                                            | 
|                                                                                                                 | 
*-------  End PYTHIA Error and Warning Messages Statistics  ------------------------------------------------------* 

------------------------------------
GenXsecAnalyzer:
------------------------------------
Before Filter: total cross section = 7.805e+10 +- 1.264e+10 pb
Filter efficiency (taking into account weights)= (10) / (10) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (10) / (10) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 7.805e+10 +- 1.264e+10 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 1.281e-08 +- 2.075e-09

============================================= 

Question 12.1

What is the file size of step1.root?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.

Generate MC dataset using CRAB

CRAB is handled by a configuration file. In CRAB3, the configuration file is in Python language. Here we give an example CRAB configuration file to run the CMSDAS_MC_generation.py MC event generation code. You can download a copy of crabConfig_MC_generation.py.

Below you also find the file:

Show/Hide

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'CMSDAS_MC_generation_test0'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'PrivateMC'
config.JobType.psetName = 'CMSDAS_MC_generation.py'
config.JobType.allowUndistributedCMSSW = True

config.section_("Data")
config.Data.outputPrimaryDataset = 'MinBias'
config.Data.splitting = 'EventBased'
config.Data.unitsPerJob = 10
NJOBS = 10  # This is not a configuration parameter, but an auxiliary variable that we use in the next line.
config.Data.totalUnits = config.Data.unitsPerJob * NJOBS
config.Data.publication = True
config.Data.outputDatasetTag = 'CMSDAS2023_CRAB3_MC_generation_test0'

config.section_("Site")
config.Site.storageSite = 'T3_CH_CERNBOX'

Put the copy of crabConfig_MC_generation.py under YOURWORKINGAREA/CMSSW_10_6_18/src.

All available CRAB configuration parameters are defined at CRAB3ConfigurationFile.

Now let us try to submit this job via crab by

crab submit -c crabConfig_MC_generation.py

For the detail of the crab command, you can find them from CRABCommands. You will be requested to enter your grid certificate password.
Then you should get an output similar to this:

Will use CRAB configuration file crabConfig_MC_generation.py
Enter GRID pass phrase for this identity:
Importing CMSSW configuration CMSDAS_MC_generation.py
Finished importing CMSSW configuration CMSDAS_MC_generation.py
Sending the request to the server at cmsweb.cern.ch
Success: Your task has been delivered to the prod CRAB3 server.
Task name: 230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Project dir: crab_projects/crab_CMSDAS_MC_generation_test0
Please use ' crab status -d crab_projects/crab_CMSDAS_MC_generation_test0 ' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log

Now you might notice a directory called crab_projects is created under CMSSW_10_6_18/src/. See what is under that directory. After you submitted the job successfully (give it a few moments), you can check the status of a task by executing the following CRAB command:

 crab status [-t] <CRAB-project-directory>

In our case, we run:

crab status crab_projects/crab_CMSDAS_MC_generation_test0

The crab status command will produce an output containing the task name, the status of the task as a whole, the details of how many jobs are in which state (submitted, running, transfering, finished, cooloff, etc.) and the location of the CRAB log (crab.log) file. It will also print the URLs of two web pages that one can use to monitor the jobs. In summary, it should look something like this:

CRAB project directory:		/afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0
Task name:			230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Grid scheduler - Task Worker:	crab3@vocms0196.cern.ch - crab-prod-tw01
Status on the CRAB server:	SUBMITTED
Task URL to use for HELP:	https://cmsweb.cern.ch/crabserver/ui/task/230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0
Dashboard monitoring URL:	https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0&from=1682080126000&to=now
Task bootstrapped at 2023-04-21 13:29:37 UTC. 19 seconds ago
Status information will be available within a few minutes
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log

Now you can take a break and have some fun. Come back after couple hours or so and check the status again.

[vmilosev@lxplus700 src]$ crab status crab_projects/crab_CMSDAS_MC_generation_test0/
CRAB project directory:		/afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0
Task name:			230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Grid scheduler - Task Worker:	crab3@vocms0196.cern.ch - crab-prod-tw01
Status on the CRAB server:	SUBMITTED
Task URL to use for HELP:	https://cmsweb.cern.ch/crabserver/ui/task/230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0
Dashboard monitoring URL:	https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0&from=1682080126000&to=now
Status on the scheduler:	COMPLETED

Jobs status:                    finished     		100.0% (10/10)

Publication status of 1 dataset(s):	done         		100.0% (10/10)
(from CRAB internal bookkeeping in transferdb)

Output dataset:			/MinBias/vmilosev-CMSDAS2023_CRAB3_MC_generation_test0-67359df6f8a0ef3c567d7c8fea38a809/USER
Output dataset DAS URL:		https://cmsweb.cern.ch/das/request?input=%2FMinBias%2Fvmilosev-CMSDAS2023_CRAB3_MC_generation_test0-67359df6f8a0ef3c567d7c8fea38a809%2FUSER&instance=prod%2Fphys03

Warning: the max jobs runtime is less than 30% of the task requested value (1250 min), please consider to request a lower value for failed jobs (allowed through crab resubmit) and/or improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task.

Warning: the average jobs CPU efficiency is less than 50%, please consider to improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task

Summary of run jobs:
 * Memory: 26MB min, 66MB max, 40MB ave
 * Runtime: 0:04:34 min, 0:05:05 max, 0:04:41 ave
 * CPU eff: 14% min, 58% max, 33% ave
 * Waste: 1:17:58 (62% of total)

Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log

Note: If at lxplus, it will write out to your eos area. You can access them from /eos/user/$U/$USER/SUBDIR with SUBDIR being the subdirectory name you provided. Take a look at that directory. (In our example we looked at MinBias and named the task CMSDAS2021_CRAB3_MC_generation_test0. The subsequent date string depends when you started your task.)

From the bottom of the output, you can see the name of the dataset and the DAS link to it. Congratulations! This is the your first CMS dataset.

Question 12.2

What is the name of the dataset you produced?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.

Exercise 13 - Running on a dataset with CRAB

Now we’re going to apply what you’ve learned using CRAB to the MiniAOD exercises you’ve been working on in the first two sets of exercises. Make sure that you finished and still have the scripts from Exercise 7 under the YOURWORKINGAREA/CMSSW_10_6_18/src.

Set up CRAB to run your MiniAOD jobs

If you forget, go back to the YOURWORKINGAREA/CMSSW_10_6_18/src and setup crab.

cmsenv

We will make another CRAB config file: crabConfig_data_slimMiniAOD.py. Copy it from here: crabConfig_data_generation.py and find it below:

Show/Hide

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'CMSDAS_Data_analysis_test0'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'slimMiniAOD_data_MuEle_cfg.py'
config.JobType.allowUndistributedCMSSW = True

config.section_("Data")
config.Data.inputDataset = '/DoubleMuon/Run2016C-03Feb2017-v1/MINIAOD'
config.Data.inputDBS = 'global'
config.Data.splitting = 'LumiBased'
config.Data.unitsPerJob = 50
config.Data.lumiMask = 'https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions16/13TeV/Cert_271036-275783_13TeV_PromptReco_Collisions16_JSON.txt'
config.Data.runRange = '275776-275782'

config.section_("Site")
config.Site.storageSite = 'T3_CH_CERNBOX'

Most of this file should be familiar by now, but a few things may be new. The runRange parameter is used to further limit your jobs to a range of what is in the lumiMask file. This is needed if your two input datasets overlap. That way you can control which events come from which datasets. Instructions how to do this are at https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVAnalysisSummaryTable. You can find the year specific instructions by clicking any of the links at the bottom.

Run CRAB

Now go through the same process for this config file. You submit it with

 crab submit -c crabConfig_data_slimMiniAOD.py

and check the status with

 crab status

After a while, you should see something like below:

CRAB project directory:		/afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0
Task name:			230421_160319:vmilosev_crab_CMSDAS_Data_analysis_test0
Grid scheduler - Task Worker:	crab3@vocms0199.cern.ch - crab-prod-tw01
Status on the CRAB server:	SUBMITTED
Task URL to use for HELP:	https://cmsweb.cern.ch/crabserver/ui/task/230421_160319%3Avmilosev_crab_CMSDAS_Data_analysis_test0
Dashboard monitoring URL:	https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_160319%3Avmilosev_crab_CMSDAS_Data_analysis_test0&from=1682089399000&to=now
Status on the scheduler:	COMPLETED

Jobs status:                    finished     		100.0% (31/31)

Publication status of 1 dataset(s):	done         		100.0% (31/31)
(from CRAB internal bookkeeping in transferdb)

Output dataset:			/DoubleMuon/vmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002/USER
Output dataset DAS URL:		https://cmsweb.cern.ch/das/request?input=%2FDoubleMuon%2Fvmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002%2FUSER&instance=prod%2Fphys03

Warning: the max jobs runtime is less than 30% of the task requested value (1250 min), please consider to request a lower value for failed jobs (allowed through crab resubmit) and/or improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task.

Summary of run jobs:
 * Memory: 153MB min, 914MB max, 578MB ave
 * Runtime: 0:03:25 min, 0:17:22 max, 0:07:30 ave
 * CPU eff: 22% min, 77% max, 56% ave
 * Waste: 0:04:15 (2% of total)

Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/crab.log

Create reports of data analyzed

Once all jobs are finished (see crab status above) you can report:

crab report

You’ll get something like this

Running crab status first to fetch necessary information.
Will save lumi files into output directory /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/results
Summary from jobs in status 'finished':
  Number of files processed: 64
  Number of events read: X
  Number of events written in EDM files: X
  Number of events written in TFileService files: 0
  Number of events written in other type of files: 0
  Processed lumis written to processedLumis.json
Summary from output datasets in DBS:
  Number of events:
    /DoubleMuon/vmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002/USER: 2167324
  Output datasets lumis written to outputDatasetsLumis.json
Additional report lumi files:
  Input dataset lumis (from DBS, at task submission time) written to inputDatasetLumis.json
  Lumis to process written to lumisToProcess.json
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/crab.log

crab report prints to the screen how many events were analyzed.

Question 13

How many events were analyzed? (n.b. the number in the above example were replaced with X)
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.

Optional: View the reconstructed Z peak in the combined data

Note

You will be doing a short analysis later when going to exercise set number four.

Use the FWLiteHistograms executable you were using in the previous exercises to aggregate the data from all the CRAB output files. The root files created in the above step have been kept at the directory below: /eos/user/$U/$USER/DoubleMuon/crab_CMSDAS_Data_analysis_test0/ One can use the command:

FWLiteHistograms inputFiles=File1,File2,File3,... outputFile=ZPeak_data.root maxEvents=-1 outputEvery=100

In my case, File1=/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root etc.. Make sure there is no space in File1,File2,File3,...

You may look at ZPeak_data.root using TBrowser.

Exercise 14 - Combining the data and calculating luminosity

Note

This last exercise in this set is done on lxplus.

Install the BRIL Work Suite

We will use the BRIL work suite, a commandline toolkit for CMS Beam Radiation Instrumentation and Luminosity to calculate the total luminosity of the data we ran over.

Refer to the documentation for further information on BRIL.

Enter the following command:

/cvmfs/cms-bril.cern.ch/brilconda3/bin/python3 -m pip install --user --upgrade brilws

When running crab report, the report will give you the location of a JSON-formatted file containing the luminosity information

Will save lumi files into output directory /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/results

This directory contains various luminosity files. Let’s figure out how much luminosity was run on by our jobs.

First step is to copy the processedLumis.json file to your .local/bin/ folder:

cp [lumi directory]/processedLumis.json ~/.local/bin/

Here, [lumi directory] is the directory reported by crab report.

Find the luminosity for the dataset

We now let brilcalc calculate the luminosity we processed with our jobs using the json file by typing following commands:

cd ~/.local/bin/
./brilcalc lumi -b "STABLE BEAMS" --normtag /afs/cern.ch/user/l/lumipro/public/Normtags/normtag_DATACERT.json -i processedLumis.json -u /fb

if the above does not work, try instead:

./brilcalc lumi -b "STABLE BEAMS" --normtag /afs/cern.ch/user/l/lumipro/public/Normtags/normtag_DATACERT.json -i processedLumis.json -c /cvmfs/cms.cern.ch/SITECONF/T0_CH_CERN/JobConfig/site-local-config.xml -u /fb

The end of the output should look similar to this (note this example summary is for a different json file):

 #Summary:
 +-------+------+-------+-------+-------------------+------------------+
 | nfill | nrun | nls   | ncms  | totdelivered(/fb) | totrecorded(/fb) |
 +-------+------+-------+-------+-------------------+------------------+
 | 9     | 37   | 17377 | 17377 | 2.761             | 2.646            |
 +-------+------+-------+-------+-------------------+------------------+
 #Check JSON:
 #(run,ls) in json but not in results: [(275890, 721)]

In the example of that other json file, the total recorded luminosity for those CRAB jobs is 2.6 fb-1.

Question 14

What is the reported number of inverse femtobarns analyzed? (n.b. it is not the same sample as listed above with luminosity 2.6-1. ) For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.

Where to find more on CRAB

Note also that all CMS members using the Grid subscribe to the Grid Annoucements CMS HyperNews forum. Important CRAB announcements will be announced on the CERN Computing Announcement HyperNews forum.



_Last reviewed: 2023/04/20 by Vukasin Milosevic

Key Points

  • Use and validate your grid certificate.

  • Setting up your CRAB configuration and run jobs over the CMS grid.

  • Publish your CRAB datasets.

  • Calculate the luminosities of the datasets processed via CRAB.


CMS Data Analysis School Pre-Exercises - Fourth Set

Overview

Teaching: 0 min
Exercises: 60 min
Questions
  • How do we analyze an EDM ROOT file using an EDAnalyzer?

  • How do we analyze an EDM ROOT file using an FWLite executable?

  • How do we use ROOT/RooFit to fit a function to a histogram?

Objectives
  • Learn how to use an EDAnalyzer

  • Learn how to use FWLite

  • Understand a variety of methods for performing a fit to a histogram

Introduction

In this set of exercises, we will analyze the MiniAOD file that was made in the third set of exercise. You must have this skimmed MiniAOD stored locally (in your eos user space) in order to access them. We will use several different workflows for analyzing the MiniAOD, namely an EDAnalyzer, a FWLite executable, a FWLite Macro, and a FWLite PyROOT script. We will basically re-make the Z peak and few other histograms and store them in an output root file. In the exercise in the end we will try to fit with a Gaussian, Breit-Wigner function, etc.

Warning

To perform this set of exercises, a CERN computing account, Grid Certificate, and CMS VO membership are required. You should already have these things, but if not, follow these instructions from the setup instructions.

Objective

Please post your answers to the questions in the Google form fourth set.

Exercise 15 - Analyzing MiniAOD with an EDAnalyzer

In this exercise we will analyze the skimmed MiniAODs created in the third set of exercises using an EDAnalyzer. In these skimmed MiniAODs, if you recall, we saved only the muons and electrons. So do not look for jets, photons, or other objects as they were simply not saved. We will use a python config file and an EDAnalyzer ( a .cc file) to make a Z mass peak. You can find an example list of files below, but please first try using the files you created.

Example file list

root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_1.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_10.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_11.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_12.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_13.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_14.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_15.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_16.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_17.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_18.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_19.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_2.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_20.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_21.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_22.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_23.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_24.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_25.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_26.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_27.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_28.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_29.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_3.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_30.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_31.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_4.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_5.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_6.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_7.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_8.root
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_9.root

First we will add the PhysicsTools/PatExamples package as follows to <YOURWORKINGAREA>/CMSSW_10_6_18/src. The PatExamples package has lot of examples for a user to try. However, we will add our own code and config file to it and then compile. To add this package, do this:

cd $CMSSW_BASE/src/
git cms-addpkg PhysicsTools/PatExamples

Note

We are assuming that you’ve already checked out a CMSSW_10_6_18 release and have performed the cmsenv setup command.

In this package, you will find the python configuration file $CMSSW_BASE/src/PhysicsTools/PatExamples/test/analyzePatBasics_cfg.py. You will also see the EDAnalyzer in $CMSSW_BASE/src/PhysicsTools/PatExamples/plugins/PatBasicAnalyzer.cc.

Next, create the following two files (download/save): $CMSSW_BASE/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc and $CMSSW_BASE/src/MyZPeak_cfg.py.

Hint

A quick way to do this on Linux, or any machine with wget, is by using the following commands:

wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/MyZPeakAnalyzer-CMSSW_10_6_18.cc -O $CMSSW_BASE/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc
wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/MyZPeak_cfg.py -O $CMSSW_BASE/src/MyZPeak_cfg.py

Then we will compile the code that you just saved by doing:

cd $CMSSW_BASE/src/
scram b

The compilation should print many lines of text to your terminal. Among those lines you should see a line like the one below. If you can’t find a similar line, then the code you just added is not compiled.

>> Compiling  <$CMSSW_BASE>/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc

After successful compilation, you must run the config file as follows:

cmsRun MyZPeak_cfg.py

Successful running of the above config file will produce an output file myZPeakCRAB.root. The output file myZPeakCRAB.root has several histograms, besides the Z peak, called mumuMass, like muonMult, muonEta, muonPhi, muonPt and similarly for electrons.

Note

In the case above, the file MyZPeak_cfg.py will read from area root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/. You should have a similar location from where you can read your CRAB output ROOT files. You can edit the MyZPeak_cfg.py file to use the MiniAOD files you made in Exercise 13 by replacing the location of the input files to the path of file you generated. From my side, the files are stored in:

'/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root'

Question 15

What is the number of entries in the mumuMass plot if you just used the first input file, probably named slimMiniAOD_data_MuEle_1.root?

Exercise 16 - Analyzing MiniAOD with an FWLite executable

In this exercise we will make the same ROOT file, myZPeakCRAB.root, as in Exercise 15, but we call it myZPeakCRAB_fwlite.root so that you do not end up overwriting the file previously made in Exercise 15.

First, check out the following two packages by doing:

cd $CMSSW_BASE/src/
git cms-addpkg PhysicsTools/FWLite
git cms-addpkg PhysicsTools/UtilAlgos

Next, replace the existing $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc with this FWLiteWithPythonConfig.cc. You are simply updating an existing analyzer. Then, create the file $CMSSW_BASE/src/parameters.py.

Hint

You can easily download the needed files by running the following commands:

wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/FWLiteWithPythonConfig.cc -O $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc
wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/parameters.py -O $CMSSW_BASE/src/parameters.py

Note

In case you have completed Exercise Set 3 successfully, put the names and path of the ROOT files that you made yourself via submitting CRAB job, instead of those currently in parameters.py.

parameters.py will read from area root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/. You should have a similar location from where you can read your CRAB output ROOT files. You can edit the parameters.py file to use the MiniAOD files you made in Exercise 13 by replacing the location of the input files. From my side, the files are stored in:

 '/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root'

Then we will compile the code that you just saved by doing:

cd $CMSSW_BASE/src/
scram b -j 4

You should see among the output a line like the one below. If not, it is probable that you haven’t compiled the code on which we are working.

>> Compiling  /your_path/YOURWORKINGAREA/CMSSW_10_6_18/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc

After successful compilation, you must run the config file as follows:

cd $CMSSW_BASE/src/
cmsenv
FWLiteWithPythonConfig parameters.py

Note

Take note of the extra cmsenv is to ensure the changes to files in the bin subdirectory are picked up in your path.

Warning

You might get a segfault when running this exercise. Just ignore it; the output ROOT file will still be created and be readable.

Note

Take a look at how the parameters defined in parameters.py get input to the executable code FWLiteWithPythonConfig.cc.

A successful running of the FWLite executable, FWLiteWithPythonConfig, results in an output file called myZPeakCRAB_fwlite.root.

The output ROOT file myZPeakCRAB_fwlite.root is a bit different from myZPeakCRAB.root made in Exercise 15 since we did not make any of the electron histograms. The histograms do have the mumuMass, besides, muonEta, muonPhi, and muonPt.

Question 16

What is the number in entries in the mumuMass obtained in Exercise 16, again using only the first input file.?

Exercise 17 - Fitting the Z mass peak

The main intention of fitting the Z mass peak is to show how to fit a distribution. To do this exercise, you will need the ROOT files that you made in Exercise 15 and Exercise 16. Make sure you have the ROOT file $CMSSW_BASE/src/myZPeakCRAB.root ( Exercise 15) or myZPeakCRAB_fwlite.root (Exercise 16). If you have not managed to create at least one of these ROOT files, you can get them from the following locations:

File list

/afs/cern.ch/cms/Tutorials/TWIKI_DATA/CMSDataAnaSch/myZPeakCRAB.root # lxplus or Bari
root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Output/myZPeakCRAB.root # cmslpc
root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Output/myZPeakCRAB_fwlite.root # cmslpc

This will allow you to continue with Exercise 17. For this exercise, we will use the ROOT file myZPeakCRAB.root. Alternatively, you can use the file myZPeakCRAB_fwlite.root, but just make sure to have the right name of the ROOT file. The most important factor is that both of these files have the histogram mumuMass.

We also ask that you create a rootlogon.C file in the $CMSSW_BASE/src/ directory. We will reference this version as opposed to anyone’s personalized rootlogon file. This sets up the libraries needed to complete this exercise.

The different distribution that we would fit to the Z mass peak are:

Some general remarks about fitting a Z peak

To fit a generator-level Z peak, a Breit-Wigner fit makes sense. However, reconstructed-level Z peaks have many detector resolutions that smear the Z mass peak. If the detector resolution is relatively poor, then it is usually good enough to fit a Gaussian (since the gaussian detector resolution will overwhelm the inherent Briet-Wigner shape of the peak). If the detector resolution is fairly good, then another option is to fit a Breit-Wigner (for the inherent shape) convolved with a Gaussian (to describe the detector effects). This is in the “no-background” case. If you have backgrounds in your sample (Drell-Yan, cosmics, etc…), and you want to do the fit over a large mass range, then another function needs to be included to take care of this; an exponential is commonly used.

Fitting a Gaussian

There are several options to fit a Gaussian

Using the inbuilt Gaussian in ROOT

Open ROOT as follows:

root -l

Then execute the following commands:

TFile f("myZPeakCRAB.root");
f.cd("analyzeBasicPat");
gStyle->SetOptFit(111111);   
mumuMass->Fit("gaus");

This will pop up the following histogram. Save this histogram as pdf, ps, or eps file using the menu of the histogram window. As you can see we should fit a sub-range as this fit is not a good fit. In the next part of this exercise, we will fit a sub-range of the mumuMass distribution, but for this we will use a ROOT macro as using inbuilt ROOT functions have very minimal usage. For more complex or useful fitting functions, one has to use a macro.

For now, we can improve the fit description of the Z resonance by limiting our fit range:

TFile f("myZPeakCRAB.root");
f.cd("analyzeBasicPat");
gStyle->SetOptFit(111111);   
g1 = new TF1("m1","gaus",85,95);
mumuMass->Fit(g1,"R");

One should obtain a similar histogram as:

GaussFitZmm

Reminder

You can quit ROOT using the .q command.

The line gStyle->SetOptFit(111111);` enables all the histogram statistics to be displayed. For more options and other information please refer to ROOT documentation.

Question 17.1a

What is the value of the mean Z Mass that you get?

Question 17.1b

What is the value of the chisquare/ndf that you get?

Using a macro of your own in ROOT

As you have seen above, we should fit a sub-range of the Z mass distribution because the fit in the full range is not all that great. In this exercise, we will fit a sub-range of the mumuMass distribution but for this we will use a ROOT macro. For more complex or useful fitting functions, one has to use a macro. The macro to run is FitZPeak.C. This macro calls another macro, BW.C. Please download/save them with the corresponding names in $CMSSW_BASE/src. Note that now the myZPeakCRAB.root file is opened by executing the macro itself, in addition to fitting the Z mass peak.

To run this macro execute the following command from the $CMSSW_BASE/src directory:

root -l FitZPeak.C

This should pop up a histogram (shown below) and you will find yourself in a ROOT session.

FitZPeak

Reminder

You can save this plot from the menu on top of the histogram and then quit ROOT using the .q command.

Hint

You can also save the plot to an encapsulated postscript file by running the macro as:

root -l FitZPeak.C\(true\)

Here is some explanation of the macro. We have defined the Gaussian distribution that we want to fit in the macro BW.C (shown below). Note that in the same macro we have also is defined a Breit-Wigner function that you can try yourself. However, in the later part of the exercise, we will use RooFit to fit the distribution using a Breit-Wigner function.

Double_t mygauss(Double_t * x, Double_t * par)
{
  Double_t arg = 0;
  if (par[2]<0) par[2]=-par[2];  // par[2]: sigma
  if (par[2] != 0) arg = (x[0] - par[1])/par[2];  // par[1]: mean
 
 //return par[0]*BIN_SIZE*TMath::Exp(-0.5*arg*arg)/
  //   (TMath::Sqrt(2*TMath::Pi())*par[2]); 
   return par[0]*TMath::Exp(-0.5*arg*arg)/
     (TMath::Sqrt(2*TMath::Pi())*par[2]); // par[0] is constant
 
}

par[0], par[1], and par[2] are the constant, mean, and sigma parameters, respectively. Also x[0] mean the x-axis variable. BW.C is called by FitZPeak.C in the line gROOT->LoadMacro("BW.C");. The initial values of the three fitted parameters are defined in FitZPeak.C as follows:

func->SetParameter(0,1.0);   func->SetParName(0,"const");  
func->SetParameter(2,5.0);   func->SetParName(2,"sigma");  
func->SetParameter(1,95.0);     func->SetParName(1,"mean");

Also note that in the macro FitZPeak.C, we have commented the following lines and used the two lines below it. The reason being that we want to fit a sub-range. If you would want to fit the entire range of the histogram, get the minimum and maximum value of the range by instead using the lines that have been commented.

//float massMIN = Z_mass->GetBinLowEdge(1);
//float massMAX = Z_mass->GetBinLowEdge(Z_mass->GetNbinsX()+1);

float massMIN = 85.0;
float massMAX = 96.0;

Question 17.2

What mean value of the Z mass do you get in the fitted sub-range?

Using a macro in RooFit

Before we start, have a look at the RooFit twiki to get a feeling for it. Then save the macro RooFitMacro.C in the $CMSSW_BASE/src/ directory. This macro will fit the Z mass peak using RooFit.

Take a look at the code and then execute the following:

root -l RooFitMacro.C

You may need to add the following line to your rootlogon.C file to get this interpreted code to work:

gROOT->ProcessLine(".include $ROOFITSYS/include/");

This should pop a histogram (shown below) and you will find yourself in a ROOT session.

Reminder

You can save this plot from the menu on top of the histogram and then quit ROOT using the .q command.

We fit the distribution with a Gaussian by default. However, we can fit a Breit-Wigner or Voigtian (convolution of Breit-Wigner and Gaussian) by uncommenting the appropriate lines.

ZmmGaussROOTFit

Question 17.3a

What is the mean for the gaussian fit in RooFit?

Question 17.3b

What is the sigma for the gaussian fit in RooFit?

Fitting a Breit-Wigner

Using a macro in ROOT

To fit the Z mass peak using a Breit-Wigner distribution, we first uncomment the Breit-Wigner part of FitZPeak.C and comment out the Gaussian part as follows (using /* and */):

////////////////
//For Gaussian//
///////////////
/*
TF1 *func = new TF1("mygauss",mygauss,massMIN, massMAX,3); 
func->SetParameter(0,1.0);   func->SetParName(0,"const");  
func->SetParameter(2,5.0);   func->SetParName(2,"sigma");  
func->SetParameter(1,95.0);     func->SetParName(1,"mean");

Z_mass->Fit("mygauss","QR");
TF1 *fit = Z_mass->GetFunction("mygauss");
*/
/////////////////////
// For Breit-Wigner//
////////////////////
TF1 *func = new TF1("mybw",mybw,massMIN, massMAX,3);
func->SetParameter(0,1.0);   func->SetParName(0,"const");
func->SetParameter(2,5.0);     func->SetParName(1,"sigma");
func->SetParameter(1,95.0);    func->SetParName(2,"mean");

Z_mass->Fit("mybw","QR");
TF1 *fit = Z_mass->GetFunction("mybw");

Then execute the following:

root -l FitZPeak.C

This should pop a histogram (shown below) and you will find yourself in ROOT seession.

BWFitZmm

Reminder

You can save this plot from the menu on top of the histogram and then quit ROOT using the .q command.

Question 17.4a

What is the mean for the Breit-Wigner fit using the macro?

Question 17.4b

What is the sigma for Breit-Wigner fit using the macro?

Using a macro in RooFit

Before we proceed we need to uncomment and comment out few lines in RooFitMacro.C to have them look as follows:

//RooGaussian gauss("gauss","gauss",x,mean,sigma);
RooBreitWigner gauss("gauss","gauss",x,mean,sigma);
// RooVoigtian gauss("gauss","gauss",x,mean,width,sigma);

Then execute:

root -l RooFitMacro.C

This should pop a histogram (shown below) as follows and you will find yourself in ROOT session.

myZmmBWROOTFit

Reminder

You can save this plot from the menu on top of the histogram and then quit ROOT using the .q command.

Question 17.5a

What is the mean for the Breit-Wigner fit using RooFit tool?

Question 17.5b

What is the sigma for the Breit-Wigner fit using RooFit tool?

Fitting a Convolution of Gaussian and Breit-Wigner

Using a macro in RooFit

Before we proceed we need to uncomment and comment out few lines in RooFitMacro.C to have them look as follows:

//RooGaussian gauss("gauss","gauss",x,mean,sigma);
// RooBreitWigner gauss("gauss","gauss",x,mean,sigma);
RooVoigtian gauss("gauss","gauss",x,mean,width,sigma);

Then execute:

root -l RooFitMacro.C

This should pop a histogram (shown below) as follows and you will find yourself in ROOT seession.

myZmmVoigtianROOTFit

Reminder

You can save this plot from the menu on top of the histogram and then quit ROOT using the .q command.

Question 17.6a

What is the mean for the convolved fit using RooFit tool?

Question 17.6b

What is the sigma for the convolved fit using RooFit tool?

Key Points

  • You can use both an EDAnalyzer or FWLite to analyze MiniAOD files

  • Various methods exist for performing fits. You can use inbuilt functions or user defined functions. You can use plain ROOT or the RooFit package.


CMS Data Analysis School Pre-Exercises - Fifth Set

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • How do I setup git on my computer/cluster?

  • How do I collaborate using GitHub?

Objectives
  • Setup your git configuration for a given computer.

  • Learn how to make and commit changes to a git repository.

  • Learn how to create a pull request on GitHub.

Introduction

This exercise is intended to provide you with basic familiarity with Git and GitHub for personal and collaborative use, including terminology, commands, and user interfaces. The exercise proceeds step-by-step through a standard collaboration “Fork and Pull” workflow. This is a highly condensed version of the tutorial exercises at CMSGitTutorial. Students are encouraged to explore those more in-depth exercises if they want to learn more about using Git. There are also accompanying slides on that twiki page. Students with no experience using Git or other version control software are recommended to read at least the first set of slides.

Warning

As a prerequisite for this exercise, please make sure that you have correctly followed the instructions for obtaining a GitHub account in the setup instructions.

Google Form

Please post your answers to the questions in the Google form fifth set.

Exercise 18 - Learning Git and GitHub

Git Configuration

Begin by setting up your .gitconfig on your local machine or lxplus:

git config --global user.name "[Name]"
git config --global user.email [Email]
git config --global user.github [Account]

Make sure you replace [Name], [Email], and [Account] with the values corresponding to your GitHub account. After this, you can check the contents of .gitconfig by doing:

cat ~/.gitconfig

Output

[user]
    name = [Name]
    email = [Email]
    github = [Account]

Optional settings:

        git config --global core.editor [your preferred text editor]
         git config --global push.default current
        git config --global alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'
         git config --global url."git@github.com:".insteadOf github:
         git config --global url."ssh://git@gitlab.cern.ch:7999/".insteadOf gitlab:

GitHub User Interface

Look carefully at the GitHub user interface on the main page for the GitHATSLPC/GitHATS repository. Click on various tabs.

Top left row: Code, Issues, Pull Requests, Actions, Projects, Wiki, Security, Insights, Settings

Collaboration on GitHub

Fork the repository GitHATSLPC/GitHATS repository by clicking “Fork” at the top right corner of the page. This makes a copy of the repository under your GitHub account.

Clone your fork of the repository to a scratch directory on your local machine or lxplus:

mkdir scratch
git clone git@github.com:[user]/GitHATS.git

Output

Cloning into 'GitHATS'...
Enter passphrase for key '/home/------/.ssh/id_rsa': 
remote: Counting objects: 21, done.
remote: Total 21 (delta 0), reused 0 (delta 0), pack-reused 21
Receiving objects: 100% (21/21), done.
Resolving deltas: 100% (5/5), done.
Checking connectivity... done.

What does the ls command show?

cd GitHATS
ls -a

Output

.  ..  .git  README.md  standard_model.md

The .git folder contains a full local copy of the repository.

Inspect the .git directory:

ls .git

Output

config  description  HEAD  hooks  index  info  logs  objects  packed-refs  refs

When you use git clone as we did above, it starts your working area on the default branch for the repository. In this case, that branch is master. (The default branch for a repo can be changed in the “Branches” section of the GitHub settings page, which you explored in the previous step.)

Inspect the branches of the repository.

git branch -a

Output

* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/atlas_discovery
  remotes/origin/cms_discovery
  remotes/origin/dune_discovery
  remotes/origin/master

Adding remotes and synchronizing

Look at your remote(s):

git remote

Output

origin

Hint

For additional information you can add the -v option to the command

git remote -v

Output

origin  git@github.com:[user]/GitHATS.git (fetch)
origin  git@github.com:[user]/GitHATS.git (push)

The “origin” remote is set by default when you use git clone. Because your repository is a fork, you also want to have a remote that points to the original repo, traditionally called “upstream”.

Add the upstream remote and inspect the result:

git remote add upstream git@github.com:GitHATSLPC/GitHATS.git
git remote -v

Output

origin  git@github.com:[user]/GitHATS.git (fetch)
origin  git@github.com:[user]/GitHATS.git (push)
upstream        git@github.com:GitHATSLPC/GitHATS.git (fetch)
upstream        git@github.com:GitHATSLPC/GitHATS.git (push)

Before you make edits to your local repo, you should make sure that your fork is up to date with the main repo. (Someone else might have made some updates in the meantime.)

Check for changes in upstream:

git pull upstream master

Output

From github.com:GitHATSLPC/GitHATS
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> upstream/master
Already up-to-date.

Note

git pull upstream master is equivalent to the following two commands:

git fetch upstream master
git merge upstream/master

If you pulled any changes from the upstream repository, you should push them back to origin. (Even if you didn’t, you can still practice pushing; nothing will happen.)

Push your local master branch back to your remote fork:

git push origin master

Output

Everything up-to-date

Making edits and committing

When collaborating with other developers on GitHub, it is best to make a separate topic branch to store any changes you want to submit to the main repo. This way, you can keep the default branch in your fork synchronized with upstream, and then make another topic branch when you want to make more changes.

Make a topic branch:

git checkout -b MyBranch

Edit the table standard_model.md to add a new particle. The new particle is called a Giton, with symbol G, spin 2, charge 0, and mass 750 GeV.

Note

Any resemblance to any other real or imaginary particles is entirely coincidental.

Once you have made changes in your working area, you have to stage the changes and then commit them. First, you can inspect the status of your working area.

Try the following commands to show the status:

git status

Output

On branch MyBranch
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

        modified:   standard_model.md

no changes added to commit (use "git add" and/or "git commit -a")
git status -s

Output

M standard_model.md
git diff

Output

diff --git a/standard_model.md b/standard_model.md
index 607b7b6..68f37ad 100644
--- a/standard_model.md
+++ b/standard_model.md
@@ -18,4 +18,5 @@ The Standard Model of Particle Physics
 | Z boson       | Z      | 1    | 0       | 91.2                    |
 | W boson       | W      | 1    | ±1      | 80.4                    |
 | gluon         | g      | 1    | 0       | 0                       |
-| Higgs boson   | H      | 0    | 0       | 125                     |
\ No newline at end of file
+| Higgs boson   | H      | 0    | 0       | 125                     |
+| Giton         | G      | 2    | 0       | 750                     |

Now stage your change, and check the status:

git add standard_model.md
git status -s

Output

M  standard_model.md

Commit your change:

git commit -m "add Giton to standard model"

Output

[MyBranch b9bc2ce] add Giton to standard model
 1 file changed, 2 insertions(+), 1 deletion(-)

Push your topic branch, which now includes the new commit you just made, to origin:

git push origin MyBranch

Output

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 356 bytes | 356.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
remote: 
remote: Create a pull request for 'MyBranch' on GitHub by visiting:
remote:      https://github.com/mtonjes/GitHATS/pull/new/MyBranch
remote: 
To github.com:mtonjes/GitHATS.git
 * [new branch]      MyBranch -> MyBranch

Making a pull request

Now that you have made your change, you can submit it for inclusion in the central repository.

When you open the page to send a pull request on GitHub, you will notice that you can send a pull request to any fork of the repo (and any branch). Make pull request

Send a pull request to the master branch of the upstream repo (GitHATSLPC). View pull request

Question 18.1

Post the link to your pull request.

For CMSDAS@CERN 2023 please submit your answer at the Google Form fifth set.

Optional

If you want to practice merging a pull request, you can send a pull request from your branch MyBranch to your own master branch.

Advanced topics

Advanced topics not explored in this exercise include: merging, rebasing, cherry-picking, undoing, removing binary files, and CMSSW-specific commands and usage.

Students are encouraged to explore these topics on their own at CMSGitTutorial.

Key Points

  • Interact with your git configuration using git config --global.

  • Use the git clone command to obtain a local copy of a git repository.

  • Add and interact with new remotes using the git remote command.

  • Use the add and commit commands to add changes to the local repository.

  • The pull and push commands will transfer changes between the remote and local copies of the repository.


CMS Data Analysis School Pre-Exercises - Sixth Set

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • What is Jupyter?

  • What is pyROOT?

Objectives
  • Learn how to use Jupyter and the Jupyter service (SWAN) at CERN.

  • Learn how to interact with the ROOT libraries using pyROOT.

Introduction

This exercise is intended to provide you with basic familiarity with pyROOT provides bindings for all classes within the ROOT libraries and allows for replacing the usual C++ with the often less cumbersome python. The goal is to obtain a general understanding of the syntax required to import and make use of the ROOT libraries within a basic python script. Various examples are provided in order to demonstrate TH1 histogram manipulation including; reading from a .root file, creating, binning, re-binning, scaling, plotting and fitting to a Gaussian.

Many courses have begun to use Jupyter notebooks as a teaching tool, this exercise has been formatted as a notebook to give a preliminary introduction to how they work. This knowledge will be used later in various DAS exercises.

Whether you use python or C++ to complete your analysis is a personal preference. However, with the current lack of documentation on pyROOT, many students stick with C++ in order to ensure their access to coding examples and experts. It is our hope that through providing you with this basic introduction and Github repository of example scripts, which you are encouraged to add to, that we can bring together the existing pyROOT community within CMS and foster its growth.

Warning

As a prerequisite for this exercise, please make sure that you have correctly followed the instructions for obtaining a GitHub account in the setup instructions.

It is also helpful to have already completed the “Collaboration on GitHub” section of the fifth set of exercises.

Objective

Please post your answers to the questions in the Google form sixth set.

Exercise 19 - Introduction to pyROOT and Jupyter

Load and execute the exercise on JupyterHub

This exercise is stored completely within Jupyter notebooks. This exercise will use a premade Jupyter service hosted at CERN, SWAN. To begin, visit pyROOTforCMSDAS and follow the directions on the first page.

Question 19.1

What is the mean value of the Gaussian fit of the jet mass spectrum for jets of pt 300-400 GeV?

Hopefully this extremely brief introduction has piqued your interest in pyROOT and encouraged you to learn more about this versatile tool.

Advanced topics

Advanced topics not explored in this exercise, but to be included on the pyROOT for CMSDAS GitHub page in the near future are:

Students are encouraged to explore these and other topics on their own and to assist with the CMS effort to document pyROOT by creating your own fork of pyROOTforCMSDAS and adding to the example scripts available there.

Key Points

  • pyROOT is an easy to use alternative to using the ROOT libraries in a C++ program.

  • Jupyter notebooks are a great way to perform real-time analysis tasks.


CMS Data Analysis School Pre-Exercises - Seventh Set

Overview

Teaching: 0 min
Exercises: 60 min
Questions
  • What is an image? How about a container?

  • What is Docker/Singularity?

  • Why is containerization useful?

  • Ummmm…how is this different from a virtual machine?

Objectives
  • Gain a basic understanding of how to run and manage a container.

  • Understand the absolute basic commands for Docker.

  • Know how to start a Singularity container.

Introduction

Warning

As a prerequisite for this exercise, please make sure that you have correctly followed the setup instructions for installing Docker and obtaining a DockerHub account.

Objective

Please post your answers to the questions in the Google form seventh set.

Limitation

This exercise seeks to introduce the student to the benefits of containerization and a handful of container services. We cannot cover all topics related to containerization in this short exercise. In particular, we do not seek to explain what is happening under the hood or how to develop your own images. There are other great tutorials covering a variety of containerization topics as they relate to LHC experiments:

There are undoubtedly also other, non-LHC oriented tutorials online.

Containers and Images

Containers are like lightweight virtual machines. They behave as if they were their own complete OS, but actually only contain the components necessary to operate. Instead, containers share the host machine’s system kernel, significantly reducing their size. In essence, they run a second OS natively on the host machine with just a thin additional layer, which means they can be faster than traditional virtual machines. These container only take up as much memory as necessary, which allows many of them to be run simultaneously and they can be spun up quite rapidly.

DockerVM

Images are read-only templates that contain a set of instructions for creating a container. Different container orchestration programs have different formats for these images. Often a single image is made of several files (layers) which contain all of the dependencies and application code necessary to create and configure the container environment. In other words, Docker containers are the runtime instances of images — they are images with a state.

DockerImage

This allows us to package up an application with just the dependencies we need (OS and libraries) and then deploy that image as a single package. This allows us to:

  1. replicate our environment/workflow on other host machines
  2. run a program on a host OS other than the one for which is was designed (not 100% foolproof)
  3. sandbox our applications in a secure environment (still important to take proper safety measures)

Container Runtimes

For the purposes of this tutorial we will only be considering Docker and Singularity for container runtimes. That said, these are really powerful tools which are so much more than just container runtimes. We encourage you to take the time to explore the Docker and Singularity documentation.

Docker logo Singularity logo

















Side Note

As a side note, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).

Exercise 20 - Pulling Docker Images

Much like GitHub allows for web hosting and searching for code, the image registries allow the same for Docker/Singularity images. Without going into too much detail, there are several public and private registries available. For Docker, however, the defacto default registry is Docker Hub. Singularity, on the other hand, does not have a defacto default registry.

To begin with we’re going to pull down the Docker image we’re going to be working in for this part of the tutorial (Note: If you already did the docker pull, this image will already be on your machine. In this case, Docker should notice it’s there and not attempt to re-pull it, unless the image has changed in the meantime.):

docker pull sl

#if you run into a premission error, use "sudo docker run ..." as a quick fix
# to fix this for the future, see https://docs.docker.com/install/linux/linux-postinstall/
# if you have a M1 chip Mac, you may want to do "docker pull sl --platform amd64"
Using default tag: latest
latest: Pulling from library/sl
175b929ba158: Pull complete 
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:latest
docker.io/library/sl:latest

The image names are composed of NAME[:TAG|@DIGEST], where the NAME is composed of REGISTRY-URL/NAMESPACE/IMAGE and is often referred to as a repository. Here are some things to know about specifying the image:

Now, let’s list the images that we have available to us locally

docker images

If you have many images and want to get information on a particular one you can apply a filter, such as the repository name

docker images sl
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  latest              5237b847a4d0        2 weeks ago         186MB

or more explicitly

docker images --filter=reference="sl"
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  latest              5237b847a4d0        2 weeks ago         186MB

You can see here that there is the TAG field associated with the sl image. Tags are way of further specifying different versions of the same image. As an example, let’s pull the 7 release tag of the sl image (again, if it was already pulled during setup, docker won’t attempt to re-pull it unless it’s changed since last pulled).

# if you have a M1 chip Mac, this may not work. In that case continue the following examples using sl instead of sl:7
docker pull sl:7 
docker images sl
7: Pulling from library/sl
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:7
docker.io/library/sl:7

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  7                   5237b847a4d0        2 weeks ago         186MB
sl                  latest              5237b847a4d0        2 weeks ago         186MB

Question 20.1

Pull down the python:3.7-slim image and then list all of the python images along with the sl:7 image. What is the ‘Image ID’ of the python:3.7-slim image? Try to do this without looking at the solution.

Solution

docker pull python:3.7-slim
docker images --filter=reference="sl" --filter=reference="python"
3.7-slim: Pulling from library/python
7d63c13d9b9b: Pull complete 
7c9d54bd144b: Pull complete 
a7f085de2052: Pull complete 
9027970cef28: Pull complete 
97a32a5a9483: Pull complete 
Digest: sha256:1189006488425ef977c9257935a38766ac6090159aa55b08b62287c44f848330
Status: Downloaded newer image for python:3.7-slim
docker.io/library/python:3.7-slim

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              3.7-slim            375e181c2688        13 days ago         120MB
sl                  7                   5237b847a4d0        2 weeks ago         186MB
sl                  latest              5237b847a4d0        2 weeks ago         186MB

Exercise 21 - Running Docker Images

To use a Docker image as a particular instance on a host machine you run it as a container. You can run in either a detached or foreground (interactive) mode.

Run the image we pulled as a container with an interactive bash terminal:

docker run -it sl:7 /bin/bash

The -i option here enables the interactive session, the -t option gives access to a terminal and the /bin/bash command makes the container start up in a bash session.

You are now inside the container in an interactive bash session. Check the file directory

pwd
ls -alh

Output

/
total 56K
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 .
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 ..
-rwxr-xr-x   1 root root    0 Oct 25 04:43 .dockerenv
lrwxrwxrwx   1 root root    7 Oct  4 13:19 bin -> usr/bin
dr-xr-xr-x   2 root root 4.0K Apr 12  2018 boot
drwxr-xr-x   5 root root  360 Oct 25 04:43 dev
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 etc
drwxr-xr-x   2 root root 4.0K Oct  4 13:19 home
lrwxrwxrwx   1 root root    7 Oct  4 13:19 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Oct  4 13:19 lib64 -> usr/lib64
drwxr-xr-x   2 root root 4.0K Apr 12  2018 media
drwxr-xr-x   2 root root 4.0K Apr 12  2018 mnt
drwxr-xr-x   2 root root 4.0K Apr 12  2018 opt
dr-xr-xr-x 170 root root    0 Oct 25 04:43 proc
dr-xr-x---   2 root root 4.0K Oct  4 13:19 root
drwxr-xr-x  11 root root 4.0K Oct  4 13:19 run
lrwxrwxrwx   1 root root    8 Oct  4 13:19 sbin -> usr/sbin
drwxr-xr-x   2 root root 4.0K Apr 12  2018 srv
dr-xr-xr-x  13 root root    0 Oct 25 04:43 sys
drwxrwxrwt   2 root root 4.0K Oct  4 13:19 tmp
drwxr-xr-x  13 root root 4.0K Oct  4 13:19 usr
drwxr-xr-x  18 root root 4.0K Oct  4 13:19 var

and check the host to see that you are not in your local host system

hostname
<generated hostname>

Question 21.1

Check the /etc/os-release file to see that you are actually inside a release of Scientific Linux. What is the Version ID of this SL image? Try to do this without looking at the solution.

Solution

cat /etc/os-release
NAME="Scientific Linux"
VERSION="7.9 (Nitrogen)"
ID="scientific"
ID_LIKE="rhel centos fedora"
VERSION_ID="7.9"
PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA"
HOME_URL="http://www.scientificlinux.org//"
BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov"

REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Scientific Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"

Exercise 22 - Monitoring, Exiting, Restarting, and Stopping Containers

Monitoring Your Containers

Open up a new terminal tab on the host machine and list the containers that are currently running

docker ps
CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            <generated name>

Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container

docker rename <CONTAINER ID> my-example

and then verify it has been renamed

docker ps
CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            my-example

Specifying a name

You can also startup a container with a specific name

docker run -it --name my-example sl:7 /bin/bash

Exiting a Container

As a test, go back into the terminal used for your container, and create a file in the container

touch test.txt

In the container exit at the command line

exit

You are returned to your shell. If you list the containers you will notice that none are running

docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

but you can see all containers that have been run and not removed with

docker ps -a
CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago      Exited (0) t seconds ago                       my-example

Restating a Container

To restart your exited Docker container start it again and then attach it interactively to your shell

docker start <CONTAINER ID>
docker attach <CONTAINER ID>

exec command

The attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case /bin/bash) that it was originally run with.

In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (-i) session, etc.

For example, the exec equivalent to attaching in our case would look like:

docker start <CONTAINER ID>
docker exec -it <CONTAINER ID> /bin/bash

You can start multiple shells inside the same container using exec.

Notice that your entry point is still / and then check that your test.txt still exists

ls -alh test.txt
-rw-r--r--   1 root root    0 Oct 25 04:46 test.txt

Clean up a container

If you want a container to be cleaned up — that is deleted — after you exit it then run with the --rm option flag

docker run --rm -it <IMAGE> /bin/bash

Stopping a Container

Sometimes you will exited a container and it won’t stop. Other times your container may crash or enter a bad state, but still be running. In order to stop a container you will exit it (exit) and then enter:

docker stop <CONTAINER ID> # or <NAME>

Exercise 23 - Removing Containers and Images

You can cleanup/remove a container docker rm

docker rm <CONTAINER NAME>

Note: A container must be stopped in order for it to be removed.

Start an instance of the sl:latest container, exit it, and then remove it:

docker run sl:latest
docker ps -a
docker rm <CONTAINER NAME>
docker ps -a

Output

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n seconds ago      Exited (0) t seconds ago                       <name>

<generated id>

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES

You can remove an image from your computer entirely with docker rmi

docker rmi <IMAGE ID>

Question 23.1

Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it. What was the image ID for the python:2.7-slim images? Try not to look at the solution.

Solution

docker pull python:2.7-slim
docker images python
docker rmi <IMAGE ID>
docker images python
2.7: Pulling from library/python
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
Digest: sha256:<the relevant SHA hash>
Status: Downloaded newer image for python:2.7-slim
docker.io/library/python:2.7-slim

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              2.7-slim            eeb27ee6b893        14 hours ago        148MB
python              3.7-slim            375e181c2688        13 days ago         120MB

Untagged: python@sha256:<the relevant SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              3.7-slim            375e181c2688        13 days ago        120MB

Exercise 24 - File I/O with Containers

Copying Files To and From a Container

Copying files between the local host and Docker containers is possible. On your local host find a file that you want to transfer to the container and then

touch io_example.txt
# If on Mac need to do: chmod a+w io_example.txt
echo "This was written on local host" > io_example.txt
docker cp io_example.txt <NAME>:<remote path>

Note: Remember to do docker ps if you don’t know the name of your container.

From the container check and modify the file in some way

pwd
ls
cat io_example.txt
echo "This was written inside Docker" >> io_example.txt
<remote path>
io_example.txt
This was written on local host

and then on the local host copy the file out of the container

docker cp <NAME>:<remote path>/io_example.txt .

and verify if you want that the file has been modified as you wanted

cat io_example.txt
This was written on local host
This was written inside Docker

Volume Mounting

What is more common and arguably more useful is to mount volumes to containers with the -v flag. This allows for direct access to the host file system inside of the container and for container processes to write directly to the host file system.

docker run -v <path on host>:<path in container> <image>

For example, to mount your current working directory on your local machine to the data directory in the example container

docker run --rm -it -v $PWD:/home/`whoami`/data sl:7

From inside the container you can ls to see the contents of your directory on your local machine

ls

and yet you are still inside the container

pwd
/home/<username>/data

You can also see that any files created in this path in the container persist upon exit

touch created_inside.txt
exit
ls *.txt
created_inside.txt

This I/O allows for Docker images to be used for specific tasks that may be difficult to do with the tools or software installed on the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).

Mounts in Cygwin

Special care needs to be taken when using Cygwin and trying to mount directories. Assuming you have Cygwin installed at C:\cygwin and you want to mount your current working directory:

echo $PWD
/home/<username>/<path_to_cwd>

You will then need to mount that folder using -v /c/cygwin/home/<username>/<path_to_cwd>:/home/docker/data

Exercise 24 - Using Singularity on lxplus

So far we’ve only discussed using Docker images and using the Docker runtime. For a variety of reasons Docker is not ideal for use on machines like lxplus, but luckily Singularity is. Therefore, this next section will cover how to run Docker and Singularity images in a Singularity runtime environment.

Before we go into any detail, you should be aware of the central CMS documentation.

Running custom images with Singularity

As an example, we are going to run a container using the ubuntu:latest image. Begin by loggin into lxplus:

ssh -Y <username>@lxplus.cern.ch

Before running Singularity, you should set the cache directory (i.e. the directory to which the images are being pulled) to a place outside your $HOME/AFS space (here we use the /tmp/user directory):

export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity shell -B $HOME -B /tmp/$(whoami)/ -B /cvmfs docker://ubuntu:latest
# try accessing cvmfs inside of the container
source /cvmfs/cms.cern.ch/cmsset_default.sh
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 2ab09b027e7f done  
Copying config 08d22c0ceb done  
Writing manifest to image destination
Storing signatures
2023/04/22 14:05:16  info unpack layer: sha256:2ab09b027e7f3a0c2e8bb1944ac46de38cebab7145f0bd6effebfe5492c818b6
INFO:    Creating SIF file...
INFO:    underlay of /etc/localtime required more than 50 (69) bind mounts

If you are asked for a docker username and password, just hit enter twice.

One particular difference from Docker is that the image name needs to be prepended by docker:// to tell Singularity that this is a Docker image. Singularity has its own registry system, which doesn’t have a de facto default registry like Docker Hub.

As you can see from the output, Singularity first downloads the layers from the registry, and is then unpacking the layers into a format that can be read by Singularity, the Singularity Image Format (SIF). This is a somewhat technical detail, but is different from Docker. It then unpacks the SIF file into what it calls a sandbox, the uncompressed image files needed to make the container.

-B (bind strings)

The -B option allows the user to specify paths to bind to the Singularity container. This option is similar to ‘-v’ in docker. By default paths are mounted as rw (read/write), but can also be specified as ro (read-only).

You must bind any mounted file systems to which you would like access (i.e. nobackup).

If you would like Singularity to run your .bashrc file on startup, you must bind mount your home directory.

In the next example, we are executing a script with singularity using the same image.

export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
echo -e '#!/bin/bash\n\necho "Hello World!"\n' > hello_world.sh
singularity exec -B $HOME -B /tmp/$(whoami)/ docker://ubuntu:latest bash hello_world.sh

exec vs. shell

Singularity differentiates between providing you with an interactive shell (singularity shell) and executing scripts non-interactively (singularity exec).

Saving the Singularity Sandbox

You may have noticed that singularity caches both the Docker and SIF images so that they don’t need to be pulled/created on subsequent Singularity calls. That said, the sandbox needed to be created each time we started a container. If you will be using the same container multiple times, it may be useful to store the sandbox and use that to start the container.

Begin by building and storing the sandbox:

export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity build --sandbox ubuntu/ docker://ubuntu:latest
INFO:    Starting build...
Getting image source signatures
Copying blob d72e567cc804 skipped: already exists
Copying blob 0f3630e5ff08 skipped: already exists
Copying blob b6a83d81d1f4 [--------------------------------------] 0.0b / 0.0b
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/28 00:14:16  info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/28 00:14:17  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/28 00:14:17  warn xattr{/uscms_data/d2/aperloff/rootfs-7379bde5-0149-11eb-9685-001a4af11eb0/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/28 00:14:38  info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/28 00:14:38  info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO:    Creating sandbox directory...
INFO:    Build complete: ubuntu/

Once we have the sandbox we can use that when starting the container. Run the same command as before, but use the sandbox rather than the Docker image:

export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity exec -B $HOME -B /tmp/$(whoami)/ ubuntu/ bash hello_world.sh
WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts
Hello World!

You will notice that the startup time for the container is significantly reduced.

Question 24.1

What is the size of the singularity sandbox? Hint: Use the command du -hs <sandbox>.

Key Points

  • Docker images are super useful for encapsulating a desired environment.

  • Docker images can be run using the Docker or Singularity runtimes.