Update README and small fixes/updates

Benedikt Volkel · chiarazampolli · commit d493bbad7f51 · 2022-06-30T16:06:04.000+02:00
* make README more readible and add further information

* get a global summary format also when run only on 2 single files

* update inspection interface

* minor bug fixes
diff --git a/RelVal/README.md b/RelVal/README.md
@@ -1,40 +1,89 @@
-This macro ReleaseValidation.C permits to compare the QC.root output from different passes
+# O2DPG ReleaseValidation (RelVal)
 
+## The macro [ReleaseValidation.C](ReleaseValidation.C)
 
-## Usage
-The input variables which we need to give to the macro are:
+This macro `ReleaseValidation.C` allows to compare 2 ROOT files that contain objects of the types
+* ROOT histograms (deriving from `TH1`)
+* ROOT `TProfile`
+* ROOT `TEfficiency`
+* O2 `o2::quality_control::core::MonitorObjectCollection`
+* O2 `o2::quality_control::core::MonitorObject`
 
-- the two QC.root files, with corresponind path
+At the moment, 3 different comparisons are implemented:
+1. relative difference of bin contents,
+1. Chi2 test,
+1. simple comparison of number of entries
 
-- the Monitor object collection we want to focus on:
-QcTaskMIDDigits;
-DigitQcTaskFV0;
-TaskDigits;
-DigitQcTaskFT0;
-QcMFTAsync;
-ITSTrackTask;
-MatchedTracksITSTPC;
-MatchingTOF;
-ITSClusterTask;
-Clusters;
-PID;
-Tracks;
-Vertexing 
+The first 2 tests are considered critical, hence if the threshold is exceeded, the comparison result is named `BAD`.
 
-- which compatibility test we want to perform (bit mask):
-1->Chi-square;
-2--> ContBinDiff;
-3 (combination of 1 and 2)--> Chi-square+MeanDiff;
-4-> N entries;
-5 (combination of 1 and 4) --> Nentries + Chi2;
-6 (combination of 1 and 2)--> N entries + MeanDiff;
-7 (combination of 1, 2 and 3)--> Nentries + Chi2 + MeanDiff
+There are 5 different test severities per test:
+1. `GOOD` if the threshold was not exceeded,
+1. `WARNING`: if a non-critical test exceeds the threshold (in this case only when comparing the number of entries),
+1. `NONCRIT_NC` if the histograms could not be compared e.g. due to different binning or axis ranges **and** if the test is considered as **non-critical**,
+1. `CRIT_NC` if the histograms could not be compared e.g. due to different binning or axis ranges **and** if the test is considered as **critical**,
+1. `BAD` if a critical test exceeds the threshold.
 
-- threshold values for checks on Chi-square and on content of bins
+## Python wrapper and usage
 
-- choose if we want to work on the grid or on local laptop (to be fixed)
+Although the above macro can be used on its own, its application was also wrapped into a [Python script](o2dpg_release_validation.py) for convenience. By doing so, it offers significantly more functionality.
 
-- tell the script it there are "critical "histograms (the list of names of critical plots has to be written in a txt file), which we need to keep separated from the other histograms. The corresponding plots will be saved in a separated pdf file
+The full help message of this script can be seen by typing
+```bash
+python o2dpg_release_validation.py [<sub-command>] --help
+```
+The wrapper includes 3 different sub-commands for now
+1. `rel-val` to steer the RelVal,
+1. `inspect` to print histograms of specified severity (if any),
+1. `influx` to convert the summary into a format that can be understood by and sent to an InfluxDB instance.
 
+### Basic usage
 
-The macro is currently working only on real data (will be fixed soon)
+If you would like to compare 2 files, simply run
+```bash
+python o2dpg_release_validation.py rel-val -i <file1> <file2> [-o <output/dir>]
+```
+This performs all of the above mentioned tests. If only certain tests should be performed, this can be achieved with the flags `--with-<which-test>` where `<which-test>` is one of
+1. `chi2`,
+1. `bincont`,
+1. `numentries`.
+By default, all of them are switched on.
+
+### Apply to entire simulation outcome
+
+In addition to simply comparing 2 ROOT files, the script offers the possibility of comparing 2 corresponding directories that contain simulation artifacts (and potentially QC and analysis results). This then automatically runs the RelVal on
+1. QC output,
+1. analysis results output,
+1. TPC tracks output,
+1. MC kinematics,
+1. MC hits.
+**NOTE** That each single one of the comparison types if only done if mutual files were found in the 2 corresponding directories. As an example, one could do
+```bash
+cd ${DIR1}
+python o2dpg_workflow_runner.py -f <workflow-json1>
+cd ${DIR2}
+# potentially something has changed in the software or the simulation/reconstruction parameters
+python o2dpg_workflow_runner.py -f <workflow-json2>
+python ${O2DPG_ROOT}/ReleaseValidation/o2dpg_release_validation.py rel-val -i ${DIR1} ${DIR2} [-o <output/dir>] [<test-flags>]
+```
+Again, also here it can be specified explicitly on what the tests should be run by specifying one or more `<test-flags` such as
+1. `--with-qc`,
+1. `--with-analysis`,
+1. `--with-tpctracks`,
+1. `--with-kine`,
+1. `--with-hits`.
+
+### Quick inspection
+
+This is done via
+```bash
+python ${O2DPG_ROOT}/ReleaseValidation/o2dpg_release_validation.py inspect <path-to-outputdir-or-file> [--severity <severity>]
+```
+The latter optional argument could be a list of any of the above mentioned severities. If a directory is passed as input, it is expected that there is either a file named `SummaryGlobal.json` or - if that cannot be found - a file named `Summary.json`.
+
+### Make ready for InfluxDB
+
+To convert the final output to something that can be digested by InfluxDB, use
+```bash
+python ${O2DPG_ROOT}/ReleaseValidation/o2dpg_release_validation.py influx --dir <rel-val-out-dir> [--tags k1=v1 k2=v2 ...] [--table-name <chosen-table-name>]
+```
+When the `--tags` argument is specified, these are injected as TAGS for InfluxDB in addition. The table name can also be specified explicitly; if not given, it defaults to `O2DPG_MC_ReleaseValidation`.
diff --git a/RelVal/o2dpg_release_validation.py b/RelVal/o2dpg_release_validation.py
@@ -57,7 +57,7 @@
 import sys
 import argparse
 from os import environ, makedirs
-from os.path import join, abspath, exists, isfile, isdir, dirname
+from os.path import join, abspath, exists, isfile, isdir, dirname, relpath
 from glob import glob
 from subprocess import Popen
 from pathlib import Path
@@ -295,26 +295,33 @@ def has_severity(filename, severity=("BAD", "CRIT_NC")):
     """
     Check if any 2 histograms have a given severity level after RelVal
     """
+    counter = {s: 0 for s in severity + ["ALL"]}
+
     def rel_val_summary(d):
         ret = False
-        for s in severity:
+        for s in REL_VAL_SEVERITY_MAP:
             names = d.get(s)
+            counter["ALL"] += len(names)
             if not names:
                 continue
+            if s not in severity:
+                continue
             print(f"Histograms for severity {s}:")
             for n in names:
                 print(f"    {n}")
+            counter[s] = len(names)
             ret = True
         return ret
 
     def rel_val_summary_global(d):
         ret = False
-        to_print = {k: [] for k in severity}
-        for s in severity:
-            for h in d:
-                if h["test_summary"] in severity:
-                    to_print[s].append(h["name"])
-                    ret = True
+        to_print = {s: [] for s in severity}
+        counter["ALL"] = len(d)
+        for h in d:
+            if h["test_summary"] in severity:
+                to_print[h["test_summary"]].append(h["name"])
+                counter[h["test_summary"]] += 1
+                ret = True
         for s, names in to_print.items():
             if not names:
                 continue
@@ -325,13 +332,16 @@ def rel_val_summary_global(d):
 
     res = None
     with open(filename, "r") as f:
-        # NOTE For now care about the summary. However, we have each test individually, so we could do a more detailed check in the future
         res = json.load(f)
 
     # decide whether that is an overall summary or from 2 files only
-    if "histograms" in res:
-        return rel_val_summary_global(res["histograms"])
-    return rel_val_summary(res["test_summary"])
+    ret = rel_val_summary_global(res["histograms"]) if "histograms" in res else rel_val_summary(res["test_summary"])
+    if ret:
+        print(f"\nNumber of compared histograms: {counter['ALL']} out of which")
+        for s in severity:
+            print(f"    {counter[s]} histograms have severity {s}")
+        print("as printed above.\n")
+    return ret
 
 
 def rel_val_ttree(dir1, dir2, files, output_dir, args, treename="o2sim", *, combine_patterns=None):
@@ -389,8 +399,8 @@ def make_summary(in_dir):
             current_summary = json.load(f)
         # remove the file name, used as the top key for this collection
         rel_val_path = "/".join(path.split("/")[:-1])
-        type_global = path.split("/")[1]
-        type_specific = "/".join(path.split("/")[1:-1])
+        type_specific = relpath(rel_val_path, in_dir)
+        type_global = type_specific.split("/")[0]
         make_summary = {}
         for which_test, flagged_histos in current_summary.items():
             # loop over tests done
@@ -483,9 +493,6 @@ def rel_val_sim_dirs(args):
             makedirs(output_dir_qc)
         rel_val_histograms(dir_qc1, dir_qc2, qc_files, output_dir_qc, args)
 
-    with open(join(output_dir, "SummaryGlobal.json"), "w") as f:
-        json.dump(make_summary(output_dir), f, indent=2)
-
 
 def rel_val(args):
     """
@@ -502,14 +509,33 @@ def rel_val(args):
         return 1
     if not exists(args.output):
         makedirs(args.output)
-    return func(args)
+    func(args)
+    with open(join(args.output, "SummaryGlobal.json"), "w") as f:
+        json.dump(make_summary(args.output), f, indent=2)
 
 
 def inspect(args):
     """
     Inspect a Summary.json in view of RelVal severity
     """
-    return has_severity(args.file, args.severity)
+    path = args.path
+
+    def get_filepath(d):
+        summary_global = join(path, "SummaryGlobal.json")
+        if exists(summary_global):
+            return summary_global
+        summary = join(path, "Summary.json")
+        if exists(summary):
+            return summary
+        print(f"Can neither find {summary_global} nor {summary}. Nothing to work with.")
+        return None
+
+    if isdir(path):
+        path = get_filepath(path)
+        if not path:
+            return 1
+
+    return not has_severity(path, args.severity)
 
 
 def influx(args):
@@ -521,13 +547,14 @@ def influx(args):
     if not exists(json_in):
         print(f"Cannot find expected JSON summary {json_in}.")
         return 1
-
-    table_name = f"{args.table_prefix}_ReleaseValidation"
+    table_name = "O2DPG_MC_ReleaseValidation"
+    if args.table_suffix:
+        table_name = f"{table_name}_{args.table_suffix}"
     tags_out = ""
     if args.tags:
         for t in args.tags:
             t_split = t.split("=")
-            if len(t_split) != 2:
+            if len(t_split) != 2 or not t_split[0] or not t_split[1]:
                 print(f"ERROR: Invalid format of tags {t} for InfluxDB")
                 return 1
             # we take it apart and put it back together again to make sure there are no whitespaces etc
@@ -542,8 +569,8 @@ def influx(args):
     with open(json_in, "r") as f:
         in_list = json.load(f)["histograms"]
     with open(out_file, "w") as f:
-        for h in in_list:
-            s = f"{row_tags},type_global={h['type_global']},type_specific={h['type_specific']} histogram_name={h['name']}"
+        for i, h in enumerate(in_list):
+            s = f"{row_tags},type_global={h['type_global']},type_specific={h['type_specific']},id={i} histogram_name=\"{h['name']}\""
             for k, v in h.items():
                 # add all tests - do it dynamically because more might be added in the future
                 if "test_" not in k:
@@ -578,14 +605,14 @@ def main():
     rel_val_parser.set_defaults(func=rel_val)
 
     inspect_parser = sub_parsers.add_parser("inspect")
-    inspect_parser.add_argument("file", help="pass a JSON produced from ReleaseValidation (rel-val)")
+    inspect_parser.add_argument("path", help="either complete file path to a Summary.json or SummaryGlobal.json or directory where one of the former is expected to be")
     inspect_parser.add_argument("--severity", nargs="*", default=["BAD", "CRIT_NC"], choices=REL_VAL_SEVERITY_MAP.keys(), help="Choose severity levels to search for")
     inspect_parser.set_defaults(func=inspect)
 
     influx_parser = sub_parsers.add_parser("influx")
     influx_parser.add_argument("--dir", help="directory where ReleaseValidation was run", required=True)
     influx_parser.add_argument("--tags", nargs="*", help="tags to be added for influx, list of key=value")
-    influx_parser.add_argument("--table-prefix", dest="table_prefix", help="prefix for table name", default="O2DPG_MC")
+    influx_parser.add_argument("--table-suffix", dest="table_suffix", help="prefix for table name")
     influx_parser.set_defaults(func=influx)
 
     args = parser.parse_args()