Distribution Metrics

These are metrics utilized to compare 2 distributions. The LOC Curve, which utilizes the length of the curve to quantify separation, is the generalized case of the traditional ROC curve, which utilized the area under the curve using numpy.trapz.

Note

Note: Both functions require 1-dimensional arrays as input. This is simply to make the function completely generalizeable. One can make any N-dimensional function “1-dimensional” by calling numpy.ndarray.ravel(). The functions will sort the bins and handle the rest.

MiLoMerge.ROC_curve(sample1: Iterable[float], sample2: Iterable[float])[source]

A function to calculate the classical ROC curve given 2 distributions

Parameters:
  • sample1 (Iterable[float]) – The “signal” sample. Must be a 1-d array.

  • sample2 (Iterable[float]) – The “background” sample. Must be the same size as sample1

Returns:

Returns 2 arrays with the same size as sample1 indicating the True Positive Rate (TPR) and False Positive Rate (FPR) per-bin, as well as the Area Under the Curve (AUC)

Return type:

tuple[Iterable[float], Iterable[float], float]

MiLoMerge.LOC_curve(sample1: Iterable[float], sample2: Iterable[float])[source]

A function to calculate the LOC curve described in (ARXIV LINK) given 2 distributions.

Parameters:
  • sample1 (Iterable[float]) – The “signal” sample. Must be a 1-d array.

  • sample2 (Iterable[float]) – The “background” sample. Must be the same size as sample1

Returns:

Returns 2 arrays with the same size as sample1 indicating the True Positive Rate (TPR) and False Positive Rate (FPR) per-bin, as well as the Length of the Curve (LoC).

Return type:

tuple[Iterable[float], Iterable[float], float]

Raises:

ValueError – If both samples are not wholly positive, raise an error. At least one sample must be completely positive.