PyUnfold API¶
Iterative unfolding¶
-
pyunfold.
iterative_unfold
(data=None, data_err=None, response=None, response_err=None, efficiencies=None, efficiencies_err=None, prior=None, ts='ks', ts_stopping=0.01, max_iter=100, cov_type='multinomial', return_iterations=False, callbacks=None)[source]¶ Performs iterative unfolding
Parameters: - data : array_like
Input observed data distribution.
- data_err : array_like
Uncertainties of the input observed data distribution. Must be the same shape as
data
.- response : array_like
Response matrix.
- response_err : array_like
Uncertainties of response matrix. Must be the same shape as
response
.- efficiencies : array_like
Detection efficiencies for the cause distribution.
- efficiencies_err : array_like
Uncertainties of detection efficiencies. Must be the same shape as
efficiencies
.- prior : array_like, optional
Prior distribution to use in unfolding. If None, then a uniform (or flat) prior will be used. If array_like, then must have the same shape as
efficiencies
(default is None).- ts : {‘ks’, ‘chi2’, ‘bf’, ‘rmd’}
Test statistic to use for stopping condition (default is ‘ks’). For more information about the available test statistics, see the Test Statistics API documentation.
- ts_stopping : float, optional
Test statistic stopping condition. At each unfolding iteration, the test statistic is computed between the current and previous iteration. Once the test statistic drops below ts_stopping, the unfolding procedure is stopped (default is 0.01).
- max_iter : int, optional
Maximum number of iterations to allow (default is 100).
- cov_type : {‘multinomial’, ‘poisson’}
Whether to use the Multinomial or Poisson form for the covariance matrix (default is ‘multinomial’).
- return_iterations : bool, optional
Whether to return unfolded distributions for each iteration (default is False).
- callbacks : list, optional
List of
pyunfold.callbacks.Callback
instances to be applied during unfolding (default is None, which means no Callbacks are applied).
Returns: - unfolded_result : dict
Returned if
return_iterations
is False (default). Dictionary containing the final unfolded distribution, associated uncertainties, and test statistic information.The returned
dict
has the following keys:- unfolded
Final unfolded cause distribution
- stat_err
Statistical uncertainties on the unfolded cause distribution
- sys_err
Systematic uncertainties on the unfolded cause distribution associated with limited statistics in the response matrix
- ts_iter
Final test statistic value
- ts_stopping
Test statistic stopping criterion
- num_iterations
Number of unfolding iterations
- unfolding_matrix
Unfolding matrix
- unfolding_iters : pandas.DataFrame
Returned if
return_iterations
is True. DataFrame containing the unfolded distribution, associated uncertainties, test statistic information, etc. at each iteration.
Examples
>>> from pyunfold import iterative_unfold >>> data = [100, 150] >>> data_err = [10, 12.2] >>> response = [[0.9, 0.1], ... [0.1, 0.9]] >>> response_err = [[0.01, 0.01], ... [0.01, 0.01]] >>> efficiencies = [1, 1] >>> efficiencies_err = [0.01, 0.01] >>> unfolded = iterative_unfold(data=data, ... data_err=data_err, ... response=response, ... response_err=response_err, ... efficiencies=efficiencies, ... efficiencies_err=efficiencies_err) >>> unfolded {'num_iterations': 4, 'stat_err': array([11.16853268, 13.65488168]), 'sys_err': array([0.65570621, 0.65570621]), 'ts_iter': 0.0038300087456445975, 'ts_stopping': 0.01, 'unfolded': array([ 94.32086967, 155.67913033]), 'unfolding_matrix': array([[0.8471473 , 0.1528527 ], [0.06404093, 0.93595907]])}
Callbacks¶
-
class
pyunfold.callbacks.
Logger
[source]¶ Logger callback
Writes test statistic information for each iteration to sys.stdout.
Methods
on_iteration_end
(iteration, status)Writes to sys.stdout
-
class
pyunfold.callbacks.
SplineRegularizer
(degree=3, smooth=None, groups=None)[source]¶ Spline regularization callback
Smooths the unfolded distribution at each iteration using
UnivariateSpline
fromscipy.interpolate
. For more information aboutUnivariateSpline
, see the UnivariateSpline API documentation.Parameters: - degree : int, optional
Degree of the smoothing spline. Must be <= 5 (default is 3, a cubic spline).
- smooth : float or None, optional
Positive smoothing factor used to choose the number of knots. If 0, spline will interpolate through all data points (default is None).
- groups : array_like, optional
Group labels for each cause bin. If groups are specified, then each cause group will be regularized independently (default is None).
Notes
The number of causes must be larger than the spline
degree
.Examples
Specify the spline degree and smoothing factor:
>>> from pyunfold.callbacks import SplineRegularizer >>> reg = SplineRegularizer(degree=3, smooth=1.25)
Different cause groups are also supported. For instance, in a problem with seven cause bins, if the first three cause bins belong to their own group, the next two cause bins belong to another group, and the last two cause bins belong to yet another group, an array can be constructed that identifies the group each cause bin belongs to. E.g.
>>> groups = [0, 0, 0, 1, 1, 2, 2] >>> reg = SplineRegularizer(degree=3, smooth=1.25, groups=groups)
If provided with a
groups
parameter,SplineRegularizer
will regularize the unfolded distribution for each group independently.Methods
on_iteration_end
Priors¶
-
pyunfold.priors.
uniform_prior
(num_causes)[source]¶ Convenience function to calculate uniform prior distribution
Parameters: - num_causes : int
Number of cause bins.
Returns: - prior : numpy.ndarray
Normalized uniform prior distribution.
Examples
>>> from pyunfold.priors import uniform_prior >>> uniform_prior(num_causes=4) array([0.25, 0.25, 0.25, 0.25])
-
pyunfold.priors.
jeffreys_prior
(causes)[source]¶ Convenience function to calculate Jeffreys prior distribution
Parameters: - causes : array_like
Midpoint value of cause bins. For instance if cause bin edges are given by [0, 2, 4], then
causes
is [1, 3].
Returns: - prior : numpy.ndarray
Normalized Jeffreys prior distribution.
Notes
The Jeffreys prior is defined as
\[P(C_{\mu})^{\text{Jeffreys}} = \frac{1}{\log(C_{\text{max}}/C_\text{min})C_{\mu}}\]for cause bin values \(C_{\mu}\) and maximum/minimum cause values \(C_{\text{max}}\)/\(C_{\text{min}}\). For more details regarding Jeffreys prior see [1].
References
[1] (1, 2) Jeffreys, H. “An Invariant Form for the Prior Probability in Estimation Problems”. Proc. of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 186 (1007). London, England:453-61. https://doi.org/10.1098/rspa.1946.0056. Examples
>>> from pyunfold.priors import jeffreys_prior >>> causes = [1, 2, 3, 4] >>> jeffreys_prior(causes=causes) array([0.48, 0.24, 0.16, 0.12])
Test Statistics¶
-
pyunfold.teststat.
get_ts
(name='ks')[source]¶ Convenience function for retrieving test statisitc calculators
Parameters: - name : {‘ks’, ‘chi2’, ‘bf’, ‘rmd’}
Name of test statistic.
Returns: - ts : TestStat
Test statistics calculator
-
class
pyunfold.teststat.
KS
(tol=None, num_causes=None, test_range=None, **kwargs)[source]¶ Kolmogorov-Smirnov (KS) two-sided test statistic
Methods
calc
(dist1, dist2)Calculate the test statistic between two input distributions
-
class
pyunfold.teststat.
Chi2
(tol=None, num_causes=None, test_range=None, **kwargs)[source]¶ Reduced chi-squared test statistic
Methods
calc
(dist1, dist2)Calculate the test statistic between two input distributions
-
class
pyunfold.teststat.
RMD
(tol=None, num_causes=None, test_range=None, **kwargs)[source]¶ Maximum relative difference test statistic
Methods
calc
(dist1, dist2)Calculate the test statistic between two input distributions
-
class
pyunfold.teststat.
BF
(tol=None, num_causes=None, test_range=None, **kwargs)[source]¶ Bayes factor test statistic
Notes
For details related to the Bayes fator see [1].
References
[1] (1, 2) S. Y. BenZvi and B. M. Connolly and C. G. Pfendner and S. Westerhoff. “A Bayesian Approach to Comparing Cosmic Ray Energy Spectra”. The Astrophysical Journal 738 (1):82. https://doi.org/10.1088/0004-637X/738/1/82. Methods
calc
(dist1, dist2)Calculate the test statistic between two input distributions