Mathematical Derivations

As introduced in the Overview, the conceptually simplest way to connect true causes \(C_{\mu}\) and observable effects \(E_{j}\) is via a matrix \(R\) and conversely by its estimated inverse \(M \approx R^{-1}\):

\[\begin{split}n(E) = R \ \phi(C) \\ \phi(C) = M \ n(E)\end{split}\]

Here

  • \(\phi(C)\) is the distribution of possible causes \(C_{\mu}, \, \mu \in [1, n_C]\)
  • \(n(E)\) is the distribution of possible effects \(E_{j}, \, j \in [1, n_E]\)

Pseudo-Inverse of R

To estimate the inverse matrix \(M\), we will effectively convolve what we know about the cause-effect system through \(R\) with a best estimate of what we think the cause distribution looks like.

We begin by stating Bayes’ theorem for the problem:

\[P(C_{\mu} | E_{j}) = \frac{ P(E_{j} | C_{\mu}) \ P(C_{\mu})}{ \sum_{\nu}^{n_{C}} P(E_{j} | C_{\nu}) \ P(C_{\nu}) }\]

This equation dictates that having observed the effect \(E_{j}\), the probability that it is due to the cause \(C_{\mu}\) is proportional to the product of the prior probability of that cause and the probability of that cause to produce that effect. Hence, the elements \(P(E_{i}|C_{\mu})\) represent the probability that a given \(C_{\mu}\) results in the effect \(E_i\) and thus comprise the known or simulated response matrix \(R\). The prior \(P(C_{\mu})\) represents the current or best knowledge of the true (normalized) cause distribution.

We then can connect the measured observed effects to their causes via

\[\phi(C_{\mu}) = \sum_{i=1}^{n_{E}} P(C_{\mu} | E_{i}) \ n(E_{i})\]

However, a cause may not necessarily result in any effect, i.e. there is a certain efficiency \(\epsilon_{\mu}\) associated with each cause:

\[0 \ \leq \ \epsilon_{\mu} \ = \sum_{j=1}^{n_E} \ P(E_j | C_{\mu}) \leq 1\]

Taking this into account in the estimate of \(\phi(C)\), we have

\[\begin{split}\begin{align} \phi(C_{\mu}) &= \frac{1}{\epsilon_{\mu}} \sum_{i=1}^{n_{E}} P(C_{\mu} | E_{i}) \ n(E_{i}) \\ &= \sum_{i=1}^{n_{E}} M_{\mu i} \ n(E_{i}) \end{align}\end{split}\]

where the explicit form of \(M\) is identified as

\[M_{\mu j} = \frac{ P(E_{j} | C_{\mu}) \ P(C_{\mu})}{ \left[ \sum_{k}^{n_E} P(E_k | C_{\mu}) \right] \, \left[ \sum_{\nu}^{n_{C}} P(E_{j} | C_{\nu}) \ P(C_{\nu}) \right]}\]

The components of \(M_{\mu j}\) are:

  • \(P(E_{j} | C_{\mu})\) the known response matrix
  • \(n(E_j)\) the measured frequency distribution of effects
  • \(P(C_{\mu})\) the prior distribution of the causes

Iterative Unfolding

Having obtained the form of \(M\) and thus an estimate of \(\phi(C)\), we have a new best guess as to the form of the cause distribution. By feeding in \(\phi(C)\) as an updated prior, we can again get an improved estimate of the cause distribution. Shortening the notation by keeping Latin subscripts for effect indices and Greek subscripts for the causes, we can then obtain \(\phi(C)^{t+1}\) at iteration \(t+1\) via

\[\begin{split}\begin{align} \phi_{\mu}^{t+1} &= \sum_j M_{\mu j}^t n_j \\ M_{\mu j}^t &= \frac{P_{\mu j} \, \phi^t_{\mu}}{ \epsilon_{\mu} \sum_{\rho} P_{\rho j} \, \phi^{t}_\rho} \end{align}\end{split}\]

where \(\phi^t\) is the cause distribution at iteration \(t\), and \(P_{\mu j}\) are the elements of the response matrix. Thus we have built an iterative unfolding procedure. Using an appropriate test statistic such as a K-S test to compare \(\phi^t_{\mu}\) and \(\phi^{t+1}_{\mu}\), we can decide when subsequent iterations provide negligible improvements, and after which the unfolding is said to have converged.

The algorithm which PyUnfold employs is illustrated by the following:

_images/unfolding_algorithm.png

An exhaustive description and derivation of the propagation of uncertainties can be found in the LaTeX documentation provided with PyUnfold.

The Initial Prior

Since one apparently has freedom as to the form of \(P(C_{\mu})\), it is typical to avoid introducing bias by setting the initial prior to either the uniform prior, where all cause bins have equal probability, or the non-informative Jeffreys’ Prior [1] which is especially appropriate when the range of causes spans several orders of magnitude. These take the form

\[\begin{split}\begin{align} P(C_{\mu})^{\text{uniform}} &= \frac{1}{n_{C}} \\ P(C_{\mu})^{\text{Jeffreys}} &= \frac{1}{\log \left( C_{\text{max}} / C_{\text{min}}\right) \, C_{\mu}} \end{align}\end{split}\]

While users can provide a custom initial prior, the uniform prior is the default in PyUnfold. Provided the minimum and maximum values of causes, the Jeffreys’ prior is provided as a utility function.

[1]
  1. Jeffreys, “An Invariant Form for the Prior Probability in Estimation Problems”, Proc. of the Royal Society of London A 186 (1946) 453.