.. _tutorial-error:

myXCLASSFit and error estimation
================================

The purpose of this tutorial is to describe how to estimate the errors
of the model parameters, described in molfit (and iso ratio) file using
the myXCLASSFit function, see Sect. ":ref:`myxclassfit-short`".

All files used within this tutorial can be downloaded
`here <https://xclass.astro.uni-koeln.de/sites/xclass/files/software/Tutorial/XCLASS-Tutorial-Error.zip>`_.



.. _tutorial-error-overview:

Error estimation
----------------

.. figure:: ../figures/tutorial-error__corner-over.png
   :align: center
   :width: 60%
   :name: fig-error-corner

   Example of a so-called *corner plot* created with the *corner*
   package. On top of each column the probability distribution for each
   free parameter is shown. The left and right dashed lines indicates the
   lower and upper limit of the corresponding HPD interval, respectively.
   The dashed line in the middle indicates the mode of the distribution.
   The blue lines indicate the parameter values of the best fit,
   calculated in previously applied optimization algorithm(s). The plot
   in the lower left corner describes the projected 2D histograms of two
   parameters. The contours indicate the HPD region.

XCLASS offers the possibility to estimate the errors of the model parameters
by using the emcee [1]_ (MCMC) or the ultranest [2]_ package, or the interval
nested sampling algorithm. In this tutorial, we use the **MCMC** method
to determine the errors of a previous fit.

The emcee (or MCMC) method starts at the estimated maximum :math:`\vec{\theta}_0`
of the likelihood function, i.e. at the parameter values of the best fit,
calculated in previously applied optimization algorithm(s), and draws
:math:`M` samples of model parameters
:math:`\left\{\vec{\theta}_1, \ldots, \vec{\theta}_M \right\}` from the
likelihood function in a small ball around the a priori preferred
position. After finishing the algorithm the probability distribution and
the corresponding highest posterior density (HPD) interval of each free
parameter is calculated. Additionally, the *corner* package [3]_ is used
to plot each one- and two-dimensional projection of the sample to reveal
covariances, see :numref:`fig-error-corner`.

Before we describe the calculation of the HPD intervals in detail, we
briefly summarize some important statistical expressions.

We start with the cumulative distribution function (CDF) of the standard
normal distribution which is given as

.. math::
   :label: ErrorEstim:cdf

   \Phi (x) = \frac{1}{\sqrt{2 \pi }} \int_{-\infty}^{x}e^{-t^{2}/2} \, dt.

In statistics one often uses the related error function, or
:math:`\erf(x)`, defined as the probability of a random variable with
normal distribution of mean 0 and variance 1/2 falling in the range
:math:`[-x, x ]` that is

.. math::
   :label: ErrorEstim:erf

   \erf(x) = \frac{1}{\sqrt{\pi }} \int_{-x}^{x}e^{-t^{2}} \, dt.

The two functions are closely related, namely

.. math::
   :label: ErrorEstim:cdf-erf

   \Phi (x) = \frac{1}{2} \left[1 + \erf(\frac{x}{\sqrt{2}})\right].

For a generic normal distribution :math:`f` with mean :math:`\mu` and
deviation :math:`\sigma`, the cumulative distribution function (CDF) for
standard score :math:`z = \left(\frac{x - \mu}{\sigma}\right)` is

.. math::
   :label: ErrorEstim:cdf2

   F(x) = \Phi \left(\frac{x - \mu}{\sigma}\right) = \frac{1}{2} \left[1 + \erf\left(\frac{x - \mu}{\sigma \sqrt{2}}\right)\right].

Following the so-called 68-95-99.7 (empirical) rule, or 3-sigma rule,
about 68 % of values drawn from a normal distribution are within one
standard deviation :math:`\sigma` away from the mean; about 95.4 % of
the values lie within two standard deviations; and about 99.7 % are
within three standard deviations.

.. figure:: ../figures/tutorial-error__HPD.png
   :align: center
   :width: 100%
   :name: fig-ErrorEstim-CredibleRegion

   Three commonly used ways of plotting a value plus error bar for
   a 95 \% credible region for four different probability
   distributions [6]_ :
   1.) *Mean ± standard deviation (blue line)*: The most commonly used
   confidence interval is :math:`\mu \pm k \sigma`, where *k* is chosen to
   give the appropriate confidence interval, assuming the posterior is
   Gaussian. Here, *k = 1.96*.
   2.) *Median with quantile (green line)*: The posterior need not be
   Gaussian. If it is not, we would like a more robust way to summarize
   it. A simple method is to report the median, and then give lower and
   upper bounds to the error bar based on quantile. For a 95 % credible
   region we would report the 2.5th percentile and the 97.5th percentile.
   3.) *Mode with HPD (red line)*: This method uses the highest
   posterior density (HPD) interval. If we're considering a 95 %
   confidence interval, the HPD interval is the shortest interval
   that contains 95 % of the probability of the posterior.

More precisely, the probability that a normal deviate lies in the range
:math:`\mu - n \sigma` and :math:`\mu + n \sigma` is given by

.. math::
   :label: ErrorEstim:pn

   P(n) = F(\mu + n \, \sigma) - F(\mu - n \, \sigma) = \Phi(n) - \Phi(-n) = \erf\left(\frac{n}{\sqrt{2}}\right) \approx \left\{\begin{array}{cl} 0.682, & n = 1\\ 0.954, & n = 2 \\ 0.997, & n = 3 \end{array}\right. .

As mentioned above, the error estimation algorithm calculates the
highest posterior density (HPD) interval of each parameter,
respectively. A HPD interval is basically the shortest interval on a
posterior density for some given confidence level, i.e. 68 % for
1\ :math:`\sigma`, 95.4 % for 2\ :math:`\sigma` etc. For example, if
we're considering a 95.4 % (or 2\ :math:`\sigma`) confidence interval,
the HPD interval is the shortest interval that contains 95.4 % of the
probability of the posterior. Mathematically a
:math:`100 \cdot (1 - P(n))` % HPD interval (or region) for a subset
:math:`\mathcal{C} \in \Theta` defined by

.. math:: \mathcal{C} = \left\{\theta:\pi(\theta|x \geq k\right\},

where :math:`k` is the largest number such that

.. math:: \int_{\left\{\theta:\pi(\theta|x \geq k\right\}} \pi(\theta|x) d\theta = 1 - P(n).

The value :math:`k` can be thought of as a horizontal line placed over
the posterior density whose intersection(s) with the posterior define
regions with probability :math:`1 - P(n)`, see
:numref:`fig-ErrorEstim-CredibleRegion`. In order to compute a HPD
interval we rank-order the MCMC trace. We know that the number of
samples that are included in the HPD is 0.954 (or another confidence
level) times the total number of MCMC sample. We then consider all
intervals that contain that many samples and find the shortest one. In
the case of a normal distribution an HPD interval coincides with the
usual probability region symmetric about the mean, spanning the
:math:`\frac{n \, \sigma}{2}` and :math:`1 - \frac{n \, \sigma}{2}`
quantiles [4]_. The same is true for any unimodal, symmetric
distribution, see :numref:`fig-ErrorEstim-CredibleRegion`.

So, the error estimation algorithm reports the (first) mode [5]_ and
then the bounds on the HPD intervals, see
:numref:`fig-error-corner`. Note, the mode must not
coincidence with the best fit result of the previously applied
algorithms!

Please note, the calculated HPD intervals are credible intervals not
confidence intervals. In Bayesian statistics, a credible interval is an
interval in the domain of a posterior probability distribution or
predictive distribution used for interval estimation. The generalization
to multivariate problems is the credible *region*. Credible intervals
are analogous to confidence intervals in frequentist statistics,
although they differ on a philosophical basis; Bayesian intervals treat
their bounds as fixed and the estimated parameter as a random variable,
whereas frequentist confidence intervals treat their bounds as random
variables and the parameter as a fixed value. (Taken from
https://en.wikipedia.org/wiki/Credible_interval).



.. _tutorial-error-obs-xml:

Observational xml file
----------------------

For the myXCLASSFit function we have to define the frequency range,
(here from 219.764240 to 221.471740 GHz), the size of the interferometric
beam (here 0.38 arcsec), the background and dust parameters.

**Observational xml file** ``tutorial-error__obs.xml`` used for the
myXCLASSMapFit function:

::

   <?xml version="1.0" encoding="UTF-8"?>
   <ExpFiles>


      <!-- define number of experimental data file(s) -->
      <NumberExpFiles>1</NumberExpFiles>


      <!-- ****************************************************************** -->
      <!-- define parameters for first file -->
      <file>


         <!-- define path and name of first data file -->
         <FileNamesExpFiles>files/data/observed_spectrum.dat</FileNamesExpFiles>


         <!-- define import filter -->
         <ImportFilter>xclassASCII</ImportFilter>


         <!-- define number of frequency ranges for first data file -->
         <NumberExpRanges>1</NumberExpRanges>


         <!-- define parameters for each frequency range -->
         <FrequencyRange>
               <MinExpRange>219764.240</MinExpRange>
               <MaxExpRange>221471.740</MaxExpRange>
               <StepFrequency>2.5</StepFrequency>


               <!-- define background temperature and temperature slope -->
               <t_back_flag>True</t_back_flag>
               <BackgroundTemperature>0.0</BackgroundTemperature>
               <TemperatureSlope>0.0</TemperatureSlope>


               <!-- define hydrogen column density, beta for dust, and kappa -->
               <HydrogenColumnDensity>1.00E+24</HydrogenColumnDensity>
               <DustBeta>0.0</DustBeta>
               <Kappa>0.0</Kappa>
         </FrequencyRange>


         <!-- define size of telescope -->
         <TelescopeSize>0.38</TelescopeSize>


         <!-- define local standard of rest (vLSR) -->
         <GlobalvLSR>0.0</GlobalvLSR>


         <!-- define if interferrometric observation is modeled -->
         <Inter_Flag>True</Inter_Flag>


         <!-- define parameters for ASCII file import -->
         <ErrorY>no</ErrorY>
         <NumberHeaderLines>0</NumberHeaderLines>
         <SeparatorColumns> </SeparatorColumns>
      </file>


      <!-- *************************************************************** -->
      <!-- parameters for isotopologues -->
      <Isotopologues>True</Isotopologues>
      <IsoTableFileName>files/tutorial-error__iso-ratio.dat</IsoTableFileName>


      <!-- *********************************** -->
      <!-- take local-overlap into account -->
      <LocalOverlap_Flag>False</LocalOverlap_Flag>


      <!-- **************************** -->
      <!-- prevent sub-beam description -->
      <NoSubBeam_Flag>True</NoSubBeam_Flag>
   </ExpFiles>



.. _tutorial-error-molfit:

Molfit file
-----------

The **molfit file** ``tutorial-error__Band_6.molfit`` used for the
myXCLASSFit function:

::

   %=======================================================================================================================================================================================================================================================================
   %
   % define database parameters:
   %
   %=======================================================================================================================================================================================================================================================================
   %%MinNumTransitionsSQL = 1                % (min. number of transitions)
   %%MaxNumTransitionsSQL = 90000            % (max. number of transitions)
   %%MaxElowSQL = 8.000e+03                  % (max. lower energy, i.e. upper limit of the lower energy of a transition)
   %%MingASQL = 1.000e-09                    % (minimal intensity, i.e. lower limit of gA of a transition)
   %%TransOrderSQL = 1                       % (order of transitions: (=1): by lower energy, (=2) by gA, (=3) gA/E_low^2, else trans. freq.)



   %=======================================================================================================================================================================================================================================================================
   %
   % define molecules and their components:
   %
   %=======================================================================================================================================================================================================================================================================
   % f:  low:   up:  s_size:   f:  low:    up:   T_rot:   f:      low:        up:     N_tot:   f:  low:   up:  V_width:   f:  low:   up:   V_off:  cfFlag:
   %
   CH3CN;v=0;   1
      y   0.1  20.0      3.0    y  50.0  550.0    300.0    y  1.00e+14   1.00e+20   2.00e+15    y   4.0  12.0       8.0   y   -2.0  +2.0     -0.3  c


.. _tutorial-error-iso-ratio:

Iso ratio file
--------------

In addition to CH\ :math:`_3`\ CN\ :math:`_{v=0}`, which is used in the molfit
file, we take the following isotopologues and vibrational excited
molecules into account:
CH\ :math:`_3`\ CN\ :math:`_{v8=1}`,
\ :math:`^{13}`\ CH\ :math:`_3`\ CN\ :math:`_{v=0}`,
CH\ :math:`_3^{13}`\ CN\ :math:`_{v=0}`,
CH\ :math:`_3`\ C\ :math:`^{15}`\ N\ :math:`_{v=0}`,

The **iso-ratio file** ``tutorial-error__iso-ratio.dat`` used for the
myXCLASSFit function:

::

   % Isotopologue:      Iso-master:        Iso-ratio:      Lower-limit:      Upper-limit:
   CH3CN;v8=1;          CH3CN;v=0;                  1
   C-13-H3CN;v=0;       CH3CN;v=0;                 16
   CH3C-13-N;v=0;       CH3CN;v=0;                 16
   CH3CN-15;v=0;        CH3CN;v=0;                300



.. _tutorial-error-alg-xml:

Algorithm xml file
------------------

Here, we use a combination of two algorithms, a so-called algorithm chain. We
start with the **trf** algorithm with max. 80 iterations and 6 cores to fit
the observational data. In the following, we estimate the errors of the
parameters optimized by the trf algorithm using the MCMC algorithm
(tab ``<ErrorMethod>``) with 20 samplers (tag ``<NumberMCMCSampler>``), 5
burn in iterations (tag ``<NumberBurnInIter>``), and 10 final iterations
(tag ``<number_iterations>``). Additionally, we will compute HPD intervals
(tag ``<ErrorType>``) with a :math:`2 \, \sigma` range
(tag ``<MultiplicitySigma>``). We do not use a backend file
(tag ``<BackendFileName>``).

Note that we actually use far too few samples and iterations here to get a
really good error estimate. It is possible that sometimes a ``"RuntimeError"``
is displayed, which is caused by insufficient sampling. It may therefore be
necessary to increase the number of samples (described by tag
``NumberMCMCSampler``) to e.g. **40** and the number of iterations (described
by tag ``number_iterations`` in the second algorithm) to e.g. **1000**.
However, depending on the used computer, this can take quite a long time.

**Algorithm xml file** ``tutorial-error__algorithm.xml`` used for the
myXCLASSFit function:

::

   <?xml version="1.0" encoding="UTF-8"?>
   <FitControl>
      <!-- settings for fit process -->


      <!-- set number of used algorithms -->
      <NumberOfFitAlgorithms>2</NumberOfFitAlgorithms>


      <!-- define settings for 2nd algorithm -->
      <algorithm>
         <FitAlgorithm>trf</FitAlgorithm>


         <!-- set max. number of iterations -->
         <number_iterations>80</number_iterations>


         <!-- set max. number of processors -->
         <NumberProcessors>6</NumberProcessors>


         <!-- settings for chi^2 -->
         <limit_of_chi2>1e-6</limit_of_chi2>
         <RenormalizedChi2>yes</RenormalizedChi2>
         <DeterminationChi2>default</DeterminationChi2>
         <SaveChi2>yes</SaveChi2>


         <!-- set plot options -->
         <PlotAxisX>Rest Frequency [MHz]</PlotAxisX>
         <PlotAxisY>T_mb [K]</PlotAxisY>
         <PlotIteration>no</PlotIteration>
      </algorithm>


      <!-- define settings for 2nd algorithm -->
      <algorithm>
         <FitAlgorithm>errorestim_ins</FitAlgorithm>


         <!-- define method for error estimation -->
         <ErrorMethod>MCMC</ErrorMethod>


         <!-- define number of samplers used for error estimation -->
         <NumberMCMCSampler>20</NumberMCMCSampler>


         <!-- define number of iterations within burn-in phase -->
         <NumberBurnInIter>5</NumberBurnInIter>


         <!-- define sampling method (local or global) -->
         <SampleMethod>global</SampleMethod>


         <!-- define multiplicity of standard deviation (1sigma, 2sigma, 3sigma, etc.) -->
         <MultiplicitySigma>2</MultiplicitySigma>


         <!-- set type of error range (Gaussian, Percentile, HPD, UltraNest) -->
         <ErrorType>hpd</ErrorType>


         <!-- define path and name of backend file -->
         <BackendFileName></BackendFileName>


         <!-- set max. number of iterations -->
         <number_iterations>10</number_iterations>


         <!-- set max. number of processors -->
         <NumberProcessors>6</NumberProcessors>


         <!-- settings for chi^2 -->
         <limit_of_chi2>1e-5</limit_of_chi2>
         <RenormalizedChi2>yes</RenormalizedChi2>
         <DeterminationChi2>default</DeterminationChi2>
         <SaveChi2>yes</SaveChi2>


         <!-- set plot options -->
         <PlotAxisX>Rest Frequency [MHz]</PlotAxisX>
         <PlotAxisY>T_mb [K]</PlotAxisY>
         <PlotIteration>no</PlotIteration>
      </algorithm>
   </FitControl>



.. _tutorial-error-call:

Call of myXCLASSFit function
----------------------------

Now everything is prepared to start the myXCLASSFit function. Note, we use
the *scipy* optimizer, because the error estimation is much faster compared
to the MAGIX implementation.

Start **myXCLASSFit** function with

::

    >>> from xclass import task_myXCLASSFit
    >>> import os

    ## get path of current directory
    >>> LocalPath = os.getcwd() + "/"

    # define path and name of molfit file
    >>> MolfitsFileName = LocalPath + "files/tutorial-error__molecules.molfit"

    # define path and name of obs. xml file
    >>> ObsXMLFileName = LocalPath + "files/tutorial-error__obs.xml"

    # define path and name of algorithm xml file
    >>> AlgorithmXMLFileName = LocalPath + "files/tutorial-error__algorithm.xml"

    # set optimization package
    >>> Optimizer = "scipy"

    # define if results of myXCLASSFit function are return as dictionary
    >>> DictOutFlag = True

    # call myXCLASSFit function
    >>> OutputDictionary = task_myXCLASSFit.myXCLASSFitCore( \
                                MolfitsFileName = MolfitsFileName, \
                                AlgorithmXMLFileName = AlgorithmXMLFileName, \
                                ObsXMLFileName = ObsXMLFileName, \
                                Optimizer = Optimizer, \
                                DictOutFlag = DictOutFlag)



.. _tutorial-error-results:

Results
-------


For the model parameters we get the following errors:

.. figure:: ../figures/tutorial-error__corner-final.png
   :align: center
   :width: 100%
   :name: fig-tutorial-error-results_trot

   The corner plot shows all the one and two dimensional projections of
   the posterior probability distributions of the model parameters.
   Additionally, it shows the marginalized distribution for each parameter
   independently in the histograms along the diagonal and then the
   marginalized two dimensional distributions in the other panels.

Note, among other things, the emcee package uses random numbers, which
is why you don't always get the exact same numbers for each function call.

From the corner plot we can directly see the errors of each model parameter.
For example, for the excitation temperature we get
:math:`T_{\rm rot} = 537.74` K. The lower limit for the error range is
:math:`T_{\rm rot}^{\rm error-low} = 537.74 - 2.1839 = 535.5561` K and
the upper limit is
:math:`T_{\rm rot}^{\rm error-up} = 537.74 + 2.1997 = 539.9397` K,
respectively. Additionally, we see that the probability distribution of the
excitation temperature is unimodal, i.e. there is only one best fit for the
temperature. Please note, that some parameters, such as the column density,
are fitted on log10 scale.



.. ----------------------------------------------------------------------------------------



Footnotes
---------

.. Footnotes
.. [1]
   https://emcee.readthedocs.io/en/stable/

.. [2]
   https://johannesbuchner.github.io/UltraNest/index.html

.. [3]
   https://pypi.python.org/pypi/corner

.. [4]
   In statistics and the theory of probability, quantiles are cut points
   dividing the range of a probability distribution into contiguous
   intervals with equal probabilities, or dividing the observations in a
   sample in the same way. There is one less quantile than the number of
   groups created. Thus quartiles are the three cut points that will
   divide a dataset into four equal-size groups. q-Quantiles are values
   that partition a finite set of values into q subsets of (nearly)
   equal sizes. There are q - 1 of the q-quantiles, one for each integer
   k satisfying 0 < k < q. In some cases the value of a quantile may not
   be uniquely determined, as can be the case for the median
   (2-quantile) of a uniform probability distribution on a set of even
   size. (Taken from https://en.wikipedia.org/wiki/Quantile)

.. [5]
   Generally, the *mode* is the value that appears most often in a set
   of data. For a normal distribution the numerical values of mode,
   mean, and median are identical, and may be very different in highly
   asymmetric distributions.

.. [6] taken from http://bebi103.caltech.edu/2015/tutorials/l06_credible_regions.html
