Central concepts
================
Note that `emgfit` is tailored to the needs of analyzing high-precision
time-of-flight mass spectra. However, both the available selection of
hyper-exponentially-modified Gaussian (Hyper-EMG) line shapes as well as the
implemented statistical techniques could be used as powerful tools for
analyzing spectroscopic data sets from a variety of fields.


The peak and spectrum classes
-----------------------------
The data analysis approach of `emgfit` is highly object oriented. The central
objects that users interact with are instances of the
:class:`~emgfit.spectrum.spectrum` and :class:`~emgfit.spectrum.peak` classes.
A spectrum object is instantiated by importing a data set. All relevant
information about this data set is stored in attributes of the spectrum object.
Initially, the user adds a number of peak objects to the spectrum which are then
stored as a list in the spectrum's :attr:`~emgfit.spectrum.spectrum.peaks`
attribute. Each peak object holds specific information about a given peak (e.g.
peak position, peak area, ...). A table with all relevant information about the
peaks in the spectrum can be viewed with the
:meth:`~emgfit.spectrum.spectrum.show_peak_properties` spectrum method.

The different peaks are fitted and more and more information is obtained by
progressively calling different methods on the spectrum object. An outline of a
typical analysis with `emgfit` is given in the tutorial. Once a peak has been
fitted the obtained :class:`~lmfit.model.ModelResult` (or "fit result") is
stored at the corresponding position in the spectrum's
:attr:`~emgfit.spectrum.spectrum.fit_results` list. Comprehensive lists of the
available methods and attributes of the spectrum and peak classes are compiled
along with detailed usage information in the :class:`~emgfit.spectrum.spectrum`
and :class:`~emgfit.spectrum.peak` sections of the API docs. The
spectrum and peak classes are tailored to the analysis needs of multi-reflection
time-of-flight mass spectrometry. However, the statistical techniques
incorporated in the spectrum class could be applied to analyses of spectroscopic
data from various other fields. If specific needs emerge other specialized
classes could be derived from the above.


Adding peaks to a spectrum
--------------------------
The easiest way to add peaks to a spectrum is to use `emgfit's` automatic
peak detection method :meth:`~emgfit.spectrum.spectrum.detect_peaks`. This
method applies some smoothing to the spectrum and then detects peaks via minima
in the second derivative of the smoothed data. This approach yields a high
sensitivity in the identification of overlapping or low-intensity peaks.
Increased sensitivity can be achieved by adapting the method's tuning parameters
such as the minimal threshold for peak detection to the specifics of the
given data set - see the method docs for the available options.

Alternatively or additionally, peaks can be added manually with the
:meth:`~emgfit.spectrum.spectrum.add_peak` method. By default, markers of the
associated peaks are added to plots of spectrum data. Peaks can be removed from
a spectrum object using the :meth:`~emgfit.spectrum.spectrum.remove_peaks`
method. **To avoid ambiguities peaks should only be added or removed in the
initial analysis stage, i.e. before the shape calibration or any other fits have
been performed.**


Assigning species to peaks and fetching AME values
--------------------------------------------------
The following attributes can be used to select a peak:

* Peak index (i.e. index in the :attr:`~emgfit.spectrum.spectrum.peaks` list)
* Peak marker position `x_pos`
* Ionic `species` label (if assigned)

The peak index and `x_pos` are always defined as soon as a peak is added to a
spectrum. The optional `species` attribute can either be set in
:meth:`~emgfit.spectrum.spectrum.add_peak` or with
:meth:`~emgfit.spectrum.spectrum.assign_species`. The `species` labels must
follow the :ref:`:-notation`. As soon as a `species` is assigned to a peak the
corresponding literature mass and its uncertainty are automatically fetched
from the AME2016_ mass database. When a AME mass value is not purely based on
experimental data the peak's `extrapolated` attribute is set to `True`.

.. _AME2016: http://amdc.in2p3.fr/web/masseval.html


Hyper-EMG distributions
-----------------------
A core feature of `emgfit` is its numerically robust implementation of
hyper-exponentially-modified Gaussian (hyper-EMG) distribution functions.
Exponentially-modified Gaussian distributions have been demonstrated to be a
powerful tool for fitting spectroscopic data from various fields including mass
spectrometry [1]_, alpha-particle spectrometry [2]_ and chromatography [3]_.
Hyper-EMG distributions :math:`h_\mathrm{emg}(x)` as introduced in [1]_ are
mixture models that allow the convolution of a Gaussian with an arbitrary number
of left-hand and right-hand exponential tails, respectively:

.. math::

   h_\mathrm{emg}(x; \mu, \sigma, \Theta, \eta_-, \tau_-, \eta_+, \tau_+) =
   \Theta h_\mathrm{-emg}(x; \mu, \sigma, \eta_-, \tau_-) +
   (1-\Theta) h_\mathrm{+emg}(x; \mu, \sigma, \eta_+, \tau_+).

where :math:`0 \leq \Theta \leq 1` is the mixing weight that determines the
relative contribution of the negative and positive skewed EMG distributions,
:math:`h_\mathrm{-emg}`` and :math:`h_\mathrm{+emg}`, respectively. The latter
are defined as:

.. math::

  h_\mathrm{-emg}(x; \mu, \sigma, \eta_-, \tau_-)
  = \sum_{i=1}^{N_-}{\frac{\eta_{-i}}{2\tau_{-i}}
      \exp{\left(\frac{\sigma}{\sqrt{2}\tau_{-i}} +
           \frac{x-\mu}{\sqrt{2}\tau_{-i}}\right)}
      \mathrm{erfc}\left(\frac{\sigma}{\sqrt{2}\tau_{-i}} +
                         \frac{x-\mu}{\sqrt{2}\sigma}\right)},

  h_\mathrm{+emg}(x; \mu, \sigma, \eta_+, \tau_+)
  = \sum_{i=1}^{N_+}{\frac{\eta_{+i}}{2\tau_{+i}}
      \exp{\left(\frac{\sigma}{\sqrt{2}\tau_{+i}} -
                 \frac{x-\mu}{\sqrt{2}\tau_{+i}}\right)}
      \mathrm{erfc}\left(\frac{\sigma}{\sqrt{2}\tau_{+i}} -
                         \frac{x-\mu}{\sqrt{2}\sigma}\right)}.

:math:`N_{-}` and :math:`N_{+}` are referred to as the negative and positive tail
order. :math:`\mu=\mu_G` denotes the mean and :math:`\sigma=\sigma_G` the
standard deviation of the underlying Gaussian distribution. The decay constants
of the left- and right-handed exponential tails are given by
:math:`\tau_-=(\tau_{-1},\tau_{-2},...,\tau_{-N_-})`
& :math:`\tau_+=(\tau_{+1},\tau_{+2},...,\tau_{+N_+})`, respectively. The negative
and positive tail weights are denoted by
:math:`\eta_-=(\eta_{-1},\eta_{-2},...,\eta_{-N_-})`
& :math:`\eta_+=(\eta_{+1},\eta_{+2},...,\eta_{+N_+})`, respectively, and obey
the following normalizations:

.. math::

   \sum_{i=1}^{N_-}{\eta_\mathrm{-i}} = 1,

   \sum_{i=1}^{N_+}{\eta_\mathrm{+i}} = 1.

For information on the numerical implementation of hyper-EMG distributions see
:mod:`emgfit.emg_funcs`.


.. _fit_model_list:

Available fit models
--------------------
All supported (single peak) fit models or peak shapes are defined in the
:mod:`emgfit.fit_models` module. Currently, the following models are available:

* Gaussian:     :func:`~emgfit.fit_models.Gaussian`
* Hyper-EMG(0,1): :func:`~emgfit.fit_models.emg01`
* Hyper-EMG(1,0): :func:`~emgfit.fit_models.emg10`
* Hyper-EMG(1,1): :func:`~emgfit.fit_models.emg11`
* Hyper-EMG(1,2): :func:`~emgfit.fit_models.emg12`
* Hyper-EMG(2,1): :func:`~emgfit.fit_models.emg21`
* Hyper-EMG(2,2): :func:`~emgfit.fit_models.emg22`
* Hyper-EMG(2,3): :func:`~emgfit.fit_models.emg23`
* Hyper-EMG(3,3): :func:`~emgfit.fit_models.emg33`

where the numbers in brackets indicate the negative and positive tail orders,
i.e. the number of exponential tails added to the left and right side of the
peak, respectively. All fit models in `emgfit` are expressed using `lmfit's`
:class:`~lmfit.model.Model` class. This interface is used to define appropriate
parameter bounds and ensure the normalization of the negative and positive tail
weights (`eta_p` and `eta_m` parameters) of Hyper-EMG models. For more details
on the above fit models see the API docs of the :mod:`emgfit.fit_models` module.

Multi-peak fits
---------------
If multiple peaks are to be fitted at once a suitable multi-peak model is
automatically created within the :class:`~emgfit.spectrum.spectrum` class by
adding a suitable number of single-peak models. In multi-peak fits, the values
of the shape (or scale) parameters of all peaks are enforced to be identical,
only the amplitude and position parameters are allowed to differ. In
multi-reflection time-of-flight mass spectrometry the width of peaks acquired
with a given instrumental resolution scales linearly with the peak's centroid
mass. Simultaneously fitting peaks with significantly different mass centroids
therefore requires a mass-dependent rescaling of the shape parameters to the
respective peak's mass. So far analysis practice has shown that the required
scaling corrections for isobaric peaks are significantly smaller than the
typical relative errors of the corresponding shape parameters. Since `emgfit`
(currently) only supports fits of isobaric species such a mass-rescaling has not
been implemented in the package. Support for fitting non-isobaric mass peaks in
the same spectrum might be added in the future.

Peak fitting approach
---------------------
Peak fits with `emgfit` are executed by the internal
:meth:`~emgfit.spectrum.spectrum.peakfit` method which builds on `lmfit's`
:class:`~lmfit.model.Model` interface. However, usually the user only interacts
with higher level methods (e.g. :meth:`~emgfit.spectrum.spectrum.determine_peak_shape`
or :meth:`~emgfit.spectrum.spectrum.fit_peaks`) that internally call
:meth:`~emgfit.spectrum.spectrum.peakfit`. Initial parameter values are defined
as follows:

* The initial peak amplitude (`amp` parameter) is estimated using the number of
  counts in the bin closest to the peak's marker position :attr:`x_pos`. The
  number of counts is converted using a empirically determined conversion factor.
  The conversion factor is somewhat peak-shape dependent but has been found to
  work well for a variety of peak shapes.
* The peak position (`mu` parameter) is initialized at the marker position
  :attr:`x_pos`.
* If the shape parameters have not already been determined in a preceding
  peak-shape calibration there is two possibilities for their initialization.
  By default, a set of suitable initial values is then derived by re-scaling the
  shape parameters for a representative peak at mass unit 100 to the mass of the
  given spectrum. The default parameters at mass 100 u are defined in the
  :func:`emgfit.fit_models.create_default_init_pars` function. Alternatively,
  the shape parameter values can be user-defined by parsing a dictionary with
  the parameter names as keys to the `init_pars` option.

Fits are performed by minimizing either of the following cost functions:

* `chi-square`: This variance weighted cost function is commonly known as
  `Pearson's chi squared statistic` and defined as:

  .. math::

     \chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)+\epsilon},

  where :math:`x_i` and :math:`y_i` denote the center and contained counts of
  the i-th bin, respectively. On each iteration the variances of the residuals
  are estimated using the latest model predictions:
  :math:`\sigma_i^2 \approx f(x_i)`. The inclusion of the small constant
  :math:`\epsilon = 1e-10` ensures numerical robustness as :math:`f(x_i)`
  approaches zero and only causes a negligibly small bias in the parameter
  estimates. The iteratively re-calculated weights result in improved behavior
  in low-count situations.

* `MLE`: With this cost function a binned maximum likelihood estimation is
  performed by minimizing the (doubled) negative log-likelihood ratio, also
  known as `Cash-statistic` [4]_:

  .. math::

     L = 2\sum_i \left[ f(x_i) - y_i + y_i \ln{\left(\frac{y_i}{f(x_i)}\right)} \right].

  The assumption that the counts in each bin follow a Poisson (instead of a
  normal) distribution makes this method applicable to count data with very
  low statistics. When a non-scalar minimization algorithm is used (e.g.
  `least_squares`) the above optimization problem is rephrased into a
  least-squares problem by minimizing the square roots of the (positive
  semidefinite) summands in the above equation. See the notes section of the
  docs of :meth:`~emgfit.spectrum.spectrum.peakfit` for details.

A number of different optimization algorithms are available to perform the
minimization.In principle, any of the algorithms listed under `lmfit's`
`fitting methods`_ can be used by passing the respective method name to the
`method` option if `emgfit's` fitting routines. By default, the `least_squares`
minimizer is used.

.. _`fitting methods`: https://lmfit.github.io/lmfit-py/fitting.html#choosing-different-fitting-methods


.. _`peak-shape calibration`:

Peak-shape calibration
----------------------
The peak-shape calibration is performed with the
:meth:`~emgfit.spectrum.spectrum.determine_peak_shape` method and offers a way
to reduce the number of parameters varied in the peak-of-interest fit(s). This
not only increases the robustness and computational speed of multi-peak fits but
can also enhance the sensitivity for detecting unidentified overlapping peaks.

In the peak-shape calibration an ideally well separated, high-statistics peak is
fitted to obtain a suitable peak shape to describe the data. We refer to all
parameters that determine the shape of a single peak in the absence of background
as *shape parameters*. In the case of a Gaussian peak model the only shape
parameter is given by the standard deviation :math:`\sigma`. The **shape
parameters of a hyper-EMG model function**
are given by:

* the standard deviation :math:`\sigma` of the underlying Gaussian,
* the left-right mixture weight :math:`\Theta`,
* the weights for the positive and negative exponential tails, :math:`\eta_{-i}`
  & :math:`\eta_{+i}`, respectively,
* and their corresponding decay constants :math:`\tau_{-i}` & :math:`\tau_{+i}`,
  respectively,

where i = 1, 2, 3, ... indicates the tail order. `emgfit` assumes
that all peaks in a spectrum have been acquired with a fixed instrumental
resolution and exhibit the same theoretical peak shape. In multi-reflection
time-of-flight mass spectrometry this assumption is not strictly satisfied since
at a given resolving power the peak widths exhibit a linear scaling with mass.
However, since `emgfit` is currently only intended for isobaric peaks the
required scale corrections of shape parameters are usually only on the
sub-percent level and hence negligible compared to the typical uncertainties in
determining these parameters in the shape calibration fit. Therefore, an
**identical peak shape is enforced for all simultaneously fitted peaks**. A
mass-dependent re-scaling of the scale parameters might be added in the future.

Before the peak-shape calibration the user must decide which of the
:ref:`fit_model_list` best describes the data. To aid in this process the
:meth:`~emgfit.spectrum.spectrum.determine_peak_shape` method comes with an
**automatic model selection** feature. Therein, `chi-square` fits with increasingly
complicated model functions are performed on the shape calibration peak,
starting from a regular Gaussian up to Hyper-EMG functions of successively
increasing tail order. To avoid overfitting, models with any best-fit shape
parameters agreeing with zero within 1:math:`\sigma` confidence are excluded
from selection. Amongst the remaining models, the one yielding the lowest
chi-square per degree of freedom is selected. Alternatively, this feature can be
skipped by setting the `vary_tail_order` option to `False` and a peak shape can
be defined manually with the `fit_model` option of
:meth:`~emgfit.spectrum.spectrum.determine_peak_shape`.

Once a peak-shape calibration has been established, all subsequent fits will,
by default, be performed with this fixed peak-shape, only varying the peak
amplitudes, peak positions and (if applicable) the amplitude of the uniform
background. If fits with a varying peak shape are desired the `vary_shape`
option of the :meth:`~emgfit.spectrum.spectrum.peakfit` method must be set to
`True`. The imperfect knowledge of the exact peak shape can be associated with
an additional uncertainty in the determination of the peak's mass centroid and
peak area. To include these contributions in the uncertainty budget, `emgfit`
provides specialized methods to quantify the `Peak-shape uncertainties`_.


.. _recalibration:

Mass recalibration and calculation of final mass values
--------------------------------------------------------
Before being imported into `emgfit` mass spectra must have undergone a
preliminary mass calibration. This initial mass scale will persist
throughout the entire analysis process and will be used as the x-axis for
all plots of spectrum data. In multi-reflection time-of-flight mass spectrometry
the initial mass scale is usually established using the following calibration
equation [5]_:

.. math::

   \frac{m}{z} = c \frac{(t-t_0)^2}{(1+Nb)^2},

where :math:`\frac{m}{z}` denotes the mass-to-charge ratio of the ion, t is
the measured time of flight of the ion :math:`t_0` marks a small time offset due
to electronic delays and N is the number of revolutions the ion has undergone.
Since N is easy to infer, the factors c and b and the time offset :math:`t_0`
remain as the calibration constants to be determined.

There is a number of ways to determine the above calibration constants. To
ensure high precision in the final mass values a second mass calibration - the
so-called `mass re-calibration` - must be performed in `emgfit`. This removes
any systematics that could arise when different procedures are used to determine
the calibrant peak position in the initial calibration and the positions of
peaks of interest in the final fitting [5]_. Further, it renders the specific
choice of the peak position parameter irrelevant as long as the same convention
is followed for all peaks. In fact, instead of using the mean of the full
hyper-EMG distribution (:math:`\mu_\mathrm{emg}`) `emgfit` uses the mean of the
underlying Gaussian (:math:`\mu`) to establish peak positions.

In the mass recalibration a calibrant peak with a well-known (ionic) literature
mass :math:`m_{cal, lit}` is fitted and the obtained peak position
:math:`(m/z)_{cal, fit}` is used to calculate the spectrum's mass recalibration
factor defined as:

.. math::

   \gamma_\mathrm{recal} = \frac{(m/z)_\mathrm{cal,lit}}{(m/z)_\mathrm{cal,fit}}
                         = \frac{m_\mathrm{cal,lit}}{m_\mathrm{cal,fit}},

The calibrant peak can either be fitted individually upfront via the
:meth:`~emgfit.spectrum.spectrum.fit_calibrant`  method or the calibrant fit can
be performed simultaneous with the ion-of-interest fits using the
`index_mass_calib` or `species_mass_calib` options of the
:meth:`~emgfit.spectrum.spectrum.fit_peaks` method. The relative uncertainty of
the recalibration factor ("recalibration uncertainty") is given by the
literature mass uncertainty :math:`\Delta m_\mathrm{cal, lit}` and the
statistical uncertainty of the calibrant fit result
:math:`\Delta m_\mathrm{cal, fit}`:

.. math::

   \frac{\Delta \gamma_\mathrm{recal}}{\gamma_\mathrm{recal}} =
     \sqrt{ \left(\frac{\Delta m_\mathrm{cal, lit}}{m_\mathrm{cal, fit}} \right)^2
          + \left(\frac{\Delta m_\mathrm{cal, fit}}{m_\mathrm{cal, fit}} \right)^2}.

The final ionic masses :attr:`m_ion` are calculated as:

.. math::

   m_\mathrm{ion} = \frac{(m/z)_\mathrm{cal, lit}}{(m/z)_\mathrm{cal, fit}}
                    \cdot (m/z)_\mathrm{fit} \cdot z
                  = \gamma_\mathrm{recal} \cdot (m/z)_\mathrm{fit} \cdot z.

The relative uncertainty of the final mass values is given by adding the
statistical mass uncertainty, the recalibration uncertainty and the peak-shape
mass uncertainty in quadrature:

.. math::

   \frac{\Delta m_\mathrm{ion}}{m_\mathrm{ion}} =
          \sqrt{ \left(\left(\frac{\Delta m}{m}\right)_\mathrm{stat} \right)^2
          + \left(\frac{\Delta \gamma_\mathrm{recal}}{\gamma_\mathrm{recal}} \right)^2
          + \left( \left(\frac{\Delta m}{m}\right)_\mathrm{PS} \right)^2 }.

Note that in the above, :math:`m` refers to ionic rather than atomic masses.
The atomic mass excess (:attr:`atomic_ME_keV` peak attribute) and its
uncertainty are calculated from :math:`m_\mathrm{ion}` from the following
relations:

.. math::

  \mathrm{ME}= m_\mathrm{ion} + z\cdot m_e  - A \cdot u

  \Delta\mathrm{ME} = \mathrm{ME} \cdot
                      \frac{\Delta m_\mathrm{ion}}{m_\mathrm{ion}},

where A denotes the atomic mass number. Note that the above neglects the atomic
binding energy of the stripped electrons, as well as the uncertainties of the
electron mass and the atomic mass unit :math:`u`. Assuming singly or doubly
charged ions, these contributions lie well below 1 keV.


Fitting peaks of interest
-------------------------
Peaks of interest are fitted with the :meth:`~emgfit.spectrum.spectrum.fit_peaks`
method of the spectrum class. By default, this method fits all defined peaks in
the spectrum. Alternatively, a specific mass range or specific neighboring peaks
to fit can be selected. It is the user's choice whether all peaks are treated at
once or whether :meth:`~emgfit.spectrum.spectrum.fit_peaks` is run multiple
times on single peaks or subgroups of peaks.


Estimation of statistical uncertainties
---------------------------------------
With `emgfit` the statistical uncertainties of peak centroids can be estimated
in two different ways:

1. The default approach exploits the scaling of the statistical uncertainty of
   the mean of a Gaussian or hyper-EMG distribution with the number of counts in
   the peak :math:`N_\mathrm{counts}`:

   .. math::

      \sigma_\mathrm{stat} = A_\mathrm{stat} \frac{\mathrm{FWHM}}{\sqrt{N_\mathrm{counts}}}.

   In the case of a Gaussian :math:`A_\mathrm{stat}` is simply given by
   :math:`A_\mathrm{stat,G} = 1/(2\sqrt{2\ln{2}}) = 0.425`. For hyper-EMG
   distributions the respective constant of proportionality :math:`A_\mathrm{stat,emg}`
   is typically larger and depends on the specific peak shape [5]_. `emgfit's`
   :meth:`~emgfit.spectrum.spectrum.determine_A_stat_emg` method can be used to
   estimate :math:`A_\mathrm{stat,emg}` for a specific peak shape via
   non-parametric bootstrapping of a reference peak with decent statistics (see
   method docs for details). The updated :math:`A_\mathrm{stat,emg}` factor will
   be used in subsequent fits to calculate the stat. mass errors with the above
   equation. If :meth:`~emgfit.spectrum.spectrum.determine_A_stat_emg` is not
   run a default value of :math:`A_\mathrm{stat,emg} = 0.52` [5]_ is used.

2. Alternatively, the statistical uncertainty can be estimated after the peak
   fitting with the :meth:`~emgfit.spectrum.spectrum.get_errors_from_resampling`
   method. This routine follows the approach outlined in [5]_ and does not use a
   reference peak but determines the statistical mass uncertainty for each peak
   of interest individually. This is done by re-performing the fit on many
   synthetic spectra obtained by resampling from the best-fit model curve
   (`parametric bootstrap`). Assuming that the data is well-described by the
   chosen fit model  this method yields refined estimates of the statistical
   uncertainties that account for departures from the simple scaling law above
   (as possible e.g. for strongly overlapping peaks).

Since the computational overhead of the second approach is usually rather
small this method is oftentimes preferable. Note that the second method also
yields estimates of the statistical uncertainty of the peak areas, whereas the
first approach only yields stat. mass errors and requires the area errors to be
independently estimated from the covariance matrix provided by lmfit (which can
be problematic for `MLE` fits).


.. _`Peak-shape uncertainties`:

Peak-shape uncertainties
------------------------
Peak-shape uncertainties quantify the effect of shape parameter uncertainties
obtained in a preceding peak-shape calibration on the final mass values and peak
areas obtained in ion-of-interest fits.

When ion-of-interest fits are performed with a fixed peak-shape, the
uncertainties of shape parameters obtained in the peak-shape calibration can
cause additional uncertainties in the final mass and peak area values.
Consequently, these so-called `peak-shape uncertainties` must be carefully
estimated and propagated into the final mass and area uncertainties. `emgfit`
provides two ways to estimate the peak-shape uncertainties of the
peak areas and the mass values `m_ion`:

  1. A quick peak-shape (PS) estimation is automatically performed in fits with
  :meth:`~emgfit.spectrum.spectrum.fit_peaks` and
  :meth:`~emgfit.spectrum.spectrum.fit_calibrant` by calling the internal
  :meth:`~emgfit.spectrum.spectrum._eval_peakshape_errors` method. This routine
  adapts the approach of [5]_ and re-performs a given fit a number of times,
  each time changing a different shape parameter by plus and minus its
  1:math:`\sigma` confidence interval, respectively, while keeping all other
  shape parameters fixed. For each shape parameter, the larger of the two shifts
  in a peak's mass and area is recorded and the peak-shape uncertainty is
  estimated for each peak by summing those values in quadrature. Mind that the
  considered mass shifts are corrected for the respective shifts of the
  calibrant peak position.

  2. The above approach implicitly assumes that the shape parameters follow
  normal posterior distributions and neglects any correlations between shape
  parameters. Since these assumptions are oftentimes violated, refined PS error
  estimates can be obtained with `emgfit's`
  :meth:`~emgfit.spectrum.spectrum.get_MC_peakshape_errors` method. This
  re-performs a given fit many times with a variety of different peak-shapes.
  For the used peak shapes to be representative of all line shapes supported by
  the data the full shape parameter posterior distributions are upfront
  estimated by Markov-Chain Monte Carlo (MCMC) sampling. Assuming a sufficiently
  large subset of these MCMC parameter sets is used to refit the data, the
  resulting PS errors account for complex parameter distributions (typically
  found when a parameter is near its bounds) and parameter correlations. Since
  this approach is computationally expensive it makes heavy use of parallel
  processing. If appropriate MCMC sampling has already been performed in the
  peak-shape calibration (with the `map_par_covar` option) those samples will be
  re-used in the Monte Carlo PS uncertainty estimation. If
  :meth:`~emgfit.spectrum.spectrum.get_MC_peakshape_errors` is run the peak
  properties table is updated with the refined uncertainties and the new
  values are marked in color to clearly indicate the way they were estimated.


Saving fit results
------------------
All critical results obtained in the analysis of a spectrum can be saved with
the :meth:`~emgfit.spectrum.spectrum.save_results` spectrum method. This routine
saves the analysis results to an XLSX-file with three worksheets containing:

1. General properties of the spectrum, such as the input filename, the fit model
   used in the peak-shape calibration and the obtained mass recalibration
   factor. For details on what the respective parameters refer to see the
   attribute list of the :class:`~emgfit.spectrum.spectrum` class.
2. The peak properties table with the attributes of all peaks as well as
   images of all best-fit curves. Check the attribute list of the
   :class:`~emgfit.spectrum.peak` class for short descriptions of what the
   different columns contain.
3. The :attr:`eff_mass_shifts` dictionary holding for each peak the larger of
   the two effective mass shifts obtained when varying each shape parameter by
   +-1:math:`\sigma` in the default peak-shape error estimation. These shifts
   are irrelevant for peaks whose peak-shape uncertainties have been estimated
   with the :meth:`~emgfit.spectrum.spectrum.get_MC_peakshape_errors` routine.

Additionally, the spectrum's peak-shape calibration parameters and their
uncertainties are saved to a separate TXT-file and plots with the obtained fit
curves are saved to PNG-files (optional).

.. _:-notation:

:-notation of chemical substances
---------------------------------

`emgfit` follows the :-notation of chemical compounds. The chemical composition
of an ion is denoted as a single string in which the constituting isotopes are
separated by a colon (``:``). Each isotope is denoted as ``'n(El)A'`` where `El`
is the corresponding element symbol, `n` denotes the number of atoms of the
given isotope and `A` is the respective atomic mass number. In the case
`n = 1`, the number indication `n` can be omitted. The charge state of the ion
is indicated by subtracting the desired number of electrons from the atomic
species (i.e. ``':-1e'`` for singly charged cations, ``':-2e'`` for doubly
charged cations etc.). Once the ionic species of a peak is assigned `emgfit`
automatically fetches the respective literature value from the AME2016_ [6]_
mass database. The subtraction of the electron is important since otherwise the
atomic instead of the ionic mass is used for subsequent calculations. The
calculated literature mass values do not account for electron binding energies
which can in most applications safely be neglected for singly and doubly charged
ions. `emgfit` does currently not interface with an isomer database. However,
isomers can be marked by appending an ``'m'`` or ``'m0'`` up to ``'m9'`` to the
end of an isotope substring (see last example below). The literature mass (and
mass error) of an isomer are automatically calculated from the respective
ground-state AME mass when the excitation energy is passed to the `Ex`
(and `Ex_error`) option of the relevant spectrum methods.

Examples:

- The most abundant isotope of the hydronium cation :math:`H_{3}O^{+}` can be
  denoted as ``'3H1:1O16:-1e'`` or ``'3H1:O16:-e'`` or ``'1O16:3H1:-1e'`` or ...
- The most abundant isotope of the ammonium cation :math:`N H_{4}^{+}` can be
  denoted as ``'4H1:1N14:-1e'`` or ``'4H1:N14:-e'`` or ``'N14:4H1:-1e'`` or ...
- The proton is denoted as ``'1H1:-1e'`` or ``'H1:-1e'`` or ``'H1:-e'``.
- A Indium-127 ion in the second isomeric state is denoted as ``'1In127m1:-1e'``


Creating simulated spectra
--------------------------
The functions in the :mod:`emgfit.sample` module allow the fast creation of
synthetic spectrum data by extending inverse transform sampling with `Scipy's`
:class:`~scipy.stats._continuous_distns.exponnorm` class to hyper-EMG
distributions. This can serve as a valuable tool for Monte Carlo studies with
count data.


Blind analysis
--------------
Premature comparison of fit results to literature values can lead to biased
results. To avoid that user bias (consciously or unconsciously) enters the final
mass values `emgfit` incorporates the option to blind the obtained mass values
and peak positions during the analysis process. Blindfolding is activated with
the :meth:`~emgfit.spectrum.spectrum.set_blinded_peaks` method. The option to
only blind specific peaks of interest leaves the possibility to use less
interesting peaks with well-known literature masses as accuracy checks. The
blinding is automatically lifted once the processing of the spectrum is
finalized and the results are exported.


.. _AME2016: http://amdc.in2p3.fr/web/masseval.html

References
----------
.. [1] Purushothaman, S., et al. "Hyper-EMG: A new probability distribution
   function composed of Exponentially Modified Gaussian distributions to analyze
   asymmetric peak shapes in high-resolution time-of-flight mass spectrometry."
   International Journal of Mass Spectrometry 421 (2017): 245-254.

.. [2] Pommé, S., and B. Caro Marroyo. "Improved peak shape fitting in alpha
   spectra." Applied Radiation and Isotopes 96 (2015): 148-153.

.. [3] Naish, Pamela J., and S. Hartwell. "Exponentially modified Gaussian
   functions — a good model for chromatographic peaks in isocratic HPLC?."
   Chromatographia 26.1 (1988): 285-296.

.. [4] Cash, Webster. "Parameter estimation in astronomy through application of
   the likelihood ratio." The Astrophysical Journal 228 (1979): 939-947.

.. [5] San Andrés, Samuel Ayet, et al. "High-resolution, accurate
  multiple-reflection time-of-flight mass spectrometry for short-lived,
  exotic nuclei of a few events in their ground and low-lying isomeric
  states." Physical Review C 99.6 (2019): 064313.

.. [6] Wang, M., et al. "The AME2016 atomic mass evaluation (II). Tables, graphs
   and references." Chinese Physics C 41.3 (2017): 030003.