Calculates the AME mass, AME mass error, the extrapolation flag and the
mass number A of the given atomic or molecular species.
Parameters:
species (str) – String with name of species to grab AME values for.
Ex (float [keV], optional, default: 0.0) – Isomer excitation energy in keV to add to ground-state literature
mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of isomer excitation energy in keV to add in quadrature
to ground-state literature mass uncertainty.
src (str, optional, default: AME2020) – Source of literature data (‘AME2016’ or ‘AME2020’).
Atoms with the nucleus in an isomeric state are flagged by appending an ‘m’
or any of ‘m0’, ‘m1’, …, ‘m9’ to the atom’s substring. For species with
isomers the literature mass values are only returned if the excitation
energy is user-specified with the Ex argument. In this case, Ex is
added to the AME value for the ground state mass and Ex_error is added in
quadrature to the respective AME uncertainty.
Both the atomic binding energy of stripped-off electrons as well as the
uncertainty of the electron mass are neglected in the calculation of the
ionic AME mass.
Returns:
List containing relevant AME data for species:
(AME mass [u], AME mass uncertainty [u],
boolean flag for extrapolated species, atomic mass number)
The ionic charge state is defined by subtracting the desired number of
electrons from the atomic species (i.e. ':-1e' for singly charged
cations, ':-2e' for doubly charged cations etc.).
emgfit.ame_funcs.mdata_AME(El, A, src='AME2020')[source]¶
Grabs atomic mass data from the atomic mass evaluation (AME).
Hyper-exponentially-modified Gaussian distribution (hyper-EMG).
The lengths of li_eta_m & li_tau_m must match and define the order of
negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match
and define the order of positive tails.
Parameters:
x (numpy.ndarray of floats) – Abscissa data (mass data).
mu (float) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
The Hyper-EMG probability distribution function was first introduced in
this publication by Purushothaman et al. [3]. The basic definitions and
notations used here are adapted from this work.
The total hyper-EMG distribution h_m_emg is comprised of the negative- and
positive-skewed EMG distributions h_m_emg and h_p_emg respectively and
is calculated as:
h_emg(x,mu,sigma,theta,li_eta_m,li_tau_m,li_eta_p,li_tau_p)=theta*h_m_emg(x,mu,sigma,li_eta_m,li_tau_m)+(1-theta)*h_p_emg(x,mu,sigma,li_eta_p,li_tau_p).
The Hyper-EMG probability distribution function was first introduced in
this publication by Purushothaman et al. [4]. The basic definitions and
notations used here are adapted from this work.
Each negative tail of a Hyper-EMG function can be expressed in two
equivalent ways:
where \(u = \left(\frac{\sigma}{\sqrt{2}\tau_{-i}}\right)^2 + \frac{x-\mu}{\tau_{-i}}\)
and \(v = \frac{\sigma}{\sqrt{2}\tau_{-i}} + \frac{x-\mu}{\sqrt{2}\sigma}\).
In double float precision, the exp(u) routine overflows if u > 709.78. The
complementary error function erfc(v) underflows to 0.0 if v > 26.54. The
scaled complementary error function erfcx(v) overflows if v < -26.62. To
circumvent those scenarios and always ensure an exact result, the underlying
helper function for the calculation of a negative EMG tail h_m_i()
uses the formulation in terms of erfcx whenever v >= 0 and switches to the
erfc-formulation when v < 0.
The Hyper-EMG probability distribution function was first introduced in
this publication by Purushothaman et al. [5]. The basic definitions and
notations used here are adapted from this work.
Each positive tail of a Hyper-EMG function can be expressed in two
equivalent ways:
where \(u = \left(\frac{\sigma}{\sqrt{2}\tau_{+i}}\right)^2 - \frac{x-\mu}{\tau_{+i}}\)
and \(v = \frac{\sigma}{\sqrt{2}\tau_{+i}} - \frac{x-\mu}{\sqrt{2}\sigma}\).
In double precision, the exp(u) routine overflows if u > 709.78. The
complementary error function erfc(v) underflows to 0.0 if v > 26.54. The
scaled complementary error function erfcx(v) overflows if v < -26.62. To
circumvent those scenarios and always ensure an exact result, the underlying
helper function for the calculation of a negative EMG tail h_m_i()
uses the formulation in terms of erfcx whenever v >= 0 and switches to the
erfc-formulation when v < 0.
The lengths of li_eta_m & li_tau_m must match and define the order of
negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match
and define the order of positive tails.
Parameters:
mu (float) – Mean value of underlying Gaussian distribution.
The Hyper-EMG probability distribution function was first introduced in
this publication by Purushothaman et al. [6]. The basic definitions and
notations used here are adapted from this work.
Calculate standard deviation of hyper-EMG distribution.
The lengths of li_eta_m & li_tau_m must match and define the order of
negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match
and define the order of positive tails.
Parameters:
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
The Hyper-EMG probability distribution function was first introduced in
this publication by Purushothaman et al. [7]. The basic definitions and
notations used here are adapted from this work.
Calculate the mean of a Hyper-EMG peak from the best-fit parameters.
Parameters:
fit_model – Name of fit model used to obtain pars.
pars (emgfit.parameter.Parameters, optional) – Parameters object to obtain shape parameters for calculation from.
This argument can be used when no fit result is available.
pref (str, optional) – Prefix of peak parameters of interest.
peak_index (int) – Index of the peak corresponding to model.
index_ref_peak (int) – Index of the peak to be used as
shape-reference peak.
scale_shape_pars (bool) – Whether to scale the scale-dependent shape parameters adopted from the
shape-reference peak.
scl_coeff (float, optional) – Constant coefficient used to scale the scale-dependent shape parameters.
mu_ref (float or str, optional) – Centroid of the underlying Gaussian of the shape-reference peak. This
argument is only relevant when scale_shape_pars=True. If mu_ref is
set to a float, this number is used as the fixed (Gaussian) reference
centroid for calculating the scale factor. If mu_ref=”varying”,
mu_ref is set to the varying (Gaussian) centroid shape-reference peak.
Notes
If scale_shape_pars is True, the model’s scale-dependent shape parameters
are multiplied with the scale factor scl_fac=scl_coeff. If further
mu_ref is not None, the scale-dependent shape parameters are multiplied
with the scale factor scl_fac=mu/mu_ref*scl_coeff, where mu is the
centroid of the Gaussian underlying the specified model. When the
reference peak is among the peaks to be fitted, scl_fac will then be
dynamically re-calculated from the mu and mu_ref values of a given
iteration in a fit.
The default parameters were defined for mass 100, to obtain suitable
parameters at other masses all mass-dependent parameters (i.e. shape
parameters & amp) are multiplied by the scaling factor mass_number/100.
The standard deviation of the underlying Gaussian \(\sigma\) is
calculated as :math::sigma = A / (R 2 sqrt(2 ln(2)), where
\(A\) denotes the specified mass_number and \(R\) is the given
resolving_power.
init_pars (dict) – Initial shape parameters of the reference peak (‘amp’ and ‘mu’
parameters in init_pars dictionary are updated with the given values
for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta,
eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –
Estimate initial value of Gaussian centroid mu from the peak’s mode
Parameters:
x_m (float) – Mode (i.e. x-position of the maximum) of the distribution.
init_pars (dict) – Dictionary with initial values of the shape parameters.
fit_model (str) – Name of used fit model (e.g. ‘emg01’, ‘emg10’, ‘emg12’…).
Notes
For a Gaussian, the mean mu is simply equal to the x_m argument.
For highly asymmetric hyper-EMG distributions the centroid of the underlying
Gaussian mu can strongly deviate from the mode \(x_{m}\) (i.e. the
x-position of the peak maximum). Hence, the initial Gaussian centroid mu
(\(\mu\)) is calculated by rearranging the equation for the mode of the
hyper-EMG distribution:
where the mode \(x_{m}\) can be estimated by the peak marker position
x_pos and emgfit.fit_models.erfcxinv() is the inverse of the scaled
complementary error function.
This class inherits from the :class`lmfit.model.Model` class and creates a
single-peak model. This class enables overriding of lmfit’s default
residuals with emgfit’s custom cost functions, thereby enabling fits beyond
standard least squares minimization.
The model function will normally take an independent variable
(generally, the first argument) and a series of arguments that are
meant to be parameters for the model. It will return an array of
data to model some data as for a curve-fitting problem.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint().
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c.
If False, the baseline parameter bkg_c will be fixed to 0.
vary_shape (bool, optional, default: False) – If False peak-shape parameters of hyper-EMG models (sigma,
theta,`etas` and taus) are kept fixed at their initial values.
If True the shared shape parameters are varied (ensuring
identical shape parameters for all peaks).
independent_vars (list of str, optional) – Arguments to func that are independent variables (default is
None).
param_names (list of str, optional) – Names of arguments to func that are to be made into
parameters (default is None).
nan_policy ({'raise', 'propagate', 'omit'}, optional) – How to handle NaN and missing values in data. See Notes below.
prefix (str, optional) – Prefix used for the model.
name (str, optional) – Name for the model. When None (default) the name is the same
as the model function (func).
**kws (dict, optional) – Additional keyword arguments to pass to model function.
Notes
1. Parameter names are inferred from the function arguments, and a
residual function is automatically constructed.
2. The model function must return an array that will be the same
size as the data being modeled.
3. nan_policy sets what to do when a NaN or missing value is
seen in the data. Should be one of:
‘raise’ : raise a ValueError (default)
‘propagate’ : do nothing
‘omit’ : drop missing data
Examples
The model function will normally take an independent variable
(generally, the first argument) and a series of arguments that are
meant to be parameters for the model. Thus, a simple peak using a
Gaussian defined as:
params (Parameters, optional) – Parameters to use in fit (default is None).
weights (array_like, optional) – Weights to use for the calculation of the fit residual [i.e.,
weights*(data-fit)]. Default is None; must have the same size as
data.
method (str, optional) – Name of fitting method to use (default is ‘least_squares’).
iter_cb (callable, optional) – Callback function to call at each iteration (default is None).
scale_covar (bool, optional) – Whether to automatically scale the covariance matrix when
calculating uncertainties (default is True).
verbose (bool, optional) – Whether to print a message when a new parameter is added
because of a hint (default is True).
fit_kws (dict, optional) – Options to pass to the minimizer being used.
nan_policy ({'raise', 'propagate', 'omit'}, optional) – What to do when encountering NaNs when fitting Model.
calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True)
for solvers other than ‘leastsq’ and ‘least_squares’.
Requires the numdifftools package to be installed.
max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The
default value depends on the fitting method.
coerce_farray (bool, optional) – Whether to coerce data and independent data to be ndarrays
with dtype of float64 (or complex128). If set to False, data
and independent data are not coerced at all, but the output of
the model function will be. (default is True)
**kwargs (optional) – Arguments to pass to the model function, possibly overriding
parameters.
1. if params is None, the values for all parameters are expected
to be provided as keyword arguments. Mixing params and
keyword arguments is deprecated (see Model.eval).
2. all non-parameter arguments for the model function, including
all the independent variables will need to be passed in using
keyword arguments.
3. Parameters are copied on input, so that the original Parameter objects
are unchanged, and the updated values are in the returned EMGModelResult.
Examples
Take t to be the independent variable and data to be the curve
we will fit. Use keyword arguments to set initial guesses:
params (Parameters) – Parameters with initial values for model.
data (array_like) – Ordinate values of data to be modeled.
weights (array_like, optional) – Weights to multiply (data-model) for default fit residual.
fitted_peaks (list of emgfit.spectrum.spectrum.peak) – List of fitted peak objects.
method (str, optional) – Name of minimization method to use (default is ‘least_squares’).
fcn_args (sequence, optional) – Positional arguments to send to model function.
fcn_kws (dict, optional) – Keyword arguments to send to model function.
iter_cb (callable, optional) – Function to call on each iteration of fit.
scale_covar (bool, optional) – Whether to scale covariance matrix for uncertainty evaluation.
nan_policy ({'raise', 'propagate', 'omit'}, optional) – What to do when encountering NaNs when fitting Model.
calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True)
for solvers other than ‘leastsq’ and ‘least_squares’.
Requires the numdifftools package to be installed.
max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The
default value depends on the fitting method.
**fit_kws (optional) – Keyword arguments to send to minimization routine.
Fit spectra simulated via sampling from a reference distribution
This function performs fits of many simulated spectra. The simulated spectra
are created by sampling events from the best-fit PDF asociated with
fit_result (as e.g. needed for a parametric bootstrap). The alt_model
can be used to perform the fits with a different distribution than the
reference PDF used for the event sampling.
fit_result (lmfit.model.ModelResult) – Fit result object holding the best-fit distribution to sample from.
alt_result (lmfit.model.ModelResult, optional) – Fit result object holding a prepared fit model to be used for the
fitting. Defaults to the fit model stored in fit_result.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which
typically yields statistical uncertainty estimates with a Monte Carlo
uncertainty of a few percent.
randomize_ref_mus_and_amps (bool, default: False) – If True, the peak and background amplitudes and the peak centroids of
the reference spectrum to sample from will be varied assuming normal
distributions around the best-fit values with standard deviations given
by the respective standard errors stored in fit_result.
MC_shape_par_samples (pandas.DataFrame) – Monte Carlo shape parameter samples to use in the fitting.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated
spectra. When set to -1 (default) all available cores are used.
Returns:
MinimizerResults obtained in the fits of the simulated spectra.
The randomize_ref_mus_and_amps option allows one to propagate systematic
uncertainties in the determination of the reference parameters into the
Monte Carlo results. If varying the centroid and amplitude parameters of the
reference spectrum, the standard deviations of the parameter distributions
will be taken as the respective standard errors determined by lmfit (see
lmfit fit report table) and might not be consistent with the area and mass
uncertainty estimates shown in the peak properties table.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature:
[[eta_m1, eta_m2, …], [tau_m1, tau_m2, …], [eta_p1, eta_p2, …],
[tau_p1, tau_p2, …]]
N_samples (int, optional, default: 1) – Number of random events to sample.
Create simulated spectrum via non-parametric resampling from df.
The simulated data is obtained through resampling from the specified dataset
with replacement.
Parameters:
df (pandas.DataFrame) – Original histogrammed spectrum data to re-sample from.
N_events (int, optional) – Number of events to create via non-parametric re-sampling, defaults to
number of events in original DataFrame df.
x_cen (float [u/z], optional) – Center of mass range to re-sample from. If None, re-sample from
full mass range of input data df.
x_range (float [u/z], optional) – Width of mass range to re-sample from. Defaults to 0.02 u. x_range
is only relevant if a x_cen argument is specified.
'hist' for binned mass spectrum (default). The centres of the mass
bins must be specified with the bin_cens argument.
'array' for unbinned array of single ion and background events.
Returns:
If out=’hist’ a dataframe with a histogram of the format
[bin centre, counts in bin] is returned. If out=’array’ an unbinned
array with the x-values of single ion or background events is returned.
Create simulated detector events drawn from a user-defined probability
density function (PDF)
Events can either be output as a list of single events (mass stamps) or as a
histogram. In histogram output mode, uniform binning is easily realized by
specifying the N_bins argument. More control over the binning can be
achieved by parsing the desired bin centers to the bin_cens argument (e.g.
for non-uniform binning).
Parameters:
shape_pars (dict) – Peak-shape parameters to use for sampling. The dictionary must follow
the structure of the shape_cal_pars attribute of the
spectrum class.
mus (float or list of float) – Nominal peak positions of peaks in simulated spectrum.
amps (float or list of float [(counts in peak)*(bin width in u)]) – Nominal amplitudes of peaks in simulated spectrum.
bkg_c (float [counts per bin], optional, default: 0.0) – Nominal amplitude of uniform background in simulated spectrum.
scl_facs (float or list of float, optional) – Scale factors to use for scaling the scale-dependent shape parameters in
shape_pars to a given peak before sampling events. If None, no
shape-parameter scaling is applied.
N_events (int, optional, default: 1000) – Total number of events to simulate (signal and background events).
'hist' for binned mass spectrum (default). The centres of the mass
bins must be specified with the bin_cens argument.
'array' for unbinned array of single ion and background events.
N_bins (int, optional) – Number of uniform bins to use in 'hist' output mode. The outer
edges of the first and last bin are fixed to the start and end of the
sampling range respectively (i.e. x_min and x_max). In between, bins
are distributed with a fixed spacing of (x_max-x_min)/N_bins.
bin_cens (numpy.ndarray) – Centres of bins to use in 'hist' output mode. This argument
allows the realization of non-uniform binning. Bin edges are centred
between neighboring bins. Note: Bins outside the sampling range defined
with x_min and x_max will be empty.
Returns:
If out=’hist’ a dataframe with a histogram of the format
[bin centre, counts in bin] is returned. If out=’array’ an unbinned
array with the x-values of single ion or background events is returned.
Random events are created via custom hyper-EMG extensions of Scipy’s
scipy.stats.exponnorm.rvs() method.
Currently, all simulated peaks have identical width and shape (no re-scaling
of mass-dependent shape parameters to a peak’s mass centroid).
Mind the different units for peak amplitudes `amps`
(<counts in peak> * <bin width in x-axis units>) and the background level
`bkg_c` (counts per bin). When spectrum data is simulated counts are
distributed between the different peaks and the background with probability
weights amps / <bin width in u> and bkg_c * <number of bins>,
respectively. As a consequence, simply changing N_events (while keeping
all other arguments constant), will cause amps and bkg_c to deviate from
their nominal units.
Create a simulated spectrum using the attributes of a reference spectrum
The peak shape of the sampling probability density function (PDF)
follows the shape calibration of the reference spectrum (spec). By
default, all other parameters of the sampling PDF are identical to the
best-fit parameters of the reference spectrum. If desired, the positions,
amplitudes and number of peaks in the sampling PDF as well as the background
level can be changed with the mus, amps and bkg_c arguments.
Parameters:
spec (spectrum) – Reference spectrum object whose best-fit parameters will be used to
sample from.
mus (float or list of float, optional) – Nominal peak centres of peaks in simulated spectrum. Defaults to the
mus of the reference spectrum fit.
amps (float or list of float [(counts in peak)*(bin width in u)], optional) – Nominal amplitudes of peaks in simulated spectrum. Defaults to the
amplitudes of the reference spectrum fit.
scl_facs (float or list of float, optional) – Scale factors to use for scaling the scale-dependent shape parameters in
shape_pars to a given peak before sampling events. Defaults to the
scale factors asociated with the fit results stored in the reference
spectrum’s spectrum.fit_results attribute.
bkg_c (float [counts per bin], optional) – Nominal amplitude of uniform background in simulated spectrum. Defaults
to the c_bkg obtained in the fit of the first peak in the reference
spectrum.
x_cen (float, optional) – Center of simulated x-range. Defaults to x_cen of spec.
x_range (float, optional) – Covered x-range of simulated spectrum. Defaults to x_range of
spectrum.
N_events (int, optional) – Number of ion events to simulate (including background events). Defaults
to total number of events in spec.
copy_spec (bool, optional, default: False) – If False (default), this function returns a fresh
spectrum object created from the simulated
mass data. If True, this function returns an exact copy of spec with
only the data attribute replaced by the new simulated mass data.
Returns:
If copy_spec = False (default) a fresh spectrum object holding the
simulated mass data is returned. If copy_spec = True, a copy of the
reference spectrum spec is returned with only the data
attribute replaced by the new simulated mass data.
Random events are created via custom Hyper-EMG extensions of Scipy’s
scipy.stats.exponnorm.rvs() method.
The returned spectrum follows the binning of the reference spectrum.
Mind the different units for peak amplitudes amps
(<counts in peak> * <bin width in x-axis units>) and the background level
bkg_c (counts per bin). When spectrum data is simulated counts are
distributed between the different peaks and the background with probability
weights amps / <bin width in x-axis units> and bkg_c * <number of bins>,
respectively. As a consequence, simply changing N_events (while keeping
all other arguments constant), will cause amps and bkg_c to deviate from
their nominal units.
x_pos (float [u/z]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter for
the underlying Gaussian peak centroid mu will be initialized at this
value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak.
Species strings follow the :-notation of chemical substances.
Examples: '1K39:-1e', 'K39:-e', '3H1:1O16:-1e'.
Do not forget to substract the electron, otherwise the atomic not
the ionic mass would be used as literature value!
Alternatively, tentative assigments can be made by adding a '?' at
the end of the species string (e.g.: 'Sn100:-1e?', '?', …).
cost_func (str) – Type of cost function used to fit peak ('chi-square' or 'MLE').
method (str) – Name of optimization algorithm used to minimize cost function. This
attribute is only shown in the peak properties table when any minimizers
other than least_squares() were used.
red_chi (float) – Reduced chi-squared of peak fit. If the peak was fitted using 'MLE',
red_chi should be taken with caution.
area (float [counts]) – Number of total counts in the peak (calculated from amplitude parameter
amp of peak fit).
area_error (float [counts]) – Uncertainty of the total number of counts in the peak.
m_ion (float [u]) – Ionic mass value obtained in peak fit (after mass recalibration and
corrected for respective charge state).
rel_stat_error (float) – Relative statistical uncertainty of m_ion.
rel_recal_error (float) – Relative uncertainty of m_ion due to mass recalibration.
rel_peakshape_error (float) – Relative peak-shape uncertainty of m_ion.
rel_mass_error (float) – Total relative mass uncertainty of m_ion (excluding systematics!).
Includes statistical, peak-shape and recalibration uncertainty.
atomic_ME_keV (float [keV]) – (Atomic) mass excess corresponding to m_ion.
mass_error_keV (float [keV]) – Total mass uncertainty of m_ion (excluding systematics!).
m_dev_keV (float [keV]) – Deviation from literature value (m_ion - m_AME).
If a valid ion species is assigned with the species argument the
corresponding literature values will automatically be fetched from the
AME database.
If different literature values are to be used, the literature mass or
mass uncertainty can be user-defined with m_AME and
m_AME_error. This is useful for isomers and in cases where more
recent measurements haven’t been included in the AME yet.
Parameters:
x_pos (float [u/z]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter
for the (Gaussian) peak centroid mu will be initialized at this
value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak.
Species strings follow the :-notation of chemical substances.
Examples: '1K39:-1e', 'K39:-e', '3H1:1O16:-1e'.
Do not forget to substract the electron from singly-charged
species, otherwise the atomic not the ionic mass will be used as
literature value! Alternatively, tentative assigments can be made by
adding a '?' at the end of the species string
(e.g.: 'Sn100:-1e?', '?', …).
m_AME (float [u], optional) – User-defined literature mass value. Overwrites value fetched from
AME. Useful for isomers or to use more up-to-date values.
m_AME_error (float [u], optional) – User-defined literature mass uncertainty. Overwrites value fetched
from AME.
Ex (float [keV], optional, default : 0.0) – Isomer excitation energy (in keV) to add to ground-state literature
mass. Irrelevant if the m_AME argument is used or if the peak is
not labelled as isomer.
Ex_error (float [keV], optional, default : 0.0) – Uncertainty of isomer excitation energy (in keV) to add in
quadrature to ground-state literature mass uncertainty. Irrelevant
if the m_AME_error argument is used or if the peak is not labelled
as isomer.
lit_src (str, optional, default: 'AME2020') – Source of literature mass data (either ‘AME2016’ or ‘AME2020’). If
AME2016 is used, 'lit_src:AME2016' is added to the peak
comment.
Updates peak attributes with AME values for specified species.
Updates the m_AME, m_AME_error, extrapolated,
A, z and m_dev_keV peak attributes with AME
values.
Parameters:
Ex (float [keV], optional, default : 0.0) – Isomer excitation energy (in keV) to add to ground-state literature
mass.
Ex_error (float [keV], optional, default : 0.0) – Uncertainty of isomer excitation energy (in keV) to add in
quadrature to ground-state literature mass uncertainty.
lit_src (str, optional, default : 'AME2020') – Source of literature mass data (either ‘AME2016’ or ‘AME2020’). If
AME2016 is used, 'lit_src:AME2016' is added to the peak
comment.
shape_cal_pars (dict) – Model parameter values obtained in peak-shape calibration.
shape_cal_errors (dict) – Model parameter uncertainties obtained in peak-shape calibration.
share_shape_pars (bool, default: True) – Whether to enforce a shared peak shape for all fitted peaks. The shared
shape parameters are pre-determined with determine_peak_shape().
scale_shape_pars (bool, default: False) – Whether to scale the scale-dependent parameters obtained in the
peak-shape calibration with the given peak’s peak.scl_coeff.
scale_shape_to_peak_cen (bool, default: False) – Whether to scale the scale-dependent parameters obtained in the
peak-shape calibration to the centroid of the given peak.
index_mass_calib (int) – Peak index of mass calibrant peak.
determined_A_stat_emg (bool) – Boolean flag for whether A_stat_emg was determined for this
spectrum specifically using the determine_A_stat_emg() method.
If True, A_stat_emg was set using
determine_A_stat_emg(), otherwise the default value
emgfit.config.A_stat_emg_default from the config module
was used. For more details see docs of determine_A_stat_emg()
method.
A_stat_emg (float) – Constant of proportionality for calculation of the statistical mass
uncertainties. Defaults to emgfit.config.A_stat_emg_default
as defined in the config module, unless the
determine_A_stat_emg() method is run.
A_stat_emg_error (float) – Uncertainty of A_stat_emg.
recal_fac (float, default: 1.0) – Scaling factor applied to m_ion in mass recalibration.
rel_recal_error (float) – Relative uncertainty of recalibration factor recal_fac.
recal_facs_pm (dict) – Modified recalibration factors obtained in peak-shape uncertainty
evaluation by varying each shape parameter by plus and minus 1 standard
deviation, respectively.
eff_mass_shifts (numpy.ndarray of dict) – Maximal effective mass shifts for each peak obtained in peak-shape
uncertainty evaluation by varying each shape parameter by plus and minus
1 standard deviation and only keeping the shift with the larger absolute
magnitude. The eff_mass_shifts array contains a dictionary for each
peak; the dictionaries have the following structure:
{‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…}
For more details see docs of _eval_peakshape_errors().
area_shifts (numpy.ndarray of dict) – Maximal area change for each peak obtained in peak-shape uncertainty
evaluation by varying each shape parameter by plus and minus 1 standard
deviation and only keeping the shift with the larger absolute magnitude.
The eff_mass_shifts array contains a dictionary for each peak; the
dictionaries have the following structure:
{‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…}
For the mass calibrant the dictionary holds the absolute shifts of the
calibrant peak centroid (calibrant centroid shift). For more
details see docs of _eval_peakshape_errors().
peaks_with_errors_from_resampling (list of int) – List with indeces of peaks whose statistical mass and area uncertainties
have been determined by fitting synthetic spectra resampled from the
best-fit model (see get_errors_from_resampling()).
MC_recal_facs (list of float) – Recalibration factors obtained in fits with Markov Chain Monte Carlo
(MCMC) shape parameter samples in get_MC_peakshape_errors().
peaks_with_MC_PS_errors (list of int) – List with indeces of peaks for which peak-shape errors have been
determined by re-fitting with shape parameter sets from Markov Chain
Monte Carlo sampling (see get_MC_peakshape_errors()).
peaks (list of peak) – List containing all peaks associated with the spectrum sorted by
ascending mass. The index of a peak within the peaks list is referred
to as the peak_index.
blinded_peaks (list of int) – List with indeces of peaks whose mass values and peak positions are to
be hidden to enable blind analysis. The mass values will be unblinded
upon export of the analysis results.
mass_number (int) – Atomic mass number associated with central bin of spectrum.
resolving_power (float, optional) – Typical resolving power of the spectrometer at FWHM level.
default_fit_range (float [u/z]) – Default x-range for fits, scaled to mass_number of spectrum.
default_lit_src (str, optional) – Source of literature mass data - either ‘AME2020’ (default) or
‘AME2016’. If AME2016 is used, 'lit_src:AME2016' is added
as flag to the respective peak comments.
Notes
The mass_number is used for re-scaling of the default model
parameters to the mass of interest. It is calculated upon data import by
taking the median of all mass bin centers (after initial cutting of the
spectrum) and rounding to the closest integer. This accounts for spectra
potentially containing several mass units.
The default_fit_range is scaled to the spectrum’s mass_number
using the relation:
\(\text{default_fit_range} = 0.01\,u \cdot (\text{mass_number}/100)\)
Create a spectrum object from histogrammed mass data
By default the data is loaded from an input file of the format:
two-column .csv- or .txt-file with tab-separated values
(column 1: mass bin, column 2: counts in bin).
Alternatively, the data can be passed as a DataFrame to the df
argument.
Optionally the spectrum can be cut to a specified fit range using the
m_start and m_stop parameters. Mass data outside this range will be
discarded and excluded from further analysis.
If show_plot is True, a plot of the spectrum is shown including
vertical markers for the m_start and m_stop mass cut-offs
(if applicable).
Parameters:
filename (str, optional) – Filename of mass spectrum to analyze. If the input file is not
located in the working directory the directory path has to be
included in filename, too. If no filename is given, data must
be provided via df argument.
m_start : float [u/z], optional
Start of fit range, data at lower m/z will be discarded.
m_stop : float [u/z], optional
Stop of fit range, data at higher m/z will be discarded.
show_plot (bool, optional, default: True) – If True, shows a plot of full spectrum with vertical markers for
m_start and m_stop cut-offs.
df (pandas.DataFrame, optional) – DataFrame with spectrum data to use, this enables the creation of a
spectrum object from a DataFrame instead of from an external file.
resolving_power (float, optional) – Typical resolving power of the spectrometer at FWHM level. Defaults
to 3e05.
default_fit_range (float [u/z], optional) – Default x-range for fits, scaled to mass_number of spectrum.
Defaults to 0.01*(mass_number/100).
default_lit_src (str, optional) – Source of literature mass data - either ‘AME2020’ (default) or
‘AME2016’. If AME2016 is used, 'lit_src:AME2016' is
added as flag to the respective peak comments.
Notes
The option to import data via the df argument was added to enable the
processing of bootstrapped spectra as regular spectrum objects
in the determine_A_stat_emg() method. This feature is primarily
intended for internal use. The parsed DataFrame must have an index
column named ‘m/z [u]’ and a value column named ‘Counts’.
Get peak-shape uncertainties for a fit result by re-fitting with many
different MC-shape-parameter sets
This method is primarily intended for internal usage.
A representative subset of the shape parameter sets which are supported
by the data is obtained by performing MCMC sampling on the peak-shape
calibrant. If this has not already been done using the map_par_covar
option in determine_peak_shape(), the _get_MCMC_par_samples()
method will be automatically called here.
The peaks specified by peak_indeces will be fitted with N_samples
different shape parameter sets. The peak-shape uncertainties are then
estimated as the RMS deviation of the obtained values from the best-fit
values.
The mass calibrant must either be included in peak_indeces or must
have been processed with this method upfront (using the same N_samples
and seed arguments to ensure identical sets of peak-shapes).
Parameters:
peak_indeces (int or list of int) – Indeces of peaks to evaluate MC peak-shape uncertainties for. The
peaks of interest must belong to the same fit_result.
fit_result (EMGModelResult, optional) – Fit result for which MC peak-shape uncertainties are to be evaluated
for. Defaults to the fit result stored for the peaks of interest in
the spectrum.fit_results spectrum attribute.
verbose (bool, optional) – Whether to print status updates.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas
obtained with the MC shape parameter sets are shown. Black vertical
lines indicate the best-fit values stored in fit_result.
N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated
spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
rerun_MCMC_sampling (bool, optional) – When False (default) pre-existing MCMC parameter samples (e.g.
obtained with determine_peak_shape()) are used. If True or
when there’s no pre-existing MCMC samples, the MCMC sampling will be
performed by this method.
**MCMC_kwargs – Keyword arguments to send to _get_MCMC_par_samples() for
control over the MCMC sampling.
Returns:
Peak-shape mass errors [u], peak-shape area errors [counts]
Both arrays have the same length as peak_indeces.
This method only supports peaks that belong to the same fit result. If
peaks in multiple fit_results are to be treated or the peak properties
are to be updated with the refined peak-shape errors use
get_MC_peakshape_errors() which wraps around this method.
Calculate the relative peak-shape uncertainty of the specified peaks.
This internal method is automatically called by the :meth:`fit_peaks`
and :meth:`fit_calibrant` methods and does not need to be run directly
by the user.
The peak-shape uncertainties are obtained by re-fitting the specified
peaks with each shape parameter individually varied by plus and minus 1
sigma and recording the respective shift of the peak centroids w.r.t the
original fit. From the shifted IOI centroids and the corresponding
shifts of the calibrant centroid, effective mass shifts are determined.
For each varied parameter, the larger of the two eff. mass shifts are
then added in quadrature to obtain the total peak-shape uncertainty.
See Notes section below for a detailed explanation of the peak-shape
error evaluation scheme.
Note: All peaks in the specified peak_indeces list must
have been fitted in the same multi-peak fit (and hence have the same
lmfit ModelResult fit_result)!
This routine does not yield a peak-shape error for the mass calibrant,
since this is zero by definition. Instead, for the mass calibrant the
absolute shifts of the peak centroid are calculated and stored in the
eff_mass_shifts_pm and eff_mass_shifts dictionaries.
Parameters:
peak_indeces (list) – List containing indeces of peaks to evaluate peak-shape uncertainty
for, e.g. to evaluate peak-shape error of peaks 0 and 3 use
peak_indeces=[0,3].
sigma,`theta`, all eta and all tau model parameters are considered
“shape parameters” and varied by plus and minus one standard deviation
in the peak-shape uncertainty evaluation. The peak amplitude, centroids
and the baseline are always freely varying.
The “peak-shape uncertainty” refers to the mass uncertainty due to
uncertainties in the determination of the peak-shape parameters and due
to deviations between the shape-calibrant and IOI peak shapes.
Simply put, the peak-shape uncertainties are estimated by evaluating how
much a given peak’s ionic mass is shifted when the shape parameters are
varied by plus or minus their 1-sigma uncertainty. A peculiarity of
emgfit’s peak-shape error estimation routine is that only the centroid
shifts relative to the calibrant are taken into account (hence
‘effective mass shifts’).
Inspired by the approach outlined in [8], the peak-shape uncertainties
are obtained via the following procedure:
Since only effective mass shifts corrected for the corresponding
shifts of the calibrant peak enter the peak-shape uncertainty,
at first, the absolute centroid shifts of the mass calibrant must be
evaluated. There are two options for this:
If the calibrant index is included in the peak_indeces argument,
the original calibrant fit is re-performed with each shape parameter
varied by plus and minus its 1-sigma confidence respectively while
all other shape parameters are kept fixed at the original best-fit
values. The resulting absolute “calibrant centroid shifts” are
recorded and stored in the spectrum’s eff_mass_shifts_pm
dictionary. The shifted calibrant centroids are further used to
calculate updated mass re-calibration factors. These are stored in
the recal_facs_pm dictionary. Only the larger of the two
centroid shifts due to the +/-1-sigma variation of each shape
parameter are stored in the spectrum’s eff_mass_shifts
dictionary.
If the calibrant is not included in the peak_indeces list, the
calibrant centroid shifts and the corresponding shifted
recalibration factors must already have been obtained in a foregoing
mass Mass recalibration and calculation of final mass values.
All non-calibrant peaks referenced in peak_indeces are treated in a
similar way. The original fit that yielded the specified fit_result
is re-performed with each shape parameter varied by plus and minus its
1-sigma confidence respectively while all other shape parameters are
kept fixed at the original best-fit values. However now, the effective
mass shifts after correction with the corresponding updated
recalibration factor are recorded and stored in the spectrum’s
eff_mass_shifts_pm dictionary. Only the larger of the two
eff. mass shifts caused by the +/-1-sigma variation of each shape
parameter are stored in the spectrum’s eff_mass_shifts
dictionary.
The estimates for the total peak-shape uncertainty of each peak are
finally obtained by adding the eff. mass shifts stored in the
eff_mass_shifts dictionary in quadrature.
Mind that peak-shape area uncertainties are only calculated for ions-of-
interest, not for the mass calibrant.
Map out parameter covariances and posterior distributions using
Markov-chain Monte Carlo (MCMC) sampling
MCMC results saved in result_emcee attribute of fit_result.
This method is intended for internal usage and for single peaks
only.
Parameters:
fit_result (emgfit.model.EMGModelResult) – Fit result to explore with MCMC sampling. Since emcee only
efficiently samples unimodal distributions, fit_result should
ideally hold the result of a single-peak fit (typically the shape
calibrant fit).
steps (int, optional) – Number of MCMC sampling steps.
burn (int, optional) – Number of initial sampling steps to discard (“burn-in” phase).
thin (int, optional) – After sampling, only every thin-th sample is used for further
treatment. It is recommended to set thin to at least half the
autocorrelation time.
show_MCMC_fit_result (bool, optional, default: False) – If True, a maximum likelihood estimate is derived from the MCMC
samples with best-fit values estimated by the median of the samples.
The MCMC MLE result can be compared to the conventional fit_result
as an additional crosscheck.
covar_map_fname (str or None (default), optional) – If not None, the parameter covariance map will be saved as
“<covar_map_fname>_covar_map.png”.
n_cores (int, optional, default: -1) – Number of CPU cores to use for parallelized sampling. If -1
(default) all available cores will be used.
MCMC_seed (int, optional) – Random state for reproducible sampling.
Notes
Markov-Chain Monte Carlo (MCMC) algorithms are a powerful tool to
efficiently sample the posterior probability density functions (PDFs) of
model parameters. In simpler words: MCMC methods can be used to estimate
the distributions of parameter values which are supported by the data.
An MCMC algorithm sends out a number of so-called walkers on stochastic
walks through the parameter space (in this method the number of MCMC
walkers is fixed to 20 times the number of varied parameters). The MCMC
walkers are initialized with randomized parameter values drawn from
truncated normal distributions defined by the respective parameter
bounds and best-fit parameter values and uncertainties stored in
fit_result. MCMC methods are particularly important in situations
where conventional sampling techniques become intractable or
inefficient. For MCMC sampling emgfit deploys lmfit’s implementation of
the emcee.EnsembleSampler from the emcee package [1]. Since
emcee’s EnsembleSampler is only optimized for uni-modal
probability density functions this method should ideally only be used to
explore the parameter space of a single-peak fit.
A complication with MCMC methods is that there is usually no rigorous
way to prove that the sampling chain has converged to the true PDF.
Instead it is at the user’s disgression to decide after how many
sampling steps a sufficient amount of convergence is achieved. Gladly,
there is a number of heuristic tools that can help in judging
convergence. The most common measure of the degree of convergence is the
integrated autocorrelation time (tau). If the integrated
autocorrelation time shows only small changes over time the MCMC chain
can be assumed to be converged. To ensure a sufficient degree of
convergence this method will issue a warning whenever the number of
performed sampling steps is smaller than 50 times the integrated
autocorrelation time of at least one parameter. If this rule of thumb is
violated it is strongly advisable to run a longer chain. An additonal
aid in judging the performance of the MCMC chain are the provided plots
of the MCMC traces. These plots show the paths of all MCMC walkers
through parameter space. Dramatic changes of the initial trace envelopes
indicate that the chain has not reached a stationary state yet and is
still in the so-called “burn-in” phase. Samples in this region are
discarded by setting the burn argument to an appropriate number of
steps (default burn-in: 500 steps).
Another complication of MCMC algorithms is the fact that nearby samples
in a MCMC chain are not indepedent. To reduce correlations between
samples MCMC chains are usually “thinned out” by only storing the result
of every m-th MCMC iteration. The number of steps after which two
samples can be assumed to be uncorrelated/independent (so to say the
memory of the chain) is given by the integrated autocorrelation time
(tau). To be conservative, emgfit uses a thinning interval of m=250
by default and issues a warning when m<tau for at least one of the
parameters. Since more data is discarded, a larger thinning interval
comes with a loss of precision of the posterior PDF estimates. However,
a sufficient amount of thinning is still advisable since emgfit’s MC
peak-shape error determination (get_MC_peakshape_errors()) relies
on independent parameter samples.
As a helpful measure for tuning MCMC chains, emgfit provides a plot of
the “acceptance fraction” for each walker, i.e. the fraction of
suggested walker steps which were accepted. The developers of emcee’s
EnsembleSampler suggest acceptance fractions between 0.2 and 0.5 as a
rule of thumb for a well-behaved chain. Acceptance fractions falling
short of this for many walkers can indicate poor initialization or a too
small number of walkers.
Get statistical and area uncertainties via resampling from best-fit
PDF.
This method provides bootstrap estimates of the statistical errors and
peak area errors by evaluating the scatter of peak centroids and areas
in fits of many simulated spectra. The simulated spectra are created by
sampling events from the best-fit PDF asociated with fit_result
(parametric bootstrap). Refined errors are calculated for each peak
individually by taking the sample standard deviations of the obtained
peak centroids and areas.
peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. errors
for, e.g. to evaluate peak-shape error of peaks 1 and 2 use
peak_indeces=[1,2]. Listed peaks must be included in
fit_result. Defaults to all peaks contained in fit_result.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which
typically yields statistical uncertainty estimates with a relative
precision of a few percent.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated
spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If True, histograms of the obtained peak centroids and areas are
shown. Black vertical lines indicate the best-fit values obtained
from the measured data.
Returns:
Array with statistical errors [u], array with area errors [counts]
Array elements correspond to the results for the peaks selected in
peak_indeces (in ascending order). If peak_indeces has not been
specified it defaults to the indeces of all peaks contained in
fit_result.
All peaks for which refined errors are to be evaluated must belong to
the same lmfit ModelResult fit_result. Even if refined stat. errors
are only to be extracted for a subset of the peaks contained in
fit_result (as specified with peak_indeces), fits will be
re-performed over the same x-range as fit_result.
Reset all fit-related spectrum and peak attributes to their defaults
Note
This method also resets all mass-calibration-related peak properties and
spectrum attributes to their default values but does not affect the results
obtained in the peak-shape calibration.
This method is based on the convolution of a normalized window with the
signal. The signal is prepared by introducing reflected copies of the
signal (with the window size) in both ends so that transient parts are
minimized in the begining and end part of the output signal.
Parameters:
x (numpy.array) – The input data
window_len (odd int, optional) – Length of the smoothing window; must be an odd integer!
window (str, optional) – Type of window from ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’,
‘blackman’, flat window will produce a moving average smoothing.
Update the peak properties using the given ‘fit_result’.
Intended for internal use only.
The values of the mass calibrant will not be changed by
this routine.
Parameters:
peaks (list) – List of indeces of peaks to update. (To get peak indeces, see plot
markers or consult the peak properties table by calling the
spectrum.show_peak_properties() method)
All peaks referenced by the ‘peaks’ argument must belong to the same
fit_result. Not necessarily all peaks contained in fit_result will
be updated, only the properties of peaks referenced with the peaks
argument will be updated.
The position of the peak must be specified with the x_pos argument.
If the peak’s ionic species is provided with the species argument the
corresponding AME literature values will be added to the peak.
Alternatively, user-defined literature values can be provided with the
m_AME and m_AME_error arguments. This option is helpful for isomers
or in case of very recent measurements that haven’t entered the AME yet.
Parameters:
x_pos (float [u/z]) – Position of peak to be added.
species (str, optional) – species label for peak to be added following the
:-notation of chemical substances. If assigned, peak.m_AME,
peak.m_AME_error & peak.extrapolated are
automatically updated with the corresponding AME literature values.
m_AME (float [u], optional) – User-defined literature mass of peak to be added. Overwrites pre-
existing peak.m_AME value.
m_AME_error (float [u], optional) – User-defined literature mass uncertainty of peak to be added.
Overwrites pre-existing peak.m_AME_error.
Ex (float [keV], optional, default: 0.0) – Excitation energy of isomeric state in keV. When the peak is
labelled as an isomer its literature mass peak.m_AME
is calculated by adding Ex to the AME ground-state mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of the excitation energy of the isomeric state in keV.
When the peak is labelled as isomer its literature mass uncertainty
peak.m_AME_error is calculated by adding Ex_error and the
AME uncertainty of the ground-state mass in quadrature.
lit_src (str, optional) – Source of literature mass data (either of ‘default’, ‘AME2016’ or
‘AME2020’). If ‘default’, the literature source defined globally
with the default_lit_src spectrum attribute is used. If
AME2016 is used, 'lit_src:AME2016' is added to the
respective peak comment.
A (int, optional) – Atomic mass number of species (only relevant when species is
undefined).
z (int, optional) – Charge state of species (only relevant when species is undefined).
verbose (bool, optional, default: True) – If True, a message is printed after successful peak addition.
Intended for internal use only.
By default the comment argument will be appended to the end of the
current peak.comment attribute (if the current comment is ‘-’ it
is overwritten by the comment argument). If overwrite is set True,
the current peak.comment is overwritten with the ‘comment’
argument.
x_pos of peak to add comment to (must be specified up to 6th
decimal).
species (str, optional) – species of peak to add comment to.
overwrite (bool) – If True the current peak comment will be overwritten
by comment, else comment is appended to the end of the current
peak comment.
Note
The shape and mass calibrant peaks are automatically marked during the
shape and mass calibration by inserting the protected flags
'shapecalibrant', 'masscalibrant' or
'shapeandmasscalibrant' into their peak comments. When
user-defined comments are added to these peaks, it is ensured that the
protected flags cannot be overwritten. The above shape and mass
calibrant flags should never be added to comments manually by the
user!
By default the comment argument will be appended to the end of the
current spectrum_comment attribute. If overwrite is set to
True the current spectrum_comment is overwritten with
comment.
overwrite (bool) – If True, the current spectrum_comment attribute will be
overwritten with comment, else comment is appended to the end of
spectrum_comment.
Assign species label(s) to a single peak (or all peaks at once).
If no single peak is selected with peak_index or x_pos, a list with
species names for all peaks in the peak list must be passed to
species. For already specified or unkown species insert None as a
placeholder into the list to skip the species assignment for this peak.
See Notes and Examples sections below for details on usage.
Parameters:
species (str or list of str) – The species name (or list of name strings) to be assigned to the
selected peak (or to all peaks). For unkown or already assigned
species, None should be inserted as placeholder at the
corresponding position in the species list. species names
must follow the :-notation of chemical substances.
peak_index (int, optional) – Index of single peak to assign species name to.
x_pos (float [u/z], optional) – x_pos of single peak to assign species name to. Must be
specified up to 6th decimal.
Ex (float [keV], optional, default: 0.0) – Excitation energy of isomeric state in keV. When the peak is
labelled as isomer its literature mass peak.m_AME is
calculated by adding Ex to the AME ground-state mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of the excitation energy of the isomeric state in keV.
When the peak is labelled as isomer its literature mass uncertainty
peak.m_AME_error is calculated by adding Ex_error and the
AME uncertainty of the ground-state mass in quadrature.
lit_src (str, optional) – Source of literature mass data - either of ‘default’, ‘AME2016’ or
‘AME2020’. If ‘default’, the literature source defined globally
with the default_lit_src spectrum attribute is used. If
‘AME2016’ is used, 'lit_src:AME2016' is added to the
respective peak comment(s).
Assignment of a single peak species:
select peak by specifying peak position x_pos (up to 6th decimal) or
peak_index argument (0-based! Check for peak index by calling
show_peak_properties() method on spectrum object).
Assignment of multiple peak species:
Nothing should be passed to the ‘peak_index’ and ‘x_pos’ arguments.
Instead the user specifies a list of the new species strings to the
species argument (if there’s N detected peaks, the list must have
length N). Former species assignments can be kept by inserting blanks
at the respective position in the species list, otherwise former
species assignments are overwritten, also see examples below for
usage.
Tentative assignments and isomers:
Use '?' at the end of the species string or constituent element
strings to indicate tentative assignments. Literature values are
also fetched for peaks with tentative assignments.
Isomeric species:
Isomers can be marked by appending a 'm' or 'm0' up to
'm9' to the end of the respective element substring in species.
For isomers no literature values are calculated unless the respective
excitation energy is manually specified with the Ex argument.
Examples
Assign the peak with peak_index 2 (third-lowest-mass peak) as
‘1Cs133:-1e’, leave all other peaks unchanged:
>>> importemgfitasemg>>> spec=emg.spectrum(<input_file>)# mock code for foregoing data import>>> spec.detect_peaks()# mock code for foregoing peak detection>>> spec.assign_species('1Cs133:-1e',peak_index=2)
Assign multiple peaks:
>>> importemgfitasemg>>> spec=emg.spectrum(<input_file>)# mock code for foregoing data import>>> spec.detect_peaks()# mock code for foregoing peak detection>>> spec.assign_species(['1Ru102:-1e','1Pd102:-1e','Rh102:-1e?',None,'1Sr83:1F19:-1e','?'])
This assigns the species of the first, second, third and fourth peak
with the respective labels in the specified list and fetches their AME
values. The ‘?’ ending of the 'Rh102:-1e?' argument indicates a
tentative species assignment, mind that literature values will still be
calculated for this peak. Equivalently, 'Rh102?:-1e' could have been
used. The None argument leaves the species assignment of the 4th
peak unchanged. The '?' argument overwrites any former species
assignments to the last peak and marks the peak as unidentified.
Mark peaks as isomers:
>>> importemgfitasemg>>> spec=emg.spectrum(<input_file>)# mock code for foregoing data import>>> spec.detect_peaks()# mock code for foregoing peak detection>>> spec.assign_species('1In127:-1e',peak_index=0)# ground state>>> spec.assign_species('1In127m:-1e',peak_index=1)# first isomer>>> spec.assign_species('1In127m1:-1e',peak_index=2,Ex=1863,>>> Ex_error=58)# second isomer
The above assigns peak 0 as ground state and fetches the corresponding
literature values. Peak 1 is marked as the first isomeric state of
In-127 but no literature values are calculated (since Ex is not
specified). Peak 2 is marked as the second isomeric state of In-127 and
the literature mass and its uncertainty are calculated from the
respective ground-state AME values and the provided Ex and Ex_error.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the
corresponding fit result from the spectrum’s fit_results
list will be fetched.
Returns:
Full width at half maximum of Hyper-EMG fit of peak of interest.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the
corresponding fit result from the spectrum’s fit_results
list will be used.
Calculate the peak area (counts in peak) and its stat. uncertainty.
Area and area error are calculated using the peak’s amplitude parameter
amp and the width of the uniform binning of the spectrum. Therefore,
the peak must have been fitted beforehand. In the case of overlapping
peaks only the counts within the fit component of the specified peak are
returned.
Note
This routine assumes the bin width to be uniform across the spectrum.
The mass binning of most mass spectra is not perfectly uniform
(usually time bins are uniform such that the width of mass bins has a
quadratic scaling with mass). However, for isobaric species the
quadratic term is usually so small that it can safely be neglected.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result object to use for area calculation. If None (default)
use corresponding fit result stored in
fit_results list.
decimals (int) – Number of decimals of returned output values.
Returns:
List with peak area and area error in format [area, area_error].
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the
corresponding fit result from the spectrum’s fit_results
list will be used.
Returns:
Standard deviation of Hyper-EMG fit of peak of interest.
Create a multi-peak composite model with the specified peak shape.
Primarily intended for internal usage.
Parameters:
peaks_to_fit (list of peak) – peaks to be fitted with composite model.
model (str, optional) – Name of fit model to use for all peaks (e.g. 'Gaussian',
'emg12', 'emg33', … - for full list see
Available fit models).
init_pars (dict, optional, default: None) – Default initial shape parameters for fit model. If None the
default parameters defined in the fit_models module
will be used after scaling to the spectrum’s mass_number.
For more details and a list of all shape parameters see the
Peak-shape calibration article.
vary_shape (bool, optional) – If False only the amplitude (amp) and Gaussian centroid (mu)
model parameters will be varied in the fit. If True, the shape
parameters (sigma, theta, etas and taus) will also be
varied.
vary_baseline (bool, optional) – If True a varying uniform baseline will be added to the fit
model as varying model parameter bkg_c. If False, the baseline
parameter bkg_c will be kept fixed at 0.
share_shape_pars (bool, optional, default: True) – Whether to enforce a shared peak shape for all peaks.
scale_shape_pars (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters of the
shape-reference peak with the peaks.scl_coeff. See Notes for
details.
scale_shape_to_peak_cen (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters to the
centroid mu of the underlying Gaussian of a given peak. Requires
scale_shape_pars to be True. See Notes for details.
The initial amplitude for each peak is estimated from the product of the
number of counts in the bin at the peak’s x_pos and the initial
value for the sigma of the underlying Gaussian. The result is multiplied
by an empirically determined proportionality factor of 3. Although this
number is somewhat shape dependent, this approach yields decent initial
amplitudes for peaks that are reasonably close to a Gaussian. If user
intervention becomes necessary, the init_par_hints option of the
peakfit() method can be used to overwrite the initial value of the
peak amplitude.
The initial value for the centroid of the underlying Gaussian (mu) is
estimated using the equation for the mode of the hyper-EMG distribution
see emgfit.fit_models.get_mu0() for details.
If share_shape_pars=True, a shape-reference peak is used to impose a
common peak shape on all fitted peaks (except for optional scaling of
the scale-dependent shape parameters). For fits performed before the
peak-shape calibration , the first peak in peaks_to_fit is used as the
shape-reference peak. For the peak-shape calibration and all subsequent
fits, the shape-calibrant peak is used as the shape-reference peak. If
the shape calibrant does not fall into the fit range, the shape
parameters obtained in the peak-shape calibration (stored in
spectrum.shape_cal_pars) are added as fixed parameters to the
returned model.
There is two options to scale the shape parameters of the
shape-reference peak to a given peak:
If scale_shape_pars=True and scale_shape_to_peak_cen=False:
The scale-dependent shape-reference parameters (‘sigma’ and ‘tau’s)
are multiplied with a fixed scale factor scl_fac=scl_coeff, where
scl_coeff is the peak.scl_coeff attribute of the given
peak.
If scale_shape_pars=True and scale_shape_to_peak_cen=True:
The scale-dependent shape-reference parameters (‘sigma’ and ‘tau’s)
are multiplied with a varying scale factor
scl_fac=scl_coeff*(mu/mu_ref), where mu and mu_ref are
the Gaussian centroids of the given peak and the shape-reference
peak, respectively. Once a peak-shape calibration has been performed,
mu_ref is taken as the shape-calibrant centroid obtained in the
peak-shape calibration. Otherwise, mu_ref is defined by the
varying (Gaussian) centroid of the shape-reference peak.
This routine finds peaks from minima in a scaled second derivative of
the spectrum data after first applying some smoothing. This
approach enables very sensitive yet robust detection, even for partially
overlapping peaks. The thres, window_len & width parameters can be
used to tune the algorithm to the specific data for maximum sensitivity.
Parameters:
thres (float, optional) – Threshold for peak detection in the inverted and scaled second
derivative of the smoothed spectrum.
window_len (odd int, optional) – Length of window used for smoothing the spectrum (in no. of bins).
Must be an ODD integer.
window (str, optional) – The window function used for smooting the spectrum. Defaults to
'blackman'. Other options: 'flat', 'hanning',
'hamming', 'bartlett'. See also NumPy window functions.
width (float [u/z], optional) – Minimal FWHM of peaks to be detected. Caution: To achieve maximal
sensitivity for overlapping peaks this number might have to be set
to less than the peak’s FWHM! In challenging cases use the plot of
the scaled inverted second derivative (by setting plot_2nd_deriv
to True) to ensure that the detection threshold is set properly.
plot_smoothed_spec (bool, optional) – If True a plot with the original and the smoothed spectrum is
shown.
plot_2nd_deriv (bool, optional) – If True a plot with the scaled, inverted second derivative of
the smoothed spectrum is shown.
plot_detection_result (bool, optional) – If True a plot of the spectrum with markers for the detected
peaks is shown.
Determine the constant of proprotionality A_stat_emg for
calculation of the statistical uncertainties of Hyper-EMG fits.
This method updates the A_stat_emg & A_stat_emg_error
spectrum attributes. The former will be used for all subsequent stat.
error estimations.
This routine must be called AFTER a successful peak-shape calibration
and should be called BEFORE the mass re-calibration.
A_stat_emg is determined by evaluating the statistical fluctuations of
a representative peak’s centroid as a function of the number of ions in
the reference peak. The fluctuations are estimated by fitting
a large number of synthetic spectra derived from the experimental
data via bootstrap re-sampling. For details see Notes section below.
Specify the peak to use for the bootstrap re-sampling by providing
either of the peak_index, species and x_pos arguments. The
peak should be well-separated and have decent statistics (typically the
peak-shape calibrant is used).
Parameters:
peak_index (int, optional) – Index of representative peak to use for bootstrap re-sampling
(typically, the peak-shape calibrant). The peak should have high
statistics and must be well-separated from other peaks.
species (str, optional) – String with species name of representative peak to use for bootstrap
re-sampling (typically, the peak-shape calibrant). The peak should
have high statistics and be well-separated from other peaks.
x_pos (float [u/z], optional) – Marker position (x_pos spectrum attribute) of representative
peak to use for bootstrap re-sampling (typically, the peak-shape
calibrant). The peak should have high statistics and be well-
separated from other peaks. x_pos must be specified up to the 6th
decimal.
x_range (float [u/z], optional) – Mass range around peak centroid over which events will be sampled
and fitted. Choose such that no secondary peaks are contained in
the mass range! If None defaults to default_fit_range
spectrum attribute.
N_spectra (int, optional, default: 1000) – Number of bootstrapped spectra to create at each number of ions.
For details see Notes section of peakfit() method documentation.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options
check arguments of lmfit.minimizer.minimize().
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint().
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c.
If False, the baseline parameter bkg_c will be fixed to 0.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
Notes
As noted in [9], statistical errors of Hyper-EMG peak centroids obey
the following scaling with the number of counts in the peak N_counts:
where the constant of proportionality A_stat_emg depends on the
specific peak shape. This routine uses the following method to determine
A_stat_emg:
N_spectra bootstrapped spectra are created for each of the following
total numbers of events: [10,30,100,300,1000,3000,10000,30000].
Each bootstrapped spectrum is fitted and the best fit peak centroids
are recorded.
The statistical uncertainties are estimated by taking the sample
standard deviations of the recorded peak centroids at each value of
N_counts. Since the best-fit peak area can deviate from the true
number of re-sampled events in the spectrum, the mean best_fit area at
each number of re-sampled events is used to determine N_counts.
A_stat_emg is finally determined by plotting the rel. statistical
uncertainty as function of N_counts and fitting it with the above
equation.
The resulting value for A_stat_emg will be stored as spectrum
attribute and will be used in all subsequent fits to calculate the stat.
errors from the number of counts in the peak.
Determine optimal peak-shape parameters by fitting the specified
peak-shape calibrant.
If vary_tail_order is True (default) an automatic model selection
is performed before the calibration of the peak-shape parameters.
It is recommended to visually check whether the fit residuals
are purely stochastic (as should be the case for a decent model). If
this is not the case either the selected model does not describe the
data well, the initial parameters lead to poor convergence or there are
additional undetected peaks.
Parameters:
index_shape_calib (int, optional) – Index of shape-calibration peak. Preferrable alternative: Specify
the shape-calibrant with the species_shape_calib argument.
species_shape_calib (str, optional) – Species name of the shape-calibrant peak in :-notation of chemical substances (e.g.
'K39:-1e'). Alternatively, the peak to use can be specified with
the index_shape_calib argument.
fit_model (str, optional, default: 'emg22') – Name of fit model to use for shape calibration (e.g. 'Gaussian',
'emg12', 'emg33', … - for full list see
Available fit models). If the automatic model selection
(vary_tail_order=True) fails or is turned off, fit_model will be
used for the shape calibration and set as the spectrum’s
fit_model attribute.
Name of cost function to use for minimization. It is strongly
recommended to use ‘chi-square’-fitting for the peak-shape
determination since this yields more robust results for fits with
many model parameters as well as more trustworthy parameter
uncertainties (important for peak-shape error determinations).
If 'chi-square', the fit is performed by minimizing Pearson’s
chi-squared statistic:
Dictionary with initial shape parameter values for fit (optional).
If None or 'default' (default), the default parameters
defined for mass 100 in the emgfit.fit_models module will
be used after re-scaling to the spectrum’s mass_number.
To define custom initial values, a parameter dictionary containing
all model parameters and their values in the format
{'<paramname>':<param_value>,...} should be passed to
init_pars.
Mind that only the initial values to shape parameters (sigma,
theta,`etas` and taus) can be user-defined. The initial values for
mu, amp and the optional baseline parameter bkg_c are
automatically derived as described in the Peak fitting approach
article.
x_fit_cen (float [u/z], optional) – Center of fit range. If None (default), the x_pos
attribute of the shape-calibrant peak is used as x_fit_cen.
x_fit_range (float [u/z], optional) – Mass range to fit. If None, defaults to the
default_fit_range spectrum attribute.
vary_baseline (bool, optional, default: True) – If True, the background will be fitted with a varying uniform
baseline parameter bkg_c. If False, the baseline parameter
bkg_c will be fixed to 0.
vary_tail_order (bool, optional) – If True (default), before the calibration of the peak-shape
parameters an automatized fit model selection is performed. For
details on the automatic model selection, see Notes section below.
If False, the specified fit_model argument is used as model
for the peak-shape determination.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options
check arguments of lmfit.minimizer.minimize().
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint().
show_fit_reports (bool, optional, default: True) – Whether to print fit reports for the fits in the automatic model
selection.
show_plots (bool, optional) – If True (default), linear and logarithmic plots of the spectrum
and the best fit curve are displayed. For details see
spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default), peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots of the shape-calibration will be saved to
two separate files named ‘<plot_filename>_log_plot.png’ and
‘<plot_filename>_lin_plot.png’. Caution: Existing files with
identical name are overwritten.
map_par_covar (bool, optional) – If True the parameter covariances will be mapped using
Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot.
This feature is only recommended for single-peak fits.
**MCMC_kwargs (optional) – Options to send to _get_MCMC_par_samples(). Only relevant when
map_par_covar is True.
Notes
Ideally the peak-shape calibration is performed on a well-separated peak
with high statistics. If this is not possible, the peak-shape
calibration can also be attempted using overlapping peaks since emgfit
ensures shared and identical shape parameters for all peaks in a multi-
peak fit.
Automatic model selection:
When the model selection is activated the routine tries to find the peak
shape that minimizes the fit’s chi-squared reduced by successively
adding more tails on the right and left. Finally, that fit model is
selected which yields the lowest chi-squared reduced without having any
of the tail weight parameters eta compatible with zero within 1-sigma
uncertainty. The latter models are excluded as is this an indication of
overfitting. Models for which the calculation of any eta parameter
uncertainty fails are likewise excluded from selection.
Determine mass re-calibration factor by fitting the selected
calibrant peak.
After the mass calibrant has been fitted the recalibration factor and
its uncertainty are calculated and saved as the spectrum’s
recal_fac and recal_fac_error attributes.
The calibrant peak can either be specified with the index_mass_calib
or the species_mass_calib argument.
Parameters:
index_mass_calib (int, optional) – Index of mass calibrant peak.
species_mass_calib (str, optional) – Species of peak to use as mass calibrant.
fit_model (str, optional, default: 'emg22') – Name of fit model to use (e.g. 'Gaussian', 'emg12',
'emg33', … - for full list see Available fit models).
x_fit_cen (float or None, [u/z], optional) – center of mass range to fit;
if None, defaults to marker position (x_pos) of mass calibrant peak
x_fit_range (float [u/z], optional) – width of mass range to fit; if None, defaults to ‘default_fit_range’
spectrum attribute
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and
taus) are kept fixed at their initial values. If True the
shared shape parameters are varied (ensuring identical shape
parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c.
If False, the baseline parameter bkg_c will be fixed to 0.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options
check arguments of lmfit.minimizer.minimize().
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint().
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum
with the best fit curve are displayed. For details see
spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma.
Note that the confidence band is only derived from the uncertainties
of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
show_fit_report (bool, optional) – If True (default) the fit results are reported.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
The spectrum.fit_peaks() method enables the simultaneous fitting
of mass calibrant and ions of interest in a single multi-peak fit and
can be used as an alternative to this method.
After the calibrant fit the spectrum._eval_peakshape_errors()
method is automatically called to save the absolute calibrant centroid
shifts as preparation for subsequent peak-shape error determinations.
Assuming the spectrum has already been coarsely calibrated via the time-
resolved calibration in the MR-TOF-MS’s data acquisition software MAc,
the recalibration (or precision calibration) factor is usually very
close to unity. An error will be raised by the
spectrum._update_calibrant_props() method if
spectrum.recal_fac deviates from unity by more than a permille
since this causes some implicit approximations for the calculation of
the final mass values and their uncertainties to break down.
The statistical uncertainty of the peak is calculated via the following
relation [10]:
For Gaussians the constant of proportionality \(A_{stat}\) is always
given by \(A_{stat,G}\) = 0.425. For Hyper-EMG models
\(A_{stat}=A_{stat,emg}\) is either set to the default value
A_stat_emg_default defined in the config module or
determined by running the spectrum.determine_A_stat_emg() method.
The latter is usually preferable since this accounts for the specifics
of the given peak shape.
Fit peaks, update peaks properties and show results.
By default, the full mass range and all peaks in the spectrum are
fitted. Optionally, only peaks specified with peak_indeces or peaks in
the mass range specified with x_fit_cen and x_fit_range are fitted.
Optionally, the mass recalibration can be performed simultaneously with
the IOI fit if the mass calibrant is in the fit range and specified with
either the index_mass_calib or species_mass_calib arguments.
Otherwise a mass recalibration must have been performed upfront.
Before running this method a successful peak-shape calibration must have
been performed with determine_peak_shape().
Parameters:
peak_indeces (int, list of int, optional) – Indeces of neighbouring peaks to fit. The fit range will be chosen
such that at least a mass range of x_fit_range/2 is included
around each peak.
x_fit_cen (float [u/z], optional) – Center of mass range to fit (only specify if a subset of the
spectrum is to be fitted)
x_fit_range (float [u/z], optional) – Width of mass range to fit. If None defaults to:
spectrum.default_fit_range attribute, only specify if subset
of spectrum is to be fitted. This argument is only relevant if
x_fit_cen is also specified.
fit_model (str, optional) – Name of fit model to use (e.g. 'Gaussian', 'emg12',
'emg33', … - for full list see Available fit models). If
None, defaults to fit_model spectrum
attribute.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options
check arguments of lmfit.minimizer.minimize().
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint().
Dictionary with initial shape parameter values for fit (optional).
If None (default) the parameters from the peak-shape
calibration (peak_shape_pars spectrum attribute) are used.
If 'default', the default parameters defined for mass 100 in
the emgfit.fit_models module will be used after re-scaling
to the spectrum’s mass_number.
To define custom initial values a parameter dictionary containing
all model parameters and their values in the format
{'<paramname>':<param_value>,...} should be passed to
init_pars.
Mind that only the initial values to shape parameters (sigma,
theta,`etas` and taus) can be user-defined. The initial values for
mu, amp and the optional baseline parameter bkg_c are
automatically derived as described in the Peak fitting approach
article.
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and
taus) are kept fixed at their initial values. If True the
shared shape parameters are varied (ensuring identical shape
parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c.
If False, the baseline parameter bkg_c will be fixed to 0.
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum
with the best fit curve are displayed. For details see
spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best-fit curve in sigma.
Note that the confidence band is only derived from the uncertainties
of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show errorbar only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
show_fit_report (bool, optional) – If True (default) the detailed lmfit fit report is printed.
show_shape_err_fits (bool, optional, default: True) – If True, plots of all fits performed for the peak-shape
uncertainty evaluation are shown.
Notes
Updates peak properties dataframe with peak properties obtained in fit.
Get peak-shape uncertainties by re-fitting peaks with many different
MC-shape-parameter sets
This method provides refined peak-shape uncertainties that account for
non-normal distributions and correlations of shape parameters. To that
end, the peaks of interest are re-fitted with N_samples different
peak-shape parameter sets. For these parameter sets to be representative
of all peak shapes supported by the data they are randomly drawn from a
larger ensemble of parameter sets obtained from Markov-Chain Monte Carlo
(MCMC) sampling on the peak-shape calibrant. The peak-shape uncertainty
of the mass values and peak areas are estimated by the obtained RMS
deviations from the best-fit mass values. Finally, the peak properties
table is updated with the refined uncertainties.
This method only takes effective mass shifts relative to the calibrant
peak into account. For each peak shape the calibrant peak is re-fitted
and the new recalibration factor is used to calculate the shifted
ion-of-interest masses. Therefore, when the peak_indeces argument is
used, it must include the mass calibrant index.
Parameters:
peak_indeces (int or list of int, optional) – Indeces of peaks to evaluate MC peak-shape uncertainties for.
verbose (bool, optional) – Whether to print status updates and intermediate results.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas
obtained with the MC shape parameter sets are shown. Black vertical
lines indicate the best-fit values obtained with fit_peaks().
show_peak_properties (bool, optional) – If True the peak properties table including the updated peak-shape
uncertainties is shown.
rerun_MCMC_sampling (bool, optional) – When False (default) pre-existing MCMC parameter samples (e.g.
obtained with determine_peak_shape()) are used. If True or
when there’s no pre-existing MCMC samples, the MCMC sampling will be
performed by this method.
N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated
spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
**MCMC_kwargs – Keyword arguments to send to _get_MCMC_par_samples() for
control over the MCMC sampling.
This method relies on a representative sample of all the shape parameter
sets which are supported by the data. These shape parameter sets are
randomly drawn from a large sample of parameter sets obtained from
Markov-Chain Monte Carlo (MCMC) sampling on the peak-shape calibrant.
In MCMC sampling so-called walkers are sent on random walks to explore
the parameter space. The latter is done with the
_get_MCMC_par_samples() method. If MCMC sampling has already
been performed with the map_par_covar option in
determine_peak_shape(), these MCMC samples will be
used for the MC peak-shape error evaluation. If there is no pre-existing
MCMC parameter sets the _get_MCMC_par_samples() method will be
automatically evoked before the MC peak-shape error evaluation.
Assuming that the samples obtained with the MCMC algorithm form a
representative set of parameter samples and are sufficiently independent
from each other, this method provides refined peak-shape uncertainties
that account for correlations and non-normal posterior distributions
of peak-shape parameters. In particular, this prevents overestimation of
the uncertainties due to non-consideration of parameter correlations.
For this method to be accurate a sufficiently large number of MCMC
sampling steps should be performed and fits should be performed with a
large number of parameter sets (N_samples>=1000). For the MCMC
parameter samples to be independent a sufficient amount of thinning has
to be applied to remove autocorrelation between MCMC samples. Thinning
refers to the common practice of only storing the results of every k-th
MCMC iteration. The length and thinning of the MCMC chain is controlled
with the steps and thin MCMC keyword arguments. For more details and
references on MCMC sampling with emgfit see the docs of the underlying
_get_MCMC_par_samples() method.
For the peak-shape mass uncertainties only effective mass shifts
relative to the calibrant centroid are relevant. Therefore, the mass
calibrant and the ions of interest (IOI) are fitted with the same
peak-shape-parameter sets and the final mass values are calculated from
the obtained IOI peak positions and the corresponding mass recalibration
factors.
The delta_m or delta_p parameters occuring in the case of hyper-EMG
models with 3 pos. or 3 neg. tails are defined as
delta_m=eta_p1+eta_p2 and delta_p=eta_p1+eta_p2,
respectively.
Get statistical and area uncertainties via resampling from best-fit
PDF and update peak properties therewith.
This method provides bootstrap estimates of the statistical errors and
peak area errors by evaluating the scatter of peak centroids and areas
in fits of many simulated spectra. The simulated spectra are created by
sampling events from the best-fit PDF asociated with fit_result
(parametric bootstrap). Refined errors are calculated for each peak
individually by taking the sample standard deviations of the obtained
peak centroids and areas.
If the peaks in peak_indeces have been fitted separately a parametric
bootstrap will be performed for each of the different fits.
Parameters:
peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. and area
errors for, e.g. to evaluate peak-shape error of peaks 1 and 2 use
peak_indeces=[1,2]. Defaults to all peaks in the spectrum’s
peaks list.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which
typically yields statistical uncertainty estimates with a relative
precision of a few percent.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated
spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If True, histograms of the obtained peak centroids and areas are
shown. Black vertical lines indicate the best-fit values obtained
from the measured data.
show_peak_properties (bool, optional, default: True) – If True, the peak properties table is shown after updating the
statistical and area errors.
Only the statistical mass and area uncertainties of ion-of-interest
peaks are updated. The uncertainties of the mass calibrant and the
recalibration uncertainty remain unaffected by this method.
Load peak shape from the TXT-file named ‘filename.txt’.
Successfully loaded shape calibration parameters and their uncertainties
are used as the new shape_cal_pars and
shape_cal_errors spectrum attributes respectively.
Parameters:
filename (str) – Name of input file (‘.txt’ extension is automatically appended).
Fits full spectrum or subrange (if x_fit_cen and x_fit_range are
specified) and optionally shows results.
This method is for internal usage. Use :meth:`spectrum.fit_peaks`
method to fit peaks and automatically update peak properties dataframe
with obtained fit results!
Parameters:
fit_model (str, optional, default: 'emg22') – Name of fit model to use (e.g. 'Gaussian', 'emg12',
'emg33', … - for full list see Available fit models).
x_fit_cen (float [u/z], optional) – Center of mass range to fit (only specify if subset of spectrum is
to be fitted).
x_fit_range (float [u/z], optional) – Width of mass range to fit (only specify if subset of spectrum is to
be fitted, only relevant if x_fit_cen is likewise specified). If
None, defaults to default_fit_range spectrum attribute.
Dictionary with initial shape parameter values for fit (optional).
If None (default) the parameters from the peak-shape
calibration are used.
If 'default', the default parameters defined for mass 100 in
the emgfit.fit_models module will be used after re-scaling
to the spectrum’s mass_number.
To define custom initial values a parameter dictionary containing
all model parameters and their values in the format
{'<paramname>':<param_value>,...} should be passed to
init_pars.
Mind that only the initial values to shape parameters (sigma,
theta,`etas` and taus) can be user-defined. The initial values
for mu, amp and the optional baseline parameter bkg_c are
automatically derived as described in the Peak fitting approach
article.
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and
taus) are kept fixed at their initial values. If True the
shared shape parameters are varied (ensuring identical shape
parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c.
If False, the baseline parameter bkg_c will be fixed to 0.
share_shape_pars (bool, optional, default: True) – Whether to enforce a shared peak shape for all peaks.
scale_shape_pars (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters of the
shape-reference peak with the peaks.scl_coeff. See Notes of
comp_model() for details.
scale_shape_to_peak_cen (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters to the
centroid mu of the underlying Gaussian of a given peak. Requires
scale_shape_pars to be True. See Notes of
comp_model() for details.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options
check arguments of lmfit.minimizer.minimize().
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.set_param_hint() to
modify or add model parameters. The keys of the par_hint_args
dictionary specify parameter names; the values must likewise be
dictionaries that hold the respective keyword arguments to pass to
set_param_hint(). For example:
par_hint_args={“p0_amp” : {“value”:10}, “p1_sigma” : {“max”:0.4}}
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum
with the best fit curve are displayed. For details see
spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma.
Note that the confidence band is only derived from the uncertainties
of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
map_par_covar (bool, optional) – If True the parameter covariances will be mapped using
Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot.
This feature is only recommended for single-peak fits.
**MCMC_kwargs (optional) – Options to send to _get_MCMC_par_samples(). Only relevant when
map_par_covar is True.
In fits with the chi-square cost function the variance weights
\(w_i\) for the residuals are estimated using the latest model
predictions: \(w_i = 1/(\sigma_i^2 + \epsilon) = 1/(f(x_i)+ \epsilon)\),
where \(\epsilon = 1E-10\) is a small number added to increase
numerical robustness when \(f(x_i)\) approaches zero. On each
iteration the weights are updated with the new values of the model
function.
When performing MLE fits including bins with low statistics, the
value for chi-squared as well as the parameter standard errors and
correlations in the lmfit fit report should be taken with caution.
This is because, strictly speaking, emgfit’s MLE cost function only
approximates a chi-squared distribution in the limit of a large number
of counts in every bin (“Wick’s theorem”). For a detailed derivation of
this statement see pp. 94-95 of these lecture slides by Mark Thompson.
However, for spectra with many (>1000) bins, it has been demonstrated
that as few as 10-30 counts in an entire spectrum can be sufficient for
the doubled, negative Poisson log-likelihood ratio (emgfit’s MLE
cost function) to yield reliable confidence intervals and act as a
decent goodness-of-fit measure [11]. In practice and if in doubt,
one can simply test the validity of the reported fit statistic as well
as parameter standard errors & correlations by re-performing the same
fit with cost_func=’chi-square’ and comparing the results. In all
tested cases decent agreement was found even if the fit range contained
low-statistics bins. Even if a deviation occurs, this is irrelevant in
most pratical cases since the mass errors reported in emgfit’s peak
properties table are independent of the lmfit parameter standard errors
given as additional information below. Only the peak area errors are by
default calculated using the standard errors of the amp parameters
reported by lmfit.
Besides the asymptotic concergence to a chi-squared distribution
emgfit’s MLE cost function has a second handy property - all
summands in the log-likelihood ratio are positive semi-definite:
\(L_i = f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right) \geq 0\).
Exploiting this property, the minimization of the log-likelihood ratio
can be re-formulated into a least-squares problem (see also [12]):
Instead of minimizing the scalar log-likelihood ratio, emgfit by default
minimizes the sum-of-squares of the square-roots of the summands
\(L_i\) in the log-likelihood ratio. This is mathematically
equivalent to minimizing \(L\) and facilitates the
usage of Scipy’s highly efficient least-squares optimizers
(‘least_squares’ & ‘leastsq’). The latter yield significant speed-ups
compared to scalar optimizers such as Scipy’s ‘Nelder-Mead’ or ‘Powell’
methods. By default, emgfit’s ‘MLE’ fits are performed with Scipy’s
‘least_squares’ optimizer, a variant of a Levenberg-Marquardt algorithm
for bound-constrained problems. If a scalar optimizaton method is
chosen emgfit uses the conventional approach of minimizing the scalar
\(L\). For more details on these optimizers see the docs of
lmfit.minimizer.minimize() and scipy.optimize.
Plot data and fit result in logarithmic and linear y-scale.
Only a single fit result can be plotted with this method. If neither
fit_result nor x_min and x_max are specified, the full mass range
is plotted.
Plots can be saved to a file using the plot_filename argument.
Parameters:
fit_result (emgfit.model.EMGModelResult, optional) – Fit result to plot. If None, defaults to fit result of first
peak in specified mass range (taken from fit_results list).
plot_title (str or None, optional) – Title of plots. If None, defaults to a string with the fit model
name and cost function of the fit_result to ensure clear
indication of how the fit was obtained.
show_peak_markers (bool, optional, default: True) – If True, peak markers are added to the plots.
show_comps (bool, optional) – If True, the single-peak components of the best-fit curve will be
indicated with colored dashed lines.
sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in
log-plot). If 0, no confidence band is shown (default).
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
x_min (float [u/z], optional) – Start and end of x-range to plot. If None, defaults to the
minimum and maximum of the fitted x-range in fit_result.
x_max (float [u/z], optional) – Start and end of x-range to plot. If None, defaults to the
minimum and maximum of the fitted x-range in fit_result.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
Show logarithmic and linear plots of data and fit curve zoomed to
peaks or mass range of interest.
There is two alternatives to define the plots’ mass ranges:
Specifying peaks-of-interest with the peak_indeces
argument. The mass range is then automatically chosen to include all
peaks of interest. The minimal mass range to include around each peak
of interest can be adjusted using x_range.
Specifying a mass range of interest with the x_center and x_range
arguments.
Parameters:
peak_indeces (int or list of ints, optional) – Index of single peak or indeces of multiple neighboring peaks to
show (peaks must belong to the same fit_result).
x_center (float [u/z], optional) – Center of manually specified x-range to plot.
x_range (float [u/z], optional, default: 0.01) – Width of x-range to plot around ‘x_center’ or minimal mass range
to include around each specified peak of interest.
plot_title (str or None, optional) – Title of plots. If None, defaults to a string with the fit model
name and cost function of the fit_result to ensure clear
indication of how the fit was obtained.
show_peak_markers (bool, optional, default: True) – If True, peak markers are added to the plots.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in
log-plot). If 0, no confidence band is shown (default).
plot_filename (str or None, optional, default: None) – If not None, the plots will be saved to two separate files named
‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’.
Caution: Existing files with identical name are overwritten.
Remove specified peak from the spectrum’s peaks list.
Select the peak to be removed by specifying either the respective
peak_index, species label or peak marker position x_pos.
Parameters:
peak_index (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s peaks list
(0-based!).
x_pos (float or list of float [u/z]) – x_pos of peak(s) to remove from the spectrum’s peaks
list. Peak marker positions must be specified up to the 6th decimal.
species (str or list of str) – species label(s) of peak(s) to remove from the spectrum’s
peaks list.
Note
This method is deprecated in v0.1.1 and will likely be removed in
future versions, use remove_peaks() instead!
Remove specified peak(s) from the spectrum’s peaks list.
Select the peak(s) to be removed by specifying either the respective
peak_indeces, species label(s) or peak marker position(s) x_pos.
To remove multiple peaks at once, pass a list to one of the above
arguments.
Parameters:
peak_indeces (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s peaks list
(0-based!).
x_pos (float or list of float [u/z]) – x_pos of peak(s) to remove from the spectrum’s peaks
list. Peak marker positions must be specified up to the 6th decimal.
species (str or list of str) – species label(s) of peak(s) to remove from the spectrum’s
peaks list.
Note
Removing a peak will shift the peak indeces of all peaks at higher
masses by -1.
Notes
The current peaks list can be viewed by calling the
show_peak_properties() spectrum method.
New in version 0.2.0: (as successor of remove_peak()).
By default, the fit curve data is taken from the first fit result stored
in the spectrum’s fit_results list; data from other
fit results can accessed through the peak_index or fit_result
arguments.
Parameters:
filename (str, optional) – Prefix of the output filename; the suffix “_fit_trace.txt” is
automatically appended.
peak_index (int, optional, default: 0) – Peak index specifying from which result in
spectrum.fit_results the fit data will be written.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result to obtain fit curve from. Only needed to write grab data
from a fit result not stored in the spectrum’s fit_results
attribute.
fmt (sequence of str) – Formats of written column values. For details see
numpy.savetxt().
x_res (float, optional [u]) – Custom spacing of abscissa values at which the best-fit model will
be evaluated. If x_res is specified, the measured count data and
its error will not be written to the output file.
comment (str, optional) – Comments to add to output file.
Write the fit results to a XLSX file and the peak-shape calibration
to a TXT file.
Write results to an XLSX Excel file named <filename>_results.xlsx
and save peak-shape calibration parameters to TXT file named
<filename>_peakshape_calib.txt.
The EXCEL file contains the following three worksheets:
general spectrum properties
peak properties and images of all obtained fit curves
results of the default peakshape-error evaluation in which shape
parameters are varied by +-1 sigma
By default, PNG images of all peak fits are saved to PNG-images in both
linear and logarithmic scale.
Parameters:
filename (string) – Prefix of the files to be saved to (any provided file extensions are
automatically removed and the necessary .xlsx & .txt extensions are
appended).
save_plots (bool, optional, default: True) – Whether to save separate PNG files with plots of all obtained fit
curves. If False, plots will still be included in the XLSX file.
plot_kws (dict, optional) – Keyword arguments to pass to plot_fit() method in order to
customize plot appearance.
Specify for which peaks mass values will be hidden for blind analysis
This method adds peaks to the spectrum’s list of
blinded_peaks. For these peaks, the
obtained mass values in the peak properties table and the peak position
parameters mu in fit reports will be hidden. Literature values and
mass uncertainties remain visible. All results are unblinded upon
export with the save_results() method.
Parameters:
indeces (int or list of int) – Indeces of peaks of interest whose obtained mass values are to be
blinded.
overwrite (bool, optional, default: False) – If False (default), the specified indeces are added to the
blinded_peaks list. If True, the current
blinded_peaks list is replaced by the specified indeces.
Examples
Activate blinding for peaks 0 & 3 of spectrum object spec:
>>> spec.set_blinded_peaks([0,3])
Add peak 3 to list of blinded peaks:
>>> spec.set_blinded_peaks([3])
Turn off blinding by resetting the blinded peaks attribute to an empty
list:
m_AME_error (float [u]) – New liteature mass uncertainty.
extrapolated (bool, optional, default: False) – Flag indicating whether this literature value has been extrapolated.
A (int, optional, default: None) – Atomic mass number of species - overwrites existing. If None and
the peak’s attribute A is undefined, the peak’s mass number
defaults to the closest integer of the provided m_AME.
z (int, optional) – Charge state of species - overwrites existing (if not None).
verbose (bool, optional, default: True) – Whether to print a status update after completion.
Notes
Manually defined literature values are indicated by adding
'lit_src:user' to the peak’s comment.
alt_x_pos (float [u]) – Position of the hypothesized alternative peak.
x_fit_cen (float [u], optional) – Center of the x-range to fit. Defaults to the center of the fit result
asociated with null_result_index.
x_fit_range (float [u], optional) – Width of the x-range to fit. Defaults to the range of the fit result
asociated with null_result_index.
vary_alt_mu (bool, optional) – Whether to vary the alternative-peak centroid in the fit.
alt_mu_min (float [u], optional) – Lower boundary to use when varying the alternative-peak centroid.
Defaults to the range defined by the MU_VAR_NSIGMA constant in the
emgfit.fit_models module.
alt_mu_max (float [u], optional) – Upper boundary to use when varying the alternative-peak centroid.
Defaults to the range defined by the MU_VAR_NSIGMA constant in the
emgfit.fit_models module.
vary_baseline (bool, optional) – If True, the constant background will be fitted with a varying
uniform baseline parameter bkg_c. If False, the baseline parameter
bkg_c will be fixed to 0.
Perform a likelihood ratio test following the method of Gross & Vitells
Decide on an appropriate significance level before executing this method
and set the `min_significance` argument accordingly!
Parameters:
spec (spectrum) – Spectrum object to perform test on.
null_result_index (int) – Index (of one) of the peak(s) present in the null-model fit.
alt_x_min (float) – Minimal x-position to use in the alternative-peak position scan.
alt_x_max (float) – Maximal x-position to use in the alternative-peak position scan.
alt_x_steps (int, optional, default: 100) – Number of steps to take in the alternative-peak position scan.
min_significance (float [sigma], default: 3) – Minimal significance level (in sigma) required to reject the null model
in favour of the alternative model.
N_spectra (int, optional, default: 100) – Number of simulated spectra to fit at each x-position.
c0 (float, optional, default: 0.5) – Threshold to use in determining the expected number of upcrossings.
seed (int, optional) – Random seed to use for reproducible event sampling.
show_upcrossings (bool, optional, default: True) – Whether to show plots of the range of upcrossings.
show_fits (bool, optional, default: True) – Whether to show plots of the null- and alternative-model fits to the
observed data.
Returns:
Dictionary with results of the likelihood ratio test.
When the exact location of a hypothesized alternative peak is unknown, one
may test for its presence by performing multiple hypothesis tests with
different fixed alternative-peak positions. However, performing multiple
tests on the same dataset artificially increases the rate of false discovery
due to the increased chance for random background fluctuations to mimick a
signal. In the high-energy particle physics literature, this complication
is referred to as the look-elsewhere effect. To obtain a global p-value
that correctly quantifies the likelihood to observe the alternative peak
anywhere in the tested region, a procedure is needed that accounts for
correlations between the local p-values obtained for the various tested
peak positions. To this end, this function adapts the method outlined by
Gross and Vitells in [13]. Namely, an upper limit on the global
p-value \(p\) is deduced from the relation:
where \(P(LLR > c)\) is the probability for the log-likelihood ratio
statistic (LLR) to exceed the maximum of the observed local LLR statistic
\(c\), \(P(\chi^2_1 > c)\) is the probability that the \(\chi^2\)
statistic with one degree of freedom exceeds the level \(c\) and
\(\langle N(c_0)\rangle\) is the expected number of times the local
LLR test statistics surpass the threshold level \(c_0 \ll c\) under the
null hypothesis. This number is estimated by simulating \(N_{spectra}\)
spectra from the null model and taken as the mean number of times the local
LRT statistics cross up through the specified threshold level \(c_0\).
In principle, \(c_0\) should be chosen as small as possible but care
should be taken that the mean spacing between detected upcrossings does
not fall below the typical width of the observed peaks.
Perform Monte Carlo likelihood ratio test by fitting simulated spectra
The simulated spectra are sampled from the null model stored in the
spectrum’s fit results list and asociated with the peak index
null_result_index.
Parameters:
spec (spectrum) – Spectrum object to perform likelihood ratio test on.
null_result_index (int) – Index (of one) of the peak(s) present in the null-model fit.
alt_x_pos (float, optional) – Initial position to use for alternative peak
alt_mu_min (float [u], optional) – Lower boundary to use when varying the alternative-peak centroid.
Defaults to the range defined by the MU_VAR_NSIGMA constant in the
emgfit.fit_models module.
alt_mu_max (float [u], optional) – Upper boundary to use when varying the alternative-peak centroid.
Defaults to the range defined by the MU_VAR_NSIGMA constant in the
emgfit.fit_models module.
vary_ref_mus_and_amps (bool, optional) – Whether to randomly vary the peak positions and the peak and background
amplitudes of the reference spectrum within their parameter
uncertainties.
vary_ref_peak_shape (bool, optional) – Whether to vary the reference peak shape used for the event sampling in
the creation of simulated spectra. If True, N_spectra parameter
samples are drawn randomly with replacement from the
MCMC_par_samples obtained in the MCMC
shape parameter sampling.
min_significance (float, optional, default: 3) – Critical significance level for rejecting the null hypothesis (measured
in sigma).
N_spectra (int, optional, default: 10000) – Number of simulated spectra to sample from the null model.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional, default: -1) – Number of CPU cores to use for parallelized sampling and fitting of
simulated spectra. If -1, all available cores are used.
show_plots (bool, optional) – Whether to show plots of the fit results.
show_results (bool, optional) – Whether to display reports with the fit results.
show_LLR_hist (bool, optional) – Whether to display histogram of log-likelihood ratio values collected
for p-value determination.
Returns:
Dictionary with results of the likelihood ratio test.
Simulated spectra are created by randomly sampling events from the null
model best fitting the observed data. These simulated spectra are then
fitted with both the null and the alternative model and the respective
values for the likelihood ratio test statistic \(\Lambda\) are
calculated using the relation
where \(L(H_0)\) and \(L(H_1)\) denote the MLE cost function values
(i.e. the negative doubled log-likelihood values) obtained from the
null-model and alternative-model fits, respectively, and
\(\mathcal{L}(H_0)\) and \(\mathcal{L}(H_1)\) mark the
corresponding likelihood functions. Finally, the p-value is calculated as
\[p = \frac{N_>}{N_< + N_>},\]
where \(N_<\) and \(N_>\) denote the number of likelihood ratio
values \(\Lambda\) that fall below and above the observed value for the
likelihood ratio test statistic \(\Lambda_\mathrm{obs}\), respectively.