emgfit API docs¶

emgfit.ame_funcs module¶

Module for importing and handling of literature mass values from the Atomic Mass Evaluation (AME2020 or AME2016).

emgfit.ame_funcs.get_AME_values(species, Ex=0.0, Ex_error=0.0, src='AME2020')[source]¶

Calculates the AME mass, AME mass error, the extrapolation flag and the mass number A of the given atomic or molecular species.

Parameters:

species (str) – String with name of species to grab AME values for.
Ex (float [keV], optional, default: 0.0) – Isomer excitation energy in keV to add to ground-state literature mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of isomer excitation energy in keV to add in quadrature to ground-state literature mass uncertainty.
src (str, optional, default: AME2020) – Source of literature data (‘AME2016’ or ‘AME2020’).

Notes

species strings follow the :-notation of chemical substances.

Atoms with the nucleus in an isomeric state are flagged by appending an ‘m’ or any of ‘m0’, ‘m1’, …, ‘m9’ to the atom’s substring. For species with isomers the literature mass values are only returned if the excitation energy is user-specified with the Ex argument. In this case, Ex is added to the AME value for the ground state mass and Ex_error is added in quadrature to the respective AME uncertainty.

Both the atomic binding energy of stripped-off electrons as well as the uncertainty of the electron mass are neglected in the calculation of the ionic AME mass.

Returns:: List containing relevant AME data for species: (AME mass [u], AME mass uncertainty [u], boolean flag for extrapolated species, atomic mass number)
Return type:: tuple of (float,float,bool,int)

emgfit.ame_funcs.get_El_from_Z(Z)[source]¶

Convenience function to grab element symbol from given proton number.

Parameters:: Z (int or array-like of int) – Proton number.
Returns:: Symbol of element with specified proton number.
Return type:: str

emgfit.ame_funcs.get_charge_state(species)[source]¶

Return charge state of given species

Parameters:: species (str) – String with name of species.

Notes

species strings follow the :-notation of chemical substances.

The ionic charge state is defined by subtracting the desired number of electrons from the atomic species (i.e. ':-1e' for singly charged cations, ':-2e' for doubly charged cations etc.).

emgfit.ame_funcs.mdata_AME(El, A, src='AME2020')[source]¶

Grabs atomic mass data from the atomic mass evaluation (AME).

Parameters:

El (str) – String with element symbol.
A (int) – Mass number of isotope of interest.
src (str, optional, default: AME2020) – Source of literature mass data (either ‘AME2016’ or ‘AME2020’).

Returns:

[Element, Z, A, atomic AME mass [u], atomic AME mass error [u], boolean flag for extrapolated AME mass]

Return type:

list (str,int,float,float,bool)

emgfit.ame_funcs.splitparticle(s)[source]¶

Extracts number, particle/element type and mass number of particle string (e.g. 1Cs133, Cs133, 1e).

Parameters:: s (str) – Input particle string.
Returns:: Tuple of signature (number of particles, element symbol, atomic mass number)
Return type:: tuple of (int, str, int)

Examples

>>> splitspecies('1Cs133') # returns (1,'Cs',133)
>>> splitspecies('-e')     # returns (-1,'e',0)

emgfit.ame_funcs.splitspecies(s)[source]¶

Splits ion species string into list of constituent atom strings.

Parameters:: s (str) – Input species string.
Returns:: List of constituent atom strings contained in s.
Return type:: list of str

Examples

>>> splitspecies('4H1:1C12')  # returns ``['4H1','1C12']``
>>> splitspecies('H1:O16')    # returns ``['H1','O16']``

emgfit.config module¶

Configuration file with general settings for emgfit package.

emgfit.emg_funcs module¶

Module with numerically robust implementation of the hyper-exponentially- modified Gaussian probability density function.

emgfit.emg_funcs._check_par_values(sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶: Check if parameters are in bounds, throw error otherwise

emgfit.emg_funcs._h_m_i(x, mu, sigma, eta_m, tau_m)[source]¶: Internal helper function to calculate single negative EMG tail for h_m_emg.

emgfit.emg_funcs._h_m_i_prec(x, mu, sigma, eta_m, tau_m)[source]¶: Arbitrary precision version of internal helper function for h_m_emg.

emgfit.emg_funcs._h_p_i(x, mu, sigma, eta_p, tau_p)[source]¶: Internal helper function to calculate single positive EMG tail for h_p_emg.

emgfit.emg_funcs._h_p_i_prec(x, mu, sigma, eta_p, tau_p)[source]¶: Arbitrary precision version of internal helper function for h_p_emg.

emgfit.emg_funcs.h_emg(x, mu, sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶

Hyper-exponentially-modified Gaussian distribution (hyper-EMG).

The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.

Parameters:

x (numpy.ndarray of floats) – Abscissa data (mass data).
mu (float) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature: (eta_m1, eta_m2, ...).
li_tau_m (tuple of floats > 0) – Tuple containing the neg. tail decay constants with the signature: (tau_m1, tau_m2, ...).
li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature: (eta_p1, eta_p2, ...).
li_tau_p (tuple of floats > 0) – Tuple containing the pos. tail decay constants with the signature: (tau_p1, tau_p2, ...).

Returns:

Ordinate values of hyper-EMG distribution

Return type:

float

See also

h_m_emg(), h_p_emg()

Notes

The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. [3]. The basic definitions and notations used here are adapted from this work.

The total hyper-EMG distribution h_m_emg is comprised of the negative- and positive-skewed EMG distributions h_m_emg and h_p_emg respectively and is calculated as: h_emg(x, mu, sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p) = theta*h_m_emg(x, mu, sigma, li_eta_m, li_tau_m) + (1-theta)*h_p_emg(x, mu, sigma, li_eta_p, li_tau_p).

For algorithmic details, see Notes of h_m_emg() and h_p_emg().

References

emgfit.emg_funcs.h_m_emg(x, mu, sigma, li_eta_m, li_tau_m)[source]¶

Negative-skewed exponentially-modified Gaussian (EMG) distribution.

The lengths of li_eta_m & li_tau_m must match and define the order of negative tails.

Parameters:

x (numpy.ndarray of floats) – Abscissa data (mass data).
mu (float) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature: (eta_m1, eta_m2, ...).
li_tau_m (tuple of floats > 0) – Tuple containing the neg. tail decay constants with the signature: (tau_m1, tau_m2, ...).

Returns:

Ordinate values of the negative-skewed EMG distribution.

Return type:

float

Notes

The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. [4]. The basic definitions and notations used here are adapted from this work.

Each negative tail of a Hyper-EMG function can be expressed in two equivalent ways:

\[h_\mathrm{emg,-i} = \frac{\eta_{-i}}{2\tau_{-i}} \exp{(-\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)^2)} \mathrm{erfcx}(v) = \frac{\eta_{-i}}{2\tau_{-i}} \exp{(u)} \mathrm{erfc}(v),\]

where \(u = \left(\frac{\sigma}{\sqrt{2}\tau_{-i}}\right)^2 + \frac{x-\mu}{\tau_{-i}}\) and \(v = \frac{\sigma}{\sqrt{2}\tau_{-i}} + \frac{x-\mu}{\sqrt{2}\sigma}\). In double float precision, the exp(u) routine overflows if u > 709.78. The complementary error function erfc(v) underflows to 0.0 if v > 26.54. The scaled complementary error function erfcx(v) overflows if v < -26.62. To circumvent those scenarios and always ensure an exact result, the underlying helper function for the calculation of a negative EMG tail h_m_i() uses the formulation in terms of erfcx whenever v >= 0 and switches to the erfc-formulation when v < 0.

References

emgfit.emg_funcs.h_p_emg(x, mu, sigma, li_eta_p, li_tau_p)[source]¶

Positive-skewed exponentially-modified Gaussian (EMG) distribution.

The lengths of li_eta_p & li_tau_p must match and define the order of positive tails.

Parameters:

x (numpy.ndarray of floats) – Abscissa data (mass data).
mu (float) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature: (eta_p1, eta_p2, ...).
li_tau_p (tuple of floats > 0) – Tuple containing the pos. tail decay constants with the signature: (tau_p1, tau_p2, ...).

Returns:

Ordinate values of the positive-skewed EMG distribution.

Return type:

float

Notes

The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. [5]. The basic definitions and notations used here are adapted from this work.

Each positive tail of a Hyper-EMG function can be expressed in two equivalent ways:

\[h_\mathrm{emg,+i} = \frac{\eta_{+i}}{2\tau_{+i}} \exp{(-\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)^2)} \mathrm{erfcx}(v) = \frac{\eta_{+i}}{2\tau_{+i}} \exp{(u)} \mathrm{erfc}(v),\]

where \(u = \left(\frac{\sigma}{\sqrt{2}\tau_{+i}}\right)^2 - \frac{x-\mu}{\tau_{+i}}\) and \(v = \frac{\sigma}{\sqrt{2}\tau_{+i}} - \frac{x-\mu}{\sqrt{2}\sigma}\). In double precision, the exp(u) routine overflows if u > 709.78. The complementary error function erfc(v) underflows to 0.0 if v > 26.54. The scaled complementary error function erfcx(v) overflows if v < -26.62. To circumvent those scenarios and always ensure an exact result, the underlying helper function for the calculation of a negative EMG tail h_m_i() uses the formulation in terms of erfcx whenever v >= 0 and switches to the erfc-formulation when v < 0.

References

emgfit.emg_funcs.mu_emg(mu, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶

Calculate mean of hyper-EMG distribution.

The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.

Parameters:

mu (float) – Mean value of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature: (eta_m1, eta_m2, ...).
li_tau_m (tuple of floats > 0) – Tuple containing the neg. tail decay constants with the signature: (tau_m1, tau_m2, ...).
li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature: (eta_p1, eta_p2, ...).
li_tau_p (tuple of floats > 0) – Tuple containing the pos. tail decay constants with the signature: (tau_p1, tau_p2, ...).

Returns:

Mean of hyper-EMG distribution.

Return type:

float

Notes

The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. [6]. The basic definitions and notations used here are adapted from this work.

References

emgfit.emg_funcs.sigma_emg(sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶

Calculate standard deviation of hyper-EMG distribution.

The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.

Parameters:

sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature: (eta_m1, eta_m2, ...).
li_tau_m (tuple of floats > 0) – Tuple containing the neg. tail decay constants with the signature: (tau_m1, tau_m2, ...).
li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature: (eta_p1, eta_p2, ...).
li_tau_p (tuple of floats > 0) – Tuple containing the pos. tail decay constants with the signature: (tau_p1, tau_p2, ...).

Returns:

Standard deviation of hyper-EMG distribution.

Return type:

float

Notes

The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. [7]. The basic definitions and notations used here are adapted from this work.

References

emgfit.fit_models module¶

Module with fit models for single peaks with Gaussian or various hyper-exponentially-modified Gaussian line shapes.

class emgfit.fit_models.ConstantModel(cost_func='default', independent_vars=['x'], prefix='bkg', nan_policy='propagate', **kwargs)[source]¶

Bases: EMGModel

Constant background model

guess(data, x=None, **kwargs)[source]¶: Estimate initial model parameter values from data.

emgfit.fit_models.Gaussian(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Gaussian emgfit model (single-peak Gaussian fit model)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models._calc_mu_emg(fit_model, pars, pref='')[source]¶

Calculate the mean of a Hyper-EMG peak from the best-fit parameters.

Parameters:

fit_model – Name of fit model used to obtain pars.
pars (emgfit.parameter.Parameters, optional) – Parameters object to obtain shape parameters for calculation from. This argument can be used when no fit result is available.
pref (str, optional) – Prefix of peak parameters of interest.

Returns:

Mean of Hyper-EMG fit of peak of interest.

Return type:

float [u/z]

emgfit.fit_models._enforce_shared_shape_pars(model, peak_index, index_ref_peak, scale_shape_pars, scl_coeff, mu_ref)[source]¶

Enfore shared peak-shapes and optionally scale the shape parameters

Parameters:

model (emgfit.model.EMGModel) – Fit model to impose parameter constraints on.
peak_index (int) – Index of the peak corresponding to model.
index_ref_peak (int) – Index of the peak to be used as shape-reference peak.
scale_shape_pars (bool) – Whether to scale the scale-dependent shape parameters adopted from the shape-reference peak.
scl_coeff (float, optional) – Constant coefficient used to scale the scale-dependent shape parameters.
mu_ref (float or str, optional) – Centroid of the underlying Gaussian of the shape-reference peak. This argument is only relevant when scale_shape_pars=True. If mu_ref is set to a float, this number is used as the fixed (Gaussian) reference centroid for calculating the scale factor. If mu_ref=”varying”, mu_ref is set to the varying (Gaussian) centroid shape-reference peak.

Notes

If scale_shape_pars is True, the model’s scale-dependent shape parameters are multiplied with the scale factor scl_fac = scl_coeff. If further mu_ref is not None, the scale-dependent shape parameters are multiplied with the scale factor scl_fac = mu/mu_ref * scl_coeff, where mu is the centroid of the Gaussian underlying the specified model. When the reference peak is among the peaks to be fitted, scl_fac will then be dynamically re-calculated from the mu and mu_ref values of a given iteration in a fit.

emgfit.fit_models.create_default_init_pars(mass_number=100, resolving_power=300000.0)[source]¶

Scale default parameters to mass of interest and return parameter dictionary.

Parameters:

mass_number (int, optional) – Atomic mass number of peaks of interest, defaults to 100.
resolving_power (float, optional) – Typical resolving power of the spectrometer at FWHM level. Defaults to 3e05.

Returns:

Dictionary with default initial parameters (scaled to mass_number).

Return type:

dict

Notes

The default parameters were defined for mass 100, to obtain suitable parameters at other masses all mass-dependent parameters (i.e. shape parameters & amp) are multiplied by the scaling factor mass_number/100.

The standard deviation of the underlying Gaussian \(\sigma\) is calculated as :math::sigma = A / (R 2 sqrt(2 ln(2)), where \(A\) denotes the specified mass_number and \(R\) is the given resolving_power.

emgfit.fit_models.emg01(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(0,1) emgfit model (single-peak fit model with one exponential tail on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg10(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(1,0) emgfit model (single-peak fit model with one exponential tail on the left)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg11(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(1,1) emgfit model (single-peak fit model with one exponential tail on the left and one exponential tail on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu0’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg12(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(1,2) emgfit model (single-peak fit model with one exponential tail on the left and two exponential tails on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu0’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg21(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(2,1) emgfit model (single-peak fit model with two exponential tails on the left and one exponential tail on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg22(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(2,2) emgfit model (single-peak fit model with two exponential tails on the left and two exponential tails on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters of the reference peak (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg23(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(2,3) emgfit model (single-peak fit model with two exponential tails on the left and three exponential tails on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg32(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(3,2) emgfit model (single-peak fit model with three exponential tails on the left and two exponential tails on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.emg33(peak_index, mu0, amp0, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.0001415536333813365, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0005, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, **kws)[source]¶

Hyper-EMG(3,3) emgfit model (single-peak fit model with three exponential tails on the left and three exponential tails on the right)

Parameters:

peak_index (int) – Index of peak to fit.
mu0 (float) – Initial guess for centroid of underlying Gaussian.
amp0 (float) – Initial guess for peak amplitude.
init_pars (dict) – Initial shape parameters (‘amp’ and ‘mu’ parameters in init_pars dictionary are updated with the given values for amp0 and mu0).
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
kws (Keyword arguments to pass to EMGModel interface.) –

Returns:

emgfit model object

Return type:

emgfit.model.EMGModel

emgfit.fit_models.erfcxinv(y)[source]¶

Approximate inverse of the scaled complementary error function

This approximation is adapted from code by John D’Errico posted at https://www.mathworks.com/matlabcentral/answers/302915-inverse-of-erfcx-scaled-complementary-error-function

emgfit.fit_models.get_mu0(x_m, init_pars, fit_model)[source]¶

Estimate initial value of Gaussian centroid mu from the peak’s mode

Parameters:

x_m (float) – Mode (i.e. x-position of the maximum) of the distribution.
init_pars (dict) – Dictionary with initial values of the shape parameters.
fit_model (str) – Name of used fit model (e.g. ‘emg01’, ‘emg10’, ‘emg12’…).

Notes

For a Gaussian, the mean mu is simply equal to the x_m argument.

For highly asymmetric hyper-EMG distributions the centroid of the underlying Gaussian mu can strongly deviate from the mode \(x_{m}\) (i.e. the x-position of the peak maximum). Hence, the initial Gaussian centroid mu (\(\mu\)) is calculated by rearranging the equation for the mode of the hyper-EMG distribution:

\[\begin{split}\mu = x_{m} - \theta\sum_{i=1}^{N_-}\eta_{-i}\left(\sqrt{2}\sigma \cdot\mathrm{erfcxinv}\left( \frac{\tau_{-i}}{\sigma} \sqrt{\frac{2}{\pi}}\right) - \frac{\sigma^2}{\tau_{-i}} \right) \\ + (1-\theta)\sum_{i=1}^{N_-}\eta_{+i}\left(\sqrt{2}\sigma \cdot\mathrm{erfcxinv}\left( \frac{\tau_{+i}}{\sigma} \sqrt{\frac{2}{\pi}}\right) - \frac{\sigma^2}{\tau_{-i}} \right),\end{split}\]

where the mode \(x_{m}\) can be estimated by the peak marker position x_pos and emgfit.fit_models.erfcxinv() is the inverse of the scaled complementary error function.

emgfit.fit_models.scl_init_pars(init_pars, mu0=None, mu_ref=None, scl_coeff=1.0, decimals=9)[source]¶

Scale initial shape parameters of reference peak to peak at mu0.

If scl_coeff is None the original init_pars dictionary is returned.

emgfit.model module¶

Module with custom model interface based on lmfit.model module.

class emgfit.model.CompositeEMGModel(left, right, op, **kws)[source]¶

Bases: EMGModel

Combine two EMGModels (left and right) with binary operator (op).

Normally, one does not have to explicitly create a CompositeModel, but can use normal Python operators +, -, *, and / to combine components as in:

>>> mod = EMGModel(fcn1) + EMGModel(fcn2) * EMGModel(fcn3)

Class modified from lmfit.model.CompositeModel class.

__init__(left, right, op, **kws)[source]¶

Create composite EMG model

Method modified from lmfit.model module.

Parameters:

left (Model) – Left-hand model.
right (Model) – Right-hand model.
op (callable binary operator) – Operator to combine left and right models.
**kws (optional) – Additional keywords are passed to Model when creating this new model.

Notes

The two models can use different independent variables.

_get_state()[source]¶: Method adopted from lmfit.model module.

_make_all_args(params=None, **kwargs)[source]¶

Generate all function arguments for all functions.

Method adopted from lmfit.model module.

_parse_params()[source]¶: Method adopted from lmfit.model module.

_reprstring(long=False)[source]¶: Method adopted from lmfit.model module.

_set_state(state, funcdefs=None)[source]¶: Method adopted from lmfit.model module.

property components¶

Return components for composite model.

Method adopted from lmfit.model module.

eval(params=None, **kwargs)[source]¶

Evaluate model function for composite model.

Method adopted from lmfit.model module.

eval_components(**kwargs)[source]¶

Return dictionary of name, results for each component.

Method adopted from lmfit.model module.

property param_names¶

Return parameter names for composite model.

Method adopted from lmfit.model module.

post_fit(fitresult)[source]¶

function that is called just after fit, can be overloaded by subclasses to add non-fitting ‘calculated parameters’

Method adopted from lmfit.model module.

class emgfit.model.EMGModel(func, cost_func='default', par_hint_args={}, vary_baseline=True, vary_shape=True, independent_vars=['x'], param_names=None, nan_policy='propagate', prefix='', name=None, **kws)[source]¶

Bases: Model

Create hyper-EMG fit model

This class inherits from the :class`lmfit.model.Model` class and creates a single-peak model. This class enables overriding of lmfit’s default residuals with emgfit’s custom cost functions, thereby enabling fits beyond standard least squares minimization.

__init__(func, cost_func='default', par_hint_args={}, vary_baseline=True, vary_shape=True, independent_vars=['x'], param_names=None, nan_policy='propagate', prefix='', name=None, **kws)[source]¶

The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. It will return an array of data to model some data as for a curve-fitting problem.

Method adopted from lmfit.model module.

Parameters:

func (callable) – Function to be wrapped.
cost_func (str, optional) –
Name of cost function to use for minimization - overrides the model’s _residual attribute.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
- If 'default' (default), use the standard residual from lmfit.
See Notes of peakfit() method for more details.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint().
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
vary_shape (bool, optional, default: False) – If False peak-shape parameters of hyper-EMG models (sigma, theta,`etas` and taus) are kept fixed at their initial values. If True the shared shape parameters are varied (ensuring identical shape parameters for all peaks).
independent_vars (list of str, optional) – Arguments to func that are independent variables (default is None).
param_names (list of str, optional) – Names of arguments to func that are to be made into parameters (default is None).
nan_policy ({'raise', 'propagate', 'omit'}, optional) – How to handle NaN and missing values in data. See Notes below.
prefix (str, optional) – Prefix used for the model.
name (str, optional) – Name for the model. When None (default) the name is the same as the model function (func).
**kws (dict, optional) – Additional keyword arguments to pass to model function.

Notes

1. Parameter names are inferred from the function arguments, and a residual function is automatically constructed.

2. The model function must return an array that will be the same size as the data being modeled.

3. nan_policy sets what to do when a NaN or missing value is seen in the data. Should be one of:

‘raise’ : raise a ValueError (default)

‘propagate’ : do nothing

‘omit’ : drop missing data

Examples

The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. Thus, a simple peak using a Gaussian defined as:

>>> import numpy as np
>>> def gaussian(x, amp, cen, wid):
...     return amp * np.exp(-(x-cen)**2 / wid)

can be turned into a Model with:

>>> gmodel = Model(gaussian)

this will automatically discover the names of the independent variables and parameters:

>>> print(gmodel.param_names, gmodel.independent_vars)
['amp', 'cen', 'wid'], ['x']

_get_state()[source]¶

Save a Model for serialization.

Note: like the standard-ish ‘__getstate__’ method but not really useful with Pickle.

Method modified from lmfit.model module.

_override_residual(cost_func)[source]¶

Override residual method

Parameters:

cost_func (str, optional) –

Name of cost function to use for minimization - overrides the model’s _residual attribute.

If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:

\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:

\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
If 'default' (default), use the standrad residual from lmfit.

See Notes of peakfit() method for more details.

_reprstring(long=False)[source]¶

Print representation string

Method adopted from lmfit.model module.

_set_state(state, funcdefs=None)[source]¶

Restore Model from serialization.

Note: like the standard-ish ‘__setstate__’ method but not really useful with Pickle.

Method adopted from lmfit.model module.

Parameters:

state – Serialized state from _get_state.
funcdefs (dict, optional) – Dictionary of function definitions to use to construct Model.

fit(data, params=None, weights=None, fitted_peaks=None, method='least_squares', iter_cb=None, scale_covar=True, verbose=False, fit_kws=None, nan_policy=None, calc_covar=True, max_nfev=None, coerce_farray=True, **kwargs)[source]¶

Fit the model to the data using the supplied Parameters.

Method modified from lmfit.model module.

Parameters:

data (array_like) – Array of data to be fit.
params (Parameters, optional) – Parameters to use in fit (default is None).
weights (array_like, optional) – Weights to use for the calculation of the fit residual [i.e., weights*(data-fit)]. Default is None; must have the same size as data.
fitted_peaks (list of emgfit.spectrum.peak) – List of peaks to be fitted.
method (str, optional) – Name of fitting method to use (default is ‘least_squares’).
iter_cb (callable, optional) – Callback function to call at each iteration (default is None).
scale_covar (bool, optional) – Whether to automatically scale the covariance matrix when calculating uncertainties (default is True).
verbose (bool, optional) – Whether to print a message when a new parameter is added because of a hint (default is True).
fit_kws (dict, optional) – Options to pass to the minimizer being used.
nan_policy ({'raise', 'propagate', 'omit'}, optional) – What to do when encountering NaNs when fitting Model.
calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True) for solvers other than ‘leastsq’ and ‘least_squares’. Requires the numdifftools package to be installed.
max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The default value depends on the fitting method.
coerce_farray (bool, optional) – Whether to coerce data and independent data to be ndarrays with dtype of float64 (or complex128). If set to False, data and independent data are not coerced at all, but the output of the model function will be. (default is True)
**kwargs (optional) – Arguments to pass to the model function, possibly overriding parameters.

Returns:

EMGModelResult

Return type:

emgfit.model.EMGModelResult

Notes

1. if params is None, the values for all parameters are expected to be provided as keyword arguments. Mixing params and keyword arguments is deprecated (see Model.eval).

2. all non-parameter arguments for the model function, including all the independent variables will need to be passed in using keyword arguments.

3. Parameters are copied on input, so that the original Parameter objects are unchanged, and the updated values are in the returned EMGModelResult.

Examples

Take t to be the independent variable and data to be the curve we will fit. Use keyword arguments to set initial guesses:

>>> result = my_model.fit(data, tau=5, N=3, t=t)

Or, for more control, pass a Parameters object.

>>> result = my_model.fit(data, params, t=t)

class emgfit.model.EMGModelResult(model, params, data=None, weights=None, fitted_peaks=None, method='least_squares', fcn_args=None, fcn_kws=None, iter_cb=None, scale_covar=False, nan_policy='propagate', calc_covar=False, max_nfev=None, **fit_kws)[source]¶

Bases: ModelResult

Result from an EMGModel fit.

__init__(model, params, data=None, weights=None, fitted_peaks=None, method='least_squares', fcn_args=None, fcn_kws=None, iter_cb=None, scale_covar=False, nan_policy='propagate', calc_covar=False, max_nfev=None, **fit_kws)[source]¶

Parameters:

model (Model) – Model to use.
params (Parameters) – Parameters with initial values for model.
data (array_like) – Ordinate values of data to be modeled.
weights (array_like, optional) – Weights to multiply (data-model) for default fit residual.
fitted_peaks (list of emgfit.spectrum.spectrum.peak) – List of fitted peak objects.
method (str, optional) – Name of minimization method to use (default is ‘least_squares’).
fcn_args (sequence, optional) – Positional arguments to send to model function.
fcn_kws (dict, optional) – Keyword arguments to send to model function.
iter_cb (callable, optional) – Function to call on each iteration of fit.
scale_covar (bool, optional) – Whether to scale covariance matrix for uncertainty evaluation.
nan_policy ({'raise', 'propagate', 'omit'}, optional) – What to do when encountering NaNs when fitting Model.
calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True) for solvers other than ‘leastsq’ and ‘least_squares’. Requires the numdifftools package to be installed.
max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The default value depends on the fitting method.
**fit_kws (optional) – Keyword arguments to send to minimization routine.

property cost_func¶: Name of the cost function used in the fit

fit(data=None, fitted_peaks=None, params=None, weights=None, method=None, nan_policy=None, **kwargs)[source]¶

Re-perform fit for a Model, given data and params.

Method modified from lmfit.model module.

Parameters:

data (array_like, optional) – Data to be modeled.
fitted_peaks (list of emgfit.spectrum.spectrum.peak) – List of peak objects to be fitted.
params (Parameters, optional) – Parameters with initial values for model.
weights (array_like, optional) – Weights to multiply (data-model) for fit residual - only relevant when using the default lmfit cost function.
method (str, optional) – Name of minimization method to use (default is ‘least_squares’).
nan_policy ({'raise', 'propagate', 'omit'}, optional) – What to do when encountering NaNs when fitting Model.
**kwargs (optional) – Keyword arguments to send to minimization routine.

property fit_model¶: Name of hyper-EMG model used in fit

load(fp, funcdefs=None, **kws)[source]¶

Load JSON representation of ModelResult from a file-like object.

Parameters:

fp (file-like object) – An open and .read()-supporting file-like object.
funcdefs (dict, optional) – Dictionary of function definitions to use to construct Model.
**kws (optional) – Keyword arguments that are passed to loads.

Returns:

ModelResult created from fp.

Return type:

ModelResult

See also

dump, loads, json.load

loads(s, funcdefs=None, **kws)[source]¶

Load ModelResult from a JSON string.

Parameters:

s (str) – String representation of ModelResult, as from dumps.
funcdefs (dict, optional) – Dictionary of custom function names and definitions.
**kws (optional) – Keyword arguments that are passed to json.loads.

Returns:

ModelResult instance from JSON string representation.

Return type:

ModelResult

See also

load, dumps, json.dumps

property par_hint_args¶: User-specified parameter hints used in the fit

property vary_baseline¶: Whether the amplitude of the uniform background was varied in the fit

property vary_shape¶: Whether the peak shape (and scale) parameters were varied in the fit

property x¶: Ordinate values of fitted data

property x_fit_cen¶: Centre of fitted ordinate range

property x_fit_range¶: Range of fitted ordinate values

property y¶: Abscissa values of fitted data

property y_err¶: Uncertainties of abscissa values of fitted data

emgfit.model._buildmodel(state, funcdefs=None)[source]¶

Build EMGModel from saved state.

Intended for internal use only.

Function modified from lmfit.model module.

emgfit.model.coerce_arraylike(x)[source]¶

coerce lists, tuples, and pandas Series, hdf5 Groups, etc to an ndarray float64 or complex128, but leave other data structures and objects unchanged

Function adopted from lmfit.model module.

emgfit.model.load_model(fname, funcdefs=None)[source]¶

Load a saved EMGModel from a file.

Function modified from lmfit.model module.

Parameters:

fname (str) – Name of file containing saved EMGModel.
funcdefs (dict, optional) – Dictionary of custom function names and definitions.

Returns:

Model object loaded from file.

Return type:

Model

emgfit.model.load_modelresult(fname, funcdefs=None)[source]¶

Load a saved EMGModelResult from a file.

Function modified from lmfit.model module.

Parameters:

fname (str) – Name of file containing saved EMGModelResult.
funcdefs (dict, optional) – Dictionary of custom function names and definitions.

Returns:

EMGModelResult object loaded from file.

Return type:

EMGModelResult

emgfit.model.save_model(model, fname)[source]¶

Save an EMGModel to a file.

Function adopted from lmfit.model module.

Parameters:

model (EMGModel) – Model to be saved.
fname (str) – Name of file for saved Model.

emgfit.model.save_modelresult(modelresult, fname)[source]¶

Save an EMGModelResult to a file.

Function adopted from lmfit.model module.

Parameters:

modelresult (EMGModelResult) – EMGModelResult to be saved.
fname (str) – Name of file for saved EMGModelResult.

emgfit.sample module¶

Module for creating simulated time-of-flight mass spectra with Gaussian and hyper-exponentially-modified line shapes.

emgfit.sample.Gaussian_rvs(mu, sigma, N_samples=None)[source]¶

Draw random samples from a Gaussian probability density function

Parameters:

mu (float) – Nominal position of simulated peak (mean of Gaussian).
sigma (float) – Nominal standard deviation of the simulated Gaussian peak.
N_samples (int, optional, default: 1) – Number of random events to sample.

Returns:

Array with simulated events.

Return type:

numpy.ndarray of floats

emgfit.sample._h_m_i_rvs(mu, sigma, tau_m, N_i)[source]¶: Helper function for definition of h_m_emg_rvs

emgfit.sample._h_p_i_rvs(mu, sigma, tau_p, N_i)[source]¶: Helper function for definition of h_p_emg_rvs

emgfit.sample.fit_simulated_spectra(spec, fit_result, alt_result=None, N_spectra=1000, seed=None, randomize_ref_mus_and_amps=False, MC_shape_par_samples=None, n_cores=-1)[source]¶

Fit spectra simulated via sampling from a reference distribution

This function performs fits of many simulated spectra. The simulated spectra are created by sampling events from the best-fit PDF asociated with fit_result (as e.g. needed for a parametric bootstrap). The alt_model can be used to perform the fits with a different distribution than the reference PDF used for the event sampling.

Parameters:

spec (emgfit.spectrum.spectrum) – Spectrum object to perform bootstrap on.
fit_result (lmfit.model.ModelResult) – Fit result object holding the best-fit distribution to sample from.
alt_result (lmfit.model.ModelResult, optional) – Fit result object holding a prepared fit model to be used for the fitting. Defaults to the fit model stored in fit_result.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which typically yields statistical uncertainty estimates with a Monte Carlo uncertainty of a few percent.
randomize_ref_mus_and_amps (bool, default: False) – If True, the peak and background amplitudes and the peak centroids of the reference spectrum to sample from will be varied assuming normal distributions around the best-fit values with standard deviations given by the respective standard errors stored in fit_result.
MC_shape_par_samples (pandas.DataFrame) – Monte Carlo shape parameter samples to use in the fitting.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.

Returns:

MinimizerResults obtained in the fits of the simulated spectra.

Return type:

numpy.ndarray of lmfit.minimizer.MinimizerResult

Note

The randomize_ref_mus_and_amps option allows one to propagate systematic uncertainties in the determination of the reference parameters into the Monte Carlo results. If varying the centroid and amplitude parameters of the reference spectrum, the standard deviations of the parameter distributions will be taken as the respective standard errors determined by lmfit (see lmfit fit report table) and might not be consistent with the area and mass uncertainty estimates shown in the peak properties table.

emgfit.sample.h_emg_rvs(mu, sigma, theta, *t_args, N_samples=None)[source]¶

Draw random samples from a hyper-EMG probability density function

Parameters:

mu (float) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
theta (float) – Mixing weight of pos. & neg. skewed EMG distributions.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_m1, eta_m2, …], [tau_m1, tau_m2, …], [eta_p1, eta_p2, …], [tau_p1, tau_p2, …]]
N_samples (int, optional, default: 1) – Number of random events to sample.

Returns:

Array with simulated events.

Return type:

numpy.ndarray of floats

emgfit.sample.h_m_emg_rvs(mu, sigma, *t_args, N_samples=None)[source]¶

Draw random samples from negative skewed hyper-EMG probability density

Parameters:

mu (float) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
theta (float) – Mixing weight of pos. & neg. skewed EMG distributions.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_m1, eta_m2, …], [tau_m1, tau_m2, …]]
N_samples (int, optional, default: 1) – Number of random events to sample.

Returns:

Array with simulated events.

Return type:

numpy.ndarray of floats

emgfit.sample.h_p_emg_rvs(mu, sigma, *t_args, N_samples=None)[source]¶

Draw random samples from pos. skewed hyper-EMG probability density

Parameters:

mu (float) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_p1, eta_p2, …], [tau_p1, tau_p2, …]]
N_samples (int, optional, default: 1) – Number of random events to sample.

Returns:

Array with simulated events.

Return type:

numpy.ndarray of floats

emgfit.sample.resample_events(df, N_events=None, x_cen=None, x_range=0.02, out='hist')[source]¶

Create simulated spectrum via non-parametric resampling from df.

The simulated data is obtained through resampling from the specified dataset with replacement.

Parameters:

df (pandas.DataFrame) – Original histogrammed spectrum data to re-sample from.
N_events (int, optional) – Number of events to create via non-parametric re-sampling, defaults to number of events in original DataFrame df.
x_cen (float [u/z], optional) – Center of mass range to re-sample from. If None, re-sample from full mass range of input data df.
x_range (float [u/z], optional) – Width of mass range to re-sample from. Defaults to 0.02 u. x_range is only relevant if a x_cen argument is specified.
out (str, optional) –
Output format of sampled data. Options:
- 'hist' for binned mass spectrum (default). The centres of the mass bins must be specified with the bin_cens argument.
- 'array' for unbinned array of single ion and background events.

Returns:

If out=’hist’ a dataframe with a histogram of the format [bin centre, counts in bin] is returned. If out=’array’ an unbinned array with the x-values of single ion or background events is returned.

Return type:

pandas.Dataframe or numpy.ndarray

emgfit.sample.simulate_events(shape_pars, mus, amps, bkg_c, N_events, x_min, x_max, out='hist', scl_facs=None, N_bins=None, bin_cens=None)[source]¶

Create simulated detector events drawn from a user-defined probability density function (PDF)

Events can either be output as a list of single events (mass stamps) or as a histogram. In histogram output mode, uniform binning is easily realized by specifying the N_bins argument. More control over the binning can be achieved by parsing the desired bin centers to the bin_cens argument (e.g. for non-uniform binning).

Parameters:

shape_pars (dict) – Peak-shape parameters to use for sampling. The dictionary must follow the structure of the shape_cal_pars attribute of the spectrum class.
mus (float or list of float) – Nominal peak positions of peaks in simulated spectrum.
amps (float or list of float [(counts in peak)*(bin width in u)]) – Nominal amplitudes of peaks in simulated spectrum.
bkg_c (float [counts per bin], optional, default: 0.0) – Nominal amplitude of uniform background in simulated spectrum.
x_min (float) – Beginning of sampling x-range.
x_max (float) – End of sampling x-range.
scl_facs (float or list of float, optional) – Scale factors to use for scaling the scale-dependent shape parameters in shape_pars to a given peak before sampling events. If None, no shape-parameter scaling is applied.
N_events (int, optional, default: 1000) – Total number of events to simulate (signal and background events).
out (str, optional) –
Output format of sampled data. Options:
- 'hist' for binned mass spectrum (default). The centres of the mass bins must be specified with the bin_cens argument.
- 'array' for unbinned array of single ion and background events.
N_bins (int, optional) – Number of uniform bins to use in 'hist' output mode. The outer edges of the first and last bin are fixed to the start and end of the sampling range respectively (i.e. x_min and x_max). In between, bins are distributed with a fixed spacing of (x_max-x_min)/N_bins.
bin_cens (numpy.ndarray) – Centres of bins to use in 'hist' output mode. This argument allows the realization of non-uniform binning. Bin edges are centred between neighboring bins. Note: Bins outside the sampling range defined with x_min and x_max will be empty.

Returns:

If out=’hist’ a dataframe with a histogram of the format [bin centre, counts in bin] is returned. If out=’array’ an unbinned array with the x-values of single ion or background events is returned.

Return type:

pandas.Dataframe or numpy.ndarray

Notes

Random events are created via custom hyper-EMG extensions of Scipy’s scipy.stats.exponnorm.rvs() method.

Currently, all simulated peaks have identical width and shape (no re-scaling of mass-dependent shape parameters to a peak’s mass centroid).

Mind the different units for peak amplitudes `amps` (<counts in peak> * <bin width in x-axis units>) and the background level `bkg_c` (counts per bin). When spectrum data is simulated counts are distributed between the different peaks and the background with probability weights amps / <bin width in u> and bkg_c * <number of bins>, respectively. As a consequence, simply changing N_events (while keeping all other arguments constant), will cause amps and bkg_c to deviate from their nominal units.

emgfit.sample.simulate_spectrum(spec, x_cen=None, x_range=None, mus=None, amps=None, scl_facs=None, bkg_c=None, N_events=None, copy_spec=False)[source]¶

Create a simulated spectrum using the attributes of a reference spectrum

The peak shape of the sampling probability density function (PDF) follows the shape calibration of the reference spectrum (spec). By default, all other parameters of the sampling PDF are identical to the best-fit parameters of the reference spectrum. If desired, the positions, amplitudes and number of peaks in the sampling PDF as well as the background level can be changed with the mus, amps and bkg_c arguments.

Parameters:

spec (spectrum) – Reference spectrum object whose best-fit parameters will be used to sample from.
mus (float or list of float, optional) – Nominal peak centres of peaks in simulated spectrum. Defaults to the mus of the reference spectrum fit.
amps (float or list of float [(counts in peak)*(bin width in u)], optional) – Nominal amplitudes of peaks in simulated spectrum. Defaults to the amplitudes of the reference spectrum fit.
scl_facs (float or list of float, optional) – Scale factors to use for scaling the scale-dependent shape parameters in shape_pars to a given peak before sampling events. Defaults to the scale factors asociated with the fit results stored in the reference spectrum’s spectrum.fit_results attribute.
bkg_c (float [counts per bin], optional) – Nominal amplitude of uniform background in simulated spectrum. Defaults to the c_bkg obtained in the fit of the first peak in the reference spectrum.
x_cen (float, optional) – Center of simulated x-range. Defaults to x_cen of spec.
x_range (float, optional) – Covered x-range of simulated spectrum. Defaults to x_range of spectrum.
N_events (int, optional) – Number of ion events to simulate (including background events). Defaults to total number of events in spec.
copy_spec (bool, optional, default: False) – If False (default), this function returns a fresh spectrum object created from the simulated mass data. If True, this function returns an exact copy of spec with only the data attribute replaced by the new simulated mass data.

Returns:

If copy_spec = False (default) a fresh spectrum object holding the simulated mass data is returned. If copy_spec = True, a copy of the reference spectrum spec is returned with only the data attribute replaced by the new simulated mass data.

Return type:

spectrum

Notes

Random events are created via custom Hyper-EMG extensions of Scipy’s scipy.stats.exponnorm.rvs() method.

The returned spectrum follows the binning of the reference spectrum.

Mind the different units for peak amplitudes amps (<counts in peak> * <bin width in x-axis units>) and the background level bkg_c (counts per bin). When spectrum data is simulated counts are distributed between the different peaks and the background with probability weights amps / <bin width in x-axis units> and bkg_c * <number of bins>, respectively. As a consequence, simply changing N_events (while keeping all other arguments constant), will cause amps and bkg_c to deviate from their nominal units.

emgfit.spectrum module¶

Module for fitting time-of-flight mass spectra.

emgfit.spectrum._strip_pref(s)[source]¶: Strip peak prefix from parameter string

emgfit.spectrum._strip_prefs(d)[source]¶: Strip peak prefixes from parameter names

class emgfit.spectrum.peak(x_pos, species, m_AME=None, m_AME_error=None, Ex=0.0, Ex_error=0.0, lit_src='AME2020')[source]¶

Bases: object

Object storing all relevant information about a mass peak.

Most peak attributes are intialized as None and are later automatically updated by methods of the spectrum class, e.g. spectrum.determine_peak_shape() or spectrum.fit_peaks().

Variables:

x_pos (float [u/z]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter for the underlying Gaussian peak centroid mu will be initialized at this value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak. Species strings follow the :-notation of chemical substances. Examples: '1K39:-1e', 'K39:-e', '3H1:1O16:-1e'. Do not forget to substract the electron, otherwise the atomic not the ionic mass would be used as literature value! Alternatively, tentative assigments can be made by adding a '?' at the end of the species string (e.g.: 'Sn100:-1e?', '?', …).
comment (str) – User comment for peak.
A (int) – Atomic mass number of the species.
z (int, optional) – Charge state of the species.
scl_coeff (float, optional) – Scale coefficient used to scale the scale-dependent reference shape parameters when fitting this peak. Defaults to 1.
m_AME (float [u], optional) – Ionic literature mass value, from AME or user-defined.
m_AME_error (float [u], optional) – Literature mass uncertainty, from AME or user-defined.
extrapolated (bool) – Boolean flag for extrapolated AME mass values (equivalent to ‘#’ in AME).
fit_model (str) – Name of model used to fit peak.
cost_func (str) – Type of cost function used to fit peak ('chi-square' or 'MLE').
method (str) – Name of optimization algorithm used to minimize cost function. This attribute is only shown in the peak properties table when any minimizers other than least_squares() were used.
red_chi (float) – Reduced chi-squared of peak fit. If the peak was fitted using 'MLE', red_chi should be taken with caution.
area (float [counts]) – Number of total counts in the peak (calculated from amplitude parameter amp of peak fit).
area_error (float [counts]) – Uncertainty of the total number of counts in the peak.
m_ion (float [u]) – Ionic mass value obtained in peak fit (after mass recalibration and corrected for respective charge state).
rel_stat_error (float) – Relative statistical uncertainty of m_ion.
rel_recal_error (float) – Relative uncertainty of m_ion due to mass recalibration.
rel_peakshape_error (float) – Relative peak-shape uncertainty of m_ion.
rel_mass_error (float) – Total relative mass uncertainty of m_ion (excluding systematics!). Includes statistical, peak-shape and recalibration uncertainty.
atomic_ME_keV (float [keV]) – (Atomic) mass excess corresponding to m_ion.
mass_error_keV (float [keV]) – Total mass uncertainty of m_ion (excluding systematics!).
m_dev_keV (float [keV]) – Deviation from literature value (m_ion - m_AME).

__init__(x_pos, species, m_AME=None, m_AME_error=None, Ex=0.0, Ex_error=0.0, lit_src='AME2020')[source]¶

Instantiate a new peak object.

If a valid ion species is assigned with the species argument the corresponding literature values will automatically be fetched from the AME database.

If different literature values are to be used, the literature mass or mass uncertainty can be user-defined with m_AME and m_AME_error. This is useful for isomers and in cases where more recent measurements haven’t been included in the AME yet.

Parameters:

x_pos (float [u/z]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter for the (Gaussian) peak centroid mu will be initialized at this value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak. Species strings follow the :-notation of chemical substances. Examples: '1K39:-1e', 'K39:-e', '3H1:1O16:-1e'. Do not forget to substract the electron from singly-charged species, otherwise the atomic not the ionic mass will be used as literature value! Alternatively, tentative assigments can be made by adding a '?' at the end of the species string (e.g.: 'Sn100:-1e?', '?', …).
m_AME (float [u], optional) – User-defined literature mass value. Overwrites value fetched from AME. Useful for isomers or to use more up-to-date values.
m_AME_error (float [u], optional) – User-defined literature mass uncertainty. Overwrites value fetched from AME.
Ex (float [keV], optional, default : 0.0) – Isomer excitation energy (in keV) to add to ground-state literature mass. Irrelevant if the m_AME argument is used or if the peak is not labelled as isomer.
Ex_error (float [keV], optional, default : 0.0) – Uncertainty of isomer excitation energy (in keV) to add in quadrature to ground-state literature mass uncertainty. Irrelevant if the m_AME_error argument is used or if the peak is not labelled as isomer.
lit_src (str, optional, default: 'AME2020') – Source of literature mass data (either ‘AME2016’ or ‘AME2020’). If AME2016 is used, 'lit_src: AME2016' is added to the peak comment.

property abs_z¶

Non-zero absolute of the species’ charge state.

Defaults to 1 if species is undefined or uncharged.

print_properties()[source]¶: Print the most relevant peak properties.

update_lit_values(Ex=0.0, Ex_error=0.0, lit_src='AME2020')[source]¶

Updates peak attributes with AME values for specified species.

Updates the m_AME, m_AME_error, extrapolated, A, z and m_dev_keV peak attributes with AME values.

Parameters:

Ex (float [keV], optional, default : 0.0) – Isomer excitation energy (in keV) to add to ground-state literature mass.
Ex_error (float [keV], optional, default : 0.0) – Uncertainty of isomer excitation energy (in keV) to add in quadrature to ground-state literature mass uncertainty.
lit_src (str, optional, default : 'AME2020') – Source of literature mass data (either ‘AME2016’ or ‘AME2020’). If AME2016 is used, 'lit_src: AME2016' is added to the peak comment.

class emgfit.spectrum.spectrum(filename=None, m_start=None, m_stop=None, skiprows=18, show_plot=True, df=None, resolving_power=300000.0, default_fit_range=None, default_lit_src='AME2020')[source]¶

Bases: object

Object storing spectrum data, associated peaks and all relevant fit results - the workhorse of emgfit.

Variables:

input_filename (str) – Name of input file containing the spectrum data.
spectrum_comment (str, default: '-') – User comment for entire spectrum (helpful for further processing after).
fit_model (str) – Name of model used for peak-shape determination and further fitting.
index_shape_calib (int) – Index of peak used for peak-shape calibration.
red_chi_shape_cal (float) – Reduced chi-squared of peak-shape determination fit.
fit_range_shape_cal (float [u/z]) – Fit range used for peak-shape calibration.
shape_cal_result (emgfit.model.EMGModelResult) – Fit result obtained in peak-shape calibration.
shape_cal_pars (dict) – Model parameter values obtained in peak-shape calibration.
shape_cal_errors (dict) – Model parameter uncertainties obtained in peak-shape calibration.
share_shape_pars (bool, default: True) – Whether to enforce a shared peak shape for all fitted peaks. The shared shape parameters are pre-determined with determine_peak_shape().
scale_shape_pars (bool, default: False) – Whether to scale the scale-dependent parameters obtained in the peak-shape calibration with the given peak’s peak.scl_coeff.
scale_shape_to_peak_cen (bool, default: False) – Whether to scale the scale-dependent parameters obtained in the peak-shape calibration to the centroid of the given peak.
index_mass_calib (int) – Peak index of mass calibrant peak.
determined_A_stat_emg (bool) – Boolean flag for whether A_stat_emg was determined for this spectrum specifically using the determine_A_stat_emg() method. If True, A_stat_emg was set using determine_A_stat_emg(), otherwise the default value emgfit.config.A_stat_emg_default from the config module was used. For more details see docs of determine_A_stat_emg() method.
A_stat_emg (float) – Constant of proportionality for calculation of the statistical mass uncertainties. Defaults to emgfit.config.A_stat_emg_default as defined in the config module, unless the determine_A_stat_emg() method is run.
A_stat_emg_error (float) – Uncertainty of A_stat_emg.
recal_fac (float, default: 1.0) – Scaling factor applied to m_ion in mass recalibration.
rel_recal_error (float) – Relative uncertainty of recalibration factor recal_fac.
recal_facs_pm (dict) – Modified recalibration factors obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation, respectively.
eff_mass_shifts (numpy.ndarray of dict) – Maximal effective mass shifts for each peak obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation and only keeping the shift with the larger absolute magnitude. The eff_mass_shifts array contains a dictionary for each peak; the dictionaries have the following structure: {‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…} For more details see docs of _eval_peakshape_errors().
area_shifts (numpy.ndarray of dict) – Maximal area change for each peak obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation and only keeping the shift with the larger absolute magnitude. The eff_mass_shifts array contains a dictionary for each peak; the dictionaries have the following structure: {‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…} For the mass calibrant the dictionary holds the absolute shifts of the calibrant peak centroid (calibrant centroid shift). For more details see docs of _eval_peakshape_errors().
peaks_with_errors_from_resampling (list of int) – List with indeces of peaks whose statistical mass and area uncertainties have been determined by fitting synthetic spectra resampled from the best-fit model (see get_errors_from_resampling()).
MCMC_par_samples (pandas.DataFrame) – Shape parameter samples obtained via Markov chain Monte Carlo (MCMC) sampling with _get_MCMC_par_samples().
MC_recal_facs (list of float) – Recalibration factors obtained in fits with Markov Chain Monte Carlo (MCMC) shape parameter samples in get_MC_peakshape_errors().
peaks_with_MC_PS_errors (list of int) – List with indeces of peaks for which peak-shape errors have been determined by re-fitting with shape parameter sets from Markov Chain Monte Carlo sampling (see get_MC_peakshape_errors()).
peaks (list of peak) – List containing all peaks associated with the spectrum sorted by ascending mass. The index of a peak within the peaks list is referred to as the peak_index.
fit_results (list of emgfit.model.EMGModelResult) – List containing fit results (emgfit.model.EMGModelResult objects) for peaks associated with spectrum.
blinded_peaks (list of int) – List with indeces of peaks whose mass values and peak positions are to be hidden to enable blind analysis. The mass values will be unblinded upon export of the analysis results.
data (pandas.DataFrame) – Histogrammed spectrum data.
mass_number (int) – Atomic mass number associated with central bin of spectrum.
resolving_power (float, optional) – Typical resolving power of the spectrometer at FWHM level.
default_fit_range (float [u/z]) – Default x-range for fits, scaled to mass_number of spectrum.
default_lit_src (str, optional) – Source of literature mass data - either ‘AME2020’ (default) or ‘AME2016’. If AME2016 is used, 'lit_src: AME2016' is added as flag to the respective peak comments.

Notes

The mass_number is used for re-scaling of the default model parameters to the mass of interest. It is calculated upon data import by taking the median of all mass bin centers (after initial cutting of the spectrum) and rounding to the closest integer. This accounts for spectra potentially containing several mass units.

The default_fit_range is scaled to the spectrum’s mass_number using the relation: \(\text{default_fit_range} = 0.01\,u \cdot (\text{mass_number}/100)\)

__init__(filename=None, m_start=None, m_stop=None, skiprows=18, show_plot=True, df=None, resolving_power=300000.0, default_fit_range=None, default_lit_src='AME2020')[source]¶

Create a spectrum object from histogrammed mass data

By default the data is loaded from an input file of the format: two-column .csv- or .txt-file with tab-separated values (column 1: mass bin, column 2: counts in bin).

Alternatively, the data can be passed as a DataFrame to the df argument.

Optionally the spectrum can be cut to a specified fit range using the m_start and m_stop parameters. Mass data outside this range will be discarded and excluded from further analysis.

If show_plot is True, a plot of the spectrum is shown including vertical markers for the m_start and m_stop mass cut-offs (if applicable).

Parameters:

filename (str, optional) – Filename of mass spectrum to analyze. If the input file is not located in the working directory the directory path has to be included in filename, too. If no filename is given, data must be provided via df argument. m_start : float [u/z], optional Start of fit range, data at lower m/z will be discarded. m_stop : float [u/z], optional Stop of fit range, data at higher m/z will be discarded.
show_plot (bool, optional, default: True) – If True, shows a plot of full spectrum with vertical markers for m_start and m_stop cut-offs.
df (pandas.DataFrame, optional) – DataFrame with spectrum data to use, this enables the creation of a spectrum object from a DataFrame instead of from an external file.
resolving_power (float, optional) – Typical resolving power of the spectrometer at FWHM level. Defaults to 3e05.
default_fit_range (float [u/z], optional) – Default x-range for fits, scaled to mass_number of spectrum. Defaults to 0.01*(mass_number/100).
default_lit_src (str, optional) – Source of literature mass data - either ‘AME2020’ (default) or ‘AME2016’. If AME2016 is used, 'lit_src: AME2016' is added as flag to the respective peak comments.

Notes

The option to import data via the df argument was added to enable the processing of bootstrapped spectra as regular spectrum objects in the determine_A_stat_emg() method. This feature is primarily intended for internal use. The parsed DataFrame must have an index column named ‘m/z [u]’ and a value column named ‘Counts’.

_add_peak_markers(peaks=None, vline_max=0.93)[source]¶

Internal function for adding peak markers to current figure object.

Place this function inside spectrum methods between plt.figure() and plt.show().

Parameters:

peaks (list of peak) – List of peaks to add peak markers for.
vline_max (float) – Maximal y-value of marker line - in coordinate axes units.

_calc_m_ion(peak_index, fit_model, pars, recal_fac=None)[source]¶

Calculate the ionic mass of a given peak.

Parameters:

peak_index (int) – Index of peak to calculate m_ion for.
fit_model (str) – Name of fit model used to obtain pars.
pars (lmfit.parameter.Parameters) – Parameters to use for calculation.
recal_fac (float, optional) – Mass recalibration factor to use. Defaults to recal_fac attribute of the spectrum.

_eval_MC_peakshape_errors(peak_indeces=[], fit_result=None, verbose=True, show_hists=False, N_samples=1000, n_cores=-1, seed=872, rerun_MCMC_sampling=False, **MCMC_kwargs)[source]¶

Get peak-shape uncertainties for a fit result by re-fitting with many different MC-shape-parameter sets

This method is primarily intended for internal usage.

A representative subset of the shape parameter sets which are supported by the data is obtained by performing MCMC sampling on the peak-shape calibrant. If this has not already been done using the map_par_covar option in determine_peak_shape(), the _get_MCMC_par_samples() method will be automatically called here.

The peaks specified by peak_indeces will be fitted with N_samples different shape parameter sets. The peak-shape uncertainties are then estimated as the RMS deviation of the obtained values from the best-fit values.

The mass calibrant must either be included in peak_indeces or must have been processed with this method upfront (using the same N_samples and seed arguments to ensure identical sets of peak-shapes).

Parameters:

peak_indeces (int or list of int) – Indeces of peaks to evaluate MC peak-shape uncertainties for. The peaks of interest must belong to the same fit_result.
fit_result (EMGModelResult, optional) – Fit result for which MC peak-shape uncertainties are to be evaluated for. Defaults to the fit result stored for the peaks of interest in the spectrum.fit_results spectrum attribute.
verbose (bool, optional) – Whether to print status updates.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas obtained with the MC shape parameter sets are shown. Black vertical lines indicate the best-fit values stored in fit_result.
N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
rerun_MCMC_sampling (bool, optional) – When False (default) pre-existing MCMC parameter samples (e.g. obtained with determine_peak_shape()) are used. If True or when there’s no pre-existing MCMC samples, the MCMC sampling will be performed by this method.
**MCMC_kwargs – Keyword arguments to send to _get_MCMC_par_samples() for control over the MCMC sampling.

Returns:

Peak-shape mass errors [u], peak-shape area errors [counts] Both arrays have the same length as peak_indeces.

Return type:

array of floats, array of floats

Notes

For details on MCMC sampling see docs of _get_MCMC_par_samples().

This method only supports peaks that belong to the same fit result. If peaks in multiple fit_results are to be treated or the peak properties are to be updated with the refined peak-shape errors use get_MC_peakshape_errors() which wraps around this method.

_eval_peakshape_errors(peak_indeces=[], fit_result=None, verbose=False, show_shape_err_fits=False)[source]¶

Calculate the relative peak-shape uncertainty of the specified peaks.

This internal method is automatically called by the :meth:`fit_peaks` and :meth:`fit_calibrant` methods and does not need to be run directly by the user.

The peak-shape uncertainties are obtained by re-fitting the specified peaks with each shape parameter individually varied by plus and minus 1 sigma and recording the respective shift of the peak centroids w.r.t the original fit. From the shifted IOI centroids and the corresponding shifts of the calibrant centroid, effective mass shifts are determined. For each varied parameter, the larger of the two eff. mass shifts are then added in quadrature to obtain the total peak-shape uncertainty. See Notes section below for a detailed explanation of the peak-shape error evaluation scheme.

Note: All peaks in the specified peak_indeces list must have been fitted in the same multi-peak fit (and hence have the same lmfit ModelResult fit_result)!

This routine does not yield a peak-shape error for the mass calibrant, since this is zero by definition. Instead, for the mass calibrant the absolute shifts of the peak centroid are calculated and stored in the eff_mass_shifts_pm and eff_mass_shifts dictionaries.

Parameters:

peak_indeces (list) – List containing indeces of peaks to evaluate peak-shape uncertainty for, e.g. to evaluate peak-shape error of peaks 0 and 3 use peak_indeces=[0,3].
fit_result (emgfit.model.EMGModelResult, optional) – Fit result object to evaluate peak-shape error for.
verbose (bool, optional, default: False) – If True, print all individual eff. mass shifts obtained by varying the shape parameters.
show_shape_err_fits (bool, optional, default: False) – If True, show individual plots of re-fits for peak-shape error determination.

See also

get_MC_peakshape_errors()

Notes

sigma,`theta`, all eta and all tau model parameters are considered “shape parameters” and varied by plus and minus one standard deviation in the peak-shape uncertainty evaluation. The peak amplitude, centroids and the baseline are always freely varying.

The “peak-shape uncertainty” refers to the mass uncertainty due to uncertainties in the determination of the peak-shape parameters and due to deviations between the shape-calibrant and IOI peak shapes. Simply put, the peak-shape uncertainties are estimated by evaluating how much a given peak’s ionic mass is shifted when the shape parameters are varied by plus or minus their 1-sigma uncertainty. A peculiarity of emgfit’s peak-shape error estimation routine is that only the centroid shifts relative to the calibrant are taken into account (hence ‘effective mass shifts’).

Inspired by the approach outlined in [8], the peak-shape uncertainties are obtained via the following procedure:

Since only effective mass shifts corrected for the corresponding shifts of the calibrant peak enter the peak-shape uncertainty, at first, the absolute centroid shifts of the mass calibrant must be evaluated. There are two options for this:
- If the calibrant index is included in the peak_indeces argument, the original calibrant fit is re-performed with each shape parameter varied by plus and minus its 1-sigma confidence respectively while all other shape parameters are kept fixed at the original best-fit values. The resulting absolute “calibrant centroid shifts” are recorded and stored in the spectrum’s eff_mass_shifts_pm dictionary. The shifted calibrant centroids are further used to calculate updated mass re-calibration factors. These are stored in the recal_facs_pm dictionary. Only the larger of the two centroid shifts due to the +/-1-sigma variation of each shape parameter are stored in the spectrum’s eff_mass_shifts dictionary.
- If the calibrant is not included in the peak_indeces list, the calibrant centroid shifts and the corresponding shifted recalibration factors must already have been obtained in a foregoing mass Mass recalibration and calculation of final mass values.
All non-calibrant peaks referenced in peak_indeces are treated in a similar way. The original fit that yielded the specified fit_result is re-performed with each shape parameter varied by plus and minus its 1-sigma confidence respectively while all other shape parameters are kept fixed at the original best-fit values. However now, the effective mass shifts after correction with the corresponding updated recalibration factor are recorded and stored in the spectrum’s eff_mass_shifts_pm dictionary. Only the larger of the two eff. mass shifts caused by the +/-1-sigma variation of each shape parameter are stored in the spectrum’s eff_mass_shifts dictionary.
The estimates for the total peak-shape uncertainty of each peak are finally obtained by adding the eff. mass shifts stored in the eff_mass_shifts dictionary in quadrature.

Mind that peak-shape area uncertainties are only calculated for ions-of- interest, not for the mass calibrant.

References

_get_MCMC_par_samples(fit_result, steps=12000, burn=500, thin=250, show_MCMC_fit_result=False, covar_map_fname=None, n_cores=-1, MCMC_seed=1364)[source]¶

Map out parameter covariances and posterior distributions using Markov-chain Monte Carlo (MCMC) sampling

MCMC results saved in result_emcee attribute of fit_result.

This method is intended for internal usage and for single peaks only.

Parameters:

fit_result (emgfit.model.EMGModelResult) – Fit result to explore with MCMC sampling. Since emcee only efficiently samples unimodal distributions, fit_result should ideally hold the result of a single-peak fit (typically the shape calibrant fit).
steps (int, optional) – Number of MCMC sampling steps.
burn (int, optional) – Number of initial sampling steps to discard (“burn-in” phase).
thin (int, optional) – After sampling, only every thin-th sample is used for further treatment. It is recommended to set thin to at least half the autocorrelation time.
show_MCMC_fit_result (bool, optional, default: False) – If True, a maximum likelihood estimate is derived from the MCMC samples with best-fit values estimated by the median of the samples. The MCMC MLE result can be compared to the conventional fit_result as an additional crosscheck.
covar_map_fname (str or None (default), optional) – If not None, the parameter covariance map will be saved as “<covar_map_fname>_covar_map.png”.
n_cores (int, optional, default: -1) – Number of CPU cores to use for parallelized sampling. If -1 (default) all available cores will be used.
MCMC_seed (int, optional) – Random state for reproducible sampling.

Notes

Markov-Chain Monte Carlo (MCMC) algorithms are a powerful tool to efficiently sample the posterior probability density functions (PDFs) of model parameters. In simpler words: MCMC methods can be used to estimate the distributions of parameter values which are supported by the data. An MCMC algorithm sends out a number of so-called walkers on stochastic walks through the parameter space (in this method the number of MCMC walkers is fixed to 20 times the number of varied parameters). The MCMC walkers are initialized with randomized parameter values drawn from truncated normal distributions defined by the respective parameter bounds and best-fit parameter values and uncertainties stored in fit_result. MCMC methods are particularly important in situations where conventional sampling techniques become intractable or inefficient. For MCMC sampling emgfit deploys lmfit’s implementation of the emcee.EnsembleSampler from the emcee package [1]. Since emcee’s EnsembleSampler is only optimized for uni-modal probability density functions this method should ideally only be used to explore the parameter space of a single-peak fit.

A complication with MCMC methods is that there is usually no rigorous way to prove that the sampling chain has converged to the true PDF. Instead it is at the user’s disgression to decide after how many sampling steps a sufficient amount of convergence is achieved. Gladly, there is a number of heuristic tools that can help in judging convergence. The most common measure of the degree of convergence is the integrated autocorrelation time (tau). If the integrated autocorrelation time shows only small changes over time the MCMC chain can be assumed to be converged. To ensure a sufficient degree of convergence this method will issue a warning whenever the number of performed sampling steps is smaller than 50 times the integrated autocorrelation time of at least one parameter. If this rule of thumb is violated it is strongly advisable to run a longer chain. An additonal aid in judging the performance of the MCMC chain are the provided plots of the MCMC traces. These plots show the paths of all MCMC walkers through parameter space. Dramatic changes of the initial trace envelopes indicate that the chain has not reached a stationary state yet and is still in the so-called “burn-in” phase. Samples in this region are discarded by setting the burn argument to an appropriate number of steps (default burn-in: 500 steps).

Another complication of MCMC algorithms is the fact that nearby samples in a MCMC chain are not indepedent. To reduce correlations between samples MCMC chains are usually “thinned out” by only storing the result of every m-th MCMC iteration. The number of steps after which two samples can be assumed to be uncorrelated/independent (so to say the memory of the chain) is given by the integrated autocorrelation time (tau). To be conservative, emgfit uses a thinning interval of m = 250 by default and issues a warning when m < tau for at least one of the parameters. Since more data is discarded, a larger thinning interval comes with a loss of precision of the posterior PDF estimates. However, a sufficient amount of thinning is still advisable since emgfit’s MC peak-shape error determination (get_MC_peakshape_errors()) relies on independent parameter samples.

As a helpful measure for tuning MCMC chains, emgfit provides a plot of the “acceptance fraction” for each walker, i.e. the fraction of suggested walker steps which were accepted. The developers of emcee’s EnsembleSampler suggest acceptance fractions between 0.2 and 0.5 as a rule of thumb for a well-behaved chain. Acceptance fractions falling short of this for many walkers can indicate poor initialization or a too small number of walkers.

For more details on MCMC see the excellent introductory papers “emcee: The MCMC hammer” [1] and “Data Analysis Recipes: Using Markov Chain Monte Carlo” [2].

References

_parametric_bootstrap(fit_result, peak_indeces=[], N_spectra=1000, seed=None, n_cores=-1, show_hists=False)[source]¶

Get statistical and area uncertainties via resampling from best-fit PDF.

This method provides bootstrap estimates of the statistical errors and peak area errors by evaluating the scatter of peak centroids and areas in fits of many simulated spectra. The simulated spectra are created by sampling events from the best-fit PDF asociated with fit_result (parametric bootstrap). Refined errors are calculated for each peak individually by taking the sample standard deviations of the obtained peak centroids and areas.

Parameters:

fit_result (emgfit.model.EMGModelResult) – Fit result object to evaluate statistical errors for.
peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. errors for, e.g. to evaluate peak-shape error of peaks 1 and 2 use peak_indeces=[1,2]. Listed peaks must be included in fit_result. Defaults to all peaks contained in fit_result.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which typically yields statistical uncertainty estimates with a relative precision of a few percent.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If True, histograms of the obtained peak centroids and areas are shown. Black vertical lines indicate the best-fit values obtained from the measured data.

Returns:

Array with statistical errors [u], array with area errors [counts] Array elements correspond to the results for the peaks selected in peak_indeces (in ascending order). If peak_indeces has not been specified it defaults to the indeces of all peaks contained in fit_result.

Return type:

numpy.ndarray, numpy.ndarray

Note

All peaks for which refined errors are to be evaluated must belong to the same lmfit ModelResult fit_result. Even if refined stat. errors are only to be extracted for a subset of the peaks contained in fit_result (as specified with peak_indeces), fits will be re-performed over the same x-range as fit_result.

The resampling is performed with the emgfit.sample.simulate_events() function.

static _plot_df(df, fig=None, title='', yscale='log', vmarkers=None, thres=None, xmin=None, xmax=None, ymin=None, ymax=None, xlabel='m/z [u]', ylabel='Counts per bin', legend=[], dpi=600)[source]¶

Plots spectrum data stored in pandas.DataFrame df.

Intended for internal use.

Parameters:

df (pandas.DataFrame) – Spectrum data to plot.
fig (matplotlib.figure.Figure, optional) – Figure object to plot onto.
title (str, optional) – Optional plot title.
yscale (str, optional) – Scale of y-axis (‘linear’ or ‘log’), defaults to ‘log’.
vmarkers (list of float [u/z], optional) – List with x positions [u/z] to add vertical markers at.
thres (float, optional) – y-level to add horizontal marker at (e.g. for indicating set threshold in peak detection).
xmin (float [u/z], optional) – Lower/upper bound of x-range to plot.
xmax (float [u/z], optional) – Lower/upper bound of x-range to plot.
ymin (float, optional) – Lower/upper bound of y-range to plot.
ymax (float, optional) – Lower/upper bound of y-range to plot.
xlabel (str, optional) – Label for x-axis.
ylabel (str, optional) – Label for y-axis.
legend (list of string, optional) – List of plot legend entries.
dpi (int, optional) – Dots per inch for plot. Defaults to dpi value set in config.py.

_reset_all_fit_props()[source]¶: Reset all fit-related spectrum and peak attributes to their defaults

Note

This method also resets all mass-calibration-related peak properties and spectrum attributes to their default values but does not affect the results obtained in the peak-shape calibration.

See also

_reset_shape_calib_props()

_reset_shape_calib_props()[source]¶: Reset all shape-calibration-related spectrum and peak properties

See also

_reset_all_fit_props()

_show_blinded_report(result)[source]¶: Display fit result with positions of blinded peaks replaced by NaN

static _smooth(x, window_len=11, window='hanning')[source]¶

Smooth the data for the peak detection.

** Intended for internal use only.**

This method is based on the convolution of a normalized window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the begining and end part of the output signal.

Parameters:

x (numpy.array) – The input data
window_len (odd int, optional) – Length of the smoothing window; must be an odd integer!
window (str, optional) – Type of window from ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’, flat window will produce a moving average smoothing.

Returns:

The smoothed spectrum data.

Return type:

numpy.ndarray

Notes

length(output) != length(input), to correct this: return y[(window_len/2-1):-(window_len/2)] instead of just y.

Method adapted from: https://scipy-cookbook.readthedocs.io/items/SignalSmooth.html

Example

>>> from numpy import linspace, sin, randn
>>> t = linspace(-2,2,0.1)
>>> y = sin(t) + randn(len(t))*0.1
>>> y_smoothed = smooth(y)

_update_calibrant_props(index_mass_calib, fit_result)[source]¶

Determine recalibration factor and update mass calibrant peak properties.

Intended for internal use only.

Parameters:

index_mass_calib (int) – Index of mass calibrant peak.
fit_result (emgfit.model.EMGModelResult) – Fit result to use for calculating the re-calibration factor and for updating the calibrant properties.

_update_peak_indeces()[source]¶: Update all peak indeces

_update_peak_props(peaks, fit_result)[source]¶

Update the peak properties using the given ‘fit_result’.

Intended for internal use only.

The values of the mass calibrant will not be changed by this routine.

Parameters:

peaks (list) – List of indeces of peaks to update. (To get peak indeces, see plot markers or consult the peak properties table by calling the spectrum.show_peak_properties() method)
fit_result (emgfit.model.EMGModelResult) – EMGModelResult holding the fit results of all peaks to be updated.

Note

All peaks referenced by the ‘peaks’ argument must belong to the same fit_result. Not necessarily all peaks contained in fit_result will be updated, only the properties of peaks referenced with the peaks argument will be updated.

add_peak(x_pos, species='?', m_AME=None, m_AME_error=None, Ex=0.0, Ex_error=0.0, lit_src='default', A=None, z=None, verbose=True)[source]¶

Manually add a peak to the spectrum’s peaks list.

The position of the peak must be specified with the x_pos argument. If the peak’s ionic species is provided with the species argument the corresponding AME literature values will be added to the peak. Alternatively, user-defined literature values can be provided with the m_AME and m_AME_error arguments. This option is helpful for isomers or in case of very recent measurements that haven’t entered the AME yet.

Parameters:

x_pos (float [u/z]) – Position of peak to be added.
species (str, optional) – species label for peak to be added following the :-notation of chemical substances. If assigned, peak.m_AME, peak.m_AME_error & peak.extrapolated are automatically updated with the corresponding AME literature values.
m_AME (float [u], optional) – User-defined literature mass of peak to be added. Overwrites pre- existing peak.m_AME value.
m_AME_error (float [u], optional) – User-defined literature mass uncertainty of peak to be added. Overwrites pre-existing peak.m_AME_error.
Ex (float [keV], optional, default: 0.0) – Excitation energy of isomeric state in keV. When the peak is labelled as an isomer its literature mass peak.m_AME is calculated by adding Ex to the AME ground-state mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of the excitation energy of the isomeric state in keV. When the peak is labelled as isomer its literature mass uncertainty peak.m_AME_error is calculated by adding Ex_error and the AME uncertainty of the ground-state mass in quadrature.
lit_src (str, optional) – Source of literature mass data (either of ‘default’, ‘AME2016’ or ‘AME2020’). If ‘default’, the literature source defined globally with the default_lit_src spectrum attribute is used. If AME2016 is used, 'lit_src: AME2016' is added to the respective peak comment.
A (int, optional) – Atomic mass number of species (only relevant when species is undefined).
z (int, optional) – Charge state of species (only relevant when species is undefined).
verbose (bool, optional, default: True) – If True, a message is printed after successful peak addition. Intended for internal use only.

See also

detect_peaks(), remove_peak()

Note

Adding a peak will shift the peak indeces of all peaks at higher masses by +1.

add_peak_comment(comment, peak_index=None, x_pos=None, species='?', overwrite=False)[source]¶

Add a comment to a peak

By default the comment argument will be appended to the end of the current peak.comment attribute (if the current comment is ‘-’ it is overwritten by the comment argument). If overwrite is set True, the current peak.comment is overwritten with the ‘comment’ argument.

Parameters:

comment (str) – Comment to add to peak.
peak_index (int, optional) – Index of peak to add comment to.
x_pos (float [u/z], optional) –

x_pos of peak to add comment to (must be specified up to 6th
decimal).
species (str, optional) – species of peak to add comment to.
overwrite (bool) – If True the current peak comment will be overwritten by comment, else comment is appended to the end of the current peak comment.

Note

The shape and mass calibrant peaks are automatically marked during the shape and mass calibration by inserting the protected flags 'shape calibrant', 'mass calibrant' or 'shape and mass calibrant' into their peak comments. When user-defined comments are added to these peaks, it is ensured that the protected flags cannot be overwritten. The above shape and mass calibrant flags should never be added to comments manually by the user!

add_spectrum_comment(comment, overwrite=False)[source]¶

Add a general comment to the spectrum.

By default the comment argument will be appended to the end of the current spectrum_comment attribute. If overwrite is set to True the current spectrum_comment is overwritten with comment.

Parameters:

comment (str) – Comment to add to spectrum.
overwrite (bool) – If True, the current spectrum_comment attribute will be overwritten with comment, else comment is appended to the end of spectrum_comment.

See also

add_peak_comment()

Notes

The spectrum_comment will be included in the output file storing all fit results and can hence be useful to pass on information for post-processing.

If spectrum_comment is ‘-’ (default value) it is always overwritten with comment.

assign_species(species, peak_index=None, x_pos=None, Ex=0.0, Ex_error=0.0, lit_src='default')[source]¶

Assign species label(s) to a single peak (or all peaks at once).

If no single peak is selected with peak_index or x_pos, a list with species names for all peaks in the peak list must be passed to species. For already specified or unkown species insert None as a placeholder into the list to skip the species assignment for this peak. See Notes and Examples sections below for details on usage.

Parameters:

species (str or list of str) – The species name (or list of name strings) to be assigned to the selected peak (or to all peaks). For unkown or already assigned species, None should be inserted as placeholder at the corresponding position in the species list. species names must follow the :-notation of chemical substances.
peak_index (int, optional) – Index of single peak to assign species name to.
x_pos (float [u/z], optional) – x_pos of single peak to assign species name to. Must be specified up to 6th decimal.
Ex (float [keV], optional, default: 0.0) – Excitation energy of isomeric state in keV. When the peak is labelled as isomer its literature mass peak.m_AME is calculated by adding Ex to the AME ground-state mass.
Ex_error (float [keV], optional, default: 0.0) – Uncertainty of the excitation energy of the isomeric state in keV. When the peak is labelled as isomer its literature mass uncertainty peak.m_AME_error is calculated by adding Ex_error and the AME uncertainty of the ground-state mass in quadrature.
lit_src (str, optional) – Source of literature mass data - either of ‘default’, ‘AME2016’ or ‘AME2020’. If ‘default’, the literature source defined globally with the default_lit_src spectrum attribute is used. If ‘AME2016’ is used, 'lit_src: AME2016' is added to the respective peak comment(s).

See also

set_lit_values(): For setting user-defined literature values.

Notes

Assignment of a single peak species: select peak by specifying peak position x_pos (up to 6th decimal) or peak_index argument (0-based! Check for peak index by calling show_peak_properties() method on spectrum object).
Assignment of multiple peak species: Nothing should be passed to the ‘peak_index’ and ‘x_pos’ arguments. Instead the user specifies a list of the new species strings to the species argument (if there’s N detected peaks, the list must have length N). Former species assignments can be kept by inserting blanks at the respective position in the species list, otherwise former species assignments are overwritten, also see examples below for usage.

Tentative assignments and isomers: Use '?' at the end of the species string or constituent element strings to indicate tentative assignments. Literature values are also fetched for peaks with tentative assignments.

Isomeric species: Isomers can be marked by appending a 'm' or 'm0' up to 'm9' to the end of the respective element substring in species. For isomers no literature values are calculated unless the respective excitation energy is manually specified with the Ex argument.

Examples

Assign the peak with peak_index 2 (third-lowest-mass peak) as ‘1Cs133:-1e’, leave all other peaks unchanged:

>>> import emgfit as emg
>>> spec = emg.spectrum(<input_file>) # mock code for foregoing data import
>>> spec.detect_peaks() # mock code for foregoing peak detection
>>> spec.assign_species('1Cs133:-1e', peak_index = 2)

Assign multiple peaks:

>>> import emgfit as emg
>>> spec = emg.spectrum(<input_file>) # mock code for foregoing data import
>>> spec.detect_peaks() # mock code for foregoing peak detection
>>> spec.assign_species(['1Ru102:-1e', '1Pd102:-1e', 'Rh102:-1e?', None,'1Sr83:1F19:-1e', '?'])

This assigns the species of the first, second, third and fourth peak with the respective labels in the specified list and fetches their AME values. The ‘?’ ending of the 'Rh102:-1e?' argument indicates a tentative species assignment, mind that literature values will still be calculated for this peak. Equivalently, 'Rh102?:-1e' could have been used. The None argument leaves the species assignment of the 4th peak unchanged. The '?' argument overwrites any former species assignments to the last peak and marks the peak as unidentified.

Mark peaks as isomers:

>>> import emgfit as emg
>>> spec = emg.spectrum(<input_file>) # mock code for foregoing data import
>>> spec.detect_peaks() # mock code for foregoing peak detection
>>> spec.assign_species('1In127:-1e', peak_index=0)    # ground state
>>> spec.assign_species('1In127m:-1e', peak_index=1)   # first isomer
>>> spec.assign_species('1In127m1:-1e', peak_index=2, Ex=1863,
>>>                     Ex_error=58) # second isomer

The above assigns peak 0 as ground state and fetches the corresponding literature values. Peak 1 is marked as the first isomeric state of In-127 but no literature values are calculated (since Ex is not specified). Peak 2 is marked as the second isomeric state of In-127 and the literature mass and its uncertainty are calculated from the respective ground-state AME values and the provided Ex and Ex_error.

calc_FWHM_emg(peak_index, fit_result=None)[source]¶

Calculate the full width at half maximum (FWHM) of a Hyper-EMG fit.

The peak of interest must have previously been fitted with a Hyper-EMG model.

Parameters:

peak_index (int) – Index of peak of interest.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the corresponding fit result from the spectrum’s fit_results list will be fetched.

Returns:

Full width at half maximum of Hyper-EMG fit of peak of interest.

Return type:

float [u/z]

calc_mu_emg(peak_index, fit_result=None)[source]¶

Calculate the mean of a Hyper-EMG peak.

The peak of interest must have previously been fitted with a Hyper-EMG model.

Parameters:

peak_index (int) – Index of peak of interest.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the corresponding fit result from the spectrum’s fit_results list will be used.

Returns:

Mean of Hyper-EMG fit of peak of interest.

Return type:

float [u/z]

calc_peak_area(peak_index, fit_result=None, decimals=2)[source]¶

Calculate the peak area (counts in peak) and its stat. uncertainty.

Area and area error are calculated using the peak’s amplitude parameter amp and the width of the uniform binning of the spectrum. Therefore, the peak must have been fitted beforehand. In the case of overlapping peaks only the counts within the fit component of the specified peak are returned.

Note

This routine assumes the bin width to be uniform across the spectrum. The mass binning of most mass spectra is not perfectly uniform (usually time bins are uniform such that the width of mass bins has a quadratic scaling with mass). However, for isobaric species the quadratic term is usually so small that it can safely be neglected.

Parameters:

peak_index (str) – Index of peak of interest.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result object to use for area calculation. If None (default) use corresponding fit result stored in fit_results list.
decimals (int) – Number of decimals of returned output values.

Returns:

List with peak area and area error in format [area, area_error].

Return type:

list of float

calc_sigma_emg(peak_index, fit_result=None)[source]¶

Calculate the standard deviation of a Hyper-EMG peak fit.

The peak of interest must have previously been fitted with a Hyper-EMG model.

Parameters:

peak_index (int) – Index of peak of interest.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result containing peak of interest. If None (default) the corresponding fit result from the spectrum’s fit_results list will be used.

Returns:

Standard deviation of Hyper-EMG fit of peak of interest.

Return type:

float [u/z]

comp_model(peaks_to_fit, fit_model='emg22', init_pars=None, cost_func='chi-square', vary_shape=False, vary_baseline=True, share_shape_pars=True, scale_shape_pars=False, scale_shape_to_peak_cen=False)[source]¶

Create a multi-peak composite model with the specified peak shape.

Primarily intended for internal usage.

Parameters:

peaks_to_fit (list of peak) – peaks to be fitted with composite model.
model (str, optional) – Name of fit model to use for all peaks (e.g. 'Gaussian', 'emg12', 'emg33', … - for full list see Available fit models).
init_pars (dict, optional, default: None) – Default initial shape parameters for fit model. If None the default parameters defined in the fit_models module will be used after scaling to the spectrum’s mass_number. For more details and a list of all shape parameters see the Peak-shape calibration article.
cost_func (str, optional) –
Name of cost function to use for minimization.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
- If 'default' (default), use the standrad residual from lmfit.
See Notes of peakfit() method for details.
vary_shape (bool, optional) – If False only the amplitude (amp) and Gaussian centroid (mu) model parameters will be varied in the fit. If True, the shape parameters (sigma, theta, etas and taus) will also be varied.
vary_baseline (bool, optional) – If True a varying uniform baseline will be added to the fit model as varying model parameter bkg_c. If False, the baseline parameter bkg_c will be kept fixed at 0.
share_shape_pars (bool, optional, default: True) – Whether to enforce a shared peak shape for all peaks.
scale_shape_pars (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters of the shape-reference peak with the peaks.scl_coeff. See Notes for details.
scale_shape_to_peak_cen (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters to the centroid mu of the underlying Gaussian of a given peak. Requires scale_shape_pars to be True. See Notes for details.

Returns:

emgfit multi-peak composite model

Return type:

emgfit.model.EMGModel

Notes

The initial amplitude for each peak is estimated from the product of the number of counts in the bin at the peak’s x_pos and the initial value for the sigma of the underlying Gaussian. The result is multiplied by an empirically determined proportionality factor of 3. Although this number is somewhat shape dependent, this approach yields decent initial amplitudes for peaks that are reasonably close to a Gaussian. If user intervention becomes necessary, the init_par_hints option of the peakfit() method can be used to overwrite the initial value of the peak amplitude.

The initial value for the centroid of the underlying Gaussian (mu) is estimated using the equation for the mode of the hyper-EMG distribution see emgfit.fit_models.get_mu0() for details.

If share_shape_pars=True, a shape-reference peak is used to impose a common peak shape on all fitted peaks (except for optional scaling of the scale-dependent shape parameters). For fits performed before the peak-shape calibration , the first peak in peaks_to_fit is used as the shape-reference peak. For the peak-shape calibration and all subsequent fits, the shape-calibrant peak is used as the shape-reference peak. If the shape calibrant does not fall into the fit range, the shape parameters obtained in the peak-shape calibration (stored in spectrum.shape_cal_pars) are added as fixed parameters to the returned model.

There is two options to scale the shape parameters of the shape-reference peak to a given peak:

If scale_shape_pars=True and scale_shape_to_peak_cen=False: The scale-dependent shape-reference parameters (‘sigma’ and ‘tau’s) are multiplied with a fixed scale factor scl_fac=scl_coeff, where scl_coeff is the peak.scl_coeff attribute of the given peak.
If scale_shape_pars=True and scale_shape_to_peak_cen=True: The scale-dependent shape-reference parameters (‘sigma’ and ‘tau’s) are multiplied with a varying scale factor scl_fac=scl_coeff * (mu/mu_ref), where mu and mu_ref are the Gaussian centroids of the given peak and the shape-reference peak, respectively. Once a peak-shape calibration has been performed, mu_ref is taken as the shape-calibrant centroid obtained in the peak-shape calibration. Otherwise, mu_ref is defined by the varying (Gaussian) centroid of the shape-reference peak.

detect_peaks(thres=0.003, window_len=23, window='blackman', width=2e-05, plot_smoothed_spec=True, plot_2nd_deriv=True, plot_detection_result=True)[source]¶

Perform automatic peak detection.

This routine finds peaks from minima in a scaled second derivative of the spectrum data after first applying some smoothing. This approach enables very sensitive yet robust detection, even for partially overlapping peaks. The thres, window_len & width parameters can be used to tune the algorithm to the specific data for maximum sensitivity.

Parameters:

thres (float, optional) – Threshold for peak detection in the inverted and scaled second derivative of the smoothed spectrum.
window_len (odd int, optional) – Length of window used for smoothing the spectrum (in no. of bins). Must be an ODD integer.
window (str, optional) – The window function used for smooting the spectrum. Defaults to 'blackman'. Other options: 'flat', 'hanning', 'hamming', 'bartlett'. See also NumPy window functions.
width (float [u/z], optional) – Minimal FWHM of peaks to be detected. Caution: To achieve maximal sensitivity for overlapping peaks this number might have to be set to less than the peak’s FWHM! In challenging cases use the plot of the scaled inverted second derivative (by setting plot_2nd_deriv to True) to ensure that the detection threshold is set properly.
plot_smoothed_spec (bool, optional) – If True a plot with the original and the smoothed spectrum is shown.
plot_2nd_deriv (bool, optional) – If True a plot with the scaled, inverted second derivative of the smoothed spectrum is shown.
plot_detection_result (bool, optional) – If True a plot of the spectrum with markers for the detected peaks is shown.

See also

add_peak(), remove_peak()

Note

Running this method removes any pre-existing peaks.

Notes

For details on the smoothing, see docs of _smooth() by calling:

>>> help(emgfit.spectrum._smooth)

determine_A_stat_emg(peak_index=None, species='?', x_pos=None, x_range=None, N_spectra=1000, fit_model=None, cost_func='MLE', method='least_squares', fit_kws=None, par_hint_args={}, vary_baseline=True, plot_filename=None)[source]¶

Determine the constant of proprotionality A_stat_emg for calculation of the statistical uncertainties of Hyper-EMG fits.

This method updates the A_stat_emg & A_stat_emg_error spectrum attributes. The former will be used for all subsequent stat. error estimations.

This routine must be called AFTER a successful peak-shape calibration and should be called BEFORE the mass re-calibration.

A_stat_emg is determined by evaluating the statistical fluctuations of a representative peak’s centroid as a function of the number of ions in the reference peak. The fluctuations are estimated by fitting a large number of synthetic spectra derived from the experimental data via bootstrap re-sampling. For details see Notes section below.

Specify the peak to use for the bootstrap re-sampling by providing either of the peak_index, species and x_pos arguments. The peak should be well-separated and have decent statistics (typically the peak-shape calibrant is used).

Parameters:

peak_index (int, optional) – Index of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and must be well-separated from other peaks.
species (str, optional) – String with species name of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and be well-separated from other peaks.
x_pos (float [u/z], optional) – Marker position (x_pos spectrum attribute) of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and be well- separated from other peaks. x_pos must be specified up to the 6th decimal.
x_range (float [u/z], optional) – Mass range around peak centroid over which events will be sampled and fitted. Choose such that no secondary peaks are contained in the mass range! If None defaults to default_fit_range spectrum attribute.
N_spectra (int, optional, default: 1000) – Number of bootstrapped spectra to create at each number of ions.
cost_func (str, optional, default: 'MLE') –
Name of cost function to use for minimization.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log-likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
For details see Notes section of peakfit() method documentation.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of lmfit.minimizer.minimize().
fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in emgfit.model.EMGModel.fit() method.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint().
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.

Notes

As noted in [9], statistical errors of Hyper-EMG peak centroids obey the following scaling with the number of counts in the peak N_counts:

\[\Delta \left(m/z\right)_\mathrm{stat} = A_{stat,emg}\frac{FWHM}{\sqrt{N_{counts}}},\]

where the constant of proportionality A_stat_emg depends on the specific peak shape. This routine uses the following method to determine A_stat_emg:

N_spectra bootstrapped spectra are created for each of the following total numbers of events: [10,30,100,300,1000,3000,10000,30000].
Each bootstrapped spectrum is fitted and the best fit peak centroids are recorded.
The statistical uncertainties are estimated by taking the sample standard deviations of the recorded peak centroids at each value of N_counts. Since the best-fit peak area can deviate from the true number of re-sampled events in the spectrum, the mean best_fit area at each number of re-sampled events is used to determine N_counts.
A_stat_emg is finally determined by plotting the rel. statistical uncertainty as function of N_counts and fitting it with the above equation.

The resulting value for A_stat_emg will be stored as spectrum attribute and will be used in all subsequent fits to calculate the stat. errors from the number of counts in the peak.

References

determine_peak_shape(index_shape_calib=None, species_shape_calib=None, fit_model='emg22', cost_func='chi-square', init_pars='default', x_fit_cen=None, x_fit_range=None, vary_baseline=True, vary_tail_order=True, method='least_squares', fit_kws=None, par_hint_args={}, show_fit_reports=False, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, error_every=1, plot_filename=None, map_par_covar=False, **MCMC_kwargs)[source]¶

Determine optimal peak-shape parameters by fitting the specified peak-shape calibrant.

If vary_tail_order is True (default) an automatic model selection is performed before the calibration of the peak-shape parameters.

It is recommended to visually check whether the fit residuals are purely stochastic (as should be the case for a decent model). If this is not the case either the selected model does not describe the data well, the initial parameters lead to poor convergence or there are additional undetected peaks.

Parameters:

index_shape_calib (int, optional) – Index of shape-calibration peak. Preferrable alternative: Specify the shape-calibrant with the species_shape_calib argument.
species_shape_calib (str, optional) – Species name of the shape-calibrant peak in :-notation of chemical substances (e.g. 'K39:-1e'). Alternatively, the peak to use can be specified with the index_shape_calib argument.
fit_model (str, optional, default: 'emg22') – Name of fit model to use for shape calibration (e.g. 'Gaussian', 'emg12', 'emg33', … - for full list see Available fit models). If the automatic model selection (vary_tail_order=True) fails or is turned off, fit_model will be used for the shape calibration and set as the spectrum’s fit_model attribute.
cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization. It is strongly recommended to use ‘chi-square’-fitting for the peak-shape determination since this yields more robust results for fits with many model parameters as well as more trustworthy parameter uncertainties (important for peak-shape error determinations).
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
For details see Notes section of peakfit() method documentation.
init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
- If None or 'default' (default), the default parameters defined for mass 100 in the emgfit.fit_models module will be used after re-scaling to the spectrum’s mass_number.
- To define custom initial values, a parameter dictionary containing all model parameters and their values in the format {'<param name>':<param_value>,...} should be passed to init_pars.
Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The initial values for mu, amp and the optional baseline parameter bkg_c are automatically derived as described in the Peak fitting approach article.
x_fit_cen (float [u/z], optional) – Center of fit range. If None (default), the x_pos attribute of the shape-calibrant peak is used as x_fit_cen.
x_fit_range (float [u/z], optional) – Mass range to fit. If None, defaults to the default_fit_range spectrum attribute.
vary_baseline (bool, optional, default: True) – If True, the background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
vary_tail_order (bool, optional) – If True (default), before the calibration of the peak-shape parameters an automatized fit model selection is performed. For details on the automatic model selection, see Notes section below. If False, the specified fit_model argument is used as model for the peak-shape determination.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of lmfit.minimizer.minimize().
fit_kws (dict, optional, default: None) – Options to pass to minimizer used in emgfit.model.EMGModel.fit() method.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint().
show_fit_reports (bool, optional, default: True) – Whether to print fit reports for the fits in the automatic model selection.
show_plots (bool, optional) – If True (default), linear and logarithmic plots of the spectrum and the best fit curve are displayed. For details see spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default), peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots of the shape-calibration will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
map_par_covar (bool, optional) – If True the parameter covariances will be mapped using Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot. This feature is only recommended for single-peak fits.
**MCMC_kwargs (optional) – Options to send to _get_MCMC_par_samples(). Only relevant when map_par_covar is True.

Notes

Ideally the peak-shape calibration is performed on a well-separated peak with high statistics. If this is not possible, the peak-shape calibration can also be attempted using overlapping peaks since emgfit ensures shared and identical shape parameters for all peaks in a multi- peak fit.

Automatic model selection: When the model selection is activated the routine tries to find the peak shape that minimizes the fit’s chi-squared reduced by successively adding more tails on the right and left. Finally, that fit model is selected which yields the lowest chi-squared reduced without having any of the tail weight parameters eta compatible with zero within 1-sigma uncertainty. The latter models are excluded as is this an indication of overfitting. Models for which the calculation of any eta parameter uncertainty fails are likewise excluded from selection.

fit_calibrant(index_mass_calib=None, species_mass_calib=None, fit_model=None, cost_func='MLE', x_fit_cen=None, x_fit_range=None, vary_shape=False, vary_baseline=True, method='least_squares', fit_kws=None, par_hint_args={}, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, error_every=1, show_fit_report=True, plot_filename=None)[source]¶

Determine mass re-calibration factor by fitting the selected calibrant peak.

After the mass calibrant has been fitted the recalibration factor and its uncertainty are calculated and saved as the spectrum’s recal_fac and recal_fac_error attributes.

The calibrant peak can either be specified with the index_mass_calib or the species_mass_calib argument.

Parameters:

index_mass_calib (int, optional) – Index of mass calibrant peak.
species_mass_calib (str, optional) – Species of peak to use as mass calibrant.
fit_model (str, optional, default: 'emg22') – Name of fit model to use (e.g. 'Gaussian', 'emg12', 'emg33', … - for full list see Available fit models).
cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes of peakfit() method for details.
x_fit_cen (float or None, [u/z], optional) – center of mass range to fit; if None, defaults to marker position (x_pos) of mass calibrant peak
x_fit_range (float [u/z], optional) – width of mass range to fit; if None, defaults to ‘default_fit_range’ spectrum attribute
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and taus) are kept fixed at their initial values. If True the shared shape parameters are varied (ensuring identical shape parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of lmfit.minimizer.minimize().
fit_kws (dict, optional, default: None) – Options to pass to minimizer used in emgfit.model.EMGModel.fit() method.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint().
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details see spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
show_fit_report (bool, optional) – If True (default) the fit results are reported.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.

See also

spectrum.fit_peaks()

Notes

The spectrum.fit_peaks() method enables the simultaneous fitting of mass calibrant and ions of interest in a single multi-peak fit and can be used as an alternative to this method.

After the calibrant fit the spectrum._eval_peakshape_errors() method is automatically called to save the absolute calibrant centroid shifts as preparation for subsequent peak-shape error determinations.

Assuming the spectrum has already been coarsely calibrated via the time- resolved calibration in the MR-TOF-MS’s data acquisition software MAc, the recalibration (or precision calibration) factor is usually very close to unity. An error will be raised by the spectrum._update_calibrant_props() method if spectrum.recal_fac deviates from unity by more than a permille since this causes some implicit approximations for the calculation of the final mass values and their uncertainties to break down.

The statistical uncertainty of the peak is calculated via the following relation [10]:

For Gaussians the constant of proportionality \(A_{stat}\) is always given by \(A_{stat,G}\) = 0.425. For Hyper-EMG models \(A_{stat}=A_{stat,emg}\) is either set to the default value A_stat_emg_default defined in the config module or determined by running the spectrum.determine_A_stat_emg() method. The latter is usually preferable since this accounts for the specifics of the given peak shape.

References

fit_peaks(peak_indeces=[], index_mass_calib=None, species_mass_calib=None, x_fit_cen=None, x_fit_range=None, fit_model=None, cost_func='MLE', method='least_squares', fit_kws=None, par_hint_args={}, init_pars=None, vary_shape=False, vary_baseline=True, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, error_every=1, plot_filename=None, show_fit_report=True, show_shape_err_fits=False)[source]¶

Fit peaks, update peaks properties and show results.

By default, the full mass range and all peaks in the spectrum are fitted. Optionally, only peaks specified with peak_indeces or peaks in the mass range specified with x_fit_cen and x_fit_range are fitted.

Optionally, the mass recalibration can be performed simultaneously with the IOI fit if the mass calibrant is in the fit range and specified with either the index_mass_calib or species_mass_calib arguments. Otherwise a mass recalibration must have been performed upfront.

Before running this method a successful peak-shape calibration must have been performed with determine_peak_shape().

Parameters:

peak_indeces (int, list of int, optional) – Indeces of neighbouring peaks to fit. The fit range will be chosen such that at least a mass range of x_fit_range/2 is included around each peak.
x_fit_cen (float [u/z], optional) – Center of mass range to fit (only specify if a subset of the spectrum is to be fitted)
x_fit_range (float [u/z], optional) – Width of mass range to fit. If None defaults to: spectrum.default_fit_range attribute, only specify if subset of spectrum is to be fitted. This argument is only relevant if x_fit_cen is also specified.
fit_model (str, optional) – Name of fit model to use (e.g. 'Gaussian', 'emg12', 'emg33', … - for full list see Available fit models). If None, defaults to fit_model spectrum attribute.
cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes of peakfit() method for details.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of lmfit.minimizer.minimize().
fit_kws (dict, optional, default: None) – Options to pass to minimizer used in emgfit.model.EMGModel.fit() method.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint().
init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
- If None (default) the parameters from the peak-shape calibration (peak_shape_pars spectrum attribute) are used.
- If 'default', the default parameters defined for mass 100 in the emgfit.fit_models module will be used after re-scaling to the spectrum’s mass_number.
- To define custom initial values a parameter dictionary containing all model parameters and their values in the format {'<param name>':<param_value>,...} should be passed to init_pars.
Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The initial values for mu, amp and the optional baseline parameter bkg_c are automatically derived as described in the Peak fitting approach article.
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and taus) are kept fixed at their initial values. If True the shared shape parameters are varied (ensuring identical shape parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details see spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best-fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show errorbar only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
show_fit_report (bool, optional) – If True (default) the detailed lmfit fit report is printed.
show_shape_err_fits (bool, optional, default: True) – If True, plots of all fits performed for the peak-shape uncertainty evaluation are shown.

Notes

Updates peak properties dataframe with peak properties obtained in fit.

get_MC_peakshape_errors(peak_indeces=[], verbose=True, show_hists=False, show_peak_properties=True, rerun_MCMC_sampling=False, N_samples=1000, n_cores=-1, seed=872, **MCMC_kwargs)[source]¶

Get peak-shape uncertainties by re-fitting peaks with many different MC-shape-parameter sets

This method provides refined peak-shape uncertainties that account for non-normal distributions and correlations of shape parameters. To that end, the peaks of interest are re-fitted with N_samples different peak-shape parameter sets. For these parameter sets to be representative of all peak shapes supported by the data they are randomly drawn from a larger ensemble of parameter sets obtained from Markov-Chain Monte Carlo (MCMC) sampling on the peak-shape calibrant. The peak-shape uncertainty of the mass values and peak areas are estimated by the obtained RMS deviations from the best-fit mass values. Finally, the peak properties table is updated with the refined uncertainties.

This method only takes effective mass shifts relative to the calibrant peak into account. For each peak shape the calibrant peak is re-fitted and the new recalibration factor is used to calculate the shifted ion-of-interest masses. Therefore, when the peak_indeces argument is used, it must include the mass calibrant index.

Parameters:

peak_indeces (int or list of int, optional) – Indeces of peaks to evaluate MC peak-shape uncertainties for.
verbose (bool, optional) – Whether to print status updates and intermediate results.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas obtained with the MC shape parameter sets are shown. Black vertical lines indicate the best-fit values obtained with fit_peaks().
show_peak_properties (bool, optional) – If True the peak properties table including the updated peak-shape uncertainties is shown.
rerun_MCMC_sampling (bool, optional) – When False (default) pre-existing MCMC parameter samples (e.g. obtained with determine_peak_shape()) are used. If True or when there’s no pre-existing MCMC samples, the MCMC sampling will be performed by this method.
N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
**MCMC_kwargs – Keyword arguments to send to _get_MCMC_par_samples() for control over the MCMC sampling.

Notes

This method relies on a representative sample of all the shape parameter sets which are supported by the data. These shape parameter sets are randomly drawn from a large sample of parameter sets obtained from Markov-Chain Monte Carlo (MCMC) sampling on the peak-shape calibrant. In MCMC sampling so-called walkers are sent on random walks to explore the parameter space. The latter is done with the _get_MCMC_par_samples() method. If MCMC sampling has already been performed with the map_par_covar option in determine_peak_shape(), these MCMC samples will be used for the MC peak-shape error evaluation. If there is no pre-existing MCMC parameter sets the _get_MCMC_par_samples() method will be automatically evoked before the MC peak-shape error evaluation.

Assuming that the samples obtained with the MCMC algorithm form a representative set of parameter samples and are sufficiently independent from each other, this method provides refined peak-shape uncertainties that account for correlations and non-normal posterior distributions of peak-shape parameters. In particular, this prevents overestimation of the uncertainties due to non-consideration of parameter correlations.

For this method to be accurate a sufficiently large number of MCMC sampling steps should be performed and fits should be performed with a large number of parameter sets (N_samples >= 1000). For the MCMC parameter samples to be independent a sufficient amount of thinning has to be applied to remove autocorrelation between MCMC samples. Thinning refers to the common practice of only storing the results of every k-th MCMC iteration. The length and thinning of the MCMC chain is controlled with the steps and thin MCMC keyword arguments. For more details and references on MCMC sampling with emgfit see the docs of the underlying _get_MCMC_par_samples() method.

For the peak-shape mass uncertainties only effective mass shifts relative to the calibrant centroid are relevant. Therefore, the mass calibrant and the ions of interest (IOI) are fitted with the same peak-shape-parameter sets and the final mass values are calculated from the obtained IOI peak positions and the corresponding mass recalibration factors.

The delta_m or delta_p parameters occuring in the case of hyper-EMG models with 3 pos. or 3 neg. tails are defined as delta_m = eta_p1 + eta_p2 and delta_p = eta_p1 + eta_p2, respectively.

get_errors_from_resampling(peak_indeces=[], N_spectra=1000, seed=5378, n_cores=-1, show_hists=False, show_peak_properties=True)[source]¶

Get statistical and area uncertainties via resampling from best-fit PDF and update peak properties therewith.

This method provides bootstrap estimates of the statistical errors and peak area errors by evaluating the scatter of peak centroids and areas in fits of many simulated spectra. The simulated spectra are created by sampling events from the best-fit PDF asociated with fit_result (parametric bootstrap). Refined errors are calculated for each peak individually by taking the sample standard deviations of the obtained peak centroids and areas.

If the peaks in peak_indeces have been fitted separately a parametric bootstrap will be performed for each of the different fits.

Parameters:

peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. and area errors for, e.g. to evaluate peak-shape error of peaks 1 and 2 use peak_indeces=[1,2]. Defaults to all peaks in the spectrum’s peaks list.
N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which typically yields statistical uncertainty estimates with a relative precision of a few percent.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If True, histograms of the obtained peak centroids and areas are shown. Black vertical lines indicate the best-fit values obtained from the measured data.
show_peak_properties (bool, optional, default: True) – If True, the peak properties table is shown after updating the statistical and area errors.

See also

determine_A_stat_emg(), parametric_bootstrap()

Notes

Only the statistical mass and area uncertainties of ion-of-interest peaks are updated. The uncertainties of the mass calibrant and the recalibration uncertainty remain unaffected by this method.

load_peak_shape_cal(filename)[source]¶

Load peak shape from the TXT-file named ‘filename.txt’.

Successfully loaded shape calibration parameters and their uncertainties are used as the new shape_cal_pars and shape_cal_errors spectrum attributes respectively.

Parameters:: filename (str) – Name of input file (‘.txt’ extension is automatically appended).

peakfit(fit_model='emg22', cost_func='chi-square', x_fit_cen=None, x_fit_range=None, init_pars=None, vary_shape=False, vary_baseline=True, share_shape_pars=True, scale_shape_pars=False, scale_shape_to_peak_cen=False, method='least_squares', fit_kws=None, par_hint_args={}, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, error_every=1, plot_filename=None, map_par_covar=False, **MCMC_kwargs)[source]¶

Internal routine for fitting peaks.

Fits full spectrum or subrange (if x_fit_cen and x_fit_range are specified) and optionally shows results.

This method is for internal usage. Use :meth:`spectrum.fit_peaks` method to fit peaks and automatically update peak properties dataframe with obtained fit results!

Parameters:

fit_model (str, optional, default: 'emg22') – Name of fit model to use (e.g. 'Gaussian', 'emg12', 'emg33', … - for full list see Available fit models).
cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
- If 'chi-square', the fit is performed by minimizing Pearson’s chi-squared statistic:
  
  \[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]
- If 'MLE', a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log-likelihood ratio:
  
  \[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes below for details.
x_fit_cen (float [u/z], optional) – Center of mass range to fit (only specify if subset of spectrum is to be fitted).
x_fit_range (float [u/z], optional) – Width of mass range to fit (only specify if subset of spectrum is to be fitted, only relevant if x_fit_cen is likewise specified). If None, defaults to default_fit_range spectrum attribute.
init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
- If None (default) the parameters from the peak-shape calibration are used.
- If 'default', the default parameters defined for mass 100 in the emgfit.fit_models module will be used after re-scaling to the spectrum’s mass_number.
- To define custom initial values a parameter dictionary containing all model parameters and their values in the format {'<param name>':<param_value>,...} should be passed to init_pars.
Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The initial values for mu, amp and the optional baseline parameter bkg_c are automatically derived as described in the Peak fitting approach article.
vary_shape (bool, optional, default: False) – If False peak-shape parameters (sigma, theta,`etas` and taus) are kept fixed at their initial values. If True the shared shape parameters are varied (ensuring identical shape parameters for all peaks).
vary_baseline (bool, optional, default: True) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
share_shape_pars (bool, optional, default: True) – Whether to enforce a shared peak shape for all peaks.
scale_shape_pars (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters of the shape-reference peak with the peaks.scl_coeff. See Notes of comp_model() for details.
scale_shape_to_peak_cen (bool, optional, default: False) – Whether to scale the scale-dependent shape parameters to the centroid mu of the underlying Gaussian of a given peak. Requires scale_shape_pars to be True. See Notes of comp_model() for details.
method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of lmfit.minimizer.minimize().
fit_kws (dict, optional, default: None) – Options to pass to minimizer used in emgfit.model.EMGModel.fit() method.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.set_param_hint() to modify or add model parameters. The keys of the par_hint_args dictionary specify parameter names; the values must likewise be dictionaries that hold the respective keyword arguments to pass to set_param_hint(). For example: par_hint_args={“p0_amp” : {“value”:10}, “p1_sigma” : {“max”:0.4}}
show_plots (bool, optional) – If True (default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details see spectrum.plot_fit().
show_peak_markers (bool, optional) – If True (default) peak markers are added to the plots.
sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
map_par_covar (bool, optional) – If True the parameter covariances will be mapped using Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot. This feature is only recommended for single-peak fits.
**MCMC_kwargs (optional) – Options to send to _get_MCMC_par_samples(). Only relevant when map_par_covar is True.

Returns:

Fit model result.

Return type:

emgfit.model.EMGModelResult

See also

fit_peaks(), fit_calibrant()

Notes

In fits with the chi-square cost function the variance weights \(w_i\) for the residuals are estimated using the latest model predictions: \(w_i = 1/(\sigma_i^2 + \epsilon) = 1/(f(x_i)+ \epsilon)\), where \(\epsilon = 1E-10\) is a small number added to increase numerical robustness when \(f(x_i)\) approaches zero. On each iteration the weights are updated with the new values of the model function.

When performing MLE fits including bins with low statistics, the value for chi-squared as well as the parameter standard errors and correlations in the lmfit fit report should be taken with caution. This is because, strictly speaking, emgfit’s MLE cost function only approximates a chi-squared distribution in the limit of a large number of counts in every bin (“Wick’s theorem”). For a detailed derivation of this statement see pp. 94-95 of these lecture slides by Mark Thompson. However, for spectra with many (>1000) bins, it has been demonstrated that as few as 10-30 counts in an entire spectrum can be sufficient for the doubled, negative Poisson log-likelihood ratio (emgfit’s MLE cost function) to yield reliable confidence intervals and act as a decent goodness-of-fit measure [11]. In practice and if in doubt, one can simply test the validity of the reported fit statistic as well as parameter standard errors & correlations by re-performing the same fit with cost_func=’chi-square’ and comparing the results. In all tested cases decent agreement was found even if the fit range contained low-statistics bins. Even if a deviation occurs, this is irrelevant in most pratical cases since the mass errors reported in emgfit’s peak properties table are independent of the lmfit parameter standard errors given as additional information below. Only the peak area errors are by default calculated using the standard errors of the amp parameters reported by lmfit.

Besides the asymptotic concergence to a chi-squared distribution emgfit’s MLE cost function has a second handy property - all summands in the log-likelihood ratio are positive semi-definite: \(L_i = f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right) \geq 0\). Exploiting this property, the minimization of the log-likelihood ratio can be re-formulated into a least-squares problem (see also [12]):

\[L = 2\sum_i L_i = 2\sum_i \left( \sqrt{ L_i } \right)^2.\]

Instead of minimizing the scalar log-likelihood ratio, emgfit by default minimizes the sum-of-squares of the square-roots of the summands \(L_i\) in the log-likelihood ratio. This is mathematically equivalent to minimizing \(L\) and facilitates the usage of Scipy’s highly efficient least-squares optimizers (‘least_squares’ & ‘leastsq’). The latter yield significant speed-ups compared to scalar optimizers such as Scipy’s ‘Nelder-Mead’ or ‘Powell’ methods. By default, emgfit’s ‘MLE’ fits are performed with Scipy’s ‘least_squares’ optimizer, a variant of a Levenberg-Marquardt algorithm for bound-constrained problems. If a scalar optimizaton method is chosen emgfit uses the conventional approach of minimizing the scalar \(L\). For more details on these optimizers see the docs of lmfit.minimizer.minimize() and scipy.optimize.

References

plot(peaks=None, fig=None, dpi=600, **plot_kwargs)[source]¶

Plot mass spectrum (without fit curve).

Vertical markers are added for all peaks specified with the peaks argument.

Parameters:

peaks (list of peaks, optional) – List of peaks to show peak markers for. Defaults to peaks. Use peaks=[] to turn markers off.
fig (matplotlib.figure.Figure, optional) – Figure object to plot onto.
dpi (int, optional) – Dots per inch for plot. Defaults to dpi value defined in config.py module.
**plot_kwargs – Plot options to pass to _plot_df().

See also

plot_fit(), plot_fit_zoom()

plot_fit(fit_result=None, plot_title=None, show_peak_markers=True, show_comps=True, sigmas_of_conf_band=0, error_every=1, x_min=None, x_max=None, plot_filename=None)[source]¶

Plot data and fit result in logarithmic and linear y-scale.

Only a single fit result can be plotted with this method. If neither fit_result nor x_min and x_max are specified, the full mass range is plotted.

Plots can be saved to a file using the plot_filename argument.

Parameters:

fit_result (emgfit.model.EMGModelResult, optional) – Fit result to plot. If None, defaults to fit result of first peak in specified mass range (taken from fit_results list).
plot_title (str or None, optional) – Title of plots. If None, defaults to a string with the fit model name and cost function of the fit_result to ensure clear indication of how the fit was obtained.
show_peak_markers (bool, optional, default: True) – If True, peak markers are added to the plots.
show_comps (bool, optional) – If True, the single-peak components of the best-fit curve will be indicated with colored dashed lines.
sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in log-plot). If 0, no confidence band is shown (default).
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
x_min (float [u/z], optional) – Start and end of x-range to plot. If None, defaults to the minimum and maximum of the fitted x-range in fit_result.
x_max (float [u/z], optional) – Start and end of x-range to plot. If None, defaults to the minimum and maximum of the fitted x-range in fit_result.
plot_filename (str, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.

See also

plot_fit_zoom(), save_fit_trace()

plot_fit_zoom(peak_indeces=None, x_center=None, x_range=0.01, plot_title=None, show_peak_markers=True, error_every=1, sigmas_of_conf_band=0, plot_filename=None)[source]¶

Show logarithmic and linear plots of data and fit curve zoomed to peaks or mass range of interest.

There is two alternatives to define the plots’ mass ranges:

Specifying peaks-of-interest with the peak_indeces argument. The mass range is then automatically chosen to include all peaks of interest. The minimal mass range to include around each peak of interest can be adjusted using x_range.
Specifying a mass range of interest with the x_center and x_range arguments.

Parameters:

peak_indeces (int or list of ints, optional) – Index of single peak or indeces of multiple neighboring peaks to show (peaks must belong to the same fit_result).
x_center (float [u/z], optional) – Center of manually specified x-range to plot.
x_range (float [u/z], optional, default: 0.01) – Width of x-range to plot around ‘x_center’ or minimal mass range to include around each specified peak of interest.
plot_title (str or None, optional) – Title of plots. If None, defaults to a string with the fit model name and cost function of the fit_result to ensure clear indication of how the fit was obtained.
show_peak_markers (bool, optional, default: True) – If True, peak markers are added to the plots.
error_every (int, optional, default: 1) – Show error bars only for every error_every-th data point.
sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in log-plot). If 0, no confidence band is shown (default).
plot_filename (str or None, optional, default: None) – If not None, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.

See also

plot_fit(), save_fit_trace()

remove_peak(peak_index=None, x_pos=None, species='?')[source]¶

Remove specified peak from the spectrum’s peaks list.

Select the peak to be removed by specifying either the respective peak_index, species label or peak marker position x_pos.

Parameters:

peak_index (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s peaks list (0-based!).
x_pos (float or list of float [u/z]) – x_pos of peak(s) to remove from the spectrum’s peaks list. Peak marker positions must be specified up to the 6th decimal.
species (str or list of str) – species label(s) of peak(s) to remove from the spectrum’s peaks list.

Note

This method is deprecated in v0.1.1 and will likely be removed in future versions, use remove_peaks() instead!

remove_peaks(peak_indeces=None, x_pos=None, species='?', verbose=True)[source]¶

Remove specified peak(s) from the spectrum’s peaks list.

Select the peak(s) to be removed by specifying either the respective peak_indeces, species label(s) or peak marker position(s) x_pos. To remove multiple peaks at once, pass a list to one of the above arguments.

Parameters:

peak_indeces (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s peaks list (0-based!).
x_pos (float or list of float [u/z]) – x_pos of peak(s) to remove from the spectrum’s peaks list. Peak marker positions must be specified up to the 6th decimal.
species (str or list of str) – species label(s) of peak(s) to remove from the spectrum’s peaks list.

Note

Removing a peak will shift the peak indeces of all peaks at higher masses by -1.

Notes

The current peaks list can be viewed by calling the show_peak_properties() spectrum method.

New in version 0.2.0: (as successor of remove_peak()).

save_fit_trace(filename, peak_index=0, fit_result=None, x_res=None, fmt=('%.9e', '%.1i', '%.3e', '%.6e'), comment='')[source]¶

Write count data and best-fit curve to TXT file

By default, the fit curve data is taken from the first fit result stored in the spectrum’s fit_results list; data from other fit results can accessed through the peak_index or fit_result arguments.

Parameters:

filename (str, optional) – Prefix of the output filename; the suffix “_fit_trace.txt” is automatically appended.
peak_index (int, optional, default: 0) – Peak index specifying from which result in spectrum.fit_results the fit data will be written.
fit_result (emgfit.model.EMGModelResult, optional) – Fit result to obtain fit curve from. Only needed to write grab data from a fit result not stored in the spectrum’s fit_results attribute.
fmt (sequence of str) – Formats of written column values. For details see numpy.savetxt().
x_res (float, optional [u]) – Custom spacing of abscissa values at which the best-fit model will be evaluated. If x_res is specified, the measured count data and its error will not be written to the output file.
comment (str, optional) – Comments to add to output file.

Notes

New in version 0.4.1.

save_peak_shape_cal(filename)[source]¶

Save peak shape parameters to a TXT-file.

Parameters:: filename (str) – Name of output file (‘.txt’ extension is automatically appended).

save_results(filename, save_plots=True, plot_kws={})[source]¶

Write the fit results to a XLSX file and the peak-shape calibration to a TXT file.

Write results to an XLSX Excel file named <filename>_results.xlsx and save peak-shape calibration parameters to TXT file named <filename>_peakshape_calib.txt.

The EXCEL file contains the following three worksheets:

general spectrum properties
peak properties and images of all obtained fit curves
results of the default peakshape-error evaluation in which shape parameters are varied by +-1 sigma

By default, PNG images of all peak fits are saved to PNG-images in both linear and logarithmic scale.

Parameters:

filename (string) – Prefix of the files to be saved to (any provided file extensions are automatically removed and the necessary .xlsx & .txt extensions are appended).
save_plots (bool, optional, default: True) – Whether to save separate PNG files with plots of all obtained fit curves. If False, plots will still be included in the XLSX file.
plot_kws (dict, optional) – Keyword arguments to pass to plot_fit() method in order to customize plot appearance.

set_blinded_peaks(indeces, overwrite=False)[source]¶

Specify for which peaks mass values will be hidden for blind analysis

This method adds peaks to the spectrum’s list of blinded_peaks. For these peaks, the obtained mass values in the peak properties table and the peak position parameters mu in fit reports will be hidden. Literature values and mass uncertainties remain visible. All results are unblinded upon export with the save_results() method.

Parameters:

indeces (int or list of int) – Indeces of peaks of interest whose obtained mass values are to be blinded.
overwrite (bool, optional, default: False) – If False (default), the specified indeces are added to the blinded_peaks list. If True, the current blinded_peaks list is replaced by the specified indeces.

Examples

Activate blinding for peaks 0 & 3 of spectrum object spec:

>>> spec.set_blinded_peaks([0,3])

Add peak 3 to list of blinded peaks:

>>> spec.set_blinded_peaks([3])

Turn off blinding by resetting the blinded peaks attribute to an empty list:

>>> spec.set_blinded_peaks([], overwrite=True)

set_lit_values(peak_idx, m_AME, m_AME_error, extrapolated=False, A=None, z=None, verbose=True)[source]¶

Manually define the (ionic) literature mass and its error for a peak

Existing values are overwritten.

Parameters:

peak_idx (int) – Index of peak to assign literature values for.
m_AME (float [u]) – New literature mass.
m_AME_error (float [u]) – New liteature mass uncertainty.
extrapolated (bool, optional, default: False) – Flag indicating whether this literature value has been extrapolated.
A (int, optional, default: None) – Atomic mass number of species - overwrites existing. If None and the peak’s attribute A is undefined, the peak’s mass number defaults to the closest integer of the provided m_AME.
z (int, optional) – Charge state of species - overwrites existing (if not None).
verbose (bool, optional, default: True) – Whether to print a status update after completion.

Notes

Manually defined literature values are indicated by adding 'lit_src: user' to the peak’s comment.

New in version 0.3.6.

show_peak_properties()[source]¶

Print properties of all peaks in peaks list.

For a description of the different columns, see class:~emgfit.spectrum.peak.

Note

Only public attributes are shown.

emgfit.hypothesis_tests module¶

Module with routines for hypothesis tests allowing to assess the significance of potential peaks in the data.

emgfit.hypothesis_tests._likelihood_ratio_test(spec, ref_result, alt_x_pos, x_fit_cen=None, x_fit_range=None, vary_alt_mu=True, alt_mu_min=None, alt_mu_max=None, vary_baseline=True, par_hint_args=None, verbose=False, show_plots=False, show_results=False)[source]¶

Perform a local likelihood ratio test on the specified spectrum

Parameters:

spec (emgfit.spectrum.spectrum) – Spectrum object to perform likelihood ratio test on.
ref_result (emgfit.model.EMGModelResult) – Fit result storing the null model.
alt_x_pos (float [u]) – Position of the hypothesized alternative peak.
x_fit_cen (float [u], optional) – Center of the x-range to fit. Defaults to the center of the fit result asociated with null_result_index.
x_fit_range (float [u], optional) – Width of the x-range to fit. Defaults to the range of the fit result asociated with null_result_index.
vary_alt_mu (bool, optional) – Whether to vary the alternative-peak centroid in the fit.
alt_mu_min (float [u], optional) – Lower boundary to use when varying the alternative-peak centroid. Defaults to the range defined by the MU_VAR_NSIGMA constant in the emgfit.fit_models module.
alt_mu_max (float [u], optional) – Upper boundary to use when varying the alternative-peak centroid. Defaults to the range defined by the MU_VAR_NSIGMA constant in the emgfit.fit_models module.
vary_baseline (bool, optional) – If True, the constant background will be fitted with a varying uniform baseline parameter bkg_c. If False, the baseline parameter bkg_c will be fixed to 0.
par_hint_args (dict of dicts, optional) – Arguments to pass to lmfit.model.Model.set_param_hint() to modify or add model parameters. See docs of peakfit() method for details.
verbose (bool, optional) – Whether to print status updates and results.
show_plots (bool, optional) – Whether to show plots of the fit results.
show_results (bool, optional) – Whether to display reports with the fit results.

Returns:

Tuple storing the Log-likelihood ratio of null and alternative model, the null fit result and the alternative model result.

Return type:

tuple of format (float, EMGModelResult, EMGModelResult)

emgfit.hypothesis_tests.run_GV_likelihood_ratio_test(spec, null_result_index, alt_x_min, alt_x_max, alt_x_steps=100, min_significance=3, N_spectra=100, c0=0.5, seed=None, show_upcrossings=True, show_fits=True)[source]¶

Perform a likelihood ratio test following the method of Gross & Vitells

Decide on an appropriate significance level before executing this method and set the `min_significance` argument accordingly!

Parameters:

spec (spectrum) – Spectrum object to perform test on.
null_result_index (int) – Index (of one) of the peak(s) present in the null-model fit.
alt_x_min (float) – Minimal x-position to use in the alternative-peak position scan.
alt_x_max (float) – Maximal x-position to use in the alternative-peak position scan.
alt_x_steps (int, optional, default: 100) – Number of steps to take in the alternative-peak position scan.
min_significance (float [sigma], default: 3) – Minimal significance level (in sigma) required to reject the null model in favour of the alternative model.
N_spectra (int, optional, default: 100) – Number of simulated spectra to fit at each x-position.
c0 (float, optional, default: 0.5) – Threshold to use in determining the expected number of upcrossings.
seed (int, optional) – Random seed to use for reproducible event sampling.
show_upcrossings (bool, optional, default: True) – Whether to show plots of the range of upcrossings.
show_fits (bool, optional, default: True) – Whether to show plots of the null- and alternative-model fits to the observed data.

Returns:

Dictionary with results of the likelihood ratio test.

Return type:

dict

Notes

When the exact location of a hypothesized alternative peak is unknown, one may test for its presence by performing multiple hypothesis tests with different fixed alternative-peak positions. However, performing multiple tests on the same dataset artificially increases the rate of false discovery due to the increased chance for random background fluctuations to mimick a signal. In the high-energy particle physics literature, this complication is referred to as the look-elsewhere effect. To obtain a global p-value that correctly quantifies the likelihood to observe the alternative peak anywhere in the tested region, a procedure is needed that accounts for correlations between the local p-values obtained for the various tested peak positions. To this end, this function adapts the method outlined by Gross and Vitells in [13]. Namely, an upper limit on the global p-value \(p\) is deduced from the relation:

\[p = P(LLR > c) \leq P(\chi^2_1 > c)/2 + \langle N(c_0)\rangle e^{-\left(c-c_0\right)/2},\]

where \(P(LLR > c)\) is the probability for the log-likelihood ratio statistic (LLR) to exceed the maximum of the observed local LLR statistic \(c\), \(P(\chi^2_1 > c)\) is the probability that the \(\chi^2\) statistic with one degree of freedom exceeds the level \(c\) and \(\langle N(c_0)\rangle\) is the expected number of times the local LLR test statistics surpass the threshold level \(c_0 \ll c\) under the null hypothesis. This number is estimated by simulating \(N_{spectra}\) spectra from the null model and taken as the mean number of times the local LRT statistics cross up through the specified threshold level \(c_0\). In principle, \(c_0\) should be chosen as small as possible but care should be taken that the mean spacing between detected upcrossings does not fall below the typical width of the observed peaks.

References

emgfit.hypothesis_tests.run_MC_likelihood_ratio_test(spec, null_result_index, alt_x_pos, alt_mu_min=None, alt_mu_max=None, min_significance=3, N_spectra=10000, vary_ref_mus_and_amps=False, vary_ref_peak_shape=False, seed=None, n_cores=-1, show_plots=True, show_results=True, show_LLR_hist=True)[source]¶

Perform Monte Carlo likelihood ratio test by fitting simulated spectra

The simulated spectra are sampled from the null model stored in the spectrum’s fit results list and asociated with the peak index null_result_index.

Parameters:

spec (spectrum) – Spectrum object to perform likelihood ratio test on.
null_result_index (int) – Index (of one) of the peak(s) present in the null-model fit.
alt_x_pos (float, optional) – Initial position to use for alternative peak
alt_mu_min (float [u], optional) – Lower boundary to use when varying the alternative-peak centroid. Defaults to the range defined by the MU_VAR_NSIGMA constant in the emgfit.fit_models module.
alt_mu_max (float [u], optional) – Upper boundary to use when varying the alternative-peak centroid. Defaults to the range defined by the MU_VAR_NSIGMA constant in the emgfit.fit_models module.
vary_ref_mus_and_amps (bool, optional) – Whether to randomly vary the peak positions and the peak and background amplitudes of the reference spectrum within their parameter uncertainties.
vary_ref_peak_shape (bool, optional) – Whether to vary the reference peak shape used for the event sampling in the creation of simulated spectra. If True, N_spectra parameter samples are drawn randomly with replacement from the MCMC_par_samples obtained in the MCMC shape parameter sampling.
min_significance (float, optional, default: 3) – Critical significance level for rejecting the null hypothesis (measured in sigma).
N_spectra (int, optional, default: 10000) – Number of simulated spectra to sample from the null model.
seed (int, optional) – Random seed to use for reproducible sampling.
n_cores (int, optional, default: -1) – Number of CPU cores to use for parallelized sampling and fitting of simulated spectra. If -1, all available cores are used.
show_plots (bool, optional) – Whether to show plots of the fit results.
show_results (bool, optional) – Whether to display reports with the fit results.
show_LLR_hist (bool, optional) – Whether to display histogram of log-likelihood ratio values collected for p-value determination.

Returns:

Dictionary with results of the likelihood ratio test.

Return type:

dict

Notes

Simulated spectra are created by randomly sampling events from the null model best fitting the observed data. These simulated spectra are then fitted with both the null and the alternative model and the respective values for the likelihood ratio test statistic \(\Lambda\) are calculated using the relation

\[\Lambda = \log\left(\frac{\mathcal{L}(H_1)}{\mathcal{L}(H_0)}\right) = L(H_1) - L(H_0),\]

where \(L(H_0)\) and \(L(H_1)\) denote the MLE cost function values (i.e. the negative doubled log-likelihood values) obtained from the null-model and alternative-model fits, respectively, and \(\mathcal{L}(H_0)\) and \(\mathcal{L}(H_1)\) mark the corresponding likelihood functions. Finally, the p-value is calculated as

\[p = \frac{N_>}{N_< + N_>},\]

where \(N_<\) and \(N_>\) denote the number of likelihood ratio values \(\Lambda\) that fall below and above the observed value for the likelihood ratio test statistic \(\Lambda_\mathrm{obs}\), respectively.