emgfit API docs¶
emgfit.ame_funcs module¶
Module for importing and handling of literature mass values from the Atomic Mass Evaluation 2016 (AME2016).
-
emgfit.ame_funcs.
get_AME_values
(species)[source]¶ Calculates the AME mass, AME mass error, the extrapolation flag and the mass number A of the given species.
-
emgfit.ame_funcs.
get_El_from_Z
(Z)[source]¶ Convenience function to grab element symbol from given proton number.
-
emgfit.ame_funcs.
splitparticle
(s)[source]¶ Extracts number, particle/element type and mass number of particle string (e.g. 1Cs133, Cs133, 1e).
-
emgfit.ame_funcs.
splitspecies
(s)[source]¶ Splits ion species string into list of constituent atom strings.
- Parameters
s (str) – Input species string.
- Returns
List of constituent atom strings contained in s.
- Return type
list of str
Examples
splitspecies('4H1:1C12')
returns['4H1','1C12']
splitspecies('H1:O16')
returns['H1','O16']
emgfit.config module¶
Configuration file with general settings for emgfit package.
emgfit.emg_funcs module¶
Module with numerically robust implementation of the hyper-exponentially- modified Gaussian probability density function.
-
emgfit.emg_funcs.
h_emg
(x, mu, sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶ Hyper-exponentially-modified Gaussian distribution (hyper-EMG).
The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.
- Parameters
x (float >= 0) – Abscissa data (mass data).
mu (float >= 0) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature:
(eta_m1, eta_m2, ...)
.li_tau_m (tuple) – Tuple containing the neg. tail decay constants with the signature:
(tau_m1, tau_m2, ...)
.li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature:
(eta_p1, eta_p2, ...)
.li_tau_p (tuple) – Tuple containing the pos. tail decay constants with the signature:
(tau_p1, tau_p2, ...)
.
- Returns
Ordinate of hyper-EMG distribution
- Return type
Notes
The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. 3. The basic definitions and notations used here are adapted from this work.
The total hyper-EMG distribution h_m_emg is comprised of the negative- and positive-skewed EMG distributions h_m_emg and h_p_emg respectively and is calculated as:
h_emg(x, mu, sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p) =
theta*h_m_emg(x, mu, sigma, li_eta_m, li_tau_m) + (1-theta)*h_p_emg(x, mu, sigma, li_eta_p, li_tau_p)
.For algorithmic details, see Notes of
h_m_emg()
andh_p_emg()
.References
- 3
Purushothaman, S., et al. “Hyper-EMG: A new probability distribution function composed of Exponentially Modified Gaussian distributions to analyze asymmetric peak shapes in high-resolution time-of-flight mass spectrometry.” International Journal of Mass Spectrometry 421 (2017): 245-254.
-
emgfit.emg_funcs.
h_m_emg
(x, mu, sigma, li_eta_m, li_tau_m)[source]¶ Negative skewed exponentially-modified Gaussian (EMG) distribution.
The lengths of li_eta_m & li_tau_m must match and define the order of negative tails.
- Parameters
x (float >= 0) – Abscissa data (mass data).
mu (float >= 0) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature:
(eta_m1, eta_m2, ...)
.li_tau_m (tuple) – Tuple containing the neg. tail decay constants with the signature:
(tau_m1, tau_m2, ...)
.
- Returns
Ordinate values of the negative skewed EMG distribution.
- Return type
Notes
The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. 4. The basic definitions and notations used here are adapted from this work.
Each negative tail of a Hyper-EMG function can be expressed in two equivalent ways:
\[h_\mathrm{emg,-i} = \frac{\eta_{-i}}{2\tau_{-i}} \exp{(-\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)^2)} \mathrm{erfcx}(v) = \frac{\eta_{-i}}{2\tau_{-i}} \exp{(u)} \mathrm{erfc}(v),\]where \(u = \frac{\sigma}{\sqrt{2}\tau_{-i}} + \frac{x-\mu}{\sqrt{2}\tau_{-i}}\) and \(v = \frac{\sigma}{\sqrt{2}\tau_{-i}} + \frac{x-\mu}{\sqrt{2}\sigma}\). In double float precision, the exp(u) routine overflows if u > 709.78. The complementary error function erfc(v) underflows to 0.0 if v > 26.54. The scaled complementary error function erfcx(v) overflows if v < -26.62. To circumvent those scenarios and always ensure an exact result, the underlying helper function for the calculation of a negative EMG tail
h_m_i()
uses the formulation in terms of erfcx whenever v >= 0 and switches to the erfc-formulation when v < 0.References
- 4
Purushothaman, S., et al. “Hyper-EMG: A new probability distribution function composed of Exponentially Modified Gaussian distributions to analyze asymmetric peak shapes in high-resolution time-of-flight mass spectrometry.” International Journal of Mass Spectrometry 421 (2017): 245-254.
-
emgfit.emg_funcs.
h_m_i
(x, mu, sigma, eta_m, tau_m)[source]¶ Internal helper function to calculate single negative EMG tail for h_m_emg.
-
emgfit.emg_funcs.
h_m_i_prec
(x, mu, sigma, eta_m, tau_m)[source]¶ Arbitrary precision version of internal helper function for h_m_emg.
-
emgfit.emg_funcs.
h_p_emg
(x, mu, sigma, li_eta_p, li_tau_p)[source]¶ Positive skewed exponentially-modified Gaussian (EMG) distribution.
The lengths of li_eta_p & li_tau_p must match and define the order of positive tails.
- Parameters
x (float >= 0) – Abscissa data (mass data).
mu (float >= 0) – Mean value of underlying Gaussian distribution.
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature:
(eta_p1, eta_p2, ...)
.li_tau_p (tuple) – Tuple containing the pos. tail decay constants with the signature:
(tau_p1, tau_p2, ...)
.
- Returns
Ordinate values of the positive skewed EMG distribution.
- Return type
Notes
The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. 5. The basic definitions and notations used here are adapted from this work.
Each positive tail of a Hyper-EMG function can be expressed in two equivalent ways:
\[h_\mathrm{emg,+i} = \frac{\eta_{+i}}{2\tau_{+i}} \exp{(-\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)^2)} \mathrm{erfcx}(v) = \frac{\eta_{+i}}{2\tau_{+i}} \exp{(u)} \mathrm{erfc}(v),\]where \(u = \frac{\sigma}{\sqrt{2}\tau_{+i}} - \frac{x-\mu}{\sqrt{2}\tau_{+i}}\) and \(v = \frac{\sigma}{\sqrt{2}\tau_{+i}} - \frac{x-\mu}{\sqrt{2}\sigma}\). In double precision, the exp(u) routine overflows if u > 709.78. The complementary error function erfc(v) underflows to 0.0 if v > 26.54. The scaled complementary error function erfcx(v) overflows if v < -26.62. To circumvent those scenarios and always ensure an exact result, the underlying helper function for the calculation of a negative EMG tail
h_m_i()
uses the formulation in terms of erfcx whenever v >= 0 and switches to the erfc-formulation when v < 0.References
- 5
Purushothaman, S., et al. “Hyper-EMG: A new probability distribution function composed of Exponentially Modified Gaussian distributions to analyze asymmetric peak shapes in high-resolution time-of-flight mass spectrometry.” International Journal of Mass Spectrometry 421 (2017): 245-254.
-
emgfit.emg_funcs.
h_p_i
(x, mu, sigma, eta_p, tau_p)[source]¶ Internal helper function to calculate single positive EMG tail for h_p_emg.
-
emgfit.emg_funcs.
h_p_i_prec
(x, mu, sigma, eta_p, tau_p)[source]¶ Arbitrary precision version of internal helper function for h_p_emg.
-
emgfit.emg_funcs.
mu_emg
(mu, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶ Calculate mean of hyper-EMG distribution.
The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.
- Parameters
mu (float >= 0) – Mean value of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature:
(eta_m1, eta_m2, ...)
.li_tau_m (tuple) – Tuple containing the neg. tail decay constants with the signature:
(tau_m1, tau_m2, ...)
.li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature:
(eta_p1, eta_p2, ...)
.li_tau_p (tuple) – Tuple containing the pos. tail decay constants with the signature:
(tau_p1, tau_p2, ...)
.
- Returns
Mean of hyper-EMG distribution.
- Return type
Notes
The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. 6. The basic definitions and notations used here are adapted from this work.
References
- 6
Purushothaman, S., et al. “Hyper-EMG: A new probability distribution function composed of Exponentially Modified Gaussian distributions to analyze asymmetric peak shapes in high-resolution time-of-flight mass spectrometry.” International Journal of Mass Spectrometry 421 (2017): 245-254.
-
emgfit.emg_funcs.
sigma_emg
(sigma, theta, li_eta_m, li_tau_m, li_eta_p, li_tau_p)[source]¶ Calculate standard deviation of hyper-EMG distribution.
The lengths of li_eta_m & li_tau_m must match and define the order of negative tails. Likewise, the lengths of li_eta_p & li_tau_p must match and define the order of positive tails.
- Parameters
sigma (float >= 0) – Standard deviation of underlying Gaussian distribution.
theta (float, 0 <= theta <= 1) – Left-right-weight factor (negative-skewed EMG weight: theta; positive-skewed EMG weight: 1 - theta).
li_eta_m (tuple) – Tuple containing the neg. tail weights with the signature:
(eta_m1, eta_m2, ...)
.li_tau_m (tuple) – Tuple containing the neg. tail decay constants with the signature:
(tau_m1, tau_m2, ...)
.li_eta_p (tuple) – Tuple containing the pos. tail weights with the signature:
(eta_p1, eta_p2, ...)
.li_tau_p (tuple) – Tuple containing the pos. tail decay constants with the signature:
(tau_p1, tau_p2, ...)
.
- Returns
Standard deviation of hyper-EMG distribution.
- Return type
Notes
The Hyper-EMG probability distribution function was first introduced in this publication by Purushothaman et al. 7. The basic definitions and notations used here are adapted from this work.
References
- 7
Purushothaman, S., et al. “Hyper-EMG: A new probability distribution function composed of Exponentially Modified Gaussian distributions to analyze asymmetric peak shapes in high-resolution time-of-flight mass spectrometry.” International Journal of Mass Spectrometry 421 (2017): 245-254.
emgfit.fit_models module¶
Module with lmfit models for single peaks with Gaussian or various hyper-exponentially-modified Gaussian line shapes.
-
emgfit.fit_models.
Gaussian
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Gaussian lmfit model (single-peak Gaussian fit model)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
create_default_init_pars
(mass_number=100)[source]¶ Scale default parameters to mass of interest and return parameter dictionary.
- Parameters
mass_number (int, optional) – Atomic mass number of peaks of interest, defaults to 100.
- Returns
Dictionary with default initial parameters (scaled to mass_number).
- Return type
Notes
The default parameters were defined for mass 100, to obtain suitable parameters at other masses all mass-dependent parameters (i.e. shape parameters & amp) are multiplied by the scaling factor mass_number/100.
-
emgfit.fit_models.
emg01
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(0,1) lmfit model (single-peak fit model with one exponential tail on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg10
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(1,0) lmfit model (single-peak fit model with one exponential tail on the left)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg11
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(1,1) lmfit model (single-peak fit model with one exponential tail on the left and one exponential tail on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg12
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(1,2) lmfit model (single-peak fit model with one exponential tail on the left and two exponential tails on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg21
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(2,1) lmfit model (single-peak fit model with two exponential tails on the left and one exponential tail on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg22
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(2,2) lmfit model (single-peak fit model with two exponential tails on the left and two exponential tails on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg23
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(2,3) lmfit model (single-peak fit model with two exponential tails on the left and three exponential tails on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg32
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(3,2) lmfit model (single-peak fit model with three exponential tails on the left and two exponential tails on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
-
emgfit.fit_models.
emg33
(peak_index, x_pos, amp, init_pars={'amp': 0.45, 'eta_m1': 0.85, 'eta_m2': 0.1, 'eta_m3': 0.05, 'eta_p1': 0.85, 'eta_p2': 0.1, 'eta_p3': 0.05, 'mu': None, 'sigma': 0.00014, 'tau_m1': 5e-05, 'tau_m2': 0.0005, 'tau_m3': 0.001, 'tau_p1': 5e-05, 'tau_p2': 0.0006, 'tau_p3': 0.001, 'theta': 0.5}, vary_shape_pars=True, index_first_peak=None)[source]¶ Hyper-EMG(3,3) lmfit model (single-peak fit model with three exponential tails on the left and three exponential tails on the right)
- Parameters
peak_index (int) – Index of peak to fit.
x_pos (float) – Initial guess of peak centroid.
amp (float) – Initial guess of peak amplitude.
init_pars (dict) – Initial parameters for fit (‘amp’ and ‘mu’ parameters in init_pars dictionary are overwritten by the given amp and x_pos arguments)
vary_shape_pars (bool) – Whether to vary or fix peak shape parameters (i.e. sigma, theta, eta’s and tau’s).
index_first_peak (int) – Index of the first peak to be fit in a multi-peak-fit. Only use this during peak shape determination to enforce common shape parameters for all peaks to be fitted. (For a regular fit with
vary_shape_pars = False
this is irrelevant.)
- Returns
lmfit model object
- Return type
lmfit.model
emgfit.sample module¶
Module for creating simulated time-of-flight mass spectra with Gaussian and hyper-exponentially-modified line shapes.
-
emgfit.sample.
_h_m_i_rvs
(mu, sigma, tau_m, N_i)[source]¶ Helper function for definition of h_m_emg_rvs
-
emgfit.sample.
_h_p_i_rvs
(mu, sigma, tau_p, N_i)[source]¶ Helper function for definition of h_p_emg_rvs
-
emgfit.sample.
h_emg_rvs
(mu, sigma, theta, *t_args, N_samples=None)[source]¶ Draw random samples from Hyper-EMG PDF
- Parameters
mu (float [u]) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float [u]) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
theta (float) – Mixing weight of pos. & neg. skewed EMG distributions.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_m1, eta_m2, …], [tau_m1, tau_m2, …], [eta_p1, eta_p2, …], [tau_p1, tau_p2, …]]
N_samples (int) – Number of random events to sample.
- Returns
Array with simulated events.
- Return type
numpy.ndarray
of floats
-
emgfit.sample.
h_m_emg_rvs
(mu, sigma, *t_args, N_samples=None)[source]¶ Draw random samples from neg. skewed Hyper-EMG PDF
- Parameters
mu (float [u]) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float [u]) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
theta (float) – Mixing weight of pos. & neg. skewed EMG distributions.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_m1, eta_m2, …], [tau_m1, tau_m2, …]]
N_samples (int) – Number of random events to sample.
- Returns
Array with simulated events.
- Return type
numpy.ndarray
of floats
-
emgfit.sample.
h_p_emg_rvs
(mu, sigma, *t_args, N_samples=None)[source]¶ Draw random samples from pos. skewed Hyper-EMG PDF
- Parameters
mu (float [u]) – Nominal position of simulated peak (mean of underlying Gaussian).
sigma (float [u]) – Nominal standard deviation (of the underlying Gaussian) of the simulated hyper-EMG peak.
t_args (list of lists of float) – List containing lists of the EMG tail parameters with the signature: [[eta_p1, eta_p2, …], [tau_p1, tau_p2, …]]
N_samples (int) – Number of random events to sample.
- Returns
Array with simulated events.
- Return type
numpy.ndarray
of floats
-
emgfit.sample.
simulate_events
(shape_pars, mus, amps, bkg_c, N_events, x_min, x_max, out='hist', N_bins=None, bin_cens=None)[source]¶ Create simulated ion events drawn from a user-defined probability distribution function (PDF)
Events can either be output as a list of single events (mass stamps) or as a histogram. In histogram output mode, uniform binning is easily realized by specifying the N_bins argument. More control over the binning can be achieved by parsing the desired bin centers to the bin_cens argument (e.g. for non-uniform binning).
- Parameters
shape_pars (dict) – Peak-shape parameters to use for sampling. The dictionary must follow the structure of the
shape_cal_pars
attribute of thespectrum
class.mus (float or list of float [u]) – Nominal peak positions of peaks in simulated spectrum.
amps (float or list of float [counts per u]) – Nominal amplitudes of peaks in simulated spectrum.
bkg_c (float [counts per bin], optional, default: 0.0) – Nominal amplitude of uniform background in simulated spectrum.
x_min (float [u]) – Beginning of sampling mass range.
x_max (float [u]) – End of sampling mass range.
N_events (int, optional, default: 1000) – Total number of events to simulate (signal and background events).
out (str, optional) –
Output format of sampled data. Options:
'hist'
for binned mass spectrum (default). The centres of the mass bins must be specified with the bin_cens argument.'list'
for unbinned list of single ion and background events.
N_bins (int, optional) – Number of uniform bins to use in
'hist'
output mode. The outer edges of the first and last bin are fixed to the start and end of the sampling range respectively (i.e. x_min and x_max). In between, bins are distributed with a fixed spacing of (x_max-x_min)/N_bins.bin_cens (
numpy.ndarray
[u]) – Centres of mass bins to use in'hist'
output mode. This argument allows the realization of non-uniform binning. Bin edges are centred between neighboring bins. Note: Bins outside the sampling range defined with x_min and x_max will be empty.
- Returns
If out=’hist’ a dataframe with a histogram of the format [bin centre [u], counts in bin] is returned. If out=’list’ an unbinned array with mass values [u] of single ion or background events is returned.
- Return type
numpy.array
orpandas.Dataframe
Notes
Random events are created via custom Hyper-EMG extensions of Scipy’s
scipy.stats.exponnorm.rvs()
method.Currently, all simulated peaks have identical width and shape (no re-scaling of mass-dependent shape parameters to a peak’s mass centroid).
Routine requires tail arguments in shape_cal_pars dict to be ordered (eta_m1,eta_m2,…) etc..
Mind the different units for peak amplitudes `amps` (counts per u) and the background level `bkg_c` (counts per bin). When spectrum data is simulated counts are randomly distributed between the different peaks and the background with probability weights amps * <bin width in> and bkg_c * <number of bins>, respectively. As a consequence, simply changing N_events (while keeping all other arguments constant), will cause amps and bkg_c to deviate from their nominal units.
-
emgfit.sample.
simulate_spectrum
(spec, x_cen=None, x_range=None, mus=None, amps=None, bkg_c=None, N_events=None, copy_spec=False)[source]¶ Create a simulated spectrum using the attributes of a reference spectrum
The peak shape of the sampling probability distribution function (PDF) follows the shape calibration of the reference spectrum (spec). By default, all other parameters of the sampling PDF are identical to the best-fit parameters of the reference spectrum. If desired, the positions, amplitudes and number of peaks in the sampling PDF as well as the background level can be changed with the mus, amps and bkg_c arguments.
- Parameters
spec (
spectrum
) – Reference spectrum object whose best-fit parameters will be used to sample from.mus (float or list of float [u], optional) – Nominal peak centres of peaks in simulated spectrum. Defaults to the mus of the reference spectrum fit.
amps (float or list of float [counts per u], optional) – Nominal amplitudes of peaks in simulated spectrum. Defaults to the amplitudes of the reference spectrum fit.
bkg_c (float [counts per bin], optional) – Nominal amplitude of uniform background in simulated spectrum. Defaults to the c_bkg obtained in the fit of the first peak in the reference spectrum.
x_cen (float [u], optional) – Mass center of simulated spectrum. Defaults to x_cen of spec.
x_range (float [u], optional) – Covered mass range of simulated spectrum. Defaults to x_range of spectrum.
N_events (int, optional) – Number of ion events to simulate (including background events). Defaults to total number of events in spec.
copy_spec (bool, optional, default: False) – If
False
(default), this function returns a freshspectrum
object created from the simulated mass data. IfTrue
, this function returns an exact copy of spec with only the :attr`data` attribute replaced by the new simulated mass data.
- Returns
If copy_spec = False (default) a fresh spectrum object holding the simulated mass data is returned. If copy_spec = True, a copy of the reference spectrum spec is returned with only the
data
attribute replaced by the new simulated mass data.- Return type
spectrum.spectrum
Notes
Random events are created via custom Hyper-EMG extensions of Scipy’s
scipy.stats.exponnorm.rvs()
method.Currently, all simulated peaks have identical width and shape (no re-scaling of mass-dependent shape parameters to a peak’s mass centroid).
The returned spectrum follows the binning of the reference spectrum.
Mind the different units for peak amplitudes amps (counts per u) and the background level bkg_c (counts per bin). When spectrum data is simulated counts are distributed between the different peaks and the background with probability weights amps * <bin width in> and bkg_c * <number of bins>, respectively. As a consequence, simply changing N_events (while keeping all other arguments constant), will render amps and bkg_c away from their nominal units.
emgfit.spectrum module¶
Module for fitting time-of-flight mass spectra.
-
class
emgfit.spectrum.
peak
(x_pos, species, m_AME=None, m_AME_error=None)[source]¶ Bases:
object
Object storing all relevant information about a mass peak.
Most peak attributes are intialized as
None
and are later automatically updated by methods of the spectrum class, e.g.spectrum.determine_peak_shape()
orspectrum.fit_peaks()
.- Variables
x_pos (float [u]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter for the (Gaussian) peak centroid mu will be initialized at this value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak. Species strings follow the :-notation of chemical substances. Examples:
'1K39:-1e'
,'K39:-e'
,'3H1:1O16:-1e'
. Do not forget to substract the electron, otherwise the atomic not the ionic mass would be used as literature value! Alternatively, tentative assigments can be made by adding a'?'
at the end of the species string (e.g.:'Sn100:-1e?'
,'?'
, …).comment (str) – User comment for peak.
m_AME (float [u], optional) – Ionic literature mass value, from AME2016 or user-defined.
m_AME_error (float [u], optional) – Literature mass uncertainty, from AME2016 or user-defined.
extrapolated (bool) – Boolean flag for extrapolated AME mass values (equivalent to ‘#’ in AME).
fit_model (str) – Name of model used to fit peak.
cost_func (str) – Type of cost function used to fit peak (
'chi-square'
or'MLE'
).red_chi (float) – Reduced chi-squared of peak fit. If the peak was fitted using
'MLE'
,red_chi
should be taken with caution.area_error (area,) – Number of total counts in peak and corresponding uncertainty (calculated from amplitude parameter amp of peak fit).
m_ion (float [u]) – Ionic mass value obtained in peak fit (after mass recalibration).
rel_stat_error (float) – Relative statistical uncertainty of
m_ion
.rel_recal_error (float) – Relative uncertainty of
m_ion
due to mass recalibration.rel_peakshape_error (float) – Relative peak-shape uncertainty of
m_ion
.rel_mass_error (float) – Total relative mass uncertainty of
m_ion
(excluding systematics!). Includes statistical, peak-shape and recalibration uncertainty.A (int) – Atomic mass number of peak species.
atomic_ME_keV (float [keV]) – (Atomic) mass excess corresponding to
m_ion
.mass_error_keV (float [keV]) – Total mass uncertainty of
m_ion
(excluding systematics!).m_dev_keV (float [keV]) – Deviation from literature value (
m_ion
-m_AME
).
-
__init__
(x_pos, species, m_AME=None, m_AME_error=None)[source]¶ Instantiate a new
peak
object.If a valid ion species is assigned with the species parameter the corresponding literature values will automatically be fetched from the AME2016 database. The syntax for species assignment follows that of MAc (for more details see documentation for species parameter below).
If different literature values are to be used, the literature mass or mass uncertainty can be user-defined with
m_AME
andm_AME_error
. This is useful for isomers and in cases where more recent measurements haven’t been included in the AME yet.- Parameters
x_pos (float [u]) – Coarse position of peak centroid. In fits the Hyper-EMG parameter for the (Gaussian) peak centroid mu will be initialized at this value. Peak markers in plots are located at x_pos.
species (str) – String with chemical formula of ion species asociated with peak. Species strings follow the :-notation of chemical substances. Examples:
'1K39:-1e'
,'K39:-e'
,'3H1:1O16:-1e'
. Do not forget to substract the electron from singly-charged species, otherwise the atomic not the ionic mass will be used as literature value! Alternatively, tentative assigments can be made by adding a'?'
at the end of the species string (e.g.:'Sn100:-1e?'
,'?'
, …).m_AME (float [u], optional) – User-defined literature mass value. Overwrites value fetched from AME2016. Useful for isomers or to use more up-to-date values.
m_AME_error (float [u], optional) – User-defined literature mass uncertainty. Overwrites value fetched from AME2016.
-
class
emgfit.spectrum.
spectrum
(filename=None, m_start=None, m_stop=None, skiprows=18, show_plot=True, df=None)[source]¶ Bases:
object
Object storing spectrum data, associated peaks and all relevant fit results - the workhorse of emgfit.
- Variables
input_filename (str) – Name of input file containing the spectrum data.
spectrum_comment (str, default: '-') – User comment for entire spectrum (helpful for further processing after).
fit_model (str) – Name of model used for peak-shape determination and further fitting.
index_shape_calib (int) – Index of peak used for peak-shape calibration.
red_chi_shape_cal (float) – Reduced chi-squared of peak-shape determination fit.
fit_range_shape_cal (float [u]) – Fit range used for peak-shape calibration.
shape_cal_result (
lmfit.model.ModelResult
) – Fit result obtained in peak-shape calibration.shape_cal_pars (dict) – Model parameter values obtained in peak-shape calibration.
shape_cal_errors (dict) – Model parameter uncertainties obtained in peak-shape calibration.
index_mass_calib (int) – Peak index of mass calibrant peak.
determined_A_stat_emg (bool) – Boolean flag for whether
A_stat_emg
was determined for this spectrum specifically using thedetermine_A_stat_emg()
method. IfTrue
,A_stat_emg
was set usingdetermine_A_stat_emg()
, otherwise the default value emgfit.config.A_stat_emg_default from theconfig
module was used. For more details see docs ofdetermine_A_stat_emg()
method.A_stat_emg (float) – Constant of proportionality for calculation of the statistical mass uncertainties. Defaults to emgfit.config.A_stat_emg_default as defined in the
config
module, unless thedetermine_A_stat_emg()
method is run.A_stat_emg_error (float) – Uncertainty of
A_stat_emg
.recal_fac (float, default: 1.0) – Scaling factor applied to
m_ion
in mass recalibration.rel_recal_error (float) – Relative uncertainty of recalibration factor
recal_fac
.recal_facs_pm (dict) – Modified recalibration factors obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation, respectively.
eff_mass_shifts (
numpy.ndarray
of dict) – Maximal effective mass shifts for each peak obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation and only keeping the shift with the larger absolute magnitude. The eff_mass_shifts array contains a dictionary for each peak; the dictionaries have the following structure: {‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…} For more details see docs of_eval_peakshape_errors()
.area_shifts (
numpy.ndarray
of dict) – Maximal area change for each peak obtained in peak-shape uncertainty evaluation by varying each shape parameter by plus and minus 1 standard deviation and only keeping the shift with the larger absolute magnitude. The eff_mass_shifts array contains a dictionary for each peak; the dictionaries have the following structure: {‘<shape param. name> eff. mass shift’ : [<maximal eff. mass shift>],…} For the mass calibrant the dictionary holds the absolute shifts of the calibrant peak centroid (calibrant centroid shift). For more details see docs of_eval_peakshape_errors()
.peaks_with_errors_from_resampling (list of int) – List with indeces of peaks whose statistical mass and area uncertainties have been determined by fitting synthetic spectra resampled from the best-fit model (see
get_errors_from_resampling()
).MCMC_par_samples (list of dict) – Shape parameter samples obtained via Markov chain Monte Carlo (MCMC) sampling with
_get_MCMC_par_samples()
.MC_recal_facs (list of float) – Recalibration factors obtained in fits with Markov Chain Monte Carlo (MCMC) shape parameter samples in
get_MC_peakshape_errors()
.peaks_with_MC_PS_errors (list of int) – List with indeces of peaks for which peak-shape errors have been determined by re-fitting with shape parameter sets from Markov Chain Monte Carlo sampling (see
get_MC_peakshape_errors()
).peaks (list of
peak
) – List containing all peaks associated with the spectrum sorted by ascending mass. The index of a peak within the peaks list is referred to as thepeak_index
.fit_results (list of
lmfit.model.ModelResult
) – List containing fit results (lmfit.model.ModelResult
objects) for peaks associated with spectrum.blinded_peaks (list of int) – List with indeces of peaks whose mass values and peak positions are to be hidden to enable blind analysis. The mass values will be unblinded upon export of the analysis results.
data (
pandas.DataFrame
) – Histogrammed spectrum data.mass_number (int) – Atomic mass number associated with central bin of spectrum.
default_fit_range (float [u]) – Default mass range for fits, scaled to
mass_number
of spectrum.
Notes
The
fit_model
spectrum attribute seems somewhat redundant with thepeak.fit_model
peak attribute but ensures that no relevant information is lost.The
mass_number
is used for re-scaling of the default model parameters to the mass of interest. It is calculated upon data import by taking the median of all mass bin centers (after initial cutting of the spectrum) and rounding to the closest integer. This accounts for spectra potentially containing several mass units.The
default_fit_range
is scaled to the spectrum’smass_number
using the relation: \(\text{default_fit_range} = 0.01\,u \cdot (\text{mass_number}/100)\)-
__init__
(filename=None, m_start=None, m_stop=None, skiprows=18, show_plot=True, df=None)[source]¶ Create a
spectrum
object by importing histogrammed mass data from .txt or .csv file.- Input file format: two-column .csv- or .txt-file with tab-separated
values (column 1: mass bin, column 2: counts in bin). This is the default format of MAc export files when histogram mode is used.
Optionally the spectrum can be cut to a specified fit range using the m_start and m_stop parameters. Mass data outside this range will be discarded and excluded from further analysis.
If show_plot is True, a plot of the spectrum is shown including vertical markers for the m_start and m_stop mass cut-offs (if applicable).
- Parameters
filename (str, optional) – Filename of mass spectrum to analyze (as exported with MAc’s histogram mode). If the input file is not located in the working directory the directory path has to be included in filename, too. If no filename is given, data must be provided via df argument. m_start : float [u], optional Start of fit range, data at lower masses will be discarded. m_stop : float [u], optional Stop of fit range, data at higher masses will be discarded.
show_plot (bool, optional, default: True) – If
True
, shows a plot of full spectrum with vertical markers for m_start and m_stop cut-offs.df (
pandas.DataFrame
, optional) – DataFrame with spectrum data to use, this enables the creation of a spectrum object from a DataFrame instead of from an external file. Primarily intended for internal use.
Notes
The option to import data via the df argument was added to enable the processing of bootstrapped spectra as regular
spectrum
objects in thedetermine_A_stat_emg()
method. This feature is primarily intended for internal use.
-
_add_peak_markers
(yscale='log', ymax=None, peaks=None)[source]¶ Internal function for adding peak markers to current figure object.
Place this function inside spectrum methods as
self._add_peak_markers(...)
betweenplt.figure()
andplt.show()
. Only for use on already fitted spectra!
-
_eval_MC_peakshape_errors
(peak_indeces=[], fit_result=None, verbose=True, show_hists=False, N_samples=1000, n_cores=- 1, seed=872, **MCMC_kwargs)[source]¶ Get peak-shape uncertainties for a fit result by re-fitting with many different MC-shape-parameter sets
This method is primarily intended for internal usage.
A representative subset of the shape parameter sets which are supported by the data is obtained by performing MCMC sampling on the peak-shape calibrant. If this has not already been done using the map_par_covar option in
determine_peak_shape()
, the_get_MCMC_par_samples()
method will be automatically called here.The peaks specified by peak_indeces will be fitted with N_samples different shape parameter sets. The peak-shape uncertainties are then estimated as the RMS deviation of the obtained values from the best-fit values.
The mass calibrant must either be included in peak_indeces or must have been processed with this method upfront (using the same N_samples and seed arguments to ensure identical sets of peak-shapes).
- Parameters
peak_indeces (int or list of int) – Indeces of peaks to evaluate MC peak-shape uncertainties for. The peaks of interest must belong to the same fit_result.
fit_result (
ModelResult
, optional) – Fit result for which MC peak-shape uncertainties are to be evaluated for. Defaults to the fit result stored for the peaks of interest in thespectrum.fit_results
spectrum attribute.verbose (bool, optional) – Whether to print status updates.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas obtained with the MC shape parameter sets are shown. Black vertical lines indicate the best-fit values stored in fit_result.
N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
**MCMC_kwargs – Keyword arguments to send to
_get_MCMC_par_samples()
for control over the MCMC sampling.
- Returns
Peak-shape mass errors [u], peak-shape area errors [counts] Both arrays have the same length as peak_indeces.
- Return type
array of floats, array of floats
Notes
For details on MCMC sampling see docs of
_get_MCMC_par_samples()
.This method only supports peaks that belong to the same fit result. If peaks in multiple fit_results are to be treated or the peak properties are to be updated with the refined peak-shape errors use
get_MC_peakshape_errors()
which wraps around this method.
-
_eval_peakshape_errors
(peak_indeces=[], fit_result=None, verbose=False, show_shape_err_fits=False)[source]¶ Calculate the relative peak-shape uncertainty of the specified peaks.
This internal method is automatically called by the :meth:`fit_peaks` and :meth:`fit_calibrant` methods and does not need to be run directly by the user.
The peak-shape uncertainties are obtained by re-fitting the specified peaks with each shape parameter individually varied by plus and minus 1 sigma and recording the respective shift of the peak centroids w.r.t the original fit. From the shifted IOI centroids and the corresponding shifts of the calibrant centroid effective mass shifts are determined. For each varied parameter, the larger of the two eff. mass shifts are then added in quadrature to obtain the total peak-shape uncertainty. See Notes section below for a detailed explanation of the peak-shape error evaluation scheme.
Note: All peaks in the specified peak_indeces list must have been fitted in the same multi-peak fit (and hence have the same lmfit modelresult fit_result)!
This routine does not yield a peak-shape error for the mass calibrant, since this is zero by definition. Instead, for the mass calibrant the absolute shifts of the peak centroid are calculated and stored in the
eff_mass_shifts_pm
andeff_mass_shifts
dictionaries.- Parameters
peak_indeces (list) – List containing indeces of peaks to evaluate peak-shape uncertainty for, e.g. to evaluate peak-shape error of peaks 0 and 3 use
peak_indeces=[0,3]
.fit_result (
lmfit.model.ModelResult
, optional) – Fit result object to evaluate peak-shape error for.verbose (bool, optional, default:
False
) – IfTrue
, print all individual eff. mass shifts obtained by varying the shape parameters.show_shape_err_fits (bool, optional, default:
False
) – IfTrue
, show individual plots of re-fits for peak-shape error determination.
See also
Notes
sigma,`theta`, all eta and all tau model parameters are considered “shape parameters” and varied by plus and minus one standard deviation in the peak-shape uncertainty evaluation. The peak amplitude, centroids and the baseline are always freely varying.
The “peak-shape uncertainty” refers to the mass uncertainty due to uncertainties in the determination of the peak-shape parameters and due to deviations between the shape-calibrant and IOI peak shapes. Simply put, the peak-shape uncertainties are estimated by evaluating how much a given peak centroid is shifted when the shape parameters are varied by plus or minus their 1-sigma uncertainty. A peculiarity of emgfit’s peak-shape error estimation routine is that only the centroid shifts relative to the calibrant are taken into account (hence ‘effective mass shifts’).
Inspired by the approach outlined in 8, the peak-shape uncertainties are obtained via the following procedure:
Since only effective mass shifts corrected for the corresponding shifts of the calibrant peak enter the peak-shape uncertainty, at first, the absolute centroid shifts of the mass calibrant must be evaluated. There are two options for this:
If the calibrant index is included in the peak_indeces argument, the original calibrant fit is re-performed with each shape parameter varied by plus and minus its 1-sigma confidence respectively while all other shape parameters are kept fixed at the original best-fit values. The resulting absolute “calibrant centroid shifts” are recorded and stored in the spectrum’s
eff_mass_shifts_pm
dictionary. The shifted calibrant centroids are further used to calculate updated mass re-calibration factors. These are stored in therecal_facs_pm
dictionary. Only the larger of the two centroid shifts due to the +/-1-sigma variation of each shape parameter are stored in the spectrum’seff_mass_shifts
dictionary.If the calibrant is not included in the peak_indeces list, the calibrant centroid shifts and the corresponding shifted recalibration factors must already have been obtained in a foregoing mass Mass recalibration and calculation of final mass values.
All non-calibrant peaks referenced in peak_indeces are treated in a similar way. The original fit that yielded the specified fit_result is re-performed with each shape parameter varied by plus and minus its 1-sigma confidence respectively while all other shape parameters are kept fixed at the original best-fit values. However now, the effective mass shifts after correction with the corresponding updated recalibration factor are recorded and stored in the spectrum’s
eff_mass_shifts_pm
dictionary. Only the larger of the two eff. mass shifts caused by the +/-1-sigma variation of each shape parameter are stored in the spectrum’seff_mass_shifts
dictionary.The estimates for the total peak-shape uncertainty of each peak are finally obtained by adding the eff. mass shifts stored in the
eff_mass_shifts
dictionary in quadrature.
Mind that peak-shape area uncertainties are only calculated for ions-of- interest, not for the mass calibrant.
References
- 8
San Andrés, Samuel Ayet, et al. “High-resolution, accurate multiple-reflection time-of-flight mass spectrometry for short-lived, exotic nuclei of a few events in their ground and low-lying isomeric states.” Physical Review C 99.6 (2019): 064313.
-
_get_MCMC_par_samples
(fit_result, steps=12000, burn=500, thin=250, show_MCMC_fit_result=False, covar_map_fname=None, n_cores=- 1, MCMC_seed=1364)[source]¶ Map out parameter covariances and posterior distributions using Markov-chain Monte Carlo (MCMC) sampling
This method is intended for internal usage and for single peaks only.
- Parameters
fit_result (
lmfit.model.ModelResult
) – Fit result of region explore with MCMC sampling. since emcee only efficiently samples unimodal distributions, fit_result should hold the result of a single-peak fit (typically of the peak-shape calibrant). The MCMC walkers are initialized with randomized parameter values drawn from normal distributions whose mean and standard deviation are given by the respective best-fit values and uncertainties stored in fit_result.steps (int, optional) – Number of MCMC sampling steps.
burn (int, optional) – Number of initial sampling steps to discard (“burn-in” phase).
thin (int, optional) – After sampling, only every thin-th sample is used for further treatment. It is recommended to set thin to at least half the autocorrelation time.
show_MCMC_fit_result (bool, optional, default: False) – If True, a maximum likelihood estimate is derived from the MCMC samples with best-fit values estimated by the median of the samples. The MCMC MLE result can be compared to the conventional fit_result as an additional crosscheck.
covar_map_fname (str or None (default), optional) – If not None, the parameter covariance map will be saved as “<covar_map_fname>_covar_map.png”.
n_cores (int, optional, default: -1) – Number of CPU cores to use for parallelized sampling. If -1 (default) all available cores will be used.
MCMC_seed (int, optional) – Random state for reproducible sampling.
- Yields
MCMC results saved in result_emcee attribute of fit_result.
Notes
Markov-Chain Monte Carlo (MCMC) algorithms are a powerful tool to efficiently sample the posterior probability density functions (PDFs) of model parameters. In simpler words: MCMC methods can be used to estimate the distributions of parameter values which are supported by the data. An MCMC algorithm sends out a number of so-called walkers on stochastic walks through the parameter space (in this method the number of MCMC walkers is fixed to 20 times the number of varied parameters). MCMC methods are particularly important in situations where conventional sampling techniques become intractable or inefficient. For MCMC sampling emgfit deploys lmfit’s implementation of the
emcee.EnsembleSampler
from the emcee package 1. Since emcee’sEnsembleSampler
is only optimized for uni-modal probability density functions this method should only be used to explore the parameter space of a single-peak fit.A complication with MCMC methods is that there is usually no rigorous way to prove that the sampling chain has converged to the true PDF. Instead it is at the user’s disgression to decide after how many sampling steps a sufficient amount of convergence is achieved. Gladly, there is a number of heuristic tools that can help in judging convergence. The most common measure of the degree of convergence is the integrated autocorrelation time (tau). If the integrated autocorrelation time shows only small changes over time the MCMC chain can be assumed to be converged. To ensure a sufficient degree of convergence this method will issue a warning whenever the number of performed sampling steps is smaller than 50 times the integrated autocorrelation time of at least one parameter. If this rule of thumb is violated it is strongly advisable to run a longer chain. An additonal aid in judging the performance of the MCMC chain are the provided plots of the MCMC traces. These plots show the paths of all MCMC walkers through parameter space. Dramatic changes of the initial trace envelopes indicate that the chain has not reached a stationary state yet and is still in the so-called “burn-in” phase. Samples in this region are discarded by setting the burn argument to an appropriate number of steps (default burn-in: 500 steps).
Another complication of MCMC algorithms is the fact that nearby samples in a MCMC chain are not indepedent. To reduce correlations between samples MCMC chains are usually “thinned out” by only storing the result of every m-th MCMC iteration. The number of steps after which two samples can be assumed to be uncorrelated/independent (so to say the memory of the chain) is given by the integrated autocorrelation time (tau). To be conservative, emgfit uses a thinning interval of
m = 250
by default and issues a warning whenm < tau
for at least one of the parameters. Since more data is discarded, a larger thinning interval comes with a loss of precision of the posterior PDF estimates. However, a sufficient amount of thinning is still advisable since emgfit’s MC peak-shape error determination (get_MC_peakshape_errors()
) relies on independent parameter samples.As a helpful measure for tuning MCMC chains, emgfit provides a plot of the “acceptance fraction” for each walker, i.e. the fraction of suggested walker steps which were accepted. The developers of emcee’s EnsembleSampler suggest acceptance fractions between 0.2 and 0.5 as a rule of thumb for a well-behaved chain. Acceptance fractions falling short of this for many walkers can indicate poor initialization or a too small number of walkers.
For more details on MCMC see the excellent introductory papers “emcee: The MCMC hammer” 1 and “Data Analysis Recipes: Using Markov Chain Monte Carlo” 2.
References
- 1(1,2)
Foreman-Mackey, Daniel, et al. “emcee: the MCMC hammer.” Publications of the Astronomical Society of the Pacific 125.925 (2013): 306.
- 2
Hogg, David W., and Daniel Foreman-Mackey. “Data analysis recipes: Using markov chain monte carlo.” The Astrophysical Journal Supplement Series 236.1 (2018): 11.
-
static
_plot_df
(df, title='', fig=None, yscale='log', peaks=None, vmarkers=None, thres=None, ymin=None, ylabel='Counts per bin', xmin=None, xmax=None)[source]¶ Plots spectrum data stored in
pandas.DataFrame
df.Intended for internal use.
Optionally with peak markers if: 1. single or multiple x_pos are passed to vmarkers, OR 2. list of peak objects is passed to peaks.
- Parameters
df (
pandas.DataFrame
) – Spectrum data to plot.fig (
matplotlib.pyplot.figure
, optional) – Figure object to plot onto.yscale (str, optional) – Scale of y-axis (
'lin'
for logarithmic,'log'
for logarithmic), defaults to'log'
.peaks (list of
peaks
, optional) – List ofpeaks
to show peak markers for.vmarkers (list of float [u], optional) – List with mass positions [u] to add vertical markers at.
thres (float, optional) – y-level to add horizontal marker at (e.g. for indicating set threshold in peak detection).
ymin (float, optional) – Lower bound of y-range to plot.
ylabel (str, optional) – Custom label for y-axis.
xmin (float [u], optional) – Lower/upper bound of mass range to plot.
xmax (float [u], optional) – Lower/upper bound of mass range to plot.
See also
-
_show_blinded_report
(result)[source]¶ Display fit result with positions of blinded peaks replaced by NaN
-
static
_smooth
(x, window_len=11, window='hanning')[source]¶ Smooth the data for the peak detection.
** Intended for internal use only.**
This method is based on the convolution of a normalized window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the begining and end part of the output signal.
- Parameters
x (numpy.array) – The input data
window_len (odd int, optional) – Length of the smoothing window; must be an odd integer!
window (str, optional) – Type of window from ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’, flat window will produce a moving average smoothing.
- Returns
The smoothed spectrum data.
- Return type
numpy.array
Notes
length(output) != length(input), to correct this: return y[(window_len/2-1):-(window_len/2)] instead of just y.
Method adapted from: https://scipy-cookbook.readthedocs.io/items/SignalSmooth.html
Example
>>> t=linspace(-2,2,0.1) >>> x=sin(t)+randn(len(t))*0.1 >>> y=smooth(x)
-
_update_calibrant_props
(index_mass_calib, fit_result)[source]¶ Determine recalibration factor and update mass calibrant peak properties.
Intended for internal use only.
- Parameters
index_mass_calib (int) – Index of mass calibrant peak.
fit_result (
lmfit.model.ModelResult
) – Fit result to use for calculating the re-calibration factor and for updating the calibrant properties.
-
_update_peak_props
(peaks, fit_result)[source]¶ Update the peak properties using the given ‘fit_result’.
Intended for internal use only.
The values of the mass calibrant will not be changed by this routine.
- Parameters
peaks (list) – List of indeces of peaks to update. (To get peak indeces, see plot markers or consult the peak properties table by calling the
spectrum.show_peak_properties()
method)fit_result (
lmfit.model.ModelResult
) –ModelResult
holding the fit results of all peaks to be updated.
Note
All peaks referenced by the ‘peaks’ argument must belong to the same fit_result. Not necessarily all peaks contained in fit_result will be updated, only the properties of peaks referenced with the peaks argument will be updated.
-
add_peak
(x_pos, species='?', m_AME=None, m_AME_error=None, verbose=True)[source]¶ Manually add a peak to the spectrum’s
peaks
list.The position of the peak must be specified with the x_pos argument. If the peak’s ionic species is provided with the species argument the corresponding AME literature values will be added to the
peak
. Alternatively, user-defined literature values can be provided with the m_AME and m_AME_error arguments. This option is helpful for isomers or in case of very recent measurements that haven’t entered the AME yet.- Parameters
x_pos (float [u]) – Position of peak to be added.
species (str, optional) –
species
label for peak to be added following the :-notation of chemical substances. If assigned,peak.m_AME
,peak.m_AME_error
&peak.extrapolated
are automatically updated with the corresponding AME literature values.m_AME (float [u], optional) – User-defined literature mass for peak to be added. Overwrites pre- existing
peak.m_AME
value.m_AME_error (float [u], optional) – User-defined literature mass uncertainty for peak to be added. Overwrites pre-existing
peak.m_AME_error
.verbose (bool, optional, default:
True
) – IfTrue
, a message is printed after successful peak addition. Intended for internal use only.
See also
Note
Adding a peak will shift the peak_indeces of all peaks at higher masses by
+1
.
-
add_peak_comment
(comment, peak_index=None, x_pos=None, species='?', overwrite=False)[source]¶ Add a comment to a peak.
By default the comment argument will be appended to the end of the current
peak.comment
attribute (if the current comment is ‘-‘ it is overwritten by the comment argument). If overwrite is setTrue
, the currentpeak.comment
is overwritten with the ‘comment’ argument.- Parameters
comment (str) – Comment to add to peak.
peak_index (int, optional) – Index of
peak
to add comment to.x_pos (float [u], optional) –
x_pos
of peak to add comment to (must be specified up to 6thdecimal).
species (str, optional) –
species
of peak to add comment to.overwrite (bool) – If
True
the current peakcomment
will be overwritten by comment, else comment is appended to the end of the current peakcomment
.
Note
The shape and mass calibrant peaks are automatically marked during the shape and mass calibration by inserting the protected flags
'shape calibrant'
,'mass calibrant'
or'shape and mass calibrant'
into their peak comments. When user-defined comments are added to these peaks, it is ensured that the protected flags cannot be overwritten. The above shape and mass calibrant flags should never be added to comments manually by the user!
-
add_spectrum_comment
(comment, overwrite=False)[source]¶ Add a general comment to the spectrum.
By default the comment argument will be appended to the end of the current
spectrum_comment
attribute. If overwrite is set toTrue
the currentspectrum_comment
is overwritten with comment.- Parameters
See also
Notes
The
spectrum_comment
will be included in the output file storing all fit results and can hence be useful to pass on information for the post-processing.If
spectrum_comment
is ‘-‘ (default value) it is always overwritten with comment.
-
assign_species
(species, peak_index=None, x_pos=None)[source]¶ Assign species label(s) to a single peak (or all peaks at once).
If no single peak is selected with peak_index or x_pos, a list with species names for all peaks in the peak list must be passed to species. For already specified or unkown species insert
None
as a placeholder to skip the species assignment for this peak. See Notes and Examples sections below for details on usage.- Parameters
species (str or list of str) – The species name (or list of name strings) to be assigned to the selected peak (or to all peaks). For unkown or already assigned species,
None
should be inserted as placeholder at the corresponding position in the species list.species
names must follow the :-notation of chemical substances.peak_index (int, optional) – Index of single peak to assign species name to.
x_pos (float [u], optional) –
x_pos
of single peak to assign species name to. Must be specified up to 6th decimal.
Notes
Assignment of a single peak species: select peak by specifying peak position x_pos (up to 6th decimal) or peak_index argument (0-based! Check for peak index by calling
show_peak_properties()
method on spectrum object).Assignment of multiple peak species: Nothing should be passed to the ‘peak_index’ and ‘x_pos’ arguments. Instead the user specifies a list of the new species strings to the species argument (if there’s N detected peaks, the list must have length N). Former species assignments can be kept by inserting blanks at the respective position in the species list, otherwise former species assignments are overwritten, also see examples below for usage.
Examples
Assign the peak with peak_index 2 (third-lowest-mass peak) as ‘1Cs133:-1e’, leave all other peaks unchanged:
>>> import emgfit as emg >>> spec = emg.spectrum(<input_file>) # mock code for foregoing data import >>> spec.detect_peaks() # mock code for foregoing peak detection >>> spec.assign_species('1Cs133:-1e',peak_index = 2)
Assign multiple peaks:
>>> import emgfit as emg >>> spec = emg.spectrum(<input_file>) # mock code for foregoing data import >>> spec.detect_peaks() # mock code for foregoing peak detection >>> spec.assign_species(['1Ru102:-1e', '1Pd102:-1e', 'Rh102:-1e?', None,'1Sr83:1F19:-1e', '?'])
This assigns the species of the first, second, third and fourth peak with the respective labels in the specified list and fetches their AME values. The ‘?’ in the
'Rh102:-1e?'
argument indicates a tentative species assignment, literature values will not be calculated for this peak. TheNone
argument leaves the species assignment of the 4th peak unchanged. The'?'
argument overwrites any former species assignments to the highest-mass peak and marks the peak as unidentified.
-
static
bootstrap_spectrum
(df, N_events=None, x_cen=None, x_range=0.02)[source]¶ Create simulated spectrum via bootstrap re-sampling from spectrum df.
- Parameters
df (class:pandas.DataFrame) – Original histogrammed spectrum data to re-sample from.
N_events (int, optional) – Number of events to create via bootstrap re-sampling, defaults to number of events in original DataFrame df.
x_cen (float [u], optional) – Center of mass range to re-sample from. If
None
, re-sample from full mass range of input data df.x_range (float [u], optional) – Width of mass range to re-sample from. Defaults to 0.02 u. x_range is only relevant if a x_cen argument is specified.
- Returns
Histogram with simulated spectrum data from non-parametric bootstrap.
- Return type
-
calc_FWHM_emg
(peak_index, fit_result=None)[source]¶ Calculate the full width at half maximum (FWHM) of a Hyper-EMG fit.
The peak of interest must have previously been fitted with a Hyper-EMG model.
- Parameters
peak_index (int) – Index of peak of interest.
fit_result (
lmfit.model.ModelResult
, optional) – Fit result containing peak of interest. IfNone
(default) the corresponding fit result from the spectrum’sfit_results
list will be fetched.
- Returns
Full width at half maximum of Hyper-EMG fit of peak of interest.
- Return type
-
calc_peak_area
(peak_index, fit_result=None, decimals=2)[source]¶ Calculate the peak area (counts in peak) and its stat. uncertainty.
Area and area error are calculated using the peak’s amplitude parameter amp and the width of the uniform binning of the spectrum. Therefore, the peak must have been fitted beforehand. In the case of overlapping peaks only the counts within the fit component of the specified peak are returned.
Note
This routine assumes the bin width to be uniform across the spectrum. The mass binning of most mass spectra is not perfectly uniform (usually time bins are uniform such that the width of mass bins has a quadratic scaling with mass). However, for isobaric species the quadratic term is usually so small that it can safely be neglected.
- Parameters
peak_index (str) – Index of peak of interest.
fit_result (
lmfit.model.ModelResult
, optional) – Fit result object to use for area calculation. IfNone
(default) use corresponding fit result stored infit_results
list.decimals (int) – Number of decimals of returned output values.
- Returns
List with peak area and area error in format [area, area_error].
- Return type
list of float
-
calc_sigma_emg
(peak_index, fit_result=None)[source]¶ Calculate the standard deviation of a Hyper-EMG peak fit.
The peak of interest must have previously been fitted with a Hyper-EMG model.
- Parameters
peak_index (int) – Index of peak of interest.
fit_result (
lmfit.model.ModelResult
, optional) – Fit result containing peak of interest. IfNone
(default) the corresponding fit result from the spectrum’sfit_results
list will be fetched.
- Returns
Standard deviation of Hyper-EMG fit of peak of interest.
- Return type
-
comp_model
(peaks_to_fit, model='emg22', init_pars=None, vary_shape=False, vary_baseline=True, index_first_peak=None)[source]¶ Create a multi-peak composite model with the specified peak shape.
Primarily intended for internal usage.
- Parameters
peaks_to_fit (list of
peak
) –peaks
to be fitted with composite model.model (str, optional) – Name of fit model to use for all peaks (e.g.
'Gaussian'
,'emg12'
,'emg33'
, … - for full list see Available fit models).init_pars (dict, optional, default:
None
) – Default initial shape parameters for fit model. IfNone
the default parameters defined in thefit_models
module will be used after scaling to the spectrum’smass_number
. For more details and a list of all shape parameters see the Peak-shape calibration article.vary_shape (bool, optional) – If
False
only the amplitude (amp) and Gaussian centroid (mu) model parameters will be varied in the fit. IfTrue
, the shape parameters (sigma, theta,`etas` and taus) will also be varied.vary_baseline (bool, optional) – If
True
a varying uniform baseline will be added to the fit model as varying model parameter c_bkg. IfFalse
, the baseline parameter c_bkg will be kept fixed at 0.index_first_peak (int, optional) – Index of first (lowest mass) peak in fit range, used for enforcing common shape for all peaks.
Notes
The initial amplitude for each peak is estimated by taking the counts in the bin closest to the peak’s
x_pos
and scaling this number with an empirically determined constant and the spectrum’smass_number
.
-
detect_peaks
(thres=0.003, window_len=23, window='blackman', width=2e-05, plot_smoothed_spec=True, plot_2nd_deriv=True, plot_detection_result=True)[source]¶ Perform automatic peak detection.
The peak detection routine uses a scaled second derivative of the spectrum
data
after first applying some smoothing. This enables very sensitive yet robust peak detection. The parameters thres, window_len & width can be used to tune the smoothing and peak detection for maximal sensitivity.- Parameters
thres (float, optional) – Threshold for peak detection in the inverted and scaled second derivative of the smoothed spectrum.
window_len (odd int, optional) – Length of window used for smoothing the spectrum (in no. of bins). Must be an ODD integer.
window (str, optional) – The window function used for smooting the spectrum. Defaults to
'blackman'
. Other options:'flat'
,'hanning'
,'hamming'
,'bartlett'
. See also NumPy window functions.width (float [u], optional) – Minimal FWHM of peaks to be detected. Caution: To achieve maximal sensitivity for overlapping peaks this number might have to be set to less than the peak’s FWHM! In challenging cases use the plot of the scaled inverted second derivative (by setting plot_2nd_deriv to
True
) to ensure that the detection threshold is set properly.plot_smoothed_spec (bool, optional) – If
True
a plot with the original and the smoothed spectrum is shown.plot_2nd_deriv (bool, optional) – If
True
a plot with the scaled, inverted second derivative of the smoothed spectrum is shown.plot_detection_result (bool, optional) – If
True
a plot of the spectrum with markers for the detected peaks is shown.
See also
Notes
For details on the smoothing, see docs of
_smooth()
by calling:>>> help(emgfit.spectrum._smooth)
-
determine_A_stat_emg
(peak_index=None, species='?', x_pos=None, x_range=None, N_spectra=1000, fit_model=None, cost_func='MLE', method='least_squares', fit_kws=None, vary_baseline=True, plot_filename=None)[source]¶ Determine the constant of proprotionality A_stat_emg for calculation of the statistical uncertainties of Hyper-EMG fits.
This method updates the
A_stat_emg
&A_stat_emg_error
spectrum attributes. The former will be used for all subsequent stat. error estimations.This routine must be called AFTER a successful peak-shape calibration and should be called BEFORE the mass re-calibration.
A_stat_emg is determined by evaluating the statistical fluctuations of a representative peak’s centroid as a function of the number of ions in the reference peak. The fluctuations are estimated by fitting a large number of synthetic spectra derived from the experimental data via bootstrap re-sampling. For details see Notes section below.
Specify the peak to use for the bootstrap re-sampling by providing either of the peak_index, species and x_pos arguments. The peak should be well-separated and have decent statistics (typically the peak-shape calibrant is used).
- Parameters
peak_index (int, optional) – Index of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and must be well-separated from other peaks.
species (str, optional) – String with species name of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and be well-separated from other peaks.
x_pos (float [u], optional) – Marker position (
x_pos
spectrum attribute) of representative peak to use for bootstrap re-sampling (typically, the peak-shape calibrant). The peak should have high statistics and be well- separated from other peaks. x_pos must be specified up to the 6th decimal.x_range (float [u], optional) – Mass range around peak centroid over which events will be sampled and fitted. Choose such that no secondary peaks are contained in the mass range! If
None
defaults todefault_fit_range
spectrum attribute.N_spectra (int, optional, default: 1000) – Number of bootstrapped spectra to create at each number of ions.
cost_func (str, optional, default: 'MLE') –
Name of cost function to use for minimization.
If
'chi-square'
, the fit is performed by minimizing Pearson’s chi-squared statistic:\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]If
'MLE'
, a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log-likelihood ratio:\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
For details see Notes section of
peakfit()
method documentation.method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of
lmfit.minimizer.minimize()
.fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in
lmfit.model.Model.fit()
method.vary_baseline (bool, optional, default:
True
) – IfTrue
, the constant background will be fitted with a varying uniform baseline parameter bkg_c (initial value: 0.1). IfFalse
, the baseline parameter bkg_c will be fixed to 0.plot_filename (str, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
Notes
As noted in 9, statistical errors of Hyper-EMG peak centroids obey the following scaling with the number of counts in the peak N_counts:
\[\sigma_{stat} = A_{stat,emg} \frac{FWHM}{\sqrt{N_{counts}}},\]where the constant of proportionality A_stat_emg depends on the specific peak shape. This routine uses the following method to determine A_stat_emg:
N_spectra bootstrapped spectra are created for each of the following total numbers of events: [10,30,100,300,1000,3000,10000,30000].
Each bootstrapped spectrum is fitted and the best fit peak centroids are recorded.
The statistical uncertainties are estimated by taking the sample standard deviations of the recorded peak centroids at each value of N_counts. Since the best-fit peak area can deviate from the true number of re-sampled events in the spectrum, the mean best_fit area at each number of re-sampled events is used to determine N_counts.
A_stat_emg is finally determined by plotting the rel. statistical uncertainty as function of N_counts and fitting it with the above equation.
The resulting value for A_stat_emg will be stored as spectrum attribute and will be used in all subsequent fits to calculate the stat. errors from the number of counts in the peak.
References
- 9
San Andrés, Samuel Ayet, et al. “High-resolution, accurate multiple-reflection time-of-flight mass spectrometry for short-lived, exotic nuclei of a few events in their ground and low-lying isomeric states.” Physical Review C 99.6 (2019): 064313.
-
determine_peak_shape
(index_shape_calib=None, species_shape_calib=None, fit_model='emg22', cost_func='chi-square', init_pars='default', x_fit_cen=None, x_fit_range=None, vary_baseline=True, method='least_squares', fit_kws=None, vary_tail_order=True, show_fit_reports=False, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, plot_filename=None, map_par_covar=False, **MCMC_kwargs)[source]¶ Determine optimal peak-shape parameters by fitting the specified peak-shape calibrant.
If vary_tail_order is
True
(default) an automatic model selection is performed before the calibration of the peak-shape parameters.It is recommended to visually check whether the fit residuals are purely stochastic (as should be the case for a decent model). If this is not the case either the selected model does not describe the data well, the initial parameters lead to poor convergence or there are additional undetected peaks.
- Parameters
index_shape_calib (int, optional) – Index of shape-calibration peak. Preferrable alternative: Specify the shape-calibrant with the species_shape_calib argument.
species_shape_calib (str, optional) – Species name of the shape-calibrant peak in :-notation of chemical substances (e.g.
'K39:-1e'
). Alternatively, the peak to use can be specified with the index_shape_calib argument.fit_model (str, optional, default: 'emg22') – Name of fit model to use for shape calibration (e.g.
'Gaussian'
,'emg12'
,'emg33'
, … - for full list see Available fit models). If the automatic model selection (vary_tail_order=True) fails or is turned off, fit_model will be used for the shape calibration and set as the spectrum’sfit_model
attribute.cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization. It is strongly recommended to use ‘chi-square’-fitting for the peak-shape determination since this yields more robust results for fits with many model parameters as well as more trustworthy parameter uncertainties (important for peak-shape error determinations).
If
'chi-square'
, the fit is performed by minimizing Pearson’s chi-squared statistic:\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]If
'MLE'
, a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
For details see Notes section of
peakfit()
method documentation.init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
If
None
or'default'
(default), the default parameters defined for mass 100 in theemgfit.fit_models
module will be used after re-scaling to the spectrum’smass_number
.To define custom initial values, a parameter dictionary containing all model parameters and their values in the format
{'<param name>':<param_value>,...}
should be passed to init_pars.
Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The mu parameter will be initialized at the peak’s
x_cen
attribute and the initial peak amplitude amp is automatically estimated from the counts at the bin closest to x_cen. If a varying baseline is used in the fit, the baseline parameter bgd_c is always initialized at a value of 0.1.x_fit_cen (float [u], optional) – Center of fit range. If
None
(default), thex_pos
attribute of the shape-calibrant peak is used as x_fit_cen.x_fit_range (float [u], optional) – Mass range to fit. If
None
, defaults to thedefault_fit_range
spectrum attribute.vary_baseline (bool, optional, default:
True
) – IfTrue
, the background will be fitted with a varying uniform baseline parameter bkg_c (initial value: 0.1). IfFalse
, the baseline parameter bkg_c will be fixed to 0.method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of
lmfit.minimizer.minimize()
.fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in
lmfit.model.Model.fit()
method.vary_tail_order (bool, optional) – If
True
(default), before the calibration of the peak-shape parameters an automatized fit model selection is performed. For details on the automatic model selection, see Notes section below. IfFalse
, the specified fit_model argument is used as model for the peak-shape determination.show_fit_reports (bool, optional, default: True) – Whether to print fit reports for the fits in the automatic model selection.
show_plots (bool, optional) – If
True
(default), linear and logarithmic plots of the spectrum and the best fit curve are displayed. For details seespectrum.plot_fit()
.show_peak_markers (bool, optional) – If
True
(default), peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma.
plot_filename (str, optional, default: None) – If not
None
, the plots of the shape-calibration will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.map_par_covar (bool, optional) – If
True
the parameter covariances will be mapped using Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot. This feature is only recommended for single-peak fits.**MCMC_kwargs (optional) – Options to send to
_get_MCMC_par_samples()
. Only relevant when map_par_covar is True.
Notes
Ideally the peak-shape calibration is performed on a well-separated peak with high statistics. If this is not possible, the peak-shape calibration can also be attempted using overlapping peaks since emgfit ensures shared and identical shape parameters for all peaks in a multi- peak fit.
Automatic model selection: When the model selection is activated the routine tries to find the peak shape that minimizes the fit’s chi-squared reduced by successively adding more tails on the right and left. Finally, that fit model is selected which yields the lowest chi-squared reduced without having any of the tail weight parameters eta compatible with zero within 1-sigma uncertainty. The latter models are excluded as is this an indication of overfitting. Models for which the calculation of any eta parameter uncertainty fails are likewise excluded from selection.
-
fit_calibrant
(index_mass_calib=None, species_mass_calib=None, fit_model=None, cost_func='MLE', x_fit_cen=None, x_fit_range=None, vary_baseline=True, method='least_squares', fit_kws=None, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, show_fit_report=True, plot_filename=None)[source]¶ Determine mass re-calibration factor by fitting the selected calibrant peak.
After the mass calibrant has been fitted the recalibration factor and its uncertainty are calculated and saved as the spectrum’s
recal_fac
andrecal_fac_error
attributes.The calibrant peak can either be specified with the index_mass_calib or the species_mass_calib argument.
- Parameters
index_mass_calib (int, optional) – Index of mass calibrant peak.
species_mass_calib (str, optional) – Species of peak to use as mass calibrant.
fit_model (str, optional, default:
'emg22'
) – Name of fit model to use (e.g.'Gaussian'
,'emg12'
,'emg33'
, … - for full list see Available fit models).cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
If
'chi-square'
, the fit is performed by minimizing Pearson’s chi-squared statistic:\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]If
'MLE'
, a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes of
spectrum.peakfit()
for more details.x_fit_cen (float or None, [u], optional) – center of mass range to fit; if None, defaults to marker position (x_pos) of mass calibrant peak
x_fit_range (float [u], optional) – width of mass range to fit; if None, defaults to ‘default_fit_range’ spectrum attribute
vary_baseline (bool, optional, default:
True
) – IfTrue
, the constant background will be fitted with a varying uniform baseline parameter bkg_c (initial value: 0.1). IfFalse
, the baseline parameter bkg_c will be fixed to 0.method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of
lmfit.minimizer.minimize()
.fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in
lmfit.model.Model.fit()
method.show_plots (bool, optional) – If
True
(default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details seespectrum.plot_fit()
.show_peak_markers (bool, optional) – If
True
(default) peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
show_fit_report (bool, optional) – If
True
(default) the fit results are reported.plot_filename (str, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
See also
Notes
The
spectrum.fit_peaks()
method enables the simultaneous fitting of mass calibrant and ions of interest in a single multi-peak fit and can be used as an alternative to this method.After the calibrant fit the
spectrum._eval_peakshape_errors()
method is automatically called to save the absolute calibrant centroid shifts as preparation for subsequent peak-shape error determinations.Since the spectrum has already been coarsely calibrated via the time- resolved calibration in the MR-TOF-MS’s data acquisition software MAc, the recalibration (or precision calibration) factor is usually very close to unity. An error will be raised by the
spectrum._update_calibrant_props()
method ifspectrum.recal_fac
deviates from unity by more than a permille since this causes some implicit approximations for the calculation of the final mass values and their uncertainties to break down.The statistical uncertainty of the peak is calculated via the following relation 10:
For Gaussians the constant of proportionality \(A_{stat}\) is always given by \(A_{stat,G}\) = 0.425. For Hyper-EMG models \(A_{stat}=A_{stat,emg}\) is either set to the default value A_stat_emg_default defined in the
config
module or determined by running thespectrum.determine_A_stat_emg()
method. The latter is usually preferable since this accounts for the specifics of the given peak shape.References
- 10
San Andrés, Samuel Ayet, et al. “High-resolution, accurate multiple-reflection time-of-flight mass spectrometry for short-lived, exotic nuclei of a few events in their ground and low-lying isomeric states.” Physical Review C 99.6 (2019): 064313.
-
fit_peaks
(peak_indeces=[], index_mass_calib=None, species_mass_calib=None, x_fit_cen=None, x_fit_range=None, fit_model=None, cost_func='MLE', method='least_squares', fit_kws=None, init_pars=None, vary_shape=False, vary_baseline=True, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, plot_filename=None, show_fit_report=True, show_shape_err_fits=False)[source]¶ Fit peaks, update peaks properties and show results.
By default, the full mass range and all peaks in the spectrum are fitted. Optionally, only peaks specified with peak_indeces or peaks in the mass range specified with x_fit_cen and x_fit_range are fitted.
Optionally, the mass recalibration can be performed simultaneously with the IOI fit if the mass calibrant is in the fit range and specified with either the index_mass_calib or species_mass_calib arguments. Otherwise a mass recalibration must have been performed upfront.
Before running this method a successful peak-shape calibration must have been performed with
determine_peak_shape()
.- Parameters
peak_indeces (list of int, optional) – List of neighbouring peaks to fit. The fit range will be chosen such that at least a mass range of x_fit_range/2 is included around each peak.
x_fit_cen (float [u], optional) – Center of mass range to fit (only specify if a subset of the spectrum is to be fitted)
x_fit_range (float [u], optional) – Width of mass range to fit. If
None
defaults to:spectrum.default_fit_range
attribute, only specify if subset of spectrum is to be fitted. This argument is only relevant if x_fit_cen is also specified.fit_model (str, optional) – Name of fit model to use (e.g.
'Gaussian'
,'emg12'
,'emg33'
, … - for full list see Available fit models). IfNone
, defaults tofit_model
spectrum attribute.cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
If
'chi-square'
, the fit is performed by minimizing Pearson’s chi-squared statistic:\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)^2}.\]If
'MLE'
, a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log likelihood ratio:\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes of
peakfit()
method for details.method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of
lmfit.minimizer.minimize()
.fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in
lmfit.model.Model.fit()
method.init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
If
None
(default) the parameters from the shape calibration (peak_shape_pars
spectrum attribute) are used.If
'default'
, the default parameters defined for mass 100 in theemgfit.fit_models
module will be used after re-scaling to the spectrum’smass_number
.To define custom initial values a parameter dictionary containing all model parameters and their values in the format
{'<param name>':<param_value>,...}
should be passed to init_pars.
Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The mu parameter will be initialized at the peak’s
x_cen
attribute and the initial peak amplitude amp is automatically estimated from the counts at the bin closest to x_cen. If a varying baseline is used in the fit, the baseline parameter bgd_c is always initialized at a value of 0.1.vary_shape (bool, optional, default:
False
) – IfFalse
peak-shape parameters (sigma, theta,`etas` and taus) are kept fixed at their initial values. IfTrue
the shared shape parameters are varied (ensuring identical shape parameters for all peaks).vary_baseline (bool, optional, default:
True
) – IfTrue
, the constant background will be fitted with a varying uniform baseline parameter bkg_c (initial value: 0.1). IfFalse
, the baseline parameter bkg_c will be fixed to 0.show_plots (bool, optional) – If
True
(default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details seespectrum.plot_fit()
.show_peak_markers (bool, optional) – If
True
(default) peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best-fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
plot_filename (str, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.show_fit_report (bool, optional) – If
True
(default) the detailed lmfit fit report is printed.show_shape_err_fits (bool, optional, default: True) – If
True
, plots of all fits performed for the peak-shape uncertainty evaluation are shown.
Notes
Updates peak properties dataframe with peak properties obtained in fit.
-
get_MC_peakshape_errors
(peak_indeces=[], verbose=True, show_hists=False, show_peak_properties=True, rerun_MCMC_sampling=False, N_samples=1000, n_cores=- 1, seed=872, **MCMC_kwargs)[source]¶ Get peak-shape uncertainties by re-fitting peaks with many different MC-shape-parameter sets
This method provides refined peak-shape uncertainties that account for non-normal distributions and correlations of shape parameters. To that end, the peaks of interest are re-fitted with N_samples different peak-shape parameter sets. For these parameter sets to be representative of all peak shapes supported by the data they are randomly drawn from a larger ensemble of parameter sets obtained from Markov-Chain Monte Carlo (MCMC) sampling on the peak-shape calibrant. The peak-shape uncertainty of the mass values and peak areas are estimated by the obtained RMS deviations from the best-fit values. Finally, the peak properties table is updated with the refined uncertainties.
This method only takes effective mass shifts relative to the calibrant peak into account. For each peak shape the calibrant peak is re-fitted and the respective recalibration factors are used to calculate shifted ion-of-interest masses. Therefore, the mass calibrant must be included in peak_indeces.
When the `peak_indeces` argument is used, it must include the mass calibrant.
- Parameters
peak_indeces (int or list of int, optional) – Indeces of peaks to evaluate MC peak-shape uncertainties for.
verbose (bool, optional) – Whether to print status updates and intermediate results.
show_hists (bool, optional) – If True histograms of the effective mass shifts and peak areas obtained with the MC shape parameter sets are shown. Black vertical lines indicate the best-fit values obtained with
fit_peaks()
.show_peak_properties (bool, optional) – If True the peak properties table including the updated peak-shape uncertainties is shown.
rerun_MCMC_sampling (bool, optional) – When False (default) pre-existing MCMC parameter samples (e.g. obtained with
determine_peak_shape()
) are used. If True or when there’s no pre-existing MCMC samples, the MCMC sampling will be performed by this method.N_samples (int, optional) – Number of different shape parameter sets to use. Defaults to 1000.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
seed (int, optional) – Random seed to use for reproducibility. Defaults to 872.
**MCMC_kwargs – Keyword arguments to send to
_get_MCMC_par_samples()
for control over the MCMC sampling.
Notes
This method relies on a representative sample of all the shape parameter sets which are supported by the data. These shape parameter sets are randomly drawn from a large sample of parameter sets obtained from Markov-Chain Monte Carlo (MCMC) sampling on the peak-shape calibrant. In MCMC sampling so-called walkers are sent on random walks to explore the parameter space. The latter is done with the
_get_MCMC_par_samples()
method. If MCMC sampling has already been performed with the map_par_covar option indetermine_peak_shape()
, these MCMC samples will be used for the MC peak-shape error evaluation. If there is no pre-existing MCMC parameter sets the_get_MCMC_par_samples()
method will be automatically evoked before the MC peak-shape error evaluation.Assuming that the samples obtained with the MCMC algorithm form a representative set of parameter samples and are sufficiently independent from each other, this method provides refined peak-shape uncertainties that account for correlations and non-normal posterior distributions of peak-shape parameters. In particular, this prevents overestimation of the uncertainties due to non-consideration of parameter correlations.
For this method to be accurate a sufficiently large number of MCMC sampling steps should be performed and fits should be performed with a large number of parameter sets (
N_samples >= 1000
). For the MCMC parameter samples to be independent a sufficient amount of thinning has to be applied to remove autocorrelation between MCMC samples. Thinning refers to the common practice of only storing the results of every k-th MCMC iteration. The length and thinning of the MCMC chain is controlled with the steps and thin MCMC keyword arguments. For more details and references on MCMC sampling with emgfit see the docs of the underlying_get_MCMC_par_samples()
method.For the peak-shape mass errors only effective mass shifts relative to the calibrant centroid are relevant. Therefore the mass calibrant and the ions of interest (IOI) are fitted with the same peak-shape-parameter sets and the final mass values are calculated with the obtained IOI centroids and the respective mass recalibration factors.
This method only takes effective mass shifts relative to the calibrant peak into account. For each peak shape the calibrant peak is re-fitted and the respective recalibration factors are used to calculate the shifted ion-of-interest masses.
-
get_errors_from_resampling
(peak_indeces=[], N_spectra=1000, n_cores=- 1, show_hists=False, show_peak_properties=True)[source]¶ Get statistical and area uncertainties via resampling from best-fit PDF and update peak properties therewith.
This method provides bootstrap estimates of the statistical errors and peak area errors by evaluating the scatter of peak centroids and areas in fits of many simulated spectra. The simulated spectra are created by sampling events from the best-fit PDF asociated with fit_result (parametric bootstrap). Refined errors are calculated for each peak individually by taking the sample standard deviations of the obtained peak centroids and areas.
If the peaks in peak_indeces have been fitted separately a parametric bootstrap will be performed for each of the different fits.
- Parameters
peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. and area errors for, e.g. to evaluate peak-shape error of peaks 1 and 2 use
peak_indeces=[1,2]
. Defaults to all peaks in the spectrum’speaks
list.N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which typically yields statistical uncertainty estimates with a relative precision of a few percent.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If True, histograms of the obtained peak centroids and areas are shown. Black vertical lines indicate the best-fit values obtained from the measured data.
show_peak_properties (bool, optional, default: True) – If True, the peak properties table is shown after updating the statistical and area errors.
See also
Notes
Only the statistical mass and area uncertainties of ion-of-interest peaks are updated. The uncertainties of the mass calibrant and the recalibration uncertainty remain unaffected by this method.
-
load_peak_shape_cal
(filename)[source]¶ Load peak shape from the TXT-file named ‘filename.txt’.
Successfully loaded shape calibration parameters and their uncertainties are used as the new
shape_cal_pars
andshape_cal_errors
spectrum attributes respectively.- Parameters
filename (str) – Name of input file (‘.txt’ extension is automatically appended).
-
parametric_bootstrap
(fit_result, peak_indeces=[], N_spectra=1000, n_cores=- 1, show_hists=False)[source]¶ Get statistical and area uncertainties via resampling from best-fit PDF.
This method is primarily for internal usage.
This method provides bootstrap estimates of the statistical errors and peak area errors by evaluating the scatter of peak centroids and areas in fits of many simulated spectra. The simulated spectra are created by sampling events from the best-fit PDF asociated with fit_result (parametric bootstrap). Refined errors are calculated for each peak individually by taking the sample standard deviations of the obtained peak centroids and areas.
All peaks for which refined errors are to be evaluated must belong to the same lmfit ModelResult `fit_result`. Even if refined stat. errors are only to be extracted for a subset of the peaks contained in `fit_result` (as specified with `peak_indeces`), fits will be re-performed over the same x-range as `fit_result`.
- Parameters
fit_result (
lmfit.model.ModelResult
) – Fit result object to evaluate statistical errors for.peak_indeces (list, optional) – List containing indeces of peaks to determine refined stat. errors for, e.g. to evaluate peak-shape error of peaks 1 and 2 use
peak_indeces=[1,2]
. Listed peaks must be included in fit_result. Defaults to all peaks contained in fit_result.N_spectra (int, optional) – Number of simulated spectra to fit. Defaults to 1000, which typically yields statistical uncertainty estimates with a relative precision of a few percent.
n_cores (int, optional) – Number of CPU cores to use for parallelized fitting of simulated spectra. When set to -1 (default) all available cores are used.
show_hists (bool, optional, default: False) – If
True
, histograms of the obtained peak centroids and areas are shown. Black vertical lines indicate the best-fit values obtained from the measured data.
- Returns
Array with statistical errors [u], array with area errors [u] Array elements correspond to the results for the peaks selected in peak_indeces (in ascending order). If peak_indeces has not been specified it defaults to the indeces of all peaks contained in fit_result.
- Return type
See also
-
peakfit
(fit_model='emg22', cost_func='chi-square', x_fit_cen=None, x_fit_range=None, init_pars=None, vary_shape=False, vary_baseline=True, method='least_squares', fit_kws=None, show_plots=True, show_peak_markers=True, sigmas_of_conf_band=0, plot_filename=None, map_par_covar=False, **MCMC_kwargs)[source]¶ Internal routine for fitting peaks.
Fits full spectrum or subrange (if x_fit_cen and x_fit_range are specified) and optionally shows results.
This method is for internal usage. Use :meth:`spectrum.fit_peaks` method to fit peaks and automatically update peak properties dataframe with obtained fit results!
- Parameters
fit_model (str, optional, default:
'emg22'
) – Name of fit model to use (e.g.'Gaussian'
,'emg12'
,'emg33'
, … - for full list see Available fit models).cost_func (str, optional, default: 'chi-square') –
Name of cost function to use for minimization.
If
'chi-square'
, the fit is performed by minimizing Pearson’s chi-squared statistic:\[\chi^2_P = \sum_i \frac{(f(x_i) - y_i)^2}{f(x_i)}.\]If
'MLE'
, a binned maximum likelihood estimation is performed by minimizing the (doubled) negative log-likelihood ratio:\[L = 2\sum_i \left[ f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right)\right]\]
See Notes below for details.
x_fit_cen (float [u], optional) – Center of mass range to fit (only specify if subset of spectrum is to be fitted).
x_fit_range (float [u], optional) – Width of mass range to fit (only specify if subset of spectrum is to be fitted, only relevant if x_fit_cen is likewise specified). If
None
, defaults todefault_fit_range
spectrum attribute.init_pars (dict, optional) –
Dictionary with initial shape parameter values for fit (optional).
If
None
(default) the parameters from the shape calibration are used (if no shape calibration has been performed yet the default parameters defined for mass 100 in theemgfit.fit_models
module will be used after re-scaling to the spectrum’smass_number
).If
'default'
, the default parameters defined for mass 100 in theemgfit.fit_models
module will be used after re-scaling to the spectrum’smass_number
.To define custom initial values a parameter dictionary containing all model parameters and their values in the format
{'<param name>':<param_value>,...}
should be passed to init_pars. Mind that only the initial values to shape parameters (sigma, theta,`etas` and taus) can be user-defined. The mu parameter will be initialized at the peak’sx_cen
attribute and the initial peak amplitude amp is automatically estimated from the counts at the bin closest to x_cen. If a varying baseline is used in the fit, the baseline parameter bgd_c is always initialized at a value of 0.1.
vary_shape (bool, optional, default:
False
) – IfFalse
peak-shape parameters (sigma, theta,`etas` and taus) are kept fixed at their initial values. IfTrue
the shared shape parameters are varied (ensuring identical shape parameters for all peaks).vary_baseline (bool, optional, default:
True
) – IfTrue
, the constant background will be fitted with a varying uniform baseline parameter bkg_c (initial value: 0.1). IfFalse
, the baseline parameter bkg_c will be fixed to 0.method (str, optional, default: ‘least_squares’) – Name of minimization algorithm to use. For full list of options check arguments of
lmfit.minimizer.minimize()
.fit_kws (dict, optional, default: None) – Options to pass to lmfit minimizer used in
lmfit.model.Model.fit()
method.show_plots (bool, optional) – If
True
(default) linear and logarithmic plots of the spectrum with the best fit curve are displayed. For details seespectrum.plot_fit()
.show_peak_markers (bool, optional) – If
True
(default) peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Confidence level of confidence band around best fit curve in sigma. Note that the confidence band is only derived from the uncertainties of the parameters that are varied during the fit.
plot_filename (str, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ and ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.map_par_covar (bool, optional) – If
True
the parameter covariances will be mapped using Markov-Chain Monte Carlo (MCMC) sampling and shown in a corner plot. This feature is only recommended for single-peak fits.**MCMC_kwargs (optional) – Options to send to
_get_MCMC_par_samples()
. Only relevant when map_par_covar is True.
- Returns
Fit model result.
- Return type
See also
Notes
In fits with the
chi-square
cost function the variance weights \(w_i\) for the residuals are estimated using the latest model predictions: \(w_i = 1/(\sigma_i^2 + \epsilon) = 1/(f(x_i)+ \epsilon)\), where \(\epsilon = 1e-10\) is a small number added to increase numerical robustness when \(f(x_i)\) approaches zero. On each iteration the weights are updated with the new values of the model function.When performing
MLE
fits including bins with low statistics the value for chi-squared as well as the parameter standard errors and correlations in the lmfit fit report should be taken with caution. This is because strictly speaking emgfit’sMLE
cost function only approximates a chi-squared distribution in the limit of a large number of counts in every bin (“Wick’s theorem”). For a detailed derivation of this statement see pp. 94-95 of these lecture slides by Mark Thompson. In practice and if needed, one can simply test the validity of the reported fit statistic as well as parameter standard errors & correlations by re-performing the same fit with cost_func=’chi-square’ and comparing the results. In all tested cases decent agreement was found even if the fit range contained low-statistics bins. Even if a deviation occurs this is irrelevant in most pratical cases since the mass errors reported in emgfit’s peak properties table are independent of the lmfit parameter standard errors given as additional information below. Only the peak area errors are by default calculated using the standard errors of the amp parameters reported by lmfit.Besides the asymptotic concergence to a chi-squared distribution emgfit’s
MLE
cost function has a second handy property - all summands in the log-likelihood ratio are positive semi-definite: \(L_i = f(x_i) - y_i + y_i ln\left(\frac{y_i}{f(x_i)}\right) \geq 0\). Exploiting this property, the minimization of the log-likelihood ratio can be re-formulated into a least-squares problem (see also 11):\[L = 2\sum_i L_i = 2\sum_i \left( \sqrt{ L_i } \right)^2.\]Instead of minimizing the scalar log-likelihood ratio, emgfit by default minimizes the sum-of-squares of the square-roots of the summands \(L_i\) in the log-likelihood ratio. This is mathematically equivalent to minimizing \(L\) and facilitates the usage of Scipy’s highly efficient least-squares optimizers (‘least_squares’ & ‘leastsq’). The latter yield significant speed-ups compared to scalar optimizers such as Scipy’s ‘Nelder-Mead’ or ‘Powell’ methods. By default, emgfit’s ‘MLE’ fits are performed with Scipy’s ‘least_squares’ optimizer, a variant of a Levenberg-Marquardt algorithm for bound-constrained problems. If a scalar optimizaton method is chosen emgfit uses the conventional approach of minimizing the scalar \(L\). For more details on these optimizers see the docs of
lmfit.minimizer.minimize()
andscipy.optimize
.References
- 11
Ross, G. J. S. “Least squares optimisation of general log-likelihood functions and estimation of separable linear parameters.” COMPSTAT 1982 5th Symposium held at Toulouse 1982. Physica, Heidelberg, 1982.
-
plot
(peaks=None, title='', fig=None, yscale='log', vmarkers=None, thres=None, ymin=None, xmin=None, xmax=None)[source]¶ Plot mass spectrum (without fit curve).
Vertical markers are added for all peaks specified with peaks.
- Parameters
peaks (list of
peaks
, optional) – List ofpeaks
to show peak markers for. Defaults topeaks
.title (str, optional) – Optional plot title.
fig (
matplotlib.pyplot.figure
, optional) – Figure object to plot onto.yscale (str, optional) – Scale of y-axis (
'lin'
for logarithmic,'log'
for logarithmic), defaults to'log'
.vmarkers (list of float [u], optional) – List with mass positions [u] to add vertical markers at.
thres (float, optional) – y-level to add horizontal marker at (e.g. for indicating set threshold in peak detection).
ymin (float, optional) – Lower bound of y-range to plot.
xmin (float [u], optional) – Lower/upper bound of mass range to plot.
xmax (float [u], optional) – Lower/upper bound of mass range to plot.
See also
-
plot_fit
(fit_result=None, plot_title=None, show_peak_markers=True, sigmas_of_conf_band=0, x_min=None, x_max=None, plot_filename=None)[source]¶ Plot data and fit result in logarithmic and linear y-scale.
Only a single fit result can be plotted with this method. If neither fit_result nor x_min and x_max are specified, the full mass range is plotted.
Plots can be saved to a file using the plot_filename argument.
- Parameters
fit_result (
lmfit.model.ModelResult
, optional, default:None
) – Fit result to plot. IfNone
, defaults to fit result of first peak in specified mass range (taken fromfit_results
list).plot_title (str or None, optional) – Title of plots. If
None
, defaults to a string with the fit model name and cost function of the fit_result to ensure clear indication of how the fit was obtained.show_peak_markers (bool, optional, default:
True
) – IfTrue
, peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in log-plot). If
0
, no confidence band is shown (default).x_min (float [u], optional) – Start and end of mass range to plot. If
None
, defaults to the minimum and maximum of the spectrum’s massdata
.x_max (float [u], optional) – Start and end of mass range to plot. If
None
, defaults to the minimum and maximum of the spectrum’s massdata
.plot_filename (str, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
-
plot_fit_zoom
(peak_indeces=None, x_center=None, x_range=0.01, plot_title=None, show_peak_markers=True, sigmas_of_conf_band=0, plot_filename=None)[source]¶ Show logarithmic and linear plots of data and fit curve zoomed to peaks or mass range of interest.
There is two alternatives to define the plots’ mass ranges:
Specifying peaks-of-interest with the peak_indeces argument. The mass range is then automatically chosen to include all peaks of interest. The minimal mass range to include around each peak of interest can be adjusted using x_range.
Specifying a mass range of interest with the x_center and x_range arguments.
- Parameters
peak_indeces (int or list of ints, optional) – Index of single peak or indeces of multiple neighboring peaks to show (peaks must belong to the same
fit_result
).x_center (float [u], optional) – Center of manually specified mass range to plot.
x_range (float [u], optional, default: 0.01) – Width of mass range to plot around ‘x_center’ or minimal mass range to include around each specified peak of interest.
plot_title (str or None, optional) – Title of plots. If
None
, defaults to a string with the fit model name and cost function of the fit_result to ensure clear indication of how the fit was obtained.show_peak_markers (bool, optional, default:
True
) – IfTrue
, peak markers are added to the plots.sigmas_of_conf_band (int, optional, default: 0) – Coverage probability of confidence band in sigma (only shown in log-plot). If
0
, no confidence band is shown (default).x_min (float [u], optional) – Start and end of mass range to plot. If
None
, defaults to the minimum and maximum of the spectrum’s massdata
is used.x_max (float [u], optional) – Start and end of mass range to plot. If
None
, defaults to the minimum and maximum of the spectrum’s massdata
is used.plot_filename (str or None, optional, default: None) – If not
None
, the plots will be saved to two separate files named ‘<plot_filename>_log_plot.png’ & ‘<plot_filename>_lin_plot.png’. Caution: Existing files with identical name are overwritten.
-
remove_peak
(peak_index=None, x_pos=None, species='?')[source]¶ Remove specified peak from the spectrum’s
peaks
list.Select the peak to be removed by specifying either the respective peak_index, species label or peak marker position x_pos.
- Parameters
peak_index (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s
peaks
list (0-based!).x_pos (float or list of float [u]) –
x_pos
of peak(s) to remove from the spectrum’speaks
list. Peak marker positions must be specified up to the 6th decimal.species (str or list of str) –
species
label(s) of peak(s) to remove from the spectrum’speaks
list.
Note
This method is deprecated in v0.1.1 and will likely be removed in future versions, use :meth:`~spectrum.remove_peaks` instead!
-
remove_peaks
(peak_indeces=None, x_pos=None, species='?', verbose=True)[source]¶ Remove specified peak(s) from the spectrum’s
peaks
list.Select the peak(s) to be removed by specifying either the respective peak_indeces, species label(s) or peak marker position(s) x_pos. To remove multiple peaks at once, pass a list to one of the above arguments.
- Parameters
peak_indeces (int or list of int, optional) – Indeces of peak(s) to remove from the spectrum’s
peaks
list (0-based!).x_pos (float or list of float [u]) –
x_pos
of peak(s) to remove from the spectrum’speaks
list. Peak marker positions must be specified up to the 6th decimal.species (str or list of str) –
species
label(s) of peak(s) to remove from the spectrum’speaks
list.
Notes
The current
peaks
list can be viewed by calling theshow_peak_properties()
spectrum method.Added in version 0.2.0 (as successor method to remove_peak)
-
save_peak_shape_cal
(filename)[source]¶ Save peak shape parameters to a TXT-file.
- Parameters
filename (str) – Name of output file (‘.txt’ extension is automatically appended).
-
save_results
(filename, save_plots=True)[source]¶ Write the fit results to a XLSX file and the peak-shape calibration to a TXT file.
Write results to an XLSX Excel file named <filename>_results.xlsx and save peak-shape calibration parameters to TXT file named <filename>_peakshape_calib.txt.
The EXCEL file contains the following three worksheets:
general spectrum properties
peak properties and images of all obtained fit curves
results of the regular peakshape-error evaluation in which shape parameters are varied by +-1 sigma
By default, PNG images of all peak fits are saved to PNG-images in both linear and logarithmic scale.
- Parameters
filename (string) – Prefix of the files to be saved to (any provided file extensions are automatically removed and the necessary .xlsx & .txt extensions are appended).
save_plots (bool, optional) – Whether to save images of all obtained fit curves to separate PNG files (default: True).
-
set_blinded_peaks
(indeces, overwrite=False)[source]¶ Specify for which peaks mass values will be hidden for blind analysis
This method adds peaks to the spectrum’s list of
blinded_peaks
. For these peaks, the obtained mass values in the peak properties table and the peak position parameters mu in fit reports will be hidden. Literature values and mass uncertainties remain visible. All results are unblinded upon export with thesave_results()
method.- Parameters
indeces (int or list of int) – Indeces of peaks of interest whose obtained mass values are to be blinded.
overwrite (bool, optional, default: False) – If False (default), the specified indeces are added to the
blinded_peaks
list. If True, the currentblinded_peaks
list is replaced by the specified indeces.
Examples
Activate blinding for peaks 0 & 3 of spectrum object spec:
>>> spec.set_blinded_peaks([0,3])
Add peak 3 to list of blinded peaks:
>>> spec.set_blinded_peaks([3])
Turn off blinding by resetting the blinded peaks attribute to an empty list:
>>> spec.set_blinded_peaks([], overwrite=True)