finchnmr package#

Submodules#

finchnmr.analysis module#

Tools to analyze models.

Authors: Nathan A. Mahynski

class finchnmr.analysis.Analysis(model: _Model)[source]#

Bases: object

Set of analysis methods for analyzing fitted models.

build_residual() Substance[source]#

Create a substance whose spectrum is comprised of the residual (true spectrum - model).

Returns:

residual – Artificial substance whose spectrum is the residual.

Return type:

substance.Substance

get_top_substances(k: int = 5, index: bool = False) tuple[Union[list['substance.Substance'], list[int]], list[float]][source]#

Retrieve the most important substances to the model.

Parameters:
  • k (int, optional(default=5)) – Number of most important spectra to retrieve. If -1 then get them all.

  • index (bool, optional(default=False)) – If True, then return list of substance indices in the model library not the substance itself.

Returns:

  • top_substances (list(Substance) of list(int)) – The most important substances, sorted from highest to lowest by the absolute value of their importance. If index=True then this is a list of integers corresponding to the index of the substance in the model’s library.

  • top_importances (list(float)) – Importance of each substance, sorted from highest to lowest by the absolute value of their importance.

plot_residual(**kwargs)[source]#

Plot the residual (target - reconstructed) spectrum.

An artificial substance is created representing the residual (see build_residual). This is what is plotted, so it may be manipulated accordingly. Refer to the kwargs in substance.Substance.plot.

Parameters:

kwargs (dict, optional(default=None)) – Keyword arguments for substance.Substance.plot.

Returns:

  • By default, or if kwargs[‘backend’] == ‘mpl’ in kwargs

    imagematplotlib.image.AxesImage

    HSQC NMR resdual spectrum as an image.

    colorbarmatplotlib.colorbar.Colorbar

    Colorbar to go with the image.

  • if kwargs[‘backend’] == ‘plotly’

    imageplotly.graph_objs._figure.Figure

    HSQC NMR spectrum as an image.

Example

>>> a = Analysis(...)
>>> a.plot_residual(absolute_values=True, backend='mpl')
>>> a.plot_residual(absolute_values=True, backend='plotly', cmap='viridis')
plot_top_importances(k: int = 5, by_name: bool = False, figsize: tuple[int, int] | None = None, backend: str = 'mpl')[source]#

Plot the importances of the top substances in the model.

Parameters:
  • k (int, optional(default=5)) – Number of top importances to plot. If -1 then plot them all.

  • by_name (bool, optional(default=False)) – HSQC NMR spectra will given by integer index in the library by default; if True, the use the associated substance name instead.

  • figsize (tuple(int, int), optional(default=None))) – Size of final figure.

  • backend (str, optional(default='mpl')) – Plotting library to use; the default ‘mpl’ uses matplotlib and is not interactive, while ‘plotly’ will yield interactive plots.

Returns:

  • if backend == ‘mpl’

    axesmatplotlib.pyplot.Axes

    Horizontal bar chart the importances are plotted on in descending order.

  • if backend == ‘plotly’

    figureplotly.graph_objs._figure.Figure

    Horizontal bar chart the importances are plotted on in descending order.

plot_top_spectra(k: int = 5, plot_width: int = 3, figsize: tuple[int, int] | None = (10, 5)) ndarray[Any, dtype[_ScalarType_co]][source]#

Plot the HSQC NMR spectra that are the most importance to the model using matplotlib.

To visualize these results using another plotting backend, such as plotly, use .get_top_substances and create subplots as desired.

Parameters:
  • k (int, optional(default=5)) – Number of most important spectra to plot. If -1 then plot them all.

  • plot_width (int, optional(default=3)) – Number of subplots the grid will have along its width.

  • figsize (tuple(int, int), optional(default=(10,5))) – Size of final figure.

Returns:

axes – Flattened array of axes on which the top HSQC NMR spectra are plotted.

Return type:

ndarray(matplotlib.pyplot.Axes, ndim=1)

finchnmr.analysis.plot_stacked_importances(optimized_models: list[Any], figsize: tuple[int, int] | None = None, backend: str = 'mpl', **imshow_kwargs: Any)[source]#

Plot the importance values in list of models.

Parameters:
  • optimized_models (list) – List of fitted models (see optimize_models).

  • figsize (tuple(int, int), optional(default=None)) – Figure size; this is currently only supported for the matplotlib backend.

  • backend (str, optional(default='mpl')) – Plotting library to use; the default ‘mpl’ uses matplotlib and is not interactive, while ‘plotly’ will yield interactive plots.

  • imshow_kwargs (dict, optional(default=None)) – Additional keyword arguments for {backend}.imshow() function; e.g., “cmap” or “color_continuous_scale”.

Returns:

  • if backend == ‘mpl’

    imagematplotlib.image.AxesImage

    Feature importances as an image of a grid where each column corresponds to a different model and each row to a different feature (in the unrolled HSQC NMR spectrum).

    colorbarmatplotlib.pyplot.colorbar

    Colorbar to go with the image.

  • if backend == ‘plotly’

    imageplotly.graph_objs._figure.Figure

    Feature importances as an image of a grid where each column corresponds to a different model and each row to a different feature (in the unrolled HSQC NMR spectrum).

Example

>>> optimized_models, analyses = finchnmr.model.optimize_models(...)
>>> plot_stacked_importances(optimized_models, backend='mpl', cmap='RdBu')
>>> plot_stacked_importances(optimized_models, backend='plotly', color_continuous_scale='RdBu')

finchnmr.library module#

Functions for defining a library of substances measured with HSQC NMR.

Authors: Nathan A. Mahynski, David A. Sheen

class finchnmr.library.Library(substances: list['substance.Substance'])[source]#

Bases: object

Library of substances for fitting new unknowns.

property X: ndarray[Any, dtype[floating]]#

Return a copy of the data in the library.

Returns:

X – This data is arranged in a 2D array, where each column is the flattened HSQC NMR spectrum of a different substance (row). The ordering follows that with which the library was instantiated.

Return type:

ndarray(float, ndim=2)

Example

>>> L = finchnmr.library.Library(substances=substances)
>>> L.fit(substance=new_compound)
>>> L.X
fit(reference: Substance) Library[source]#

Align all substances to another one which serves as a reference.

Parameters:

reference (Substance) – Substance to align all substances in the library with (match extent, etc.).

Return type:

self

is_fitted_: ClassVar[bool]#
save(filename: str) None[source]#

Pickle library to a file.

Parameters:

filename (str) – Filename to write to.

substance_by_index(idx: int) Substance[source]#

Retrieve a substance from the library by index.

Parameters:

idx (int) – Index of the substance in the library.

Returns:

substance – Desired substance.

Return type:

Substance

substance_by_name(name: str) Substance[source]#

Retrieve a substance from the library by name.

Parameters:

name (str) – Name of the substance in the library.

Returns:

substance – Desired substance.

Return type:

Substance

finchnmr.model module#

Tools to build models.

Authors: Nathan A. Mahynski

class finchnmr.model.LASSO(alpha: float = 1.0, precompute: bool = False, copy_X: bool = True, max_iter: int = 10000, tol: float = 0.0001, warm_start: bool = False, random_state: int | None = None, selection: str = 'cyclic')[source]#

Bases: _Model

LASSO model from sklearn.

alpha: ClassVar[float]#
copy_X: ClassVar[bool]#
fit_intercept: ClassVar[bool]#
get_model_params() dict[str, Any][source]#

Return the parameters for an sklearn.linear_model.Lasso model.

importances() ndarray[Any, dtype[floating]][source]#

Return the Lasso model coefficients as importances.

max_iter: ClassVar[int]#
positive: ClassVar[bool]#
precompute: ClassVar[bool]#
random_state: ClassVar[int | None]#
selection: ClassVar[str]#
set_fit_request(*, nmr_library: bool | None | str = '$UNCHANGED$', target: bool | None | str = '$UNCHANGED$') LASSO#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • nmr_library (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for nmr_library parameter in fit.

  • target (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for target parameter in fit.

Returns:

self – The updated object.

Return type:

object

tol: ClassVar[float]#
warm_start: ClassVar[bool]#
finchnmr.model.optimize_models(targets: list['substance.Substance'], nmr_library: Library, nmr_model: _Model, param_grid: dict[str, list], model_kw: dict[str, Any] | None = None) tuple[list['_Model'], list['analysis.Analysis']][source]#

Optimize a model to fit each wild spectra in a list.

All combinations of parameters in param_grid are tested and the best performer is retained.

Parameters:
  • targets (list[Substance]) – Unknown/wild HSQC NMR spectrum to fit with the nmr_library.

  • nmr_library (Library) – Library of HSQC NMR spectra to use for fitting targets.

  • nmr_model (_Model) – Uninstantiated model class to fit the spectra with.

  • param_grid (dict(str, list)) – Dictionary of parameter grid to search over; this follows the same convention as sklearn.model_selection.GridSearchCV.

  • model_kw (dict(str, Any), optional(default=None)) – Default keyword arguments to your model. If None then the nmr_model defaults are used.

Returns:

  • optimized_models (list(_Model)) – List of optimized models fit to each target HSQC NMR spectrum.

  • analyses (list(Analysis)) – List of analysis objects to help visualize and understand each fitted model.

Example

>>> target = finchnmr.substance.Substance(...) # Load target(s)
>>> nmr_library = finchnmr.library.Library(...) # Create library
>>> optimized_models, analyses = finchnmr.model.optimize_models(
...     targets=[target],
...     nmr_library=nmr_library,
...     nmr_model=finchnmr.model.LASSO,
...     param_grid={'alpha': np.logspace(-5, 1, 100)},
... )
>>> analyses[0].plot_top_spectra(k=5)

finchnmr.substance module#

Functions for defining a substance measured with HSQC NMR.

Authors: Nathan A. Mahynski, David A. Sheen

class finchnmr.substance.Substance(pathname: str | None = None, name: str = '', style: str = 'bruker', warning: Literal['default', 'error', 'ignore', 'always', 'all', 'module', 'once'] = 'error')[source]#

Bases: object

Substance that was measured with HSQC NMR.

static bin_spectrum(spec_to_bin: ndarray[Any, dtype[floating]], window_size: int = 4, window_size_y: int | None = None) ndarray[Any, dtype[floating]][source]#

Coarsen HSQC NMR spectrum into discrete histograms.

Parameters:
  • spec_to_bin (ndarray(float, ndim=1)) – Raw HSQC NMR spectrum to bin.

  • window_size (int, optional(default=4)) – How many neighboring bins to sum together during binning. A window_size > 1 will coarsen the spectra.

  • window_size_y (int, optional(default=None)) – Window size to use in the y direction (axes 0) if different from window_size. If None, uses window_size.

Returns:

spectrum – Coarsened HSQC NMR spectrum.

Return type:

ndarray(float, ndim=2)

property data: ndarray[Any, dtype[floating]]#

Return the 2D HSQC NMR spectrum.

property extent: tuple[float, float, float, float]#

Return the bounds of the spectrum.

fit(reference: Substance) Substance[source]#

Align this substance to another one which serves as a reference.

This also transforms the intensities to absolute values.

Parameters:

reference (Substance) – Substance to align this one with this (match extent, etc.).

Returns:

aligned – New Substance which is a version of this one, but is now aligned/matched with reference.

Return type:

Substance

flatten() ndarray[Any, dtype[floating]][source]#

Return a flattened (1D) version of the data.

from_xml(filename: str) None[source]#

Read substance from XML peak list.

property name: str#

Return the name of the substance.

plot(norm: Normalize | None = None, ax: Axes | None = None, cmap='RdBu', absolute_values=False, backend: str = 'mpl', title: str | None = None)[source]#

Plot a single HSQC NMR spectrum.

Parameters:
  • norm (str or matplotlib.colors.Normalize, optional(default=None)) – The normalization method used to scale data to the [0, 1] range before mapping to colors using cmap. If None, a matplotlib.colors.SymLogNorm is used. This is currently only supported for the matplotlib backend.

  • ax (matplotlib.pyplot.Axes, optional(default=None)) – Axes to plot the image on. This is currently only supported for the matplotlib backend.

  • cmap (str, optional(default='RdBu')) – The matplotlib.colors.Colormap instance or registered colormap name used to map scalar data to colors. String names are largely similar between the plotting backends and can usually be used interchangeably.

  • absolute_values (bool, optional(default=False)) – Whether to plot the absolute values of the data (intensities).

  • backend (str, optional(default='mpl')) – Plotting library to use; the default ‘mpl’ uses matplotlib and is not interactive, while ‘plotly’ will yield interactive plots.

  • title (str, optional(default=None)) – Optional title to put on plot; otherwise this defaults to the substance’s name.

Returns:

  • if backend == ‘mpl’

    imagematplotlib.image.AxesImage

    HSQC NMR spectrum as an image.

    colorbarmatplotlib.pyplot.colorbar

    Colorbar to go with the image.

  • if backend == ‘plotly’

    imageplotly.graph_objs._figure.Figure

    HSQC NMR spectrum as an image.

read(pathname: str, name: str = '', style: str = 'bruker', warning: Literal['default', 'error', 'ignore', 'always', 'all', 'module', 'once'] = 'error') None[source]#

Read HSQC NMR spectrum from a directory created by the instrument.

Parameters:
  • pathname (str, optional(default=None)) – Read data from this folder.

  • name (str, optional(default=None)) – Name of the substance, e.g., “octanol”.

  • style (str, optional(default='bruker')) – Manufacturer of NMR instrument which dictates how to extract this information. At the moment only ‘bruker’ is supported.

  • warning (str, optional(default="error")) – How to handle warnings thrown when reading from disk; ‘error’ causes an Exception to be thrown stopping the code, however, if you are confident that the warnings are not relevant, you can set this to ‘default’ to simply report the warnings instead.

Example

>>> s = Substance()
>>> s.read('test_data/my_substance/pdata/1', name='my_substance', style='bruker')
property scale: tuple#

Return the grid points the spectrum is reported on.

unflatten(data: ndarray[Any, dtype[floating]]) ndarray[Any, dtype[floating]][source]#

Unflatten or reshape data back to the original 2D shape.

finchnmr.xml_parser module#

Parse XML files.

Authors: David A. Sheen, Nathan A. Mahynski

finchnmr.xml_parser.parse_peak_file(xml_file: str) DataFrame[source]#

Parse the XML file in a Pandas DataFrame.

Parameters:

xml_file (str) – Name of .xml file to parse.

Returns:

dataframe – DataFrame of NMR features

Return type:

pd.DataFrame

Module contents#

Init for FINCHnmr.

Author: Nathan A. Mahynski