Example API Usage: Analyzing a Batch of Spectra

Example API Usage: Analyzing a Batch of Spectra#

Author: Nathan A. Mahynski

Date: 2024/11/26

Description: Example of how to use FINCHnmr’s API to analyze a batch of spectra, e.g., repeated measurements of the same sample.

Open In Colab

  1. Install FINCHnmr using pip.

[1]:
# pip install finchnmr
[2]:
import finchnmr
from finchnmr import analysis, library, model, substance

import numpy as np

%load_ext autoreload
%autoreload 2
  1. Load an HSQC NMR (1H-13C) dataset to use as a background library. Here we will use a dataset from HuggingFace. If the datasets package is not installed, do so now. In this example we will also use dotenv to load a token to access this dataset.

[3]:
# pip install datasets, load_dotenv
[4]:
import os
from dotenv import load_dotenv
_ = load_dotenv(".env")
HF_TOKEN = os.getenv("HF_TOKEN")
[5]:
from datasets import load_dataset

nmr_dataset = load_dataset(
  "mahynski/bmrb-hsqc-nmr-1H13C",
  split="train",
  token=HF_TOKEN,
  trust_remote_code=True,
)
[6]:
lib = finchnmr.library.Library([
    finchnmr.substance.Substance(
        pathname=d['pathname'],
        name=d['name'],
        warning='ignore'
    ) for d in nmr_dataset
])
  1. Load “unknown” mixtures we would like to identify.

[7]:
# Load a variety of unknowns - this is an example of fish liver
unknown_dataset = load_dataset(
  "mahynski/bmrb-hsqc-nmr-1H13C",
  split="test",
  token=HF_TOKEN,
  trust_remote_code=True,
)

# Collect all the measurements into a list
substances = [
    finchnmr.substance.Substance(
        pathname=d['pathname'],
        name=d['name'],
        warning='ignore'
    ) for d in unknown_dataset
]
  1. Fit a model to the each substance.

[8]:
optimized_models, analyses = finchnmr.model.optimize_models(
    targets=substances,
    nmr_library=lib,
    nmr_model=finchnmr.model.LASSO, # Use a Lasso model to obtain a sparse solution
    param_grid={'alpha': np.logspace(-16, 0, 5)}, # Select a range of alpha values to examine sparsity
    model_kw={'max_iter':1000, 'selection':'cyclic', 'random_state':42, 'tol':0.0001} # These are default, but you can adjust
)
Iterating through targets: 0it [00:00, ?it/s]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.37s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.38s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.31s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.27s/it]
Iterating through targets: 1it [00:07,  7.82s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.51s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.42s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.31s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.21s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.23s/it]
Iterating through targets: 2it [00:15,  7.68s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.61s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.49s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.35s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.26s/it]
Iterating through targets: 3it [00:23,  7.79s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.64s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.49s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.35s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.26s/it]
Iterating through targets: 4it [00:31,  7.82s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:07,  1.82s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.63s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.42s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.28s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.32s/it]
Iterating through targets: 5it [00:39,  8.04s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.51s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.39s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.29s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.20s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.22s/it]
Iterating through targets: 6it [00:47,  7.88s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.42s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:03,  1.32s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.25s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:04<00:01,  1.17s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:05<00:00,  1.18s/it]
Iterating through targets: 7it [00:54,  7.70s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:07,  1.85s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.62s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.42s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.27s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.31s/it]
Iterating through targets: 8it [01:02,  7.90s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.40s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.34s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.26s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:04<00:01,  1.18s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:05<00:00,  1.19s/it]
Iterating through targets: 9it [01:10,  7.75s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:07,  1.94s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:05,  1.74s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:03,  1.53s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.34s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.38s/it]
Iterating through targets: 10it [01:19,  8.10s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.48s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.37s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.27s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:04<00:01,  1.18s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.20s/it]
Iterating through targets: 11it [01:26,  7.91s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.64s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.55s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.39s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.24s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.27s/it]
Iterating through targets: 12it [01:34,  7.93s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.49s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.39s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:03<00:02,  1.29s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.20s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.22s/it]
Iterating through targets: 13it [01:42,  7.83s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.74s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.57s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.39s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.25s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.29s/it]
Iterating through targets: 14it [01:50,  7.93s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:06,  1.71s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:03<00:04,  1.55s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.39s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.25s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.29s/it]
Iterating through targets: 15it [01:58,  8.00s/it]
Iterating through parameter sets:   0%|                                                           | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets:  20%|██████████▏                                        | 1/5 [00:01<00:05,  1.47s/it]
Iterating through parameter sets:  40%|████████████████████▍                              | 2/5 [00:02<00:04,  1.39s/it]
Iterating through parameter sets:  60%|██████████████████████████████▌                    | 3/5 [00:04<00:02,  1.30s/it]
Iterating through parameter sets:  80%|████████████████████████████████████████▊          | 4/5 [00:05<00:01,  1.21s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.22s/it]
Iterating through targets: 16it [02:06,  7.88s/it]
[9]:
# Interactively visualize which substances were considered the most important to each model
analysis.plot_stacked_importances(optimized_models, backend='plotly', color_continuous_scale='viridis')
[10]:
# Alternatively, we can take the first sample as representative and look the the most important substances in library
analyses[0].plot_top_importances(k=5, by_name=False, backend='plotly')
[11]:
# Identify the name of those substances
lib.substance_by_index(98).name, lib.substance_by_index(9).name, lib.substance_by_index(146).name
[11]:
('D-(+)-Maltose', "2'-Deoxyuridine", 'L-Glutamine')
[12]:
# Alternatively
analyses[0].plot_top_importances(k=5, by_name=True, backend='mpl')
[12]:
<Axes: xlabel='Importance'>
../_images/jupyter_example_batch_16_1.png