Example API Usage: Analyzing a Batch of Spectra#
Author: Nathan A. Mahynski
Date: 2024/11/26
Description: Example of how to use FINCHnmr’s API to analyze a batch of spectra, e.g., repeated measurements of the same sample.
Install FINCHnmr using pip.
[1]:
# pip install finchnmr
[2]:
import finchnmr
from finchnmr import analysis, library, model, substance
import numpy as np
%load_ext autoreload
%autoreload 2
Load an HSQC NMR (1H-13C) dataset to use as a background library. Here we will use a dataset from HuggingFace. If the datasets package is not installed, do so now. In this example we will also use dotenv to load a token to access this dataset.
[3]:
# pip install datasets, load_dotenv
[4]:
import os
from dotenv import load_dotenv
_ = load_dotenv(".env")
HF_TOKEN = os.getenv("HF_TOKEN")
[5]:
from datasets import load_dataset
nmr_dataset = load_dataset(
"mahynski/bmrb-hsqc-nmr-1H13C",
split="train",
token=HF_TOKEN,
trust_remote_code=True,
)
[6]:
lib = finchnmr.library.Library([
finchnmr.substance.Substance(
pathname=d['pathname'],
name=d['name'],
warning='ignore'
) for d in nmr_dataset
])
Load “unknown” mixtures we would like to identify.
[7]:
# Load a variety of unknowns - this is an example of fish liver
unknown_dataset = load_dataset(
"mahynski/bmrb-hsqc-nmr-1H13C",
split="test",
token=HF_TOKEN,
trust_remote_code=True,
)
# Collect all the measurements into a list
substances = [
finchnmr.substance.Substance(
pathname=d['pathname'],
name=d['name'],
warning='ignore'
) for d in unknown_dataset
]
Fit a model to the each substance.
[8]:
optimized_models, analyses = finchnmr.model.optimize_models(
targets=substances,
nmr_library=lib,
nmr_model=finchnmr.model.LASSO, # Use a Lasso model to obtain a sparse solution
param_grid={'alpha': np.logspace(-16, 0, 5)}, # Select a range of alpha values to examine sparsity
model_kw={'max_iter':1000, 'selection':'cyclic', 'random_state':42, 'tol':0.0001} # These are default, but you can adjust
)
Iterating through targets: 0it [00:00, ?it/s]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.37s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.38s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.31s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.27s/it]
Iterating through targets: 1it [00:07, 7.82s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.51s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.42s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.31s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.21s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.23s/it]
Iterating through targets: 2it [00:15, 7.68s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.61s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.49s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.35s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.26s/it]
Iterating through targets: 3it [00:23, 7.79s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.64s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.49s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.35s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.23s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.26s/it]
Iterating through targets: 4it [00:31, 7.82s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:07, 1.82s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.63s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.42s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.28s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.32s/it]
Iterating through targets: 5it [00:39, 8.04s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.51s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.39s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.29s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.20s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.22s/it]
Iterating through targets: 6it [00:47, 7.88s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.42s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:03, 1.32s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.25s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:04<00:01, 1.17s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:05<00:00, 1.18s/it]
Iterating through targets: 7it [00:54, 7.70s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:07, 1.85s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.62s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.42s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.27s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.31s/it]
Iterating through targets: 8it [01:02, 7.90s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.40s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.34s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.26s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:04<00:01, 1.18s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:05<00:00, 1.19s/it]
Iterating through targets: 9it [01:10, 7.75s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:07, 1.94s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:05, 1.74s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:03, 1.53s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.34s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.38s/it]
Iterating through targets: 10it [01:19, 8.10s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.48s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.37s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.27s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:04<00:01, 1.18s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.20s/it]
Iterating through targets: 11it [01:26, 7.91s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.64s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.55s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.39s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.24s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.27s/it]
Iterating through targets: 12it [01:34, 7.93s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.49s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.39s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:03<00:02, 1.29s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.20s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.22s/it]
Iterating through targets: 13it [01:42, 7.83s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.74s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.57s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.39s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.25s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.29s/it]
Iterating through targets: 14it [01:50, 7.93s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:06, 1.71s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:03<00:04, 1.55s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.39s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.25s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.29s/it]
Iterating through targets: 15it [01:58, 8.00s/it]
Iterating through parameter sets: 0%| | 0/5 [00:00<?, ?it/s]
Iterating through parameter sets: 20%|██████████▏ | 1/5 [00:01<00:05, 1.47s/it]
Iterating through parameter sets: 40%|████████████████████▍ | 2/5 [00:02<00:04, 1.39s/it]
Iterating through parameter sets: 60%|██████████████████████████████▌ | 3/5 [00:04<00:02, 1.30s/it]
Iterating through parameter sets: 80%|████████████████████████████████████████▊ | 4/5 [00:05<00:01, 1.21s/it]
Iterating through parameter sets: 100%|███████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.22s/it]
Iterating through targets: 16it [02:06, 7.88s/it]
[9]:
# Interactively visualize which substances were considered the most important to each model
analysis.plot_stacked_importances(optimized_models, backend='plotly', color_continuous_scale='viridis')
[10]:
# Alternatively, we can take the first sample as representative and look the the most important substances in library
analyses[0].plot_top_importances(k=5, by_name=False, backend='plotly')
[11]:
# Identify the name of those substances
lib.substance_by_index(98).name, lib.substance_by_index(9).name, lib.substance_by_index(146).name
[11]:
('D-(+)-Maltose', "2'-Deoxyuridine", 'L-Glutamine')
[12]:
# Alternatively
analyses[0].plot_top_importances(k=5, by_name=True, backend='mpl')
[12]:
<Axes: xlabel='Importance'>