Skip to content

Creating Chemicals

Chemicals are the essential components run through the simulated LC-MS/MS process in ViMMS. There are two main types: Known Chemicals and Unknown Chemicals.

Known Chemicals

Known chemicals refer to substances with identified properties and are represented by their formulae. They can be sampled from databases such as HMDB or from specific distributions using classes extending the Formula Sampler. Current options include:

After generating a list of formula objects, you can use the Chemical Mixture Creator class to produce chemical datasets for simulation:

from vimms.Chemicals import ChemicalMixtureCreator
from vimms.ChemicalSamplers import UniformMZFormulaSampler

df = UniformMZFormulaSampler(min_mz=100, max_mz=500)
cm = ChemicalMixtureCreator(df)
chemicals = cm.sample(100, 2)  # sample 100 chemicals up to MS2

ViMMS offers many options to specify formula samplers and customize the generated RT, intensity, chromatograms, and MS2 peaks for a Chemical object.

You can explore these functionalities further with our notebooks demonstrating the creation of purely simulated chemicals and HMDB-sampled chemicals.

Unknown Chemicals

Unknown chemicals are those without identifiable properties, typically extracted from existing mzML files. These could come from prior runs on an actual mass spectrometer. Each peak picked is presumed to correspond to a chemical, and their identities remain unknown. As fragmentation strategies operate without needing to know chemical identities, this presumption suffices for our simulation process.

For an example of how to extract unknown chemicals from existing mzML files, see this notebook.

Multi-sample Mixtures

For experiments involving multiple samples, MultipleMixtureCreator can introduce group specific intensity changes and missing values. Provide a master list of chemicals and a description of each sample group:

from vimms.Chemicals import MultipleMixtureCreator

mm = MultipleMixtureCreator(master_list,
                             group_list=["control", "case"],
                             group_dict={"control": {},
                                         "case": {"missing_probability": 0.1,
                                                  "changing_probability": 0.2}})
mixtures = mm.generate_chemical_lists()

Each entry in mixtures represents the chemicals for a single sample.

Building Datasets from mzML

Existing mzML files can be converted into chemical lists using ChemicalMixtureFromMZML. The ROIs are extracted and converted to UnknownChemical objects so that real data can be resimulated or combined with synthetic chemicals.

from vimms.Chemicals import ChemicalMixtureFromMZML
cm = ChemicalMixtureFromMZML("example.mzML")
chemicals = cm.sample(n_chemicals=None, ms_levels=2)