Creating Chemicals¶
Chemicals are the essential components run through the simulated LC-MS/MS process in ViMMS. There are two main types: Known Chemicals
and Unknown Chemicals
.
Known Chemicals¶
Known chemicals refer to substances with identified properties and are represented by their formulae. They can be sampled from databases such as HMDB or from specific distributions using classes extending the Formula Sampler
. Current options include:
DatabaseFormulaSampler
: samples formulas from a provided database.UniformMZFormulaSampler
: samples formulas uniformly in a defined m/z range.PickEverythingFormulaSampler
: samples all formulas from a database.EvenMZFormulaSampler
: creates evenly spaced m/z, primarily for test cases.MZMLFormulaSampler
: samples m/z values from a histogram of m/z derived from a user-supplied mzML file.
After generating a list of formula objects, you can use the Chemical Mixture Creator
class to produce chemical datasets for simulation:
from vimms.Chemicals import ChemicalMixtureCreator
from vimms.ChemicalSamplers import UniformMZFormulaSampler
df = UniformMZFormulaSampler(min_mz=100, max_mz=500)
cm = ChemicalMixtureCreator(df)
chemicals = cm.sample(100, 2) # sample 100 chemicals up to MS2
ViMMS offers many options to specify formula samplers and customize the generated RT, intensity, chromatograms, and MS2 peaks for a Chemical object.
You can explore these functionalities further with our notebooks demonstrating the creation of purely simulated chemicals and HMDB-sampled chemicals.
Unknown Chemicals¶
Unknown chemicals are those without identifiable properties, typically extracted from existing mzML files. These could come from prior runs on an actual mass spectrometer. Each peak picked is presumed to correspond to a chemical, and their identities remain unknown. As fragmentation strategies operate without needing to know chemical identities, this presumption suffices for our simulation process.
For an example of how to extract unknown chemicals from existing mzML files, see this notebook.
Multi-sample Mixtures¶
For experiments involving multiple samples, MultipleMixtureCreator
can introduce group specific intensity changes and missing values. Provide a master list of chemicals and a description of each sample group:
from vimms.Chemicals import MultipleMixtureCreator
mm = MultipleMixtureCreator(master_list,
group_list=["control", "case"],
group_dict={"control": {},
"case": {"missing_probability": 0.1,
"changing_probability": 0.2}})
mixtures = mm.generate_chemical_lists()
Each entry in mixtures
represents the chemicals for a single sample.
Building Datasets from mzML¶
Existing mzML files can be converted into chemical lists using ChemicalMixtureFromMZML
. The ROIs are extracted and converted to UnknownChemical
objects so that real data can be resimulated or combined with synthetic chemicals.
from vimms.Chemicals import ChemicalMixtureFromMZML
cm = ChemicalMixtureFromMZML("example.mzML")
chemicals = cm.sample(n_chemicals=None, ms_levels=2)