You are here: Home V2 Software Software More ... Developer Notes Data Model Molecule Labeling package

Molecule Labeling package

Proposal for a new LabelledMolecule package to handle isotope labeled molecules. Implemented RAsmus Fogh 2007


Limits to current model

We can already handle the following

  • Uniform labeling
  • Atom-specific labeling at a few positions.
  • Molecules built out of residue building blocks with a single building block at a given position.

We can handle mixtures of different labeling schemes in RefSampleComponent, but we have to do it with separate MolComponents for each residue.

We can not handle a protein built from a mixture of building blocks, e.g. with half the alanines labeled and half the alanines unlabeled, randomly at each position. Unfortunately this is exactly what some of the modern labeling schemes do. With e.g. 1,3 13C-glycerol labeling, you would get a mixture of several labeling patterns for each amino acid, with each e.g. Leucine assigned randomly to one of them.

New needs

We must allow Nmr experiments to be specified with their associated labeling schemes. Unfortunately, but understandably, the NMR programmers do not want  to introduce full handling of Samples, RefSampleComponents etc., just to satisfy this requirement.


Ideally we would want to give the following information:
For any atom position what is the isotopic distribution in a given sample/experiment?
For any pair (triplet, ...) of atoms what is the isotopic distribution (i.e. what percentage is 13C-13C, 12C-12C, 12C-13C, 13C-12C)?

To handle the fully general case we would have to store the isotope percentages explicitly for every atom pair (or even triplet etc.) in the molecule. Which is ridiculous. Or we could store the percentage of every molecule-level isotopomer, which is impossible. The draft tries to handle the realistic cases in a manageable way.

The model below gives an independent isotope distribution for every atom in a ResLabel
For every residue position it gives a distribution of ResLabels, which is independent of the distributions at other residue positions. By combining several ResLabels for every residue position and several MolLabels for the molecule as a whole, you can describe both independent and correlated isotope substitution at different positions.

The model allows the same situation to be described in different ways - notably you can specify isotope distributions within a residue either on a ChemComp basis (in the ChemCompLabel package) or molecule by molecule in the new LabeledMolecule package. There is room for a certain amount of reorganisation. The system is complex because it tries to make the most common situation simple to enter - the underlying data could be stored in a simpler system.

The Model draft

LabeledMolecule Package

LabeledMolecule package:
The LabeledMixture and MolLabelFraction classes define a mixture of MolLabels, each of which corresponds to a labeled species. To find the isotope distribution for a given atom (or pair) you loop over MolLabels, ResLabels and AtomLabels, find the matching record(s), and average to find the isotope fractions. The rules are a little complex:

  • A ResLabel matches an Atom if 'atom.residue.resId in resLabel.resIds'
  • For two Atoms in different residues the isotope distributions are calculated as a pair within each MolLabel. For two Atoms in the same residue, information for the pair of atoms is taken as a pair from each ResLabel.
  • The SingleAtomLabel gives the fraction of a given Atom that is of a given massNumber (the element is a given). If there is at least one SingleAtomLabel within a ResLabel that matches a given atom, only information from SingleAtomLabels within that ResLabel is considered.
  • The UniformAtomLabel is valid for all atoms of that element (e.g. for all carbons) in the ResLabel (except those overridden by SingleAtomLabels). If there is at least one UniformAtomLabel within a ResLabel that matches a given atom, information from derived from the sourceName (see below) is ignored.
  • Where no AtomLabel is present, the sourceName and speciesSerial refer to the isotope composition specified in the corresponding ChemCompCharge.LabeledSpecies. If speciesSerial is not set, the ResLabel refers to the entire set of LabeledSpecies for that  ChemCompCharge, as if there had been a ResLabel for each. If sourceName is not set either, the isotope distribution is assumed to be natural abundance (modified accorcing to the AtomLabels).
  • If the fractions within a ResLabel, MolLabel or LabeledMixture sum up to less than 100%, it is assumed that the remainder is at natural abundance.

ChemCompLabel package:
The LabeledSpecies class defines one of several species in a mixture. For each species the Label class (should maybe be renamed)  gives the fraction of a given atom that are of a given isotope. Missing atoms, and the remainder where atom fractions do not sum to 100% are assumed to be at natural abundance.

The classes ChemCompLabelVar and ChemAtomVar should be removed from the package. All they did was to duplicate the information in the ChemComp package, so people might as well go to ChemComp for information. For the same reason the ChemCompVarXxx class should probably be removed in the ChemCompCoord and ChemCompCharge packages.

RefSampleComponent package:
RefSampleComponent.SpecificLabelGroup and RefSampleComponent.IsotopeLabel are replaced by the RefSampleComponent.MolComponent.labeledMixture link.

Outstanding questions

Default isotope pattern
We think that any isotope contribution that is not specified (fractions do not add up to 100%, sourceName and Atomlabel not specified, ...) should be assumed to be at natural abundance. This makes it easy to specify labeling at a few places only. It does mean that e.g. an AtomLabel that says ''CA" "13C" "0.33" corresponds to an actual labeling fraction of ca. 33.8%, given that the remaining two thirds are at natural abundance. If you do not like this you are free to enter ''CA" "13C" "0.33"; ''CA" "12C" "0.67". There are a couple of alternatives.

  • We could assume the most common isotope for the rest. There are two problems with this. We would have to specify natural abundance explicitly whenever we had it, and things would get a little messy for elements with several high-abundance isotopes (Cl or Pb, for instance).
  • We could  enforce that isotope composition had to be given completely. We would then still have to specify natural abundance, and we would have to have more atom records than in the alternatives. Possibly we could add a couple of special cases (yet more) so that natural abundance could be specified in a simple manner.

The class and attribute names are still up for discussion. As just one example we have used 'fraction' in the new model, where teh RefSampleComponent model used to have  'incorporation' for isotope fraction.