You are here: Home V2 Software Software More ... Developer Notes CcpNmr FormatExchange FormatExchange LinkResonances

FormatExchange LinkResonances

A draft proposal for organizing data structures and interface for FormatExchange LinkResonances.

Overview

The task of linking resonances starts with a sequence, assigned peaks, and assigned shifts read in from external files. The peak and shift assignments are passed to FormatExchange as chainCode, resNum, resType, and atomCode, in the nomenclature of the files. The task of linking resonances is to create a data structure that connects assigned peaks, assigned shifts, and atoms, and that contains enough information to convert the entire structure to the format of the data model. Since we are, effectively, mapping the original assignment strings to a final assignment, we should store the original assignment strings, so that the user can compare the chosen assignment with the original file contents,m and so we can redo the mapping with new information.

Nomenclature

It is confusing and rather unfortunate that FormatExchange and the CCPN data model use the word Resonance for different things. For the time being I shall use 'peakdim' for the thing that FormatExchange calls 'resonance', and 'Linkresonance' for the thing that CCPN calls Resonance.

Data Structures

Atoms

As soon as we know the sequence, we effectively know the chainCodes, resNums, resCodes and official atom names, whether they are stored in the FormatExchange XML or not. We need not consider this point further.

Chemical Shifts

The simplest way of handling assignment would be to add the information necessary for LinkResonances to the Chemical shift xml elements and editing tables. The chainCode, resNum, resCode, and atomCode already being stored would represent the correct assignment. AtomCodes would be CCPN assignment names (see below) rather than ChemComp atom names. We would need the following extra elements (and columns):

nucleus:
e.g. '13C', '15N', '1H', ...

originalAssignment:
The original assgnment in the file. Could be just a string, or split into components. We must be able both to print out the original string and to redo the conversion from file content to proposed asignment.

residueOffset:
Does this atom belong to this residue, the previous one, or the next one? Necessary to handle spin system type assignments. Could be stored and/or shown as an integer (default to zero) or an enumeration ('same'/'next'/'prev').

directlyRead:
This need not be displayed, but should be stored to track whether this shift record was read in from file or generated from a peakdim record. In the latter case the Chemical shift should be removed if no longer needed for a peakdim assignment.

PeakDims

PeakDims need a chainCode, like Chemical shifts have, to store the correct assignment in all cases. They also need an originalAssignment like Chemical shifts.

Procedure and Interface

The tasks that need doing map directly onto the kind of display pages that we need, so I shall treat them together, in the order in which the tasks should be carried out. It may of course be that not all panes are relevant in practice; I would welcome input on that, especially from Marc.

There might be cases where there is no separate sequence file and the sequence must be determined form shift and peak data. This would require separate consideration. Also individual shiftlists and peaklists might have naming conventions that do not match. Here I assume that everything that is read together is treated as consistent, in order to avoid complicating the interface.

Residue types

The first task is to gather all the residue type codes and map them onto ChemComp ccpCodes. Something similar needs doing for reading sequence files so some of the code is presumably there somewhere. It corresponds to identifying a naming system to use, in Wims LinkResonance code. Because Sparky is freeform text, there might be more work to do here than for sequence reading. I would propose a table with columns for original residue type code (as read in the files), number of separate resNum the type code is found in, and the corresponding ccpCode. The latter would be editable. The table should probably be sorted with the most common codes first - there might be codes that appeared rarely that were typos or just annotations ('AMX', for instance).

Atom Codes

For each residue type one should gather all the atom codes found and map them. Again a similar task will like have been done in reading e.g. coordinates. The mapping here is not to ChemComp atom names, but to assignment names, which include e.g. Hba/Hbb for non-specific prochirals, and HG2* for methyl groups (see below). There should be a table with the resCodes, (including None and All). On selecting a row another table should appear with the relevant atomCodes as read in the file, the number of different residues in which it is found, and the atom name to use. The latter should be editable. The None residue should have atoms hat lack a resCode, and teh All table should have atomCodes that are found in more than one resCode.

Residue and Atom code mappings must be set up automaticlaly. For Atom codes we need to have a default mapping that includes the most common pseudoatom codes. From case01 I notice atoms like QD and QG, that are standard Wuthrich pseudoatom codes and would translate to HD* and HG* respectively. The default mapping should assume that assignments are *not* stereospecific, unless there is evidence to the contrary.

Chains and sequences

The sequence from peaks and shifts might not match the one from the molecule. There could be problems with where the residue numbering starts, there could be mistakes in  either, or there could too many or inconsistent chain codes. The program should make the best sequence alignment to start with. For user input the molecule and the nmr-derived sequence should be put side by side with the best alignment. It should be possible to move (or renumber) blocks of residues at once. NOTE: The display for the correct sequence should be the seqCode and seqInsertCode, not the seqId.

The resCodes, chainCodes and sequence numbers might often be fully handled automatically, in which case they would come up green. The atomCodes are more likely to require individual attention.

Resonance linking

When the automatic mapping is over, the user should check the mapping records (as above) and rerun the mapping as necessary. Once that is done it is time to view and edit individual assignments. The Chemical Shifts table would hold both chemical shifts as read in and information needed for resonance linking.. This requires that we calculate a shift for assignments that are not present in the shift list but only in the peak list. That ought not to be a problem - Analysis does it anyway. As discussed under 'Data Structures' we would need the three additional columns nucleus, originalAssignment, and residueOffset. The editable co.,lumns would be chainCode, resNum, resCode, atomCode, nucleus, and residueOffset.

The peaks and peak assignment tables would not need to be modified, except that Peak Assignment should have an additional, editable, 'chain Code' column.

We would need one extra table. It should open up when you select a Chemical Shift, either automatically or by pressing a suitable button.  The columns should be the same as for the Chemical Shift, except that the leftmost column should be replaced by a text column 'Identity', and the 'residueOffset' column should be removed.  The first line of the table whould be the chemical shift line, with Identity='Chemical Shift n' where n is the shift number. The other lines should show all PeakDims that share the same assignment. Here the first column should say e.g. 'Peak <PeakListName> <PeakId> F2'. The content of chainCode, resNum, resCode and atomCode should be the same as for the first line and should be editable, 'nucleus' should show the nucleus as deduced from the peak in question and should not be editable. shift value and std should show the position of the peak and the uncertainty. We would need to decide if 'Shift value' should be editable for peakdims. The purpose of this table is to show all objects that share a given assignment, and allow you to reassign those that do not fit.

LinkResonances

The LinkResonances process would create CCPN resonances and ResonanceGroups (=spin systems) from the information in the Chemical Shifts of FormatExchange, and set up links to CCPN Shifts, PeakDims, and Atoms. The way this process works defines, as it were, the meaning of the information stored in the relevant parts of the formatExchange XML.

The combination of chainCode, resNum, and atomCode uniquely defines a resonance. The resCode must always be the same for a given chainCode+resNum (validation!). The chemical shift entry (there can be only one) and the various PeakDim entries must all be assigned to this resonance. On creating the resonance, Resonance.isotopeCode should be set to the Chemical Shift 'nucleus', and Resonance.name should be set to the atomCode (note that atomCode may be empty in some cases). BNote that this means the nucleus has to be set (validation!) There should be a Shift object created that is linked to the Resonance, and the relevant PeakDims of the relevant Peaks should be assigned to the Resonance.

What further happens depends on the chainCode, resNum, resCode, atomCode, and residueOffset.. We first assume that residueOffset is 0.

  • If all four correspond to a single atom in the MolSystem, the Resonance is linked to that atom through a single ResonanceSet and AtomSet. Examples: 12 Ile HN, 13 Ser HB2. The latter is a stereospecific assignment.
  • If all four correspond to a group of atoms in fast exchange (a methyl group or atoms from a flipping aromatic ring), there will be a pre-existing AtomSet.  The Resonance is then linked to that AtomSet through a single ResonanceSet. Examples: 47 Thr HG2*, 54LeuHD2*. The latter is a stereospecific assignment.
  • If all four correspond to a non-stereospecific assignment, the Resonance is linked to *both* the possible assignments through one ResonanceSet and two atomsets. The resonanceSet may be shared between two Resonances if bboth alternatives are present.. Examples: 32 Ser Hbb, 49 Val HGb*. The latter is a methyl group.
  • If all four correspond to both resonances in a prochiral group, assignment is made to both atoms. Examples: 32 Ser HB*, 49 Val HG*.

The cases shown above are all standard in Analysis, and there will be functions in AssignmentBasic that do exactly this. As I remember, the standard functions in AssignmentBasic will automatically create a SpinSystem (=ResonaceGroup) for each residue.

  • If chainCode and resNum match an existing residue, but the atomCode is not found in the residue, the resonance is added to  the ResonanceGroup that corresponds to the residue. The resonance is interpreted as 'belongs to this residue, assignment unknown, and the Resonance.name shows what we know about the assignment.
  • If resNum is set, but resNum does *not* match a residue in the MolSystem, this represents a spin system that has not been assigned to the molecule. A resonance group is created to represent it (unless there is one already). If the resCode matches a ccpCode, this is used to set ResoanceGroup.molType and .ccpCode, otherwise the resCode is set in the ResonanceGroup.details. The name of the ResonanceGroup is set by combining the chainCode (if any) and the resNum. The Resonance is then added to the ResonanceGroup. If the ccpCode is set and the atomCode matches a name for this residue, or if the atomCode matches a name shared by most residues (like H, N, CA, ...), do Resonance.assignNames = (atomCode,)
  • If resNum is not set, this is an individual resonance that does not map to a spin system. Simply create the CCPN Resonance, and set the name to the original assignmetn string from the file.


Finally we need to consider cases where residueOffset is not zero. In practice it will then be either +1 or -1, correspoinding to the next or the previous residue in the sequence. We might get this kind of thing as a mapping for atomCodes like CAi-1 or CA-1 or N+1 (or even 'PCA' for 'previous carbon alpha'), anyway where the user is assigning atoms from a preceding or following residue as part of a given spin system. What needs doing here is to create a new ResonanceGroup (or find one, if it exists) link it to the current ResonanceGroup as the preceding (or following) residue as the case may be, and add the resonance to it. Since this is rather complex, I would suggest that it be implemented with the help of Cambridge CCPN people once everything else has been set.