You are here: Home Collaborations Developer Notes Agilent

Interfacing with Varian

roviding interface functionality from Varian to CCPN software.

Description of CCPN software and interface with Varian software

Overall

The general idea would be to get Varian software to have an "Export to CCPN" option.  (Possibly later on we might also want an "Import from CCPN" option.)

The CCPN do not support a specific format.  We support a software library based on a data model.  The underlying storage mechanism is largely irrelevant to application developers, because the CCPN software deals with (most of) that.

The CCPN data model describes scientfic metadata, so data that is derived from the raw data (so, for example, peak positions, etc.).  It does not deal directly with the raw or processed spectral data, but does contain descriptions of that data (for example, where it is on disk).  To some extent one needs to understand the data model in order to work with the CCPN software.

The CCPN data model is class-based.  So, for example, there is a Peak class, which describes what an NMR peak should be (e.g. its position, its intensity, etc.).  Each class description in the data model corresponds to a class in the software.

We usually call the software library "the API" although it is much more than an Application Programming Interface, it is an implementation of that interface.  We currently provide the API in Python, Java and C (the C API relies on the Python API).

Varian Export


There is the issue of what language API to use for the export and how to get at the metadata that the Varian software uses.  In short, we need to convert from the Varian data model to the CCPN one.  We will leave out the Varian-specific parts (although it's possible we might decide to store some in what is called application specific data).

The Varian graphical interface, VnmrJ, is written in Java.  So one option would be to write the export using the Java API.  (This is in fact what we do for the Bruker export.)  This would work assuming that the VnmrJ code has direct access to the Varian metadata.

Another option would be to use the Varian Magical macro-programming language somehow.  This would only work if we could make Java calls (directly or indirectly) from Magical (which presumably must be possible).  So, for example, say there was a Magical macro called createCcpnExperiment, which created a CCPN Experiment (see below for what that is).  That could call a wrapper function in some Varian-CCPN Java code, which in turn would call the CCPN Java API.

Another would be for VnmrJ to spawn a separate process, which directly parsed the Varian files on disk.  To some extent the CCPN Format Converter already parses these files, but this would be a fuller version.  If we did this then in theory we could use the CCPN Python API.  This approach is not considered ideal because it would mean parsing files that Varian software already parses.

So what CCPN needs to know to get started is which of these methods makes most sense, and once that is decided, we need to know how to get at the Varian metadata using this method.

Issues that need addressing

The CCPN would prefer if ownership of the export code (so maintenance, etc.) was with Varian, although of course CCPN will help write the initial version.

There is also the question of how to distribute the software.  If we use the Java API then we can just provide a jar file.  (Presumably Varian already distributes other third-party jar files with VnmrJ.)  With the Python API it would be harder.  For one thing, Varian would need to distribute the CCPN Python API (and Python does not have the concept of a jar file, at least not so elegantly) but also possibly Python.  This would tend to favour the Java API.

The below talks about experiments and the like.  It does not talk about molecules, but if VnmrJ knows what the molecular description is then that would also be good to export.

Short description of relevant parts of data model


The data model and libraries are all documented extensively, so this is only a short primer.

The CCPN data model currently (November 2008) has 424 classes, split into 46 packages.  The classes have attributes, and there are links between classes.

The top-level CCPN class is called a MemopsRoot.  Informally, we generally call it a Project (because that was its previous name).  A user works on one Project at a time. 

Underneath MemopsRoot is a class called NmrProject.  This contains the Nmr-specific data (as contrasted, say, with the molecular description).

For export one could work either with an existing Project and/or NmrProject or create new ones.

Underneath NmrProject is where it gets interesting.  First up is Experiment, which describes the experiment carried out in the spectrometer.  Underneath that is DataSource, which describes one data set.  Each Experiment can have multiple DataSources.  There is one DataSource for the raw data and one (or more) for the processed data.  (So it's possible one processes the same raw data in more than one way.)  Underneath DataSource is PeakList, which in turn contains Peaks (of course this only makes sense for processed DataSources).  For processed data sets, the CCPN data model currently does not describe how the processing was done.

Also under MemopsRoot is a class called NmrExpPrototype, which in turn contains a class called RefExperiment.  That describes an Nmr experiment by its magnitisation transfer pathway.  CCPN has created a list of around a hundred such reference experiments.  Each Experiment optionally has a link to a RefExperiment.  Getting this information included in the export would be extremely useful.

In the data model, containment of one class in another is called a child-parent link.  For example, DataSource is a child of Experiment, which is a child of NmrProject.  Generally, most parents can have multiple children of a given type.  So, for example, an NmrProject can have multiple Experiments.  All other links between classes are called cross-links, and are the only ones listed below under "important links".

Sample CCPN API usage

Here is an example of how to create a processed dataSource with the CCPN API, by creating all the objects from scratch (except possibly the MemopsRoot and NmrProject).  This was used in a workshop we conducted in February 2008.

The code is in Python.  The Java code would be similar but more verbose (because of the Java typing system).

Below are sketchy discussions of some of the relevant classes

 

Experiment

This describes an Nmr experiment.

Important attributes (all mandatory):

  • name (mandatory): just a free text description (so the user could choose this when exporting)
  • numDim: number of dimensions


Important links:

  • refExperiment (optional)


Less important attributes (all optional):

  • date: date experiment was run
  • nmrTypeType: e.g. "Shigemi 10mm"
  • numScans
  • etc.

ExpDim


This describes one dimension of an Experiment, and is a child class of Experiment.  When you create an Experiment the ExpDims are automatically created.

Important attributes (all mandatory):

  • dim: automatically set (counts from 1, not 0)
  • isAcquisition (default False): whether this dim is the acquisition dimension

 

ExpDimRef

This describes the referencing for values that can appear on an axis in an NMR spectrum, and is a child class of ExpDim.

Important attributes (only sf is mandatory):

  • sf: spectrometer frequency in MHz (normally)
  • isotopeCodes: isotope identification strings for isotopes


Less important attributes (all optional):

  • baseFrequency: nominal base frequency in MHz
  • constantTimePeriod: total constant-time period available
  • isAxisReversed (default True)
  • isFolded (default False)
  • etc.

 

DataSource

This describes the stored data matrix for an Nmr spectrum, and is a child of Experiment.

Important attributes (all mandatory):

  • name
  • numDim (could be different than that for the Experiment, e.g. because of projection)
  • dataType: here 'FID'
  • numShapes (default 0): number of shapes in each matrix decomposition component
  • numSparsePoints (default 0): number of time increments acquired


Important links (optional):

  • dataStore: where actual data are stored

 

FidDataDim (for raw data)

This describes an FID dimension with a regular grid of values, and is a child of DataSource.

Important attributes (all mandatory):

  • dim
  • isComplex
  • numPoints
  • pointOffset (default 0): number of points to ignore at start of FID
  • numPointsValid: number of valid data points (starting at pointOffset)
  • valuePerPoint: corresponds to the difference between successive time values whether or not the points are complex


Important links (mandatory):

  • expDim: the ExpDim corresponding to this FidDataDim

 

FreqDataDim (for processed data)

This describes a frequency domain dimension, and is a child of DataSource.

Important attributes (all mandatory):

  • dim
  • isComplex
  • numPoints
  • pointOffset (default 0): number of points that were removed at start, e.g. after Fourier transform
  • numPointsOrig: number of points before points were removed, e.g. after Fourier transform
  • valuePerPoint: conversion between point number and frequency (the latter normally in Hz)


Important links (mandatory):

  • expDim: the ExpDim corresponding to this FreqDataDim

DataDimRef (for processed data)

This describes referencing information for a given dimension, and is a child of FreqDataDim.

Important attributes (all mandatory):

  • refPoint (default 0.0): point number corresponding to refValue (note that points start at 1, not 0)
  • refValue (default 0.0): reference value (in expDimRef unit) at refPoint

Important links (mandatory):

  • expDimRef: the ExpDimRef corresponding to this DataDimRef