Quick guide to CCP software development

What is the 'data model'?

    The data model itself is an abstract description of all the data commonly used in NMR (with other areas like protein production being included). For example, the data model describes an 'Experiment' object which is linked to 'ExpDim' object(s) that describe the different dimensions in an NMR experiment. This abstract description of the data model is represented and maintained graphically using the Universal Modelling Language (UML).

What are 'packages'?

    The data model is split up in packages. Each of these packages describes a 'unit' of information that can be shared by other packages. For example, the description of a template molecule is done in the 'Molecule' package, the description of a molecular system with 'real' molecules is done in the 'MolSystem' package. The 'Nmr' package uses information from the 'MolSystem' package, which could be shared by an 'Xray' package if it was available. For this reason the data of each package is stored in separate locations.

What is the 'API'?

    API stands for Application Programming Interface. With an API the objects described by the data model can be manipulated in computer memory. Basically this means that the data is organized in a way that is consistent with the 'data model'. The API therefore also handles consistency checking of the objects (e.g. an Nmr 'Experiment' object has to be linked to at least one 'ExpDim' (experiment dimension)). The API is currently only available in Python, but a Java version is in the making.

Which programs use this 'API'?

    Currently only the ccpNmr applications interact directly with data inside the data model. These applications include 'FormatConverter' and 'Analysis', with 'Processing' available soon. The new ARIA software, validation software from the CMBI, and CLOUDS will work with the data model indirectly (and eventually directly). Our hope is that more applications will convert to using the data model API now that working applications are available that show its advantages.

How do I get my data into (and out of) the 'data model'?

    The ccpNmr FormatConverter application allows you to import data from existing formats into the data model. Export functions are also available so it can be used as a format converter between existing formats. Currently Ansig, Dyana, Cns, Cyana, Fasta, NmrDraw, NmrStar, NmrView, Pdb, Pipp, Pronto, Sparky and XEasy are fully or partially supported.

What is the advantage of having data inside the 'data model'?

  • All programs that work with the data model 'understand' each other. For example, you can read data into the data model with the ccpNmr FormatConverter, start using ccpNmr Analysis straight away (providing it understands the spectrum raw data format), and transfer the information to ARIA for a structure calculation.

  • Scripts that work on the data model can be used by every application that uses a data model API. For example, if a good automatic assignment script was written it could be run from any data model based application.

  • Import/export to foreign formats comes for free (see above). This basically allows you to store all your data in one place throughout a project, while going back and forth between different programs while doing that. Final export to an nmrStar file ready for deposition will also be included.

What else is planned?

  • Further data modelling of other 'bio' areas.

  • Development of a LIMS system, which also includes information for sample preparation, protein production, ...

  • New code for the data model to do things like automatic assignment, ...

  • Full support for databases - so that you can store all information in a relational database while you work on it.
Work done by the CCPN team.
Last modified: Tue May 25 18:59:48 CEST 2004