You are here: Home V2 Software Software More ... Developer Notes Data Model Trajectory package

Trajectory package

Draft of data model package for dynamics trajectories and/or compact storage of ensembles. IMPLEMENTED Rasmus Fogh 2011

Trajectory package

We need a way to store both dynamics trajectories and very large structure ensembles in a memory-efficient manner. data model package to store dynamics trajectories and very large structural ensembles in a memory-efficient manner. After discussion, I have decided to modify the MolStructure package to cater for both needs. This will lead to some code breaking, but the extent should be minimal (see below)

The proposed package is shown on the diagram here .

Data Matrix Organisation

Data matrices have the model as teh slowest varying dimensin, teh atom as teh next slowest, and additional dimensions after that. The data matrices are stored as packed into 1D arrays. We might use array.array under the hood, but that is a implementation detail. We prefer not to use a native matrix type, like NumPy for Python. While this would be more efficient, it would introduce an unpleasant dependency on the API, and would require a different behavior for each language. All functions that pass data in or out of matrices also use packed 1D arrays. Matrix data can be queried and set per Model, per Atom, or for arbitrary submatrices ,by using the generic functiosn on Model, Atom, or FloatMatrixObject. The getCoordinates, getBFactors and getOccupancies functions are simply wrappers with user-friendly names that call getSubmatrixData. All data matrices are adjusted as appropriate when Models are created or deleted.

The Coord class

The Coord class is maintained for compatibility purposes only, and is no longer stored. Coord objects can be queried, modified and created almost as before (see below), but serve as views on the underlying data. Derived attributes on Model, Atom, and Coord allow access to these attributes per model, per atom, or per Coord. Coord objects are created as needed. The new DataMatrix class, child of StructureEnsemble, serves to store the actual data as matrices for bFactors, occupancies, and coordinates - these matrices are created automatically when the first Model object is created and resized as needed to support storing B factor, occupancy and coordinate data.

Additional Data Matrices

It is also possible to create new Data Matrices for e.g. velocities (nModel*nAtom*3), partial charges (nModel*nAtom) or energy terms (nModel). All Matrix sizes are adjusted when Models are created or deleted. New matrix elements are set to 0.0 by default. Matrices where all elements are equal to the default value are automatically stored in a compact, space-saving form. The default value can be defined in the matrix object. Standard names for data matrices are

Data matrices can optionally define the unit used - default units are
coordinates: Angstrom
velocities: Angstrom/s
times: s
temperatures: K
partialCharges: electron charge unit

Additions and modifications for these names and units are welcome.

Efficient usage

To get the speed and memory advantages of the new arrangement, all references to the Coord class should be removed. Data should be set up one model at a time using Model.setSubmatrixData (which is quicker than the equivalent function on Atom), and can be modified one by one with Atom.setSubmatrixValues.

Additional features

StructureEnsemble.orderedAtoms is created automatically and gives the atoms in the order of their creation. This is also the order that determines where atom values appear in data matrices. The order of Model data in the data matrices is given by Model creation order, as preserved in StructureEnsemble.sortedModels()
Model.index and Atom.index gives the index of the Model (resp. Atom) in the data matrices. Indices are reset automatically when Models / Atoms are deleted. NB, Atom indices are reset at the start of the deletion process. If an atom deletion fails (should normally not occur) the indices may be corrupted.
StructureEnsemble.purge() will compact data matrices where all values equal the default value, and will remove all Coord objects from memory.
Coord objects are created automatically by any method that access the Atom.coords or Model.coords link - all Coords belonging to the starting object are created.
Atom.newCoord will return an existing Coord object if present, and will otherwise create one. This is the most memory-efficient way of getting hold of a single Coord object.

Breaking MolStructure code:

  • altLocationCode
    Now in Atom instead of Coord, and part of the Atom key. As a consequence all altLocationCodes must be known before you start to enter coordinates, and all altLocationCodes must be present in all Models. For data where altLocationCode is always equal to the default this will make no difference to old code. If your code supports other cases it must be rewritten.
  • Atoms cannot be added once DataMatrices (hence Models) exist, so all Atoms must be created first
  • Atoms cannot be deleted once DataMatrices (hence Models) exist.
  • Coord x, y, z are mandatory with a default value of 0.0.
  • Data for all Coords are added (with default values) when a model is generated, hence all atoms have coordinates etc. in all models.
  • Deleting a Coord will remove the object from memory but will make no difference to the underlying data.

Subject to the limitations above is is possible for code that enters data by creating Coord objects to keep working.