You are here: Home V2 Software Software More ... Developer Notes Legacy Notes Meeting 11/8/06 Machinery change and migration

Meeting 11/8/06 Machinery change and migration

Meeting with Anne, Rasmus, Wayne, Wim, and Peter Keller to agree on the details of the coming machinery change and how to migrate to it.

Inheritance and Genericity

Issue raised by Peter Keller see elaboration here. Meeting agreed with his considerations.

It was agreed:

  • Disallow overloading between classes.
    If you call function a.m when m is defined both in class A and a superclass, which function should be executed depends only on the actual type of object a, and not on the type declaration for the variable a or on the actual or formal types of the calling parameters. This will require some wrapping function in Java that prevents Javas normal overloading mechanisms to call a superclass m instead. Note that overloading within a single class is allowed.
  • Parameters - including return types - may be overridden by a parameter of a conforming type (See genericity discussion by Peter Keller for definition). The name and number of parameters may not be overridden. Sub- and super-classes may have different numbers of overloaded versions of a given function, provided that interclass overloading is suppressed.
  • Attributes and roles may be overridden only if they are defined as abstract. The exact limits should be set as fairly restrictive. ACTION Rasmus.
  • This should solve the immediate problems. It might become necessary to allow the equivalent of interfaces inside the Memops model to ensure that different classes implemented a given functionality. Actually implementing separate interfaces in the code (e.g. Java) is less likely to be needed, but could still be done. Anything further is for the long-term future, if that.
    The measures in this paragraph should be fairly simple to implement. One could allow multiple inheritance in the model (with some constraints) without using multiple inheritance in the genrated code. The 'extra' superclasses could be limited to abstract elements, or their contents could be copied down to their subclasses before generation, and/or they could be mapped to interfaces or multiple inheritance depending on language.

New machinery and migration


Rasmus presented an overview presentation  (see here). The meeting generally accepted the move, with the following additional points:

DataObjTypes

The main reason is the simplification of the machinery that comes from replacing the special-casing of ApplicationData with the new but general machinery of DataObjTypes. The move will be done at the same time as the change to the new Implementation package, so that ApplicationData can be made into a DataObjType. The newAppData functions will not be retained, and there will be no special helper functions. The reason is that 1) No one is using newAppData much at present, and 2) any helper functions would be quite complicated to implemented since a given DataObjType can be present in more than one place in a class, DataObjTypes can be nested, ...

New Collection types

The new types will be added - the question is where they will be used. The changeover is quite difficult since (unlike ApplicationData) there is no easy way to grep the places that need changing. A script or debug change to make it easier may not be possible, much as it would have been popular. The Python Profiler should be able to show where a given function is called, though it will obviously miss code that is not executed. It was decided to

  • Change child links to internal Dict storage, with unique, unordered return type (set), and a sortedChildLink function to return a list sorted in local key order. This change is important to speed up uniqueness checking and getByKey, the 'sorted' function will give you an appropriate list, and th child links ar easier to keep track of when you need to change them.
  • Keep crosslinks unique and ordered in the first instance. Before making the change we wil look for cases where the changeover to sets might be a good idea (or is required), but any change will be done by-and-by.

New Implementation package

This is the biggest and most visible change. The change will bring a number of simplification and improvements, but the main reason is that it will fix data extent handling, which is currently broken in the Database implementation.  As it happens, PIMS programmers are showing increased interest in ways to subdivide data into chunks.

It was agreed to make the change. As a general principle, packages with only one or two classes should ad a new TopObject instead of using the single class as its own TopObject.

After the meeting  the following was propposed for discussion:

New "Classification" package :

A new 'Classification' package would take over the classes InstrumentType, ExperimentType, ComponentCategory, HazardPhrase, SampleCategory, and possibly Target.Status and Target.ScoreBoard. The RiskAssessment package would disappear.

The TopObject of the 'Classification' package should be 'Classification'.

New TopObjects

The following packages can use an existing class as TopObject. In all cases the TopObject has the same name as the package, and there are no reasons for renaming :
  • Previously ContentStored (HeadObject disppears):
    ChemComp,  ChemCompCoord,
  • Previously NormalStored:
    ChemCompCharge, ChemCompLabel, ExpBlueprint, Coordinates, Molecule, MolSystem, NmrExpPrototype, Protocol, NmrReference, NameMapping

The following have special considerations :
  • The Coordinates (ContentStored) package can use the current HeadObject, "Structure", as the new TopObject.
    "Structure" is a bit generic, though. Also the package name, 'Coordinates' is plural. How about renaming package and TopObject both to "MolStructure"?
  • The NmrConstraints (ContentStored) package can use the current HeadObject as the new TopObject. The class shouldbe renamed "NmrConstraint". The package is currently called "NmrConstraints" (p;ural).How about renaming it to singular?
  • The NmrReference (ContentStored) package can use the current topmost object "ChemCompNmrRef" as TopObject.  How about renaming the package to "ChemCompNmrRef", in analogy with "ChemCompCoord" and "ChemCompCharge"?
  • The DbRef package would improve greatly if the current DbName class was made into the TopObject. The name of the class, and of the package, should probably be changed to "RefDatabase".

The following classes need a new TopObject. Few of them can (easily) have the same name for TopObject class and package, and most of the names are very much under discussion :

  • Annotation - TopObject "AnnotationSet"?
  • BmrbEntry - TopObject "Bmrb"? Rename package to "Bmrb"?
  • ChemElement - TopObject "ChemElement"? Requires renaming ChemElement class, to "Element"? "Atom"?
    Alternatively have TopObject "ChemElementSet"?
  • Citation - TopObject "CitationSet"?
  • Crystallization - TopObject "Crystallization"?
  • Experiment - TopObject "Biochemistry"? "LabExperimnt"? "ExperimentGroup" (using the existing ExperimentGroup class)? "Laboratory"? "LIMS"
  • Holder - TopObject "SampleHolder"?
  • Instrument - TopObject "Instrumentation"?
  • Location - TopObject "SampleTracking"?
  • Method - TopObject "MethodSet"? "Procedure"?
  • Nmr - TopObject "Nmr"?
  • People - TopObject "People"?
  • RefSampleComponent - TopObject "RefSampleComponent"?
  • RefStereoChemistry - TopObject "RefStereoChemistry" and rename class of that name to "StereoChemistry"?
    Or call TopObject "StereoChemistry"?
  • Sample - TopObject "SampleTracking"? "SampleComposition"? "SampleDescription"
  • Target - TopObject "TargetTracking"?
  • Taxonomy - TopObject "Taxonomy"?
    Maybe "NaturalSource" should be renamed for other reasons later
  • Analysis - TopObject "Analysis"?
  • AccessControl - TopObject "AccessControl"?

The following packages will disappear:

RiskAssessment, Poirot, Crystallography


Migration

It is critical to make the migration as painless on the developers as possible.

The first phase

is to introduce TopObjects in the model, while keeping the current machinery (ACTION: Rasmus). This must be ready before the end of August, so that the ExtendNMR course in Paris can be taught with something that looks more or less right. To speed the transfer, two sets of extra functions should be generated using the current machinery:

  • Functions that will be there after the change but are not there yet. This includes the 'sortedChildren' function, the Project.currentNmr (etc.), and functions that will get, find, etc. e.g. ChemComps directly from Project, bypassing the HeadObject.
  • Functions that  are called currently but will no longer exist in the new model. Typically these would be functions corresponding to child links from Project that will be replaced by links to new TopObjects, e.g. project.getResonances etc. These functions will be derived, using the currentNmr (etc.) links.
  • A 'WARNING - deprecated function xxx called in line xxx' should be printed out when a deprecated function is called. In case this proves difficult, the Python Profiler can provide information on which functions are called where.

    Once the change is made, deprecated functions should be replaced with the proper ones the the greatest possible extent.

To accomplish this, a new branch is split out from branch2, so that there is a working, bugfixing branch while the new work is being done.

Second Phase

The next phase wil be to switch to the new Implementaion package, the DataObjTypes, the new MetaModel, XML Model storage, and generation machinery. This corresponds to merging teh current branch back into teh trunk, and continuing from there. This is the largest change and will take a couple of months to make migrate and debug

Third phase


Once the new machinery  is in place, with collection types kept unchanged, the new collection types will be added, the collection types of the model will be changed, and the code will be updated.

Outreach

A lot of communication and documentation will be necessary to keep the various developers on side, including precise descriptions of what to do with tips and examples. To start with a letter with the powerpoint presentatino from the meeting and some general comments should go to the CCPNMR mailing list ASAP (ACTION: Rasmus).

Future

This is a major disruptive change. It is justified because it has been preannounced for over a year, and because the people affected are still mostly within CCPN (barely). Future machinery changes, like future model changes, will have to consult more widely and to pay much more attention to backwards compatibility of code. In short, we may never get another opportunity to make so drastic changes.