You are here: Home V2 Software Software More ... Memops Code Generation Strategy Documents Repository Lookup and TopObjects

Repository Lookup and TopObjects

The new machinery includes mutable keys for TopObjects and a lookup path for repositories. This page discusses the implications and what behaviour to implement. IMPLEMENTED Rasmus Fogh 2007


The new machinery introduces:

  • Mutable keys for TopObjects
  • a GUID as a permanent global unique ID for topobjects.
  • Several repositories for data with a lookup path defined for each package.
  • Multiple extents for all packages, with a root.currentPackageExtent link.

We may not have fully considered all the implications.

Object identification

Hard links explicitly identify the right topObject (e.g. NmrProject) and use the guid to do it. This ensures you always link to the same object. For reference data you will typically use derived links, either going through the current link (e.g. for ChemElements) or using the keys (e.g. for ChemComps). Either will get you the current version of the reference data, even if it is now a different object. getByKey goes through the TopObject and uses the keys for navigation - maybe the more flexible getByNavigation should be preferred, at least in Python. When modeling and coding you must consider what behaviour you want if the topObject key changes, or if a new package extent (be it with the same or different keys) is introduced.

Effect of lookup path

The question is: what happens if you have the 'same' TopObject in several repositories, e.g. if there is a copy of Ala in both your personal chemcomps repository and the groups general repository?

The problem is different depending whether the two Ala have the same guid or the same key (or both). There are several possibilities: You could say that this kind of duplication is illegal, you could use the copy found first and mask the others, or you could even make some modifications (like making the repository link part of the key once loaded), that would allow you to have several objects with the same key in memory simultaneously. One problem is that if there are several 'identical' TopObjects floating around, there is a big risk that different objects from different packages might latch on to different versions, with all the resulting confusion.
this to behave.

Loading behaviour

In the end I decided on the following:

  • Every package has a search path consisting of one or more repositories. To ensure this, the DataLocation with packageName=='any' must always be present.
  • A data extent can be present in several repositories. It is identified by its guid, which is also used for its file name in file implementations. For modification tracking purposes, the data are considered to 'live' in the first repository on the path where it is found. The 'isModified' attribute tracks whether the data are syncronised with the repository where itis supposed to  'live'.
    NBNB TBD saveTo, loadFrom, remove, etc. handle isModified correctly. Modifying Repository objects and DataLocation objects does not yet do so. Meanwhile it is recommended not to modify Repositories and DataLocations in the same sessions you use for modifying real data.
  • Shell objects representing all known TopObjects are stored in the Implementation extent. Operations that require knowledge of all extents of a given package will search the repositories and create new shell objects where none currently exist. In file implementations this does not require loading of the files. NBNB TBD - check what happens if a shell TopObject no longer corresponds to anything on disk.
  • The shell object for the top object contains only the guid and the key. A TopObject will be loaded automatically if you need information not in the shell object. Out-of-package links to TopObjects are stored only in the importing package and are generally handled like other crosslinks. Links that are part of teh TopObject key are of course stored in the Implementation packge XML, and the implementation makes use of this to aviod loading in some cases.
  • Accessing a  -to-many link from an imported package to its importing package will force a load of all the package extents on the importing side. Child links of MemopsRoot are exceptions to this rule.

Original loading behaviour

The original plan was for the loading behaviour given below. It was abandoned because the external guid-key mappping was too hard to implement and storing links between topobjects on both sides effectively broke the package separation.

  • Interpackage links to TopObjects are always stored, also on the 'wrong' side of a two-way link. This includes the 'current' link. The link stores the guid, classname, and key, which is enough to create a 'shell' TopObject without loading anything.
  • When you access the internals of a TopObject, the implementation searches for the right repository using the guid and key, and loads the file.
  • Root.getChemComps (e.g.), Root.findAllChemComps, and root findFirstChemComp will trigger the creation in memory of all relevant TopObjects. findFirst will first look through the existing TopObjects first, and will only lad more if no match is found. The program will look through a guid-key map, maintained in every repository, and will use the information there to create shell TopObjects without loading their actual file.
  • The on-disk guid-key map will be updated when saving. New TopObjects will be added. When calling saveAll deleted topObjects will be removed.
  • Deleted TopObjects will have their files deleted on disk when you call saveAll, but not otherwise.
  • Directory structure will reflect the package organisation (e.g. 'repositoryTop/ccp/molecule/ChemComp'.
  • File names will reflecty the guid. As a help to users, there will be soft links from a name that reflects the key to the actual file. For systems that do not allow soft links and keys that do not form legal file names these soft links will be skipped.



The desired behaviour of backup and restore needs to be considered in the context of these rules. Suggestions are welcome.