You are here: Home V2 Software Software More ... Memops Code Generation Strategy Documents XML file names

XML file names

File names for XML storage

New Rules:

  • File names will be given as the TopObject fullKey followed by the topObject guid, and end with'.xml'
  • The guid stays the same. The format is projName_userName_2007-12-11-11-36-07_00001
  • MemopsRoot.name and MemopsRoot.currentUserId (the source of projNameand userName) are limited to purely alphanumeric (a-zA-Z0-9), This is an new constraint,
  • The fullKey is stringified. In each part all non-alphanumeric characters are replaced by underscore ('_'). The key is no longer recoverable, but we do not intend to do that anyway, and this should get us semi-readable names.
  • The parts of the fullKeys and the guid are joined together using '+' (plus sign) as a separator
    and '.xml' is appended . The resulting file name is truncated so that we keep the last 254 chars.
  • When reading files only the guid would be checked. The fullKey part is purely for readability and is never used for identifying or reading files.
  • We will *not* rely on the same key always generating the same encoded string. Apart from the compatibility issues above, there is also the problem of ensuring that different encoders give fullly identical results, including letter case.
  • Checking for key uniqueness requires us to read the files, but I have set it up so that we create only the TopObject and not the other objects, and so do not need to read the entire file.



Old Rules:

  • File names will be given as the TopObject fullKey followed by the topObject guid, and end with'.xml'
  • The guid stays the same. The format is projName_userName_2007-12-11-11-36-07_00001
  • MemopsRoot.name and MemopsRoot.currentUserId (the source of projName
    and userName) are limited to purely alphanumeric (a-zA-Z0-9), This is an new constraint,
  • The fullKey is stringified and each part is urlencoded. This keeps a-z, A-Z, 0-9, and ".-_" (dot, minus, and underscore) and uses %xy escaping for everything else (alternative C).
    As alternative D we could escape spaces using the plus sign ('+').
  • The parts of the fullKeys and the guid are joined together using '=' (equal sign) as a separator
    and '.xml' is appended (alternative A). As alternative B we could use the plus sign ('+') as a separator instead.
  • File names could contain a-z, A-Z, 0-9, ".-_" (dot, minus, and underscore), '%' (per cent) as an escape marker, and plus and/or equal sign ('+' and/or '=') as escape characteras / separators depending on the alternative we choose.
  • When reading files only the guid would be checked. The fullKey part is purely for readability and is never used for identifying or reading files. NB - this requires a small machinery change
  • Keys with object links in them will be rather illegible, but this cannot be helped.
  • Parsing the file names will be used to get back the guid, but *not* the keys - there are too many backwards compartibility issues with key changes, class name changes, ... In the future we could in theory start parsing the file names to find the fullKeys, but I do not plan to do so. Some fullKeys contain object links and getting the object from the string sounds like too much work.
  • We will *not* rely on the same key always generating the same encoded string. Apart from the compatibility issues above, there is also the problem of ensuring that different URLEncoders give fullly identical results, including letter case.
  • Checking for key uniqueness requires us to read the files, but I will set it up so that we create only the TopObject and not the other objects, and so do not need to read the entire file.


Examples:

  • Alanine, Alternative A
    protein=Ala=msd_ccpnRef_2007-12-11-10-20-09_00022.xml
  • Alanine, Alternative B
    protein+Ala+msd_ccpnRef_2007-12-11-10-20-09_00022.xml
  • Molecule 'my molecule', Alternative A and C
    my%20molecule=zzz_user_2007-12-13-16-55-57_00001.xml
  • Molecule 'my molecule', Alternative B and C
    my%20molecule+zzz_user_2007-12-13-16-55-57_00001.xml
  • Molecule 'my molecule', Alternative A and D
    my+molecule=zzz_user_2007-12-13-16-55-57_00001.xml
  • Identifier for a resonance (that might be aprt of a key). Alternative C
    %3Cccp.nmr.Nmr.Resonance%20%5B%27nmr%27%2C%201%5D%3E'
  • Identifier for a resonance (that might be aprt of a key). Alternative D
    %3Cccp.nmr.Nmr.Resonance+%5B%27nmr%27%2C+1%5D%3E'