Readme

Detailed description of the program

<Usage>
To install, put the code in some directory and compile clusterpose.f with some Fortran compiler (e.g. g77).

To run Fortran program, type "clusterpose" and answer the questions.

To run Python program, type "python clusterpose.py" (requires Python Tkinter module).

User provided information required:
  List (or directory) of PDB files.
  List of atoms to use for comparison.

Comments from the Fortran source code:

C       CLUSTERPOSE
C       A CLUSTERING ROUTINE WITH ANNEALING AND CONSTELLATION DETERMINATION
C
C       R. DIAMOND
C       MRC LABORATORY FOR MOLECULAR BIOLOGY
C       HILLS ROAD
C       CAMBRIDGE CB2 2QH
C       ENGLAND
C
C       LAST UPDATE 9/2/95
C
C       SUBROUTINES HOW, TRIDI, EIGVAL, AND EIGVEC ARE DUE TO D. W. MATULA
C
C       TO MAKE A CCP4 VERSION OF THIS PROGRAM DE-COMMENT EVERY LINE WHICH
C       BEGINS WITH THE CHARACTER STRING CCP4 AND COMMENT-OUT THE PRECEDING
C       LINE.
C
C       THE PROGRAM IS DESIGNED TO BE RUN INTERACTIVELY USING A DIALOGUE, BUT
C       MAY ALSO TAKE ALL ITS COMMANDS EXCEPT THE FIRST FROM A CONTROL FILE.
C
C       INITIALLY, EACH STRUCTURE IS REGARDED AS A CLUSTER OF ONE. AT EACH
C       SUBSEQUENT STAGE THE TWO CLUSTERS YIELDING THE SMALLEST RESIDUAL IF
C       UNITED TO FORM A NEW CLUSTER ARE SO UNITED, THE RESIDUAL FUNCTION USED
C       TO CONTROL THIS PROCESS BEING ONE OF THREE POSSIBILITIES. ADDITIONALLY,
C       STRUCTURES MAY BE AGGREGATED BY ADDING THEM ONE AT A TIME IN THE ORDER
C       SPECIFIED BY THE INDEX FILE. THIS MAY BE OF INTEREST IF THAT FILE HAS
C       THE STRUCTURES ALREADY ARRANGED IN ENERGY ORDER, FOR EXAMPLE.
C
C       IT IS ASSUMED THAT EACH STRUCTURE IS GIVEN IN BROOKHAVEN FORMAT,
C       EACH IN ITS OWN FILE. EACH RECORD IS READ A80, THEN REREAD WITH
C       FORMAT (A4,I7,A19,3F8.3,A22) IF THE RECORD IS AN ATOM. THIS IS AT
C       STATEMENT 162.
C
C       AN INDEX FILE IS REQUIRED WHICH CONTAINS A LIST OF NAMES OF THESE
C       FILES, ONE TO EACH LINE, 132 CHARACTERS EACH.
C
C       ONLY `ATOM' RECORDS (COLS 1-4) ARE READ, ALL OTHER TYPES
C       OF RECORD BEING IGNORED, EXCEPT THAT `END' IN COLS 1-3 IS RECOGNISED
C       AS A TERMINATOR. THESE FILES ARE REQUIRED TO HAVE THE SAME ATOMS LISTED
C       IN THE SAME ORDER.
C
C       ATOMS TO BE FITTED MAY BE SELECTED IN ONE OF TWO WAYS
C       1.      BY DECLARING AN ATOM INCLUSION FILE WHICH LISTS THE LINE
C               NUMBERS OF THE ATOMS TO BE INCLUDED, ONE PER RECORD, OR
C       2.      BY DECLARING FIRST AND LAST INCLUSIVE LINE NUMBERS OF
C               INCLUDED ATOMS.
C       IN EITHER CASE `LINE NUMBER' REFERS TO THE NUMBER IN COLS 5-11 OF THE
C       INPUT FILES.
C       IN CASE 1 ATOMS MAY BE LISTED IN ANY ORDER, AND A ZERO LINE NUMBER
C       TERMINATES A DOMAIN. UP TO EIGHT DOMAINS MAY BE SO DEFINED.
C       IN CASE 2 UP TO EIGHT PAIRS OF FIRST AND LAST LINE NUMBERS MAY BE
C       GIVEN (FORMAT 16I5). A "FIRST" OF ZERO TERMINATES THE LIST, AND THE
C       DEFAULT OPTION (FIRST PAIR ZERO OR BLANK) INCLUDES EVERYTHING IN
C       ONE DOMAIN. CLUSTERING IS DONE ON A ONE-DOMAIN-ALL-STRUCTURES BASIS.
C       ANY ATOM MAY BE INCLUDED IN MORE THAN ONE DOMAIN SO THAT, FOR EXAMPLE,
C       DOMAIN 3 COULD BE THE UNION OF DOMAINS 1 AND 2. INCLUDING AN ATOM MORE
C       THAN ONCE WITHIN ONE DOMAIN HAS A WEIGHTING EFFECT, HOWEVER, IF THIS
C       WERE TO PRODUCE A DOMAIN SIZE GREATER THAN NN THIS WOULD BE AN
C       UNTRAPPED ERROR (A STRUCTURE SIZE EXCEEDING NN WOULD BE TRAPPED, AND
C       DOMAIN SIZE CANNOT OTHERWISE EXCEED THE STRUCTURE SIZE).
C      
C       THE PROGRAM IS A DEVELOPMENT OF POLYPOSE AND CONTAINS MANY OF THE SAME
C       FACILITIES. HOWEVER, IT DOES NOT PROVIDE A CHOICE OF TREATMENTS WHEN
C       ENANTIOMERS ARE ENCOUNTERED, THOUGH, IF DETECTED, THESE ARE REPORTED.
C       NOR DOES IT PROVIDE FOR THE REPORTING OF INTER-DOMAIN RELATIONSHIPS,
C       NOR FOR ORIENTING THE RESULTS ON THE PRINCIPAL AXES OF INERTIA,
C       NOR FOR SEARCHES FOR ALTERNATIVE SOLUTIONS. THIS LAST IS NOT RELEVANT
C       IN THE CONTEXT OF CLUSTERING BECAUSE EVERY FIT WHICH IS PERFORMED IS A
C       FIT OF ONE (INTERNALLY RIGID) CLUSTER ON ONE OTHER, TO WHICH THE
C       SOLUTION  IS ALWAYS UNIQUE.
C      
C       CHECKING OF ATOM LABELS IS NOW AN OPTION ON ANY CONTIGUOUS SUBSET
C       OF CHARACTERS IN COLUMNS 12 TO 30 OF THE INPUT FILES, OR NONE.
C      
C       FACILITIES PROVIDED WHICH HAVE NO COUNTERPART IN POLYPOSE INCLUDE THE
C       OPTION TO CALCULATE MULTI-DIMENSIONAL CONSTELLATIONS OF POINTS
C       REPRESENTING THE STRUCTURES CLUSTERED, AND THE FACILTY TO CALCULATE
C       INTER-CLUSTER CROSS-TERM ERRORS FOR EACH FITTED ATOM ON THE FORMATION
C       OF EACH NEW CLUSTER, THUS IDENTIFYING `HOT SPOTS' WHERE STRUCTURES
C       BEING COMBINED ARE PARTICULARLY DIFFERENT.
C
C       THE RESIDUALS WHICH MAY BE USED AS CLUSTERING CRITERIA ARE R1, R2 AND
C       R4 AS DEFINED IN THE CLUSTERING PAPER.
C
C               R1 IS THE RMS OF ALL INTER-STRUCTURE DIFFERENCES IN THE
C               NEWLY FORMED CLUSTER.
C
C               R2 IS THE RMS DIFFERENCE BETWEEN ALL STRUCTURES IN THE NEWLY
C               FORMED  CLUSTER AND THE AVERAGE STRUCTURE OF THE NEWLY FORMED
C               CLUSTER.
C
C               R4 IS THE RMS DIFFERENCE BETWEEN ALL STRUCTURES IN ONE OF THE
C               CLUSTERS BEING COMBINED AND ALL OF THE STRUCTURES IN THE OTHER
C               CLUSTER.
C
C               ADDITIONALLY, STRUCTURES MAY BE CLUSTERED BY SUCCESSIVE
C               ADDITION OF STRUCTURES, ONE AT A TIME, IN THE ORDER IN WHICH
C               THEY STAND IN THE INDEX FILE. THIS MAY BE OF SOME INTEREST IF
C               THEIR ORDERING IS BASED ON ENERGIES, FOR EXAMPLE.
C
C               R3 MAY ALSO BE DEFINED AS THE MEAN DIFFERENCE BETWEEN ALL
C               STRUCTURES IN THE NEWLY FORMED CLUSTER AND THE AVERAGE
C               STRUCTURE OF THE NEWLY FORMED STRUCTURE AND IS SIMILAR TO R2.
C               R3 MAY BE CALCULATED BUT IS NOT AVAILABLE AS A CLUSTERING
C               CRITERION. (R3 IS THE MEAN OVER STRUCTURES OF THE RMS
C               DIFFERENCE OVER ATOMS, WHEREAS R2 IS THE RMS OVER STRUCTURES
C               OF THE RMS DIFFERENCE OVER ATOMS.)
C      
C       SOME DISCUSSION OF THE CHARACTERISTICS OF THE CLUSTERING PROCESS
C       FLOWING FROM THE CHOICE OF CRITERION IS GIVEN IN THE PAPER.
C              
C       WHICHEVER CRITERION IS USED TO CONTROL THE CLUSTERING, R1, R2 AND R4
C       ARE REPORTED ON THE FORMATION OF EACH CLUSTER, THESE VALUES BEING
C       OBTAINABLE ANALYTICALLY.  R3 CAN BE CALCULATED AS A BY-PRODUCT OF THE
C       FIRST STEPS IN THE CONSTELLATION CALCULATIONS, WHICH ARE AVAILABLE
C       FOR EVERY CLUSTER, IF REQUIRED, AND THIS IS OFFERED AS AN OPTION,
C       INDEPENDENTLY OF THE OPTION TO CALCULATE THE CONSTELLATION COORDINATES.
C              
C       TRANSFORMED COORDINATES MAY BE CALCULATED FOR THE FINAL CLUSTER, IE
C       WHEN EVERY STRUCTURE IN THE ENSEMBLE HAS ENTERED THE SAME CLUSTER.
C       IF NO ANNEALING IS DONE THEN THE RELATIONSHIPS BETWEEN STRUCTURES
C       ESTABLISHED AT INTERMEDIATE STAGES ARE PRESERVED AND MAY BE DISPLAYED
C       FROM THE FINAL COORDINATES. IF ANNEALING IS DONE THEN THE FINAL
C       CLUSTER WILL BE THE SAME AS WOULD BE PROVIDED BY POLYPOSE. DIFFERENCES
C       BETWEEN ANNEALED AND UNANNEALED COORDINATES ARE USUALLY SLIGHT.
C
C       IF THE OPTION TO CALCULATE THE TRANSFORMED COORDINATES IS TAKEN, THEN
C       THESE ARE WRITTEN TO FILES WITH NAMES OF THE FORM nameDnn.CLU WHERE
C       `name' IS THE NAME OF THE CORRESPONDING INPUT FILE UP TO BUT EXCLUDING
C       A `.' OR THE FIRST TRAILING BLANK. nn IS A TWO DIGIT DOMAIN NUMBER
C       WITH RESPECT TO WHICH THE FITTING HAS BEEN DONE.
C
C       SUCH OUTPUT FILES CONTAIN A `FITTED ATOMS' PORTION (AS `REMARK's)
C       FOLLOWED BY A `WHOLE STRUCTURE' PORTION (AS `ATOM's) WHICH MAY BE
C       LARGELY REPETITIVE. THE `FITTED ATOMS' PORTION MAY EITHER BE IN PDB
C       FORMAT (`LONG' OPTION) OR ONLY A LIST OF LINE NUMBERS (`SHORT' OPTION).
C
C       ANY ONE OF THE STRUCTURES MAY BE DESIGNATED (BY KEEP) TO RETAIN ITS
C       ORIGINAL ORIENTATION, STRUCTURE 1 BEING THE DEFAULT OPTION.
C
C       IF THE OPTION TO CALCULATE THE AVERAGE OF THE ROTATED STRUCTURES
C       IS TAKEN, A FILE IS WRITTEN CONTAINING THE AVERAGE OF ALL THE ROTATED
C       STRUCTURES IN THE FINAL CLUSTER AND THE RMS DISTANCE OF THE INDIVIDUAL
C       ATOMS IN THESE STRUCTURES FROM THE AVERAGE STRUCTURE. THIS FILE IS
C       GIVEN THE FILE NAME nameDnn.AVE DERIVED AS ABOVE FROM THE INPUT FILE
C       WHICH IS DESIGNATED (BY KEEP) AS BEING THE ONE WHICH IS TO RETAIN ITS
C       ORIGINAL ORIENTATION. THERE IS ONE SUCH FILE FOR EACH DOMAIN FITTED.
C       IF THIS OPTION IS TAKEN, R1, R2 AND R3 FOR THE FINAL CLUSTER ARE ALSO
C       CALCULATED FROM THE OUTPUT COORDINATES, AND THE CONSISTENCY OF THESE
C       FIGURES WITH THOSE CALCULATED ANALYTICALLY DURING CLUSTERING FORMS A
C       VERY STRONG CHECK ON THE PERFORMANCE OF THE PROCESS.  IT ALSO MEANS
C       THAT IF IT IS KNOWN THAT THE FINAL AVERAGE STRUCTURE WILL BE REQUIRED
C       ANYWAY, THERE IS LESS NEED TO OBTAIN R3 EN-PASSANT.
C
C       IF THE OPTION TO CALCULATE THE ATOM-BY-ATOM CROSS-TERMS BETWEEN THE
C       TWO CLUSTERS FORMING ANY NEW CLUSTER IS TAKEN THEN THESE APPEAR AT THE
C       END OF THE MAIN OUTPUT FILE
C
C       THE OPTIONS TO OUTPUT THE ROTATION MATRICES, THE AVERAGE STRUCTURE,
C       THE TRANSFORMED COORDINATES, AND THE ATOM-BY-ATOM CROSS-TERM
C       ERRORS ARE PROGRESSIVE. OPTING FOR ANY OF THESE INCLUDES THE
C       PRECEDING OPTIONS ALSO.

<Versions>
This is a version from 1995, provided by David Neuhaus.

<Copyright>
Robert Diamond 1995

<License>
This software is distributed under the GNU GPL license (see www.gnu.org/copyleft/gpl.html).