Input File Format

PDB Input File Format

The PROSESS server requires either a PDB formatted file (for newly determined structures) or a PDB accession number (for previously determined structures) as input. The PDB files may consist of a single protein structure or chain or an ensemble of structures (up to 100) from an NMR structure calculation. The maximum number of residues is 10,000. Acceptable PDB formats include files with and without a PDB header.

The PDB Format


Example #1: The following is an example of a PDB format without a file header. This is typical of a structure that has been generated "in-house" via crystallography, NMR or homology modeling/prediction.

ATOM      1  N   MET A   1     -17.368   2.952  -4.017  1.00  0.00           N   
ATOM      2  CA  MET A   1     -17.520   3.028  -2.540  1.00  0.00           C   
ATOM      3  C   MET A   1     -16.387   3.833  -1.910  1.00  0.00           C   
ATOM      4  O   MET A   1     -15.417   4.188  -2.581  1.00  0.00           O   
ATOM      5  CB  MET A   1     -17.536   1.606  -1.975  1.00  0.00           C   
ATOM      6  CG  MET A   1     -18.575   1.395  -0.886  1.00  0.00           C   
ATOM      7  SD  MET A   1     -20.263   1.452  -1.517  1.00  0.00           S   
ATOM      8  CE  MET A   1     -21.121   2.182  -0.123  1.00  0.00           C   
ATOM      9 1H   MET A   1     -17.691   3.856  -4.415  1.00  0.00           H   
ATOM     10 2H   MET A   1     -16.362   2.788  -4.223  1.00  0.00           H   
ATOM     11 3H   MET A   1     -17.955   2.165  -4.357  1.00  0.00           H   
ATOM     12  HA  MET A   1     -18.461   3.510  -2.316  1.00  0.00           H   
ATOM     13 1HB  MET A   1     -17.743   0.915  -2.779  1.00  0.00           H   
ATOM     14 2HB  MET A   1     -16.563   1.383  -1.563  1.00  0.00           H   
ATOM     15 1HG  MET A   1     -18.409   0.430  -0.432  1.00  0.00           H   
ATOM     16 2HG  MET A   1     -18.457   2.167  -0.141  1.00  0.00           H   
ATOM     17 1HE  MET A   1     -21.436   3.183  -0.379  1.00  0.00           H   
ATOM     18 2HE  MET A   1     -21.986   1.584   0.119  1.00  0.00           H   
ATOM     19 3HE  MET A   1     -20.458   2.220   0.727  1.00  0.00           H   
ATOM     20  N   ASP A   2     -16.518   4.119  -0.619  1.00  0.00           N   
ATOM     21  CA  ASP A   2     -15.506   4.883   0.101  1.00  0.00           C   
ATOM     22  C   ASP A   2     -14.311   4.004   0.457  1.00  0.00           C   
ATOM     23  O   ASP A   2     -14.378   3.188   1.377  1.00  0.00           O   
ATOM     24  CB  ASP A   2     -16.107   5.491   1.369  1.00  0.00           C   
ATOM     25  CG  ASP A   2     -15.339   6.706   1.848  1.00  0.00           C   
ATOM     26  OD1 ASP A   2     -14.149   6.558   2.193  1.00  0.00           O   
ATOM     27  OD2 ASP A   2     -15.929   7.808   1.878  1.00  0.00           O   
ATOM     28  H   ASP A   2     -17.315   3.810  -0.139  1.00  0.00           H   
ATOM     29  HA  ASP A   2     -15.171   5.680  -0.546  1.00  0.00           H   
ATOM     30 1HB  ASP A   2     -17.126   5.788   1.171  1.00  0.00           H   
ATOM     31 2HB  ASP A   2     -16.099   4.749   2.155  1.00  0.00           H   
ATOM     32  N   ARG A   3     -13.217   4.176  -0.280  1.00  0.00           N   
ATOM     33  CA  ARG A   3     -12.002   3.404  -0.047  1.00  0.00           C   
ATOM     34  C   ARG A   3     -10.947   4.248   0.658  1.00  0.00           C   
ATOM     35  O   ARG A   3     -10.179   4.962   0.013  1.00  0.00           O   
ATOM     36  CB  ARG A   3     -11.444   2.881  -1.373  1.00  0.00           C   
ATOM     37  CG  ARG A   3     -12.122   1.613  -1.868  1.00  0.00           C   
ATOM     38  CD  ARG A   3     -12.676   1.788  -3.273  1.00  0.00           C   

Example #2: The following is an example of a PDB format with a file header. This is typical of a structure downloaded from the PDB.

HEADER    ELECTRON TRANSPORT                      19-MAR-90   2TRX              
TITLE     CRYSTAL STRUCTURE OF THIOREDOXIN FROM ESCHERICHIA COLI AT             
TITLE    2 1.68 ANGSTROMS RESOLUTION                                            
COMPND    MOL_ID: 1;                                                            
COMPND   2 MOLECULE: THIOREDOXIN;                                               
COMPND   3 CHAIN: A, B;                                                         
COMPND   4 ENGINEERED: YES                                                      
SOURCE    MOL_ID: 1;                                                            
SOURCE   2 ORGANISM_SCIENTIFIC: ESCHERICHIA COLI;                               
SOURCE   3 ORGANISM_TAXID: 562                                                  
KEYWDS    ELECTRON TRANSPORT                                                    
EXPDTA    X-RAY DIFFRACTION                                                     
AUTHOR    S.K.KATTI,D.M.LEMASTER,H.EKLUND                                       
REVDAT   4   24-FEB-09 2TRX    1       VERSN                                    
REVDAT   3   01-APR-03 2TRX    1       JRNL                                     
REVDAT   2   15-JAN-93 2TRX    1       HEADER COMPND                            
REVDAT   1   15-OCT-91 2TRX    0                                                
JRNL        AUTH   S.K.KATTI,D.M.LEMASTER,H.EKLUND                              
JRNL        TITL   CRYSTAL STRUCTURE OF THIOREDOXIN FROM ESCHERICHIA            
JRNL        TITL 2 COLI AT 1.68 A RESOLUTION.                                   
JRNL        REF    J.MOL.BIOL.                   V. 212   167 1990              
JRNL        REFN                   ISSN 0022-2836                               
JRNL        PMID   2181145                                                      
JRNL        DOI    10.1016/0022-2836(90)90313-B                                 
REMARK   1                                                                      
REMARK   1 REFERENCE 1                                                          
REMARK   1  AUTH   A.HOLMGREN,B.-O.SODERBERG,H.EKLUND,C.-I.BRANDEN              
REMARK   1  TITL   THREE-DIMENSIONAL STRUCTURE OF ESCHERICHIA COLI              
REMARK   1  TITL 2 THIOREDOXIN-S2 TO 2.8 ANGSTROMS RESOLUTION                   
REMARK   1  REF    PROC.NATL.ACAD.SCI.USA        V.  72  2305 1975              
REMARK   1  REFN                   ISSN 0027-8424                               
REMARK   1 REFERENCE 2                                                          
REMARK   1  AUTH   B.-O.SODERBERG,A.HOLMGREN,C.-I.BRANDEN                       
REMARK   1  TITL   STRUCTURE OF OXIDIZED THIOREDOXIN TO 4.5 ANGSTROMS           
REMARK   1  TITL 2 RESOLUTION                                                   
REMARK   1  REF    J.MOL.BIOL.                   V.  90   143 1974              
REMARK   1  REFN                   ISSN 0022-2836                               
REMARK   1 REFERENCE 3                                                          
REMARK   1  AUTH   A.HOLMGREN,B.-O.SODERBERG                                    
REMARK   1  TITL   CRYSTALLIZATION AND PRELIMINARY CRYSTALLOGRAPHIC             
REMARK   1  TITL 2 DATA FOR THIOREDOXIN FROM ESCHERICHIA COLI B                 
REMARK   1  REF    J.MOL.BIOL.                   V.  54   387 1970              
REMARK   1  REFN                   ISSN 0022-2836                               
REMARK   2                                                                      
REMARK   2 RESOLUTION.    1.68 ANGSTROMS.                                       
REMARK   3                                                                      
REMARK   3 REFINEMENT.                                                          
REMARK   3   PROGRAM     : PROFFT                                               
REMARK   3   AUTHORS     : KONNERT,HENDRICKSON,FINZEL                           
REMARK   3                                                                      
REMARK   3  DATA USED IN REFINEMENT.                                            
REMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 1.68                           
REMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 8.00                           
REMARK   3   DATA CUTOFF            (SIGMA(F)) : 3.000                          
REMARK   3   COMPLETENESS FOR RANGE        (%) : NULL                           
REMARK   3   NUMBER OF REFLECTIONS             : 25969                          
REMARK   3                                                                      
REMARK   3  NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT.                    
REMARK   3   PROTEIN ATOMS            : 1688                                    
REMARK   3   NUCLEIC ACID ATOMS       : 0                                       
REMARK   3   HETEROGEN ATOMS          : 2                                       
REMARK   3   SOLVENT ATOMS            : 196                                     
REMARK   3                                                                      
REMARK   3  B VALUES.                                                           
REMARK   3   FROM WILSON PLOT           (A**2) : NULL                           
REMARK   3   MEAN B VALUE      (OVERALL, A**2) : NULL                           
REMARK   3   OVERALL ANISOTROPIC B VALUE.                                       
REMARK   3    B11 (A**2) : NULL                                                 
REMARK   3    B22 (A**2) : NULL                                                 
REMARK   3    B33 (A**2) : NULL                                                 
REMARK   3    B12 (A**2) : NULL                                                 
REMARK   3    B13 (A**2) : NULL                                                 
REMARK   3    B23 (A**2) : NULL                                                 
REMARK   3                                                                      
REMARK   3  RMS DEVIATIONS FROM IDEAL VALUES.                                   
REMARK   3   DISTANCE RESTRAINTS.                    RMS    SIGMA               
REMARK   3    BOND LENGTH                     (A) : 0.015 ; 0.020               
REMARK   3    ANGLE DISTANCE                  (A) : 0.035 ; 0.030               
REMARK   3    INTRAPLANAR 1-4 DISTANCE        (A) : 0.055 ; 0.050               
REMARK   3    H-BOND OR METAL COORDINATION    (A) : NULL  ; NULL                
REMARK   3                                                                      
REMARK   3   PLANE RESTRAINT                  (A) : 0.021 ; 0.020               
REMARK   3   CHIRAL-CENTER RESTRAINT       (A**3) : 0.131 ; 0.150               
REMARK   3                                                                      
REMARK   3   NON-BONDED CONTACT RESTRAINTS.                                     
REMARK   3    SINGLE TORSION                  (A) : 0.165 ; 0.500               
REMARK   3    MULTIPLE TORSION                (A) : 0.174 ; 0.500               
REMARK   3    H-BOND (X...Y)                  (A) : NULL  ; NULL                
REMARK   3    H-BOND (X-H...Y)                (A) : 0.180 ; 0.500               
REMARK   3                                                                      
REMARK   3   CONFORMATIONAL TORSION ANGLE RESTRAINTS.                           
REMARK   3    SPECIFIED                 (DEGREES) : NULL  ; NULL                
REMARK   3    PLANAR                    (DEGREES) : 4.000 ; 3.000               
REMARK   3    STAGGERED                 (DEGREES) : 16.300; 15.000              
REMARK   3    TRANSVERSE                (DEGREES) : NULL  ; NULL                
REMARK   3                                                                      
REMARK   3  ISOTROPIC THERMAL FACTOR RESTRAINTS.    RMS    SIGMA                
REMARK   3   MAIN-CHAIN BOND               (A**2) : 1.380 ; 1.000               
REMARK   3   MAIN-CHAIN ANGLE              (A**2) : 2.280 ; 1.000               
REMARK   3   SIDE-CHAIN BOND               (A**2) : 1.970 ; 1.000               
REMARK   3   SIDE-CHAIN ANGLE              (A**2) : 3.270 ; 1.500               
REMARK   3                                                                                                                                        
REMARK 280 CRYSTAL                                                              
REMARK 280 SOLVENT CONTENT, VS   (%): 54.58                                     
REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.71                     
REMARK 280                                                                      
REMARK 280 CRYSTALLIZATION CONDITIONS: NULL                                     
REMARK 290                                                                      
REMARK 290 CRYSTALLOGRAPHIC SYMMETRY                                            
REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: C 1 2 1                          
REMARK 290                                                                      
REMARK 290      SYMOP   SYMMETRY                                                
REMARK 290     NNNMMM   OPERATOR                                                
REMARK 290       1555   X,Y,Z                                                   
REMARK 290       2555   -X,Y,-Z                                                 
REMARK 290       3555   X+1/2,Y+1/2,Z                                           
REMARK 290       4555   -X+1/2,Y+1/2,-Z                                         
REMARK 290                                                                      
REMARK 290     WHERE NNN -> OPERATOR NUMBER                                     
REMARK 290           MMM -> TRANSLATION VECTOR                                  
REMARK 290                                                                      
REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS                            
REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM             
REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY                
REMARK 290 RELATED MOLECULES.                                                   
REMARK 290   SMTRY1   1  1.000000  0.000000  0.000000        0.00000            
REMARK 290   SMTRY2   1  0.000000  1.000000  0.000000        0.00000            
REMARK 290   SMTRY3   1  0.000000  0.000000  1.000000        0.00000            
REMARK 290   SMTRY1   2 -1.000000  0.000000  0.000000        0.00000            
REMARK 290   SMTRY2   2  0.000000  1.000000  0.000000        0.00000            
REMARK 290   SMTRY3   2  0.000000  0.000000 -1.000000        0.00000            
REMARK 290   SMTRY1   3  1.000000  0.000000  0.000000       44.75000            
REMARK 290   SMTRY2   3  0.000000  1.000000  0.000000       25.53000            
REMARK 290   SMTRY3   3  0.000000  0.000000  1.000000        0.00000            
REMARK 290   SMTRY1   4 -1.000000  0.000000  0.000000       44.75000            
REMARK 290   SMTRY2   4  0.000000  1.000000  0.000000       25.53000            
REMARK 290   SMTRY3   4  0.000000  0.000000 -1.000000        0.00000            
REMARK 290                                                                      
DBREF  2TRX A    1   108  UNP    P0AA25   THIO_ECOLI       1    108             
DBREF  2TRX B    1   108  UNP    P0AA25   THIO_ECOLI       1    108             
SEQRES   1 A  108  SER ASP LYS ILE ILE HIS LEU THR ASP ASP SER PHE ASP          
SEQRES   2 A  108  THR ASP VAL LEU LYS ALA ASP GLY ALA ILE LEU VAL ASP          
SEQRES   3 A  108  PHE TRP ALA GLU TRP CYS GLY PRO CYS LYS MET ILE ALA          
SEQRES   4 A  108  PRO ILE LEU ASP GLU ILE ALA ASP GLU TYR GLN GLY LYS          
SEQRES   5 A  108  LEU THR VAL ALA LYS LEU ASN ILE ASP GLN ASN PRO GLY          
SEQRES   6 A  108  THR ALA PRO LYS TYR GLY ILE ARG GLY ILE PRO THR LEU          
SEQRES   7 A  108  LEU LEU PHE LYS ASN GLY GLU VAL ALA ALA THR LYS VAL          
SEQRES   8 A  108  GLY ALA LEU SER LYS GLY GLN LEU LYS GLU PHE LEU ASP          
SEQRES   9 A  108  ALA ASN LEU ALA                                              
SEQRES   1 B  108  SER ASP LYS ILE ILE HIS LEU THR ASP ASP SER PHE ASP          
SEQRES   2 B  108  THR ASP VAL LEU LYS ALA ASP GLY ALA ILE LEU VAL ASP          
SEQRES   3 B  108  PHE TRP ALA GLU TRP CYS GLY PRO CYS LYS MET ILE ALA          
SEQRES   4 B  108  PRO ILE LEU ASP GLU ILE ALA ASP GLU TYR GLN GLY LYS          
SEQRES   5 B  108  LEU THR VAL ALA LYS LEU ASN ILE ASP GLN ASN PRO GLY          
SEQRES   6 B  108  THR ALA PRO LYS TYR GLY ILE ARG GLY ILE PRO THR LEU          
SEQRES   7 B  108  LEU LEU PHE LYS ASN GLY GLU VAL ALA ALA THR LYS VAL          
SEQRES   8 B  108  GLY ALA LEU SER LYS GLY GLN LEU LYS GLU PHE LEU ASP          
SEQRES   9 B  108  ALA ASN LEU ALA                                              
HET     CU  A 109       1                                                       
HET     CU  B 109       1                                                       
HET    MPD  A 601       8                                                       
HET    MPD  B 602       8                                                       
HET    MPD  B 603       8                                                       
HET    MPD  B 604       8                                                       
HET    MPD  A 605       8                                                       
HET    MPD  A 606       8                                                       
HET    MPD  A 607       8                                                       
HETNAM      CU COPPER (II) ION                                                  
HETNAM     MPD (4S)-2-METHYL-2,4-PENTANEDIOL                                    
FORMUL   3   CU    2(CU 2+)                                                     
FORMUL   5  MPD    7(C6 H14 O2)                                                 
FORMUL  12  HOH   *140(H2 O)                                                    
HELIX    1 A1A SER A   11  LEU A   17  1DISORDERED IN MOLECULE B           7    
HELIX    2 A2A CYS A   32  TYR A   49  1BENT BY 30 DEGREES AT RES 39      18    
HELIX    3 A3A ASN A   59  ASN A   63  1                                   5    
HELIX    4 31A THR A   66  TYR A   70  5DISTORTED H-BONDING C-TERMINS      5    
HELIX    5 A4A SER A   95  LEU A  107  1                                  13    
HELIX    6 A1B SER B   11  LEU B   17  1DISORDERED IN MOLECULE B           7    
HELIX    7 A2B CYS B   32  TYR B   49  1BENT BY 30 DEGREES AT RES 39      18    
HELIX    8 A3B ASN B   59  ASN B   63  1                                   5    
HELIX    9 31B THR B   66  TYR B   70  5DISTORTED H-BONDING C-TERMINS      5    
HELIX   10 A4B SER B   95  LEU B  107  1                                  13    
SHEET    1 B1A 5 LYS A   3  THR A   8  0                                        
SHEET    2 B1A 5 LEU A  53  ASN A  59  1  O  VAL A  55   N  ILE A   5           
SHEET    3 B1A 5 GLY A  21  TRP A  28  1  N  TRP A  28   O  LEU A  58           
SHEET    4 B1A 5 PRO A  76  LYS A  82 -1  O  THR A  77   N  PHE A  27           
SHEET    5 B1A 5 VAL A  86  GLY A  92 -1  O  ALA A  87   N  LEU A  80           
SHEET    1 B1B 5 LYS B   3  THR B   8  0                                        
SHEET    2 B1B 5 LEU B  53  ASN B  59  1  O  VAL B  55   N  ILE B   5           
SHEET    3 B1B 5 GLY B  21  TRP B  28  1  N  TRP B  28   O  LEU B  58           
SHEET    4 B1B 5 PRO B  76  LYS B  82 -1  O  THR B  77   N  PHE B  27           
SHEET    5 B1B 5 VAL B  86  GLY B  92 -1  O  ALA B  87   N  LEU B  80           
SSBOND   1 CYS A   32    CYS A   35                          1555   1555  2.09  
SSBOND   2 CYS B   32    CYS B   35                          1555   1555  2.05  
LINK        CU    CU A 109                 N   SER A   1     1555   1555  2.05  
LINK        CU    CU A 109                 N   ASP A   2     1555   1555  2.06  
LINK        CU    CU A 109                 OD1 ASP A   2     1555   1555  2.00  
LINK        CU    CU A 109                 O   HOH A 405     1555   1555  2.65  
LINK        CU    CU B 109                 N   ASP B   2     1555   1555  2.05  
LINK        CU    CU B 109                 O   HOH B 478     1555   1555  2.63  
LINK        CU    CU B 109                 OD1 ASP B   2     1555   1555  2.06  
LINK        CU    CU B 109                 N   SER B   1     1555   1555  2.09  
LINK        CU    CU A 109                 OD1 ASP A  10     1555   4545  1.97  
LINK        CU    CU A 109                 OD2 ASP A  10     1555   4545  2.62  
LINK        CU    CU B 109                 OD1 ASP B  10     1555   4546  2.08  
LINK        CU    CU B 109                 OD2 ASP B  10     1555   4546  2.54  
CISPEP   1 ILE A   75    PRO A   76          0         0.60                     
CISPEP   2 ILE B   75    PRO B   76          0        -2.42                     
SITE     1 AC1  5 SER A   1  ASP A   2  LYS A   3  ASP A  10                    
SITE     2 AC1  5 HOH A 405                                                     
SITE     1 AC2  5 SER B   1  ASP B   2  LYS B   3  ASP B  10                    
SITE     2 AC2  5 HOH B 478                                                     
SITE     1 AC3  4 ASP A  10  ASP A  43  GLU A  44  HOH A 442                    
SITE     1 AC4  6 GLU A  44  HOH A 524  GLU B  30  TRP B  31                    
SITE     2 AC4  6 GLY B  33  LYS B  36                                          
SITE     1 AC5  5 TYR B  70  ILE B  72  THR B  77  THR B  89                    
SITE     2 AC5  5 VAL B  91                                                     
SITE     1 AC6  3 ILE B  60  ALA B  67  ILE B  72                               
SITE     1 AC7  4 MET A  37  ILE A  38  ALA A  93  LEU A  94                    
SITE     1 AC8  4 TYR A  70  GLY A  71  THR A  89  VAL A  91                    
SITE     1 AC9  8 ILE A  60  ALA A  67  ILE A  72  ARG A  73                    
SITE     2 AC9  8 GLY A  74  ILE A  75  HOH A 494  HOH B 528                    
CRYST1   89.500   51.060   60.450  90.00 113.50  90.00 C 1 2 1       8          
ORIGX1      1.000000  0.000000  0.000000        0.00000                         
ORIGX2      0.000000  1.000000  0.000000        0.00000                         
ORIGX3      0.000000  0.000000  1.000000        0.00000                         
SCALE1      0.011173  0.000000  0.004858        0.00000                         
SCALE2      0.000000  0.019585  0.000000        0.00000                         
SCALE3      0.000000  0.000000  0.018039        0.00000                         
ATOM      1  N   SER A   1      21.389  25.406  -4.628  1.00 23.22           N  
ATOM      2  CA  SER A   1      21.628  26.691  -3.983  1.00 24.42           C  
ATOM      3  C   SER A   1      20.937  26.944  -2.679  1.00 24.21           C  
ATOM      4  O   SER A   1      21.072  28.079  -2.093  1.00 24.97           O  
ATOM      5  CB  SER A   1      21.117  27.770  -5.002  1.00 28.27           C  
ATOM      6  OG  SER A   1      22.276  27.925  -5.861  1.00 32.61           O  
ATOM      7  N   ASP A   2      20.173  26.028  -2.163  1.00 21.39           N  
ATOM      8  CA  ASP A   2      19.395  26.125  -0.949  1.00 21.57           C  
ATOM      9  C   ASP A   2      20.264  26.214   0.297  1.00 20.89           C  
ATOM     10  O   ASP A   2      19.760  26.575   1.371  1.00 21.49           O  
ATOM     11  CB  ASP A   2      18.439  24.914  -0.856  1.00 22.14           C  
ATOM     12  CG  ASP A   2      19.199  23.629  -0.576  1.00 23.23           C  
ATOM     13  OD1 ASP A   2      20.107  23.371  -1.387  1.00 22.71           O  
ATOM     14  OD2 ASP A   2      18.905  22.959   0.420  1.00 23.61           O  
ATOM     15  N   LYS A   3      21.530  25.857   0.207  1.00 19.20           N  
ATOM     16  CA  LYS A   3      22.310  25.875   1.488  1.00 18.91           C  
ATOM     17  C   LYS A   3      23.353  26.982   1.459  1.00 18.43           C  
ATOM     18  O   LYS A   3      24.203  26.950   2.370  1.00 20.34           O  
ATOM     19  CB  LYS A   3      23.006  24.540   1.741  1.00 20.31           C  
ATOM     20  CG  LYS A   3      21.971  23.407   1.921  1.00 22.14           C  
ATOM     21  CD  LYS A   3      22.677  22.143   2.401  1.00 24.45           C  
ATOM     22  CE  LYS A   3      21.620  21.104   2.844  1.00 25.84           C  
ATOM     23  NZ  LYS A   3      20.830  20.757   1.615  1.00 25.55           N  
ATOM     24  N   ILE A   4      23.299  27.821   0.461  1.00 17.03           N  
ATOM     25  CA  ILE A   4      24.287  28.908   0.332  1.00 17.28           C  
ATOM     26  C   ILE A   4      23.779  30.213   0.927  1.00 17.70           C  
ATOM     27  O   ILE A   4      22.691  30.658   0.487  1.00 19.79           O  
ATOM     28  CB  ILE A   4      24.592  29.122  -1.211  1.00 19.04           C  
ATOM     29  CG1 ILE A   4      24.953  27.791  -1.886  1.00 19.62           C  
ATOM     30  CG2 ILE A   4      25.689  30.221  -1.338  1.00 19.70           C  
ATOM     31  CD1 ILE A   4      26.177  27.022  -1.384  1.00 21.32           C  
ATOM     32  N   ILE A   5      24.492  30.834   1.831  1.00 15.41           N  
ATOM     33  CA  ILE A   5      24.075  32.125   2.372  1.00 15.87           C

Chemical Shift Input File Format

PROSESS accepts and processes backbone and side chain 1H, 13C or 15N chemical shift data of almost any combination (HA only, HN only, HA+HN only, HA+HN+sidechain H, CA only, CA+CB only, CA+CO only, HA+CA+CB, HN+CA+CB, HN+15N only, HN,+15N+CA, HN+15N+CA+CB, etc.). This allows PROSESS to handle small peptides (where only H shifts are typically measured) to large proteins (where only N or C shifts might be available).??The input file must include sequence data and chemical shift data either in BMRB STAR 2.1 (or 2.1.1) format or SHIFTY format. The minimum sequence length is 3 residues. The maximum is 1000 residues.?

The BMRB Format

Examples of allowable BMRB file formats (with and without different headers) are shown below:

Example #1: This is an example of a generic BMRB file extracted from the BMRB. The entire file is ~500 lines, and only a portion is shown here. The header file is not important for PROSESS data processing,only the chemical shift list (at the bottom of the file). PROSESS ignores most (if not all) of the header text.


data_548

#######################
#  Entry information  #
#######################

save_entry_information
   _Saveframe_category      entry_information

   _Entry_title            
;
Sequence-Specific 1H NMR Assignment and Secondary Structure of Neuropeptide Y in
 Aqueous Solution
;

   loop_
      _Author_ordinal
      _Author_family_name
      _Author_given_name
      _Author_middle_initials
      _Author_family_title

      1 Saudek Vladimir .  . 
      2 Pelton John     T. . 

   stop_

   _BMRB_accession_number   548
   _BMRB_flat_file_name     bmr548.str
   _Entry_type              revision
   _Submission_date         1995-07-31
   _Accession_date          1996-04-12
   _Entry_origination       BMRB
   _NMR_STAR_version        2.1
   _Experimental_method     NMR

ETC.
ETC.

   loop_
      _Atom_shift_assign_ID
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1 TYR   HA    H 4.53 . 1 
        2  1 TYR   HB2   H 3.05 . 2 
        3  1 TYR   HB3   H 3.28 . 2 
        4  1 TYR   HD1   H 7.28 . 1 
        5  1 TYR   HD2   H 7.28 . 1 
        6  1 TYR   HE1   H 6.93 . 1 
        7  1 TYR   HE2   H 6.93 . 1 
        8  2 PRO   HA    H 4.59 . 1 
        9  2 PRO   HB2   H 2.01 . 2 
       10  2 PRO   HB3   H 2.39 . 2 
       11  2 PRO   HG2   H 1.48 . 1 
       12  2 PRO   HG3   H 1.48 . 1 
       13  2 PRO   HD2   H 3.38 . 2 
       14  2 PRO   HD3   H 3.74 . 2 
       15  3 SER   H     H 8.42 . 1 
       16  3 SER   HA    H 4.38 . 1 
       17  3 SER   HB2   H 3.83 . 1 
       18  3 SER   HB3   H 3.83 . 1

Example #2: This is an example of a slightly shortened BMRB format where only the assigned chemical shift section of the BMRB file is provided.

##############################
#  assigned chemical shifts  #
##############################



save_assigned_chem_shift_list_1
   _Saveframe_category               assigned_chemical_shifts


   loop_
      _Software_label

      $NMRPipe 

   stop_

   loop_
      _Sample_label

      $sample_1 
      $sample_2 

   stop_

   _Sample_conditions_label         $sample_conditions_1
   _Chem_shift_reference_set_label  $chemical_shift_reference_1
   _Mol_system_component_name        entity_1

   loop_
      _Atom_shift_assign_ID
      _Residue_author_seq_code
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1  1 GLY HA2  H   4.44 0.0300 2 
        2  1  1 GLY HA3  H   3.72 0.0300 2 
        3  1  1 GLY CA   C  44.81 0.4000 1 
        4  2  2 SER H    H   8.70 0.0300 1 
        5  2  2 SER N    N 121.24 0.4000 1 
        6  4  4 MET HA   H   4.30 0.0300 1 
        7  4  4 MET HB2  H   2.11 0.0300 2 
        8  4  4 MET HB3  H   1.94 0.0300 2 
        9  4  4 MET HG2  H   2.30 0.0300 2 
       10  4  4 MET HG3  H   2.30 0.0300 2 
       11  4  4 MET C    C 172.22 0.4000 1 
       12  4  4 MET CA   C  55.62 0.4000 1 
       13  4  4 MET CB   C  29.60 0.4000 1

Example #3: This is an example of the simplest BMRB format that PROSESS accepts. Only the chemical shift list is provided with no preceding data tags. The number of columns in this example is 9.

1  1  1 GLY HA2  H   4.44 0.0300 2 
        2  1  1 GLY HA3  H   3.72 0.0300 2 
        3  1  1 GLY CA   C  44.81 0.4000 1 
        4  2  2 SER H    H   8.70 0.0300 1 
        5  2  2 SER N    N 121.24 0.4000 1 
        6  4  4 MET HA   H   4.30 0.0300 1 
        7  4  4 MET HB2  H   2.11 0.0300 2 
        8  4  4 MET HB3  H   1.94 0.0300 2 
        9  4  4 MET HG2  H   2.30 0.0300 2 
       10  4  4 MET HG3  H   2.30 0.0300 2 
       11  4  4 MET C    C 172.22 0.4000 1 
       12  4  4 MET CA   C  55.62 0.4000 1
       13  4  4 MET CB   C  29.60 0.4000 1

Example #4: This is another example of a simplified BMRB format that PROSESS also accepts. The number of data columns in this example is 8. The minimum number of columns that PROSESS accepts is 8. If no data is available for the chemical shift error or ambiguity, these values can be replaced by a period (as seen in this example).

loop_
      _Atom_shift_assign_ID
      _Residue_author_seq_code
      _Residue_seq_code
      _Residue_label
      _Atom_name
      _Atom_type
      _Chem_shift_value
      _Chem_shift_value_error
      _Chem_shift_ambiguity_code

        1  1 GLY HA2  H   4.44 . . 
        2  1 GLY HA3  H   3.72 . . 
        3  1 GLY CA   C  44.81 . . 
        4  2 SER H    H   8.70 . . 
        5  2 SER N    N 121.24 . . 
        6  4 MET HA   H   4.30 . . 
        7  4 MET HB2  H   2.11 . . 
        8  4 MET HB3  H   1.94 . . 
        9  4 MET HG2  H   2.30 . . 
       10  4 MET HG3  H   2.30 . . 
       11  4 MET C    C 172.22 . . 
       12  4 MET CA   C  55.62 . . 
       13  4 MET CB   C  29.60 . .

Example #5: Here is another example of an acceptable BMRB format. In this situation the "case" of the assignment loop is upper case (instead of the usual lower case). The number of data columns is 9,even though the Author_seq_code and residue_seq_code are duplicated.

loop_
      _ATOM_SHIFT_ASSIGN_ID
      _RESIDUE_AUTHOR_SEQ_CODE
      _RESIDUE_SEQ_CODE
      _RESIDUE_LABEL
      _ATOM_NAME
      _ATOM_TYPE
      _CHEM_SHIFT_VALUE
      _CHEM_SHIFT_VALUE_ERROR
      _CHEM_SHIFT_AMBIGUITY_CODE

	1  1  1 GLY HA2  H   4.44 0.0300 . 
        2  1  1 GLY HA3  H   3.72 0.0300 . 
        3  1  1 GLY CA   C  44.81 0.4000 . 
        4  2  2 SER H    H   8.70 0.0300 . 
        5  2  2 SER N    N 121.24 0.4000 . 
        6  4  4 MET HA   H   4.30 0.0300 . 
        7  4  4 MET HB2  H   2.11 0.0300 . 
        8  4  4 MET HB3  H   1.94 0.0300 . 
        9  4  4 MET HG2  H   2.30 0.0300 . 
       10  4  4 MET HG3  H   2.30 0.0300 . 
       11  4  4 MET C    C 172.22 0.4000 . 
       12  4  4 MET CA   C  55.62 0.4000 . 
       13  4  4 MET CB   C  29.60 0.4000 .

Example #6: In this example the data is presented in a tab-delimited format rather than following the usual 3-character spacing found in most BMRB files. Comments have also been added below the chemical shift assignment loop and above the data columns. This format (and modest variations of it) is also accepted by PROSESS.

loop_
      _ATOM_CHEM_SHIFT.ID
      _ATOM_CHEM_SHIFT.COMP_INDEX_ID
      _ATOM_CHEM_SHIFT.COMP_ID
      _ATOM_CHEM_SHIFT.ATOM_ID
      _ATOM_CHEM_SHIFT.ATOM_TYPE
      _ATOM_CHEM_SHIFT.VAL
      _ATOM_CHEM_SHIFT.VAL_ERR
      _ATOM_CHEM_SHIFT.AMBIGUITY_CODE
      _ATOM_CHEM_SHIFT.OCCUPANCY
#
# some comments placed here
# more comments
#
1	1  	GLY 	HA2  	H   	4.44     0.0300     2 
2	1  	GLY 	HA3  	H   	3.72     0.0300     2 
3	1  	GLY 	CA   	C   	44.81    0.4000     1 
4	2  	SER 	H    	H   	8.70     0.0300     1 
5	2  	SER 	N    	N   	121.24   0.4000     1 
6	4  	MET 	HA   	H   	4.30     0.0300     1 
7	4  	MET 	HB2  	H   	2.11     0.0300     2 
8	4  	MET 	HB3  	H   	1.94     0.0300     2 
9	4  	MET 	HG2  	H   	2.30     0.0300     2 
10	4  	MET 	HG3  	H   	2.30     0.0300     2 
11	4  	MET 	C    	C   	172.22   0.4000     1 
12	4  	MET 	CA   	C  	    55.62    0.4000     1 
13	4  	MET 	CB   	C  	    29.60    0.4000     1

Example #7: In this example the data is presented in a single-space-delimited format rather than following the usual 3-character spacing found in most BMRB files. Comments have also been added below the chemical shift assignment loop and above the data columns. This format (and modest variations of it) is also accepted by PROSESS.

loop_
      _ATOM_CHEM_SHIFT.ID
      _ATOM_CHEM_SHIFT.COMP_INDEX_ID
      _ATOM_CHEM_SHIFT.COMP_ID
      _ATOM_CHEM_SHIFT.ATOM_ID
      _ATOM_CHEM_SHIFT.ATOM_TYPE
      _ATOM_CHEM_SHIFT.VAL
      _ATOM_CHEM_SHIFT.VAL_ERR
      _ATOM_CHEM_SHIFT.VAL_ERROR
      _ATOM_CHEM_SHIFT.AMBIGUITY_CODE
      _ATOM_CHEM_SHIFT.OCCUPANCY
      _ATOM_CHEM_SHIFT.DETAILS
#
# some comments placed here
# more comments

1 1 1 GLY HA2 H 4.44 0.03 2. 
2 1 1 GLY HA3 H 3.72 0.03 2. 
3 1 1 GLY CA C 44.81 0.4 1. 
4 2 2 SER H H 8.70 0.03 1. 
5 2 2 SER N N 121.24 0.4 1. 
6 4 4 MET HA H 4.30 0.03 2. 
7 4 4 MET HB2 H 2.11 0.03 2. 
8 4 4 MET HB3 H 1.94 0.03 2. 
9 4 4 MET HG2 H 2.30 0.03 2. 
10 4 4 MET HG3 H 2.30 0.03 1. 
11 4 4 MET C C 172.22 0.4 1. 
12 4 4 MET CA C 55.62 0.4 1. 

NMR Exchange Format (NEF)


You can find information about NMR exchange format (NEF) here.

Use this file to test if the RCI server works with a NEF file.


The SHIFTY Format

The SHIFTY file format is a simplified chemical shift data entry format developed in the Sykes Lab in 1991 and is one of the more common alternate formats for chemical shift information. Examples of allowable SHIFTY formats are shown below (note that any combination of shifts may be listed in any order, just as long as the columns are labeled with a header). The first line header is essential. The header can be matched to the column positions or it can be presented as a single spaced row. Minimally a SHIFTY file must have 3 columns: a residue number column, the single letter residue name column and a chemical shift column. Unmeasured or undetectable chemical shifts can be entered as either 0.00 or *.

Example #2:

#NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
6 V 4.5204 8.4684 123.4184 61.4330 34.6444 173.0311 
7 T 4.9002 8.2696 119.8067 62.2487 70.0431 174.1138 
8 I 4.1698 8.8360 129.2597 61.8793 37.2884 176.4472 
9 T 4.4136 8.2868 115.9694 60.8221 70.1452 174.6432 
10 A 4.2796 8.0655 127.7723 50.9885 19.0033 176.6414 
11 P 4.3562 0.0000 0.0000 65.5591 31.2252 177.2392 
12 N 4.8824 7.8942 112.1161 52.5902 39.2484 177.0207 
13 G 3.7309 7.5941 106.4993 46.8305 0.0000 174.5358 
14 L 4.6853 9.7859 121.2612 53.1092 41.6631 175.3041 
15 D 4.6986 7.0435 114.6080 52.0224 40.8042 177.3864 
16 T 4.0677 7.8732 114.9997 67.0623 68.7506 177.2631 
17 R 3.9316 8.0671 119.4180 60.4646 30.5755 177.9282 
18 P 4.2658 0.0000 0.0000 65.3875 30.9009 178.6357 
19 A 4.0015 8.5778 121.5522 55.2170 18.1581 179.5463 
20 A 4.0493 7.9442 119.6336 55.1010 18.1309 179.7605 
21 Q 4.0158 7.9651 115.7440 58.4227 28.2881 178.1323 
22 F 4.1284 8.6923 121.2872 61.8092 39.3486 177.1596 
23 V 4.0272 8.4435 118.5810 65.9995 31.2267 178.5363 
24 K 3.9445 7.8277 117.7576 58.7971 31.7623 178.6483

Example #2: Here is an example where only HA HN and N15 shifts are presented. The header spacing is aligned with the columns in this case, although the alignment is not necessary.


#NUM AA  HA     HN      N15 
1 M 4.6128 8.3509 128.1401  
2 F 5.1658 9.1754 128.0914  
3 Q 5.0880 7.8251 122.4598  
4 Q 4.6980 8.4214 119.1251  
5 E 5.1262 8.3247 122.6401  
6 V 4.5204 8.4684 123.4184  
7 T 4.9002 8.2696 119.8067

Example #3: Acceptable SHIFTY Format can include any of the following column headers where the # sign is replaced by NUM or > or #NUM:

#NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or
NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or
> AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557 
or
#NUM AA HA HN N15 CA CB CO
1 M 4.6128 8.3509 128.1401 55.5746 33.1840 174.0504 
2 F 5.1658 9.1754 128.0914 56.8722 43.2068 172.6446 
3 Q 5.0880 7.8251 122.4598 54.4658 32.9175 174.3090 
4 Q 4.6980 8.4214 119.1251 54.3607 33.5503 173.9477 
5 E 5.1262 8.3247 122.6401 54.8529 31.9685 176.1557

NOE Input Format

PROSESS accepts NOE data and will calculate a number of statistics including: total number of NOEs, average number of NOEs per residue, number of upper-bound violations, number of lower-bound violations, etc. based on the input data. Currently PROSESS accepts just one allowable format of NOE input: the XPLOR NIH format.

Example : XPLOR NIH Format:

!C1
  assign (resid  1  and name HA    ) (resid    2  and name HG2#  )   4.0   2.2  1.0
  assign (resid  1  and name HB1   ) (resid    2  and name HA  )   4.0   2.2  1.0
  assign (resid  1  and name HB2   ) (resid    2  and name HA  )   4.0   2.2  1.0
  assign (resid  1  and name HB#   ) (resid   70  and name HA  )   4.0   2.2  1.0
  assign (resid  1  and name HB1   ) (resid   70  and name HN  )   4.0   2.2  1.0
  assign (resid  1  and name HB2   ) (resid   70  and name HN  )   4.0   2.2  1.0
  !T2
  assign (resid  2  and name HA    ) (resid    3  and name HN  )   2.2   0.4  0.5
  assign (resid  2  and name HB    ) (resid    3  and name HN  )   4.0   2.2  1.0
  assign (resid    2  and name HG2#  ) (resid    3  and name HN  )   4.0   2.2  1.0
  assign (resid    2  and name HA    ) (resid   70  and name HB#  )   4.0   2.2  1.0
  assign (resid    2  and name HA    ) (resid   99  and name HB2  )   4.0   2.2  1.0
  !C3
  assign (resid  3  and name HA    ) (resid    4  and name HN  )   2.2   0.4  0.5
  assign (resid  3  and name HB1   ) (resid    4  and name HN  )   4.0   2.2  1.0
  assign (resid  3  and name HB2   ) (resid    4  and name HN  )   4.0   2.2  1.0
  assign (resid  3  and name HB2   ) (resid    4  and name HA  )   4.0   2.2  1.0
  assign (resid  3  and name HB1   ) (resid   99  and name HA    )   4.0   2.2  1.0
  assign (resid  3  and name HB1   ) (resid   99  and name HB1   )   4.0   2.2  1.0
  assign (resid  3  and name HB2   ) (resid   99  and name HB1   )   4.0   2.2  1.0
  assign (resid  3  and name HN    ) (resid   99  and name HB1  )   4.0   2.2  1.0
  assign (resid  3  and name HN    ) (resid   99  and name HB2  )   4.0   2.2  1.0
  

FASTA Format Description

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:

Example #1:

>Name of sequence
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHI

Example #2: Another equally valid version of the FASTA format is shown here:

>
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHI

Sequences are expected to be represented in the standard IUB/IUPAC single letter amino acid code.