SitEx


Main Search Statistics Exon BLAST Search 3D Exon Search

Help
Quick search
  PDB, keyword, organism etc.  
Analyzing protein structure projection on exon-intron structure of corresponding gene through years led to several fundamental conclusions about structural and functional organization of the protein. According to these results we decided to map the protein functional sites. So we created the database SitEx that keep the information about this mapping and included the BLAST search and 3D similar structure search using PDB3DScan for the polypeptide encoded by one exon, participating in organizing the functional site.
This will help:
  1. to study the positions of the functional sites in exon structure;
  2. to make the complex analysis of the protein function;
  3. to exposure the exons that took part in exon shuffling and came from bacterial genomes;
  4. to study the peculiarities of coding the polypeptide structures.
Comparison of functional site structures in the pair of human proteins

D-glyceraldehyde-3-phosphate dehydrogenase (GAPDH, PDB: 1U8F, EC 1.2.1.12) binds to nicotinamide-adenine-dinucleotide (NAD) in its active site (AC2) formed at the boundary of two NAD-binding domains and encoded by all the 8 exons of the sequence. Using the 3DExonScan integrated with SitEx, we found structural similarity (Z-score 3.9, RMSD 3.4) with the polypeptide encoded by the sixth exon of alcohol dehydrogenase (PDB: 1D1T, EC 1.1.1.1), although no sequence similarity was revealed. This polypeptide is partly involved in binding to NAD (EC2 site). A comparison of the positions of the NAD amino acid binding sites disclosed that they are located in the similar regions of the spatial structure of polypeptides and that they are similar in amino acid composition. The binding sites of the acetate and zinc ions are located precisely in these regions. It should be noted that the region with the 1U8F structure aligned with the polypeptide with the 1D1T structure is encoded by 4 exons. The alcohol dehydrogenase site is formed also by two dehydrogenase domains.

Prokaryotes as a specific type of analysis of exons

The search for the conserved functional units of proteins is considered as an exemplary application of SitEx. The human protein uroporphyrinogen decarboxylase is encoded by 10 exons. Its binding site is composed of 18 amino acids encoded by 8 exons. The PDB identifier is 1R3S, its site is AC1. URO-D is the single domain encoded by 10 exons. Bacillus subtilis possesses a protein with identical function (PDB: 2INF), similar in sequence (E-Value≈10-64) and structure (Z-Score=6.1). Search using 3DExonScan revealed high structural similarity between these two proteins only for the polypeptide encoded by the sixth exon of the human protein sequence (Z-Score=4.9). High similarity between the two proteins was found for only exons 2, 5, 6, and 10 in the human protein sequence (E-Value ≈10-8).

According to Fan et al., of the 17 amino acids in the human functional site, 7 differ from the amino acids in the active center uroporphyrinogen decarboxylase of Bacillus subtilis, substitutions were observed only for hydrophobic amino acids, while the amino acids directly binding coproporphyrin remained conserved and are located in polypeptides encoded by exons 2, 6 and 10 of human uroporphyrinogen decarboxylase.

Investigation of the exon structure of encoded protein promiscuous domain

The Carboxylesterase type B domain (Pfam: PF00135) occurs in proteins with different function and could be found in combination with various protein domains. The Acetylcholinesterase (PDB: 2X8B) protein was provided as an example. Using the BLAST program, we established sequence similarities between Acetylcholinesterase sequence and exon sequences involved in coding of the following proteins Butyrylcholinesterase (PDB: 1P0I), Bile salt-dependent lipase (PDB: 1AQL), Carboxylesterase1(PDB: 2H7C), Neuroligin-2 (PDB: 3BL8), Neuroligin-4 (PDB: 3BE8). All these proteins comprise the Carboxylesterase type B domain. This domain is encoded by different numbers of exons in all the proteins. It should be noted that exon 2 in the coding structure of the Neuroligin-2 protein shares no sequence similarity with other exon sequences coding this particular domain in the other sequences. We examined the distribution of the functional sites in the exon structure of the coding sequences. These proteins posses the common ligand in their binding sites - N-acetyl-D-glucosamine (NAG). The coefficient was CoefE=0 for its binding sites in all the structures with the exception of 3BL8 (Table 4). In 3BL8 this ligand is bound to protein region encoded by exons that are not neighbors (CoefE>0) due to the occurrence of an additional exon 2. The other ligands that bind to the above listed proteins are different. CoefE values deviating from 0 indicate that the amino acids of binding sites for Taurocholic acid (TCH) and Coenzyme A (COA) (the bulky ligands) are not encoded by neighbor exons, while the smaller ligands (FUC, fucose; BUA, butanoic acid; BMA, MAN, mannose; FLC, citrate anion) are encoded by one or neighbor exons (CoefE=0); the CoefA value close to 1 indicate that the amino acids of sites that bind themare quite distant from each other. The CoefA coefficient equals 0 for binding sites where a single amino acid of site occurs in one chain (Table 1).

Table 1. Binding sites for ligands in the Carboxylesterase type B domain

PDB Id Site Ligand CoefE CoefA N(exons) N(domains) N(chains)

AC1

NAG

0

0.6

1

1

1

AC4

NAG

0

0.4

1

1

1

AC5

NAG

0

0

1

1

1

AC7

NAG

0

0.953

1

1

1

AC8

NAG

0

0.955

1

1

1

AC9

NAG

0

0.857

2

1

1

AC3

FUC

0

0.912

1

1

1

BC5

BUA

0

0.985

1

1

1

AC1

NAG

0

0

1

1

1

AC3

TCH

0/0

0.5/0

1/1

1

2

AC4

TCH

0.25

0.932

3

1

1

AC5

TCH

0.714/0

0.982/0

2/1

1

2

AC1

NAG

0

0.25

1

1

1

CC5

COA

0.33/0.25

0.953/0.91

4/3

1

2

DC1

COA

0.5/0.375

0.972/0.959

4/6

1

2

AC1

NAG

0.714

0.991

2

1

1

AC2

NAG

0.714

0.996

2

1

1

AC4

NAG

0

0.4

1

1

1

AC5

NAG

0.33/0

0.968/0

2/1

1

2

AC7

NAG

0.33

0.977

2

1

1

AC8

BMA

0

0

1

1

1

AC9

MAN

0

0.939

2

1

1

AD1

MAN

0

0

1

1

1

BC2

MAN

0

0.968

2

1

1

AC1

NAG

0

0

1

1

1

AC2

NAG

0

0.886

1

1

1

AC5

FLC

0

0.933

2

1

1

AC9

NA

0

0.5

1

1

1

Download mysql dump
Database structure
Database data

Download exon databases
Blast peptide database
Blast nucleotide database
Exons 3D structures database
Institute of Cytology and Genetics, Computer Proteomics Group (c), 2009-2016