17,048 WGHs are found in the 1,668 eukaryotic genomes. The top three phyla in the numbers of FACs are also top three in the numbers of WGHs; and 2,328, 5,444 and 5,171 WGHs are encoded in three phyla Arthropoda, Ascomycota and Streptophyta, respectively. The top four eukaryotic genomes in the numbers
of WGHs are from the phylum Streptophyta, and they are Oryza sativa sp japonica (Rice) (828 WGHs), Arabidopsis thaliana (Mouse-ear cress) (678 WGHs), Vitis vinifera (Grape) (602 WGHs) and Zea mays (Maize) (284 WGHs). It is interesting to observe that there are 272 and 224 WGHs in the human and mouse genomes, respectively. Besides two other plant genomes, i.e. Oryza sativa subsp. indica (Rice) (258 WGHs) and Physcomitrella patens
PD0325901 cell line sp patens (Moss) (226 WGHs), all the other 6 eukaryotic genomes encoding more than 200 WGHs are from the fungal phylum Ascomycota. No cellulosome components were identified in the eukaryotic genomes. 200 learn more (~73.53%) human WGHs are homologous to mouse WGHs with NCBI BLAST E-values < e-23. So the majority of these enzymes have been in the genomes of human and mouse at least before their divergence 75 million years ago [36]. Identified glydromes in metagenomes Overall, 63 FACs and 6,072 WGHs are found in 42 metagenomes except for TM7b which was sampled from the human mouth. The top two metagenomes in the numbers of glycosyl hydrolases are from termite guts (12 FACs and 1,150 WGHs) and diversa silage soil (13 FACs and 820 WGHs). Since the number of proteins in metagenomes varies from 452 in termite gut fosmids to 185,274 in the diversa silage soil, we calculated the percentage of the glycosyl hydrolases in each metagenome. On average, 0.65% of a metagenome encode glycosyl hydrolases. We noted that all the metagenomes with
more than 1% encoding glycosyl hydrolases are from the animal guts (including Progesterone human, mouse and termite). This is confirmed by an independent study using BLAST mapping [37]. No cellulosome components were identified in any metagenome. Utility The query interface of GASdb All the annotated glydromes were organized into an easy-to-use database GASdb (Figure 2). A user can find the proteins of interest through browsing, and searching using keywords or BLAST. The overall organization of each glydrome can be displayed; and the high resolution images of each protein can be downloaded for the publication purpose, as shown in Figure 3. A user can also display the signal peptide and functional domains of a given protein and its homologs using BLAST with E-value cutoff 1e-20, as shown in Figure 3. Figure 2 The database interfaces: the main page, the browsing page, the searching page, and the BLAST page. Figure 3 The displaying pages for the domain architectures of the glydrome of Clostridium acetobutylicum , and domain architectures of the protein Clostridium acetobutylicum CelA and its homolog.