
Real Biological Scenarios:
Unless otherwise noted, all examples use the Mus musculus section of the GeneSpeed Database
1. I have mounting evidence that the gene/pathway I am studying is regulated by a homeobox transcription factor. I would like to obtain a complete list of genes for this family, including un-annotated members if such exist, and use this list as a base list to start my expression (micro-array) analysis.
This is one of the primary reasons why we developed the GeneSpeed Database. First go to the new search page and select the organism you wish to search. On the next page, chose the ‘Search by Classification’ search type. This will take you to the Superclass selection page, where you should choose ‘HTH’ because the Homeobox is a type of Helix Turn Helix transcription factor. This will bring you to the Class selection page and you can see that there is a class called ‘Homeodomain/Homeobox’. Select this and choose the output options you prefer. Please note that you may want to select a reasonably stringent (low) e-score cutoff to obtain more relevant hits. That is it, now you have your list! Because there are many domains associated with homeodomain transcription factors, you may further refine the search to include only those genes with a definitive homeobox domain by performing an InterPro sub-search. Just check-mark the 'IPR Sub-Search' check box along side your domain of interest (homeobox in this case) and then click on the 'IPR Sub-Search' button. This will cut the list from ~1,500 to around 250 genes (assuming the e-score cutoff was 10-2). Because the GeneSpeed Database has also incorporated the complete Novartis GeneAtlas v2 micro-array set so you may also study the expression levels of any of your genes amongst the 79 human and 61 mouse tissues.
2. My recent Affymetrix micro-array expression studies have yielded a list of 300 genes that fit a particular profile that is particularly relevant to my experimental model. 60 of these 300 genes are unknown and have no annotation associated with them. How may I use the GeneSpeed Database to identify what these genes are?
Affymetrix arrays are based on the Unigene dataset so you can easily get a list of Unigene Ids for these 60 genes from your analysis software or from the Affymetrix website. In the ‘Account’ page of the GeneSpeed site is the option to save a custom gene list using external Unigene Ids. Saving your custom gene list will cause GeneSpeed to update all of your Unigene Ids to the most current Ids available. Now proceed to display your custom gene list with any of the various output options. The output will display all the domains contained in the genes of your saved list. By knowing what functional domains are within the genes in your list, you can get a reasonable grasp on the function of your genes. At this point you can continue to study the details of these domains using the tools built into the GeneSpeed site. These may include performing an InterPro sub-search, which will tell you what other genes in the database contain that domain. You may also use the built in Gene Ontology link to see where in the gene ontology tree the domain resides. There is also the InterPro and Pfam links which can give you further detailed information about each particular domain. The GeneSpeed Database has also incorporated the complete Novartis GeneAtlas v2 micro-array set so you may study the expression levels of any of your genes amongst the 79 human and 61 mouse tissues.
3. My favorite gene Crebbp (Creb binding protein) has several domains in its protein product. How may I use the GeneSpeed Database to study its currently annotated domains as well as possibly discover new un-annotated ones?
This is a perfectly suited task that the GeneSpeed Database performs very efficiently. First you will want to go the New Search page and do a keyword search for “Crebbp”. This will show all the domains present in Crebbp. Notice that several domains (zf-TAZ, KIX, Bromodomain, and DUF906) have very low e-values (<1e-20) and therefore would be considered to be significant as being in this protein. There are also other domains (Podocalyxin and Extensin-2) that have a moderate similarity (<1e-10). You will also observe many others with e-values greater than 1e-10. In this way you can get a pretty accurate picture of the domains that are within your gene (Crebbp in this case) and also the amount of similarity these domains have to the domains that has been characterized as such. You might be surprised at the possibility of obtaining domain similarities that you would not believe existed if going through e.g. Pfam via the specific Unigene ID tag. Bear in mind that each of such predictions may be either real or false. Familiarize yourself with the E-score cutoff values in relation to domain input size, and you will quickly be able to decide whether you may trust a given domain prediction.
4. I study evolutionary motifs in my favorite gene family (PHD Zinc Finger). How may I use GeneSpeed to determine which regulatory domains characterize this family?
Because we have performed a detailed characterization of transcription factors (TFs), this is a simple task to perform. First go to the New Search page and choose the ‘Search by Classification’ link. Then browse by Zinc (as the Superclass of TF) and then by PHD (as the Class of TF). The output will be a list of all genes containing similarity to the PHD domain. For each of these genes (each row in the output table), there is a column labeled ‘Domain No’. This column lists the number of domains present for each gene. Clicking on this number will list the information on all of these domains including the domain name, size, original BLAST sequence utilized, e-score, as well as various web links (InterPro, Pfam, Ensembl, Gene Ontology, Locus Link) to provide even more detailed information on each of the respective domains. Remember that the E-score is a measure of domain similarity, and is dependent on the size of the input domain (in amino acids). Imagine that you observe a relatively low E-score (Say E10-6) for a domain size of e.g. 120 amino acids. Perfect, or near perfect hits, should score well below E10-6 for a domain that size. However, the E-score of E10-6 is still far below what a random amino acid sequence input would bring out. Therefore, it is likely that that domain retains a quite ancient signature that is hardly recognizable, but still relevant. Use it to your advantage, exerting caution at every decision.
5. I have evidence that a novel protein I am studying may be a regulatory factor involved in regulating a certain biological process I am interested in. I found the gene that encodes the protein in the Unigene database, however, Unigene just lists it as a Riken clone (6230401O10Rik). How may I use the GeneSpeed Database to help me identify what type of regulatory factor it could be, and if may be correct in my prediction?
Because the GeneSpeed Database is based on Unigene, this is a trivial task. Perform a ‘Keyword’ search with “6230401O10Rik” on the GeneSpeed Search page or use the Unigene ID directly. ‘Select all’ for output and observe the results. First, notice how there are four rows representing this gene. Each row represents a different domain that has similarity to a section of its protein (the domain column will also display the number 4 to represent this). Next, observe that the first row represents a SCAN domain that is C2H2 Zinc Finger transcription factor. Also notice that the second row indicates this protein also contains a specific C2H2 domain. Because the e-score values for these domains are reasonably significant (8e-53 and 3e-10 respectively), we can make a reasonably confident conclusion that this is a C2H2 type transcription factor. In addition, notice that there is also weak similarity to a collagen and transposase domains. Many times these ‘other’ domains can provide clues as to other functions or binding potential the protein may have. Study these genes further by observing the domain names, sizes and original BLAST sequence. In addition, you may utilize the other links provided (InterPro, Pfam, Ensembl, Gene Ontology, Locus Link) to gain further understanding about the details of these domains, and thus, about this Riken clone you are studying. It might be that one of the domains have a sister for which a GO-function has been annotated. This allows you to compare that gene function to your initial predictions regarding the presumed biological process. The GeneSpeed Database has also incorporated the complete Novartis GeneAtlas v2 micro-array set so you may study the expression levels of any of your genes amongst the 79 human and 61 mouse tissues.
GeneSpeed Background Menu: