Analyze EST data

Introduction

Biozon uses multiple elements to map EST sequences to their corresponding protein products. We use UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences, and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, SwissProt keywords, and protein similarity data are used to detect ESTs that are associated with specific functional descriptors.

Mapping

We say EST s directly maps to protein p if:
  1. EST s encodes p
  2. s is a substring of DNA s' near an encoding region of s' which encodes for p
  3. s is in a UniGene cluster to which NCBI assigns p
  4. s is in a UniGene cluster with s' and s' encodes p
  5. s is in a UniGene cluster with s' and s' is a substring of DNA s'' near an encoding region of s'' which encodes for p
In mapping modes 2 and 5, s is a substring of s' near an encoding region of s' which encodes for p currently means s appears as a substring of s' and is no more than 50 base pairs away from overlapping the s' encoding region for p.

Mapping modes 4 and 5 serve to complete the information provided by NCBI for UniGene clusters (in some cases a member of a UniGene cluster directly encodes for a protein, but is not documented as such by the NCBI team).

Similarity data We say an EST s maps to protein p if s directly maps to p or if s directly maps to p' and p' is similar to p (such relations are marked clearly in the output). We use the Biozon similarity data with 0.1 as an evalue threshold.

Input file

To analyze your EST sequences, upload a list of GenBank or RefSeq accession numbers (ACs or GIs), one per line (see example file). If you you have a short list (up to 10 ESTs) you can also paste it to the text box.

The ESTs will be analyzed in search for protein products. The first page will display a summary table, in which each entry corresponds to one EST on the list. The information displayed will be a high-level summary of all proteins that can be linked to that EST (such as a set of definitions and descriptors, etc). For the detailed list of proteins and how they are linked to the EST, click on 'View more'. For more information on the output format see here.

Target proteins

We also provide an option to check whether the EST can be mapped to a list of target proteins. The target proteins are characterized by a set of GO terms and keywords that are provided by the user. This function is not fully supported yet. The initial form currently allows you to choose neuro-related proteins. If an EST can be mapped to a target protein, you will see 'yes' in the corresponding column (Is Target?) of the result table.