Figure 1. Flowchart for GRouping and
Identification of Sequence Tags (GRIST). High quality matches (HQM)
under our default conditions are at least a 97% identity over a minimum
length of 50 bp for NCBI RefSeq/NR (non-redundant) database matches and
96% identity over a minimum 100 bp length for NCBI dbEST database
matches. Blast matches against NR are filtered to ignore multigene
clones (such as bacterial artifical chromosomes [BACs]) and known
artifacts. NR matches are checked for GeneID and are grouped with
RefSeq matches for the same GeneID. This takes account of short or
incomplete RefSeqs. Unigenes are assigned independently by BLAST
against dbEST. UniGene assignments for the top eight HQM dbEST matches
for each clone are identified, and those that occur at frequencies of
at least 15% for the whole group are reported. This can help identify
Unigene problems, overlapping genes, and variant transcripts.
