Blast main parameters

E value threshold:

The E value (Expectation value) is used to report the significance of each hit. It corresponds to the number of different hits with scores equivalent to or better than S (the defined alignment score) that are expected to occur in a database search by chance. The lower the E value, the more significant the score. The E value threshold is used by BLAST during the similarity search. Database sequences that match the query sequence at an E value lower than this threshold are reported in the BLAST output. Among these sequences, those with an E value lower than the best match E value x 105 are used for the family identification.


Filter:

Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton andFederhen (1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs. It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.


Descriptions:

Restricts the number of short descriptions of matching sequences reported to the number specified; default limit is 100 descriptions. See also E value threshold.


Alignments:

Restricts database sequences to the number specified for which high-scoring segment pairs (HSPs) are reported; the default limit is 100. If more database sequences than this happen to satisfy the statistical significance threshold for reporting (see E value threshold below), only the matches ascribed the greatest statistical significance are reported.
These sequences are used fot the family identification.



If you have problems or comments...

Back to PBIL home page