Sequence Format
Sequences can be specified in two different formats:
FASTA format
- The FASTA format is a text-based format representing either nucleic acid sequences or peptide sequences.
- A sequence begins with a single description line, followed by one or more lines of sequence data.
example: >sequence_1 example sequence 1
STAGKVIKCKAAVLWEVKKPFSIEDVEVAPPKAYEVRIKMVAVGICRTDDHVVSGNLVTP
LPVILGHEAAGIVESVGEGVTTVKPGDKVIPLFTPQCGKCRVCKNPESNYCLKNDLGNPR
GTLQDGTRRFTCRGKPIHHFLGTSTFSQY - The description line starts with a '>' sign, followed immediately by a name. After a space character may follow a comment.
example: >name comment
- Multiple sequences can be specified by concatenating single sequences.
example: >sequence_1 example sequence 1
STAGKVIKCKAAVLWEVKKPFSIEDVEVAPPKAYEVRIKMVAVGICRTDDHVVSGNLVTP
LPVILGHEAAGIVESVGEGVTTVKPGDKVIPLFTPQCGKCRVCKNPESNYCLKNDLGNPR
GTLQDGTRRFTCRGKPIHHFLGTSTFSQY
>sequence_2 example sequence 2
LPVILGHEAAGIVESVGEGVTTVKPGDKVIPLFTPQCGKCRVCKNPESNYCLKNDLGNPR
>sequence_3
VESVGEGVTTVKPGDKVIPLFTPQCGKCRVCKNPESNYCLKNDLGNPRIHHFLGTSTF
EVAPPKAYEVRIKMVGVTTVKPGDKVIPLFTPQCGKCRVCKNPES - Amino acids are indicated by the standard IUPAC one-letter codes.
- Nucleotides are indicated by the IUPAC ambiguity codes.
- The letter 'X' is used for a position where any amino acid is accepted.
Raw sequence
- Only direct sequence information is allowed, no additional information.
example: STAGKVIKCKAAVLWEVKKPFSIEDVEVAPPKAYEVRIKMVAVGICRTDDHVVSGNLVTP
LPVILGHEAAGIVESVGEGVTTVKPGDKVIPLFTPQCGKCRVCKNPESNYCLKNDLGNPR
GTLQDGTRRFTCRGKPIHHFLGTSTFSQY - Only a single sequence can be specified.
- Amino acids are indicated by the standard IUPAC one-letter codes.
- Nucleotides are indicated by the IUPAC ambiguity codes.
- The letter 'X' is used for a position where any amino acid is accepted.