Multiple Sequence Alignments (SeqGroup)

This module provides the SeqGroup class with methods to operate with Multiple Sequence Files, including Multiple Sequence Alignments.

Currently, Fasta, Phylip sequencial and Phylip interleaved formats are supported.

class SeqGroup(sequences=None, format='fasta', fix_duplicates=True, **kwargs)[source]

Class to store a set of sequences (aligned or not).

__init__(sequences=None, format='fasta', fix_duplicates=True, **kwargs)[source]
Parameters:
  • sequences – Path to the file containing the sequences or, alternatively, the text string containing them.

  • format – Encoding format of sequences. Supported formats are: fasta, phylip (phylip sequencial) and iphylip (phylip interleaved). Phylip format forces sequence names to a maximum of 10 chars. To avoid this effect, you can use the relaxed phylip format: phylip_relaxed and iphylip_relaxed.

Example:

seqs_str = ('>seq1\n'
            'AAAAAAAAAAA\n'
            '>seq2\n'
            'TTTTTTTTTTTTT\n')
seqs = SeqGroup(seqs_str, format='fasta')
print(seqs.get_seq('seq1'))
get_entries()[source]

Return the list of entries currently stored.

get_seq(name)[source]

Return the sequence associated to a given entry name.

iter_entries()[source]

Return an iterator over all sequences in the collection.

Each item is a tuple with the sequence name, sequence, and sequence comments.

set_seq(name, seq, comments=None)[source]

Add or update a sequence.

write(format='fasta', outfile=None)[source]

Return the text representation of the sequences.

Parameters:
  • format – Format for the output representation.

  • outfile – If given, the result is written to that file.