Taxonomy databases¶
NCBITaxa¶
- class NCBITaxa(dbfile=None, taxdump_file=None, memory=False, update=True)[source]¶
A local transparent connector to the NCBI taxonomy database.
- __init__(dbfile=None, taxdump_file=None, memory=False, update=True)[source]¶
Open and keep a connection to the NCBI taxonomy database.
If it is not present in the system, it will download the database from the NCBI site first, and convert it to ete’s format.
- annotate_tree(t, taxid_attr='name', tax2name=None, tax2track=None, tax2rank=None, ignore_unclassified=False)[source]¶
Annotate a tree containing taxids as leaf names.
The annotation adds the properties: ‘taxid’, ‘sci_name’, ‘lineage’, ‘named_lineage’ and ‘rank’.
- Parameters:
t – a Tree (or Tree derived) instance.
taxid_attr (name) – Allows to set a custom node attribute containing the taxid number associated to each node (i.e. species in PhyloTree instances).
tax2name,tax2track,tax2rank – Use these arguments to provide pre-calculated dictionaries providing translation from taxid number and names,track lineages and ranks.
- get_broken_branches(t, taxa_lineages, n2content=None)[source]¶
Returns a list of NCBI lineage names that are not monophyletic in the provided tree, as well as the list of affected branches and their size.
CURRENTLY EXPERIMENTAL
- get_descendant_taxa(parent, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, return_tree=False)[source]¶
Return list of descendant taxids of the given parent.
Parent can be given as taxid or scientific species name.
If intermediate_nodes=True, the list will also have the internal nodes.
- get_fuzzy_name_translation(name, sim=0.9)[source]¶
Return taxid, species name and match score from the NCBI database.
The results are for the best match for name in the NCBI database of taxa names, with a word similarity >= sim.
- Parameters:
name – Species name (does not need to be exact).
sim (0.9) – Min word similarity to report a match (from 0 to 1).
- get_lineage(taxid)[source]¶
Return lineage track corresponding to the given taxid.
The lineage track is a hierarchically sorted list of parent taxids.
- get_lineage_translator(taxids)[source]¶
Return dict with lineage tracks corresponding to the given taxids.
The lineage tracks are a hierarchically sorted list of parent taxids.
- get_name_translator(names)[source]¶
Return dict with taxids corresponding to the given scientific names.
Exact name match is required for translation.
- get_taxid_translator(taxids, try_synonyms=True)[source]¶
Return dict with the scientific names corresponding to the taxids.
- get_topology(taxids, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, annotate=True)[source]¶
Return the minimal pruned NCBI taxonomy tree containing taxids.
- Parameters:
intermediate_nodes (False) – If True, single child nodes representing the complete lineage of leaf nodes are kept. Otherwise, the tree is pruned to contain the first common ancestor of each group.
rank_limit (None) – If valid NCBI rank name is provided, the tree is pruned at that given level. For instance, use rank=”species” to get rid of sub-species or strain leaf nodes.
collapse_subspecies (False) – If True, any item under the species rank will be collapsed into the species upper node.
GTDBTaxa¶
- class GTDBTaxa(dbfile=None, taxdump_file=None, memory=False)[source]¶
Local transparent connector to the GTDB taxonomy database.
- annotate_tree(t, taxid_attr='name', tax2name=None, tax2track=None, tax2rank=None, ignore_unclassified=False)[source]¶
Annotate a tree containing taxids as leaf names.
It annotates by adding the properties ‘taxid’, ‘sci_name’, ‘lineage’, ‘named_lineage’ and ‘rank’.
- Parameters:
t – Tree to annotate.
taxid_attr – Node attribute (property) containing the taxid number associated to each node (i.e. species in PhyloTree instances).
tax2rank (tax2name, tax2track,) – Pre-calculated dictionaries with translations from taxid number to names, track lineages and ranks.
- get_broken_branches(t, taxa_lineages, n2content=None)[source]¶
Returns a list of GTDB lineage names that are not monophyletic in the provided tree, as well as the list of affected branches and their size. CURRENTLY EXPERIMENTAL
- get_descendant_taxa(parent, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, return_tree=False)[source]¶
given a parent taxid or scientific species name, returns a list of all its descendants taxids. If intermediate_nodes is set to True, internal nodes will also be dumped.
- get_name_lineage(taxnames)[source]¶
Given a valid taxname, return its corresponding lineage track as a hierarchically sorted list of parent taxnames.
- get_rank(taxids)[source]¶
Give a list of GTDB string taxids, return a dictionary with their corresponding ranks. Examples:
> gtdb.get_rank([‘c__Thorarchaeia’, ‘RS_GCF_001477695.1’]) {‘c__Thorarchaeia’: ‘class’, ‘RS_GCF_001477695.1’: ‘subspecies’}
- get_topology(taxnames, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, annotate=True)[source]¶
Return minimal pruned GTDB taxonomy tree containing all given taxids.
- Parameters:
intermediate_nodes – If True, single child nodes representing the complete lineage of leaf nodes are kept. Otherwise, the tree is pruned to contain the first common ancestor of each group.
rank_limit – If valid NCBI rank name is provided, the tree is pruned at that given level. For instance, use rank=”species” to get rid of sub-species or strain leaf nodes.
collapse_subspecies – If True, any item under the species rank will be collapsed into the species upper node.