Tree (main class)¶
- class Tree(data=None, children=None, parser=None)¶
The Tree class is used to store a tree structure.
A tree consists of a collection of Tree instances connected in a hierarchical way. Trees can be loaded from the New Hampshire Newick format (newick).
- __init__()¶
- Parameters:
data – A string or file object with the description of the tree as a newick, or a dict with the contents of a single node.
children – List of nodes to add as children of this one.
parser – A description of how to parse a newick to create a tree. It can be a single number specifying the format or a structure with a fine-grained description of how to interpret nodes (see
newick.pyx
).
Examples:
t1 = Tree() # empty tree t2 = Tree({'name': 'A'}) t3 = Tree('(A:1,(B:1,(C:1,D:1):0.5):0.5);') t4 = Tree(open('/home/user/my-tree.nw'))
- add_child(self, child=None, name=None, dist=None, support=None)¶
Add a new child to this node and return it.
If child node is not suplied, a new node instance will be created.
- Parameters:
child – Node to be added as a child.
name – Name that will be given to the child.
dist – Distance from the node to the child.
support – Support value of child partition.
- add_children(self, nodes)¶
Set the given nodes as children of this node.
- add_face(self, face, column=0, position='branch_right', collapsed_only=False)¶
- add_face_smartview(self, face, column, position='branch_right', collapsed_only=False)¶
Add a fixed face to the node.
This type of faces will be always attached to nodes, independently of the layout function.
- Parameters:
face – Face to add.
column – Column number where the face will go. Starts at 0.
position – Position to place the face in the node. Posible values are: “branch_right”, “branch_top”, “branch_bottom”, “aligned”.
- add_face_treeview(self, face, column, position='branch-right')¶
Add a fixed face to the node.
This type of faces will be always attached to nodes, independently of the layout function.
- Parameters:
face – Face to add.
column – An integer number starting from 0
position – Position to place the face in the node. Posible values are: “branch-right”, “branch-top”, “branch-bottom”, “float”, “float-behind”, “aligned”.
- add_feature(self, pr_name, pr_value)¶
Add or update a node’s feature.
- add_features(self, **features)¶
Add or update several features.
- add_prop(self, name, value)¶
Add or update node’s property to the given value.
- add_props(self, **props)¶
Add or update several properties.
- add_sister(self, sister=None, name=None, dist=None)¶
Add a sister to this node and return it.
If sister node is not supplied, a new Tree instance will be created.
- ancestors(self, root=None, include_root=True)¶
Yield all ancestor nodes of this node (up to the root if given).
- check_monophyly(self, values, prop='name', unrooted=False)¶
Return tuple (is_monophyletic, clade_type, leaves_extra).
- Parameters:
values – List of values of the selected nodes.
prop – Node property being used to check monophyly (i.e. ‘species’ for species trees, ‘name’ for gene family trees, or any custom feature present in the tree).
- children¶
- collapsed_faces¶
- common_ancestor(self, nodes)¶
Return the last node common to the lineages of the given nodes.
All the nodes should have self as an ancestor, or an error is raised.
- compare(self, ref_tree, use_collateral=False, min_support_source=0.0, min_support_ref=0.0, has_duplications=False, expand_polytomies=False, unrooted=False, max_treeko_splits_to_be_artifact=1000, ref_tree_attr='name', source_tree_attr='name')¶
compare this tree with another using robinson foulds symmetric difference and number of shared edges. Trees of different sizes and with duplicated items allowed.
returns: a Python dictionary with results
- cophenetic_matrix(self)¶
Return a cophenetic distance matrix of the tree.
The cophenetic matrix is a matrix representation of the distance between each node.
If we have a tree like:
╭╴A ╭╴y╶┤ │ ╰╴B ╴z╶┤ │ ╭╴C ╰╴x╶┤ │ ╭╴D ╰╴w╶┤ ╰╴E
where w, x, y, z are internal nodes, then:
d(A,B) = d(y,A) + d(y,B)
and:
d(A,E) = d(z,A) + d(z,E) = (d(z,y) + d(y,A)) + (d(z,x) + d(x,w) + d(w,E))
To compute it, we use an idea from https://gist.github.com/jhcepas/279f9009f46bf675e3a890c19191158b
First, for each node we find its path to the root. For example:
A -> A, y, z E -> E, w, x, z
and make these orderless sets. Then we XOR the two sets to only find the elements that are in one or other sets but not both. In this case A, E, y, x, w.
The distance between the two nodes is the sum of the distances from each of those nodes to the parent
One more optimization: since the distances are symmetric, and distance to itself is zero we user itertools.combinations rather than itertools.permutations. This cuts our computes from theta(n^2) 1/2n^2 - n (= O(n^2), which is still not great, but in reality speeds things up for large trees).
For this tree, we will return the two dimensional array:
A B C D E A 0 d(A,y) + d(B,y) d(A,z) + d(C,z) d(A,z) + d(D,z) d(A,z) + d(E,z) B d(B,y) + d(A,y) 0 d(B,z) + d(C,z) d(B,z) + d(D,z) d(B,z) + d(E,z) C d(C,z) + d(A,z) d(C,z) + d(B,z) 0 d(C,x) + d(D,x) d(C,x) + d(E,x) D d(D,z) + d(A,z) d(D,z) + d(B,z) d(D,x) + d(C,x) 0 d(D,w) + d(E,w) E d(E,z) + d(A,z) d(E,z) + d(B,z) d(E,x) + d(C,x) d(E,w) + d(D,w) 0
We will also return the one dimensional array with the leaves in the order in which they appear in the matrix (i.e. the column and/or row headers).
- copy(self, method='cpickle')¶
Return a copy of the current node.
- Parameters:
method – Protocol used to copy the node structure.
The following values are accepted for the method:
- “newick”: Tree topology, node names, branch lengths and
branch support values will be copied by as represented in the newick string (copy by newick string serialisation).
- “newick-extended”: Tree topology and all node properties
will be copied based on the extended newick format representation. Only node properties will be copied, thus excluding other node attributes. As this method is also based on newick serialisation, properties will be converted into text strings when making the copy.
- “cpickle”: The whole node structure and its content is
cloned based on cPickle object serialisation (slower, but recommended for full tree copying)
- “deepcopy”: The whole node structure and its content is
copied based on the standard “copy” Python functionality (this is the slowest method but it allows to copy complex objects even if attributes point to lambda functions, etc.)
- del_feature(self, pr_name)¶
Permanently deletes a node’s feature.
- del_prop(self, prop_name)¶
Permanently delete a node’s property.
- delete(self, prevent_nondicotomic=True, preserve_branch_length=False)¶
Delete node from the tree structure, keeping its children.
The children from the deleted node are transferred to the old parent.
- Parameters:
prevent_nondicotomic – If True (default), it will also delete parent nodes until no single-child nodes occur.
preserve_branch_length – If True, branch lengths of the deleted nodes are transferred (summed up) to the parent’s branch, thus keeping original distances among nodes.
Example:
t = Tree('(C,(B,A)H)root;') print(t.to_str(props=['name'])) # ╭╴C # ╴root╶┤ # │ ╭╴B # ╰╴H╶┤ # ╰╴A t['H'].delete() # delete the "H" node print(t.to_str(props=['name'])) # ╭╴C # │ # ╴root╶┼╴B # │ # ╰╴A
- descendants(self, strategy='levelorder', is_leaf_fn=None)¶
Yield all descendant nodes.
- describe(self)¶
Return a string with information on this node and its connections.
- detach(self)¶
Detach this node (and descendants) from its parent and return itself.
The detached node conserves all its structure of descendants, and can be attached to another node with the
add_child()
function. This mechanism can be seen as a “cut and paste”.
- dist¶
- edges(self, cached_content=None)¶
Yield a pair of sets of leafs for every partition of the tree.
For every node, there are leaves that lay on one side of that node, and leaves that lay on the other. This generator yields all those pairs of sets.
- Parameters:
cached_content – Dictionary that to each node associates the leaves that descend from it. If passed, it won’t be recomputed.
- expand_polytomies(self, map_prop='name', polytomy_size_limit=5, skip_large_polytomies=False)¶
Return all combinations of solutions of the multifurcated nodes.
If the tree has one or more polytomies, this functions returns the list of all trees (in newick format) resulting from the combination of all possible solutions of the multifurcated nodes.
Warning
Please note that the number of of possible binary trees grows exponentially with the number and size of polytomies. Using this function with large multifurcations is not feasible:
polytomy size: 3 number of binary trees: 3 polytomy size: 4 number of binary trees: 15 polytomy size: 5 number of binary trees: 105 polytomy size: 6 number of binary trees: 945 polytomy size: 7 number of binary trees: 10395 polytomy size: 8 number of binary trees: 135135 polytomy size: 9 number of binary trees: 2027025
- explore(self, name=None, layouts=None, show_leaf_name=True, show_branch_length=True, show_branch_support=True, include_props=('name', 'dist'), exclude_props=None, host='localhost', port=None, quiet=True, compress=False, keep_server=False, open_browser=True)¶
Launch an interactive smartview session to visualize the tree.
- Parameters:
name (str) – Name used to store and refer to the tree.
layouts (list) – Layouts that will be available from the front end. It is important to name functions (__name__), as they will be adressed by that name in the explorer.
include_props (list) – Properties to show in the nodes popup. If None, show all.
exclude_props (list) – Properties to exclude from the nodes popup.
port – Server listening port. If None, use next available port >= 5000.
- faces¶
- static from_parent_child_table(parent_child_table)¶
Convert a parent-child table into an ETE Tree instance.
- Parameters:
parent_child_table – List of tuples containing parent-child relationships. For example: [(‘A’, ‘B’, 0.1), (‘A’, ‘C’, 0.2), (‘C’, ‘D’, 1), (‘C’, ‘E’, 1.5)], where each tuple represents: [parent, child, child-parent-dist]
Example:
t = Tree.from_parent_child_table([('A', 'B', 0.1), ('A', 'C', 0.2), ('C', 'D', 1), ('C', 'E', 1.5)]) print(t.to_str(props=['name', 'dist'])) # ╭╴B,0.1 # ╴A,⊗╶┤ # │ ╭╴D,1.0 # ╰╴C,0.2╶┤ # ╰╴E,1.5
- static from_skbio(skbio_tree, map_attributes=None)¶
Convert a scikit-bio TreeNode object into ETE Tree object.
- Parameters:
skbio_tree – A scikit bio TreeNode instance
map_attributes – List of attribute names in the scikit-bio tree that should be mapped into the ETE tree instance. (name, id and branch length are always mapped)
Example:
t = Tree.from_skibio(skbioTree, map_attributes=['value'])
- get_cached_content(self, prop=None, container_type=set, leaves_only=True)¶
Return dict that assigns to each node a set of its leaves.
The dictionary serves as a cache for operations that require many traversals of the nodes under this tree.
Instead of assigning a set, it can assign a list (with
container_type
). And instead of the leaves themselves, it can be any of their properties (like their names, withprop
).- Parameters:
prop – Node property that should be cached (i.e. name, distance, etc.). If None, it caches the node itself.
container_type – Type of container for the leaves (set, list).
leaves_only – If False, for each node it stores all its descendant nodes, not only its leaves.
- get_children(self)¶
Return an independent list of the node’s children.
- get_closest_leaf(self, topological=False, is_leaf_fn=None)¶
Return the node’s closest descendant leaf, and its distance.
- Parameters:
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- get_distance(self, node1, node2, topological=False)¶
Return the distance between the given nodes.
- Parameters:
node1 – A node within the same tree structure.
node2 – Another node within the same tree structure.
topological – If True, distance will refer to the number of nodes between target and target2.
- get_farthest_leaf(self, topological=False, is_leaf_fn=None)¶
Return the node’s farthest descendant (a leaf), and its distance.
- Parameters:
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- get_farthest_node(self, topological=False)¶
Returns the farthest descendant or ancestor node, and its distance.
- Parameters:
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- get_midpoint_outgroup(self, topological=False)¶
Return the node dividing into two distance-balanced partitions.
- Parameters:
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- get_monophyletic(self, values, prop='name')¶
Yield nodes matching the provided monophyly criteria.
For a node to be considered a match, all
prop
values within the node, and exclusively them, should be grouped.- Parameters:
values – List of values of the selected nodes.
prop – Property being used to check monophyly (for example ‘species’ for species trees, ‘name’ for gene family trees).
- get_prop(self, prop, default=None)¶
Return the node’s property prop (an attribute or in self.props).
- get_sisters(self)¶
Return an independent list of sister nodes.
- get_topology_id(self, prop='name')¶
Return a unique ID representing the topology of the tree.
Two trees with the same topology will produce the same id. This is useful to detect the number of unique topologies over a bunch of trees, without requiring full distance methods.
The id is, by default, calculated based on the terminal node’s names. Any other node property could be used instead.
If trees are unrooted, make sure that the root node is not binary or use the tree.unroot() function before generating the topology id.
- id¶
Return node_id (list of relative hops from root to node).
- property img_style¶
Tree._get_style(self)
- init_from_ete(self, data)¶
- init_from_newick(self, data, parser=None)¶
- property is_collapsed¶
Tree._get_collapsed(self)
- property is_initialized¶
Tree._get_initialized(self)
- is_leaf¶
Return True if the current node is a leaf.
- is_monophyletic(self, nodes)¶
Return True if the nodes form a monophyletic group.
- is_root¶
Return True if the current node has no parent.
- iter_prepostorder(self, is_leaf_fn=None)¶
Yield all nodes in a tree in both pre and post order.
Each iteration returns a postorder flag (True if node is being visited in postorder) and a node instance.
- ladderize(self, topological=False, reverse=False)¶
Sort branches according to the size of each partition.
- leaf_names(self, is_leaf_fn=None)¶
Yield the leaf names under this node.
- leaves(self, is_leaf_fn=None)¶
Yield the terminal nodes (leaves) under this node.
- level¶
Return the number of nodes between this node and the root.
- lineage(self, root=None, include_root=True)¶
Yield all nodes in the lineage of this node (up to root if given).
- name¶
- pop_child(self, child_idx=-1)¶
- populate(self, size, names=None, model='yule', dist_fn=None, support_fn=None)¶
Populate current node with a dichotomic random topology.
- Parameters:
size – Number of leaves to add. The necessary intermediate nodes will be created too.
names – Collection (list or set) of names to name the leaves. If None, leaves will be named using short letter sequences.
model –
Model used to generate the topology. It can be:
”yule” or “yule-harding”: Every step a randomly selected leaf grows two new children.
”uniform” or “pda”: Every step a randomly selected node (leaf or interior) grows a new sister leaf.
dist_fn – Function to produce values to set as distance in newly created branches, or None for no distances.
support_fn – Function to produce values to set as support in newly created internal branches, or None for no supports.
Example to create a tree with 100 leaves, uniformly random distances between 0 and 1, and all valid supports set to 1:
t = Tree() random.seed(42) # set seed if we want a reproducible result t.populate(100, dist_fn=random.random, support_fn=lambda: 1)
- props¶
props: dict
- prune(self, nodes, preserve_branch_length=False)¶
Prune the topology conserving only the given nodes.
It will only retain the minimum number of nodes that conserve the topological relationships among the requested nodes. The root node is always conserved.
- Parameters:
nodes – List of node names or objects that should be kept.
preserve_branch_length (bool) – If True, branch lengths of the deleted nodes are transferred (summed up) to its parent’s branch, thus keeping original distances among nodes.
Examples:
t = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;') print(t.to_str(props=['name'])) # ╭╴A # ╭╴D╶╌╴C╶┤ # ╭╴F╶┤ ╰╴B # │ │ # ╭╴H╶┤ ╰╴E # │ │ # ╴root╶┤ ╰╴G # │ # │ ╭╴I # ╰╴K╶┤ # ╰╴J t1 = t.copy() t1.prune(['A', 'B']) print(t1.to_str(props=['name'])) # ╭╴A # ╴root╶┤ # ╰╴B t2 = t.copy() t2.prune(['A', 'B', 'C']) print(t2.to_str(props=['name'])) # ╭╴A # ╴root╶╌╴C╶┤ # ╰╴B t3 = t.copy() t3.prune(['A', 'B', 'I']) print(t3.to_str(props=['name'])) # ╭╴A # ╭╴C╶┤ # ╴root╶┤ ╰╴B # │ # ╰╴I t4 = t.copy() t4.prune(['A', 'B', 'F', 'H']) print(t4.to_str(props=['name'])) # ╭╴A # ╴root╶╌╴H╶╌╴F╶┤ # ╰╴B
- remove_child(self, child)¶
Remove child from this node and return it.
After calling this function, parent and child nodes still exit, but are no longer connected.
- remove_children(self)¶
Remove all children from this node and return a list with them.
- remove_sister(self, sister=None)¶
Remove a sister node and return it.
It has the same effect as self.up.remove_child(sister). If a sister node is not supplied, the first sister will be deleted.
- Parameters:
sister – A node instance to be removed as a sister.
- render(self, file_name, layout=None, w=None, h=None, tree_style=None, units='px', dpi=90)¶
Render the tree as an image.
- Parameters:
file_name – Name of the output image. Valid extensions are “svg”, “pdf”, and “png”.
layout – Layout function or layout function name to use.
tree_style – A TreeStyle instance containing the image properties.
units – Units for height (h) or width (w). They can be “px” for pixels, “mm” for millimeters, “in” for inches.
h – Height of the image in
units
.w – Width of the image in
units
.dpi – Resolution in dots per inch.
- resolve_polytomy(self, descendants=True)¶
Convert node to a series of dicotomies if it is a polytomy.
A polytomy is a node that has more than 2 children. This function changes them to a ladderized series of dicotomic branches. The tree topology modification is arbitrary (no important results should depend on it!).
- Parameters:
descendants – If True, resolve all polytomies under this node too. Otherwise, do it only for the current node.
- reverse_children(self)¶
Reverse current children order.
- robinson_foulds(self, t2, prop_t1='name', prop_t2='name', unrooted_trees=False, expand_polytomies=False, polytomy_size_limit=5, skip_large_polytomies=False, correct_by_polytomy_size=False, min_support_t1=0.0, min_support_t2=0.0)¶
Return the Robinson-Foulds distance to the tree, and related info.
The returned tuple contains the Robinson-Foulds symmetric distance (rf), but also more information:
(rf, rf_max, common, edges_t1, edges_t2, discarded_edges_t1, discarded_edges_t2)
- Parameters:
t2 – Target tree (tree to compare to the reference tree).
prop_t1 – Property to use in the reference tree as the node name when comparing trees.
prop_t2 – Property to use in the target tree as the node name when comparing trees.
unrooted_trees – If True, consider trees as unrooted.
expand_polytomies – If True, all polytomies in the reference and target tree will be expanded into all possible binary trees. Robinson-Foulds distance will be calculated between all tree combinations and the minimum value will be returned. See also
Tree.expand_polytomy()
.
- root¶
Return the absolute root node of the current tree structure.
- search_ancestors(self, **conditions)¶
Yield ancestor nodes matching the given conditions.
- search_descendants(self, **conditions)¶
Yield descendant nodes matching the given conditions.
- search_leaves_by_name(self, name)¶
Yield leaf nodes matching the given name.
- search_nodes(self, **conditions)¶
Yield nodes matching the given conditions.
Example:
for node in tree.search_nodes(dist=0.0, name='human'): print(node.prop['support'])
- set_outgroup(self, node, bprops=None, dist=None)¶
Change tree so the given node is set as outgroup.
The original root node will be used as the new root node, so any reference to it in the code will still be valid.
- Parameters:
node – Node to set as outgroup (future first child of the root).
bprops – List of branch properties (other than “dist” and “support”).
dist – Distance from the node, where we put the new root of the tree.
- set_style(self, node_style)¶
Set ‘node_style’ as the fixed style for the current node.
- show(self, layout=None, tree_style=None, name='ETE')¶
Start an interactive session to visualize the current node.
- size¶
size: ‘(double, double) ‘
- property sm_style¶
Tree._get_sm_style(self)
- sort_descendants(self, prop='name')¶
Sort branches by leaf node values (names or any other given prop).
- standardize(self, delete_orphan=True, preserve_branch_length=True)¶
Resolve multifurcations and remove single-child nodes.
This function changes the current tree structure to produce a standardized topology: nodes with only one child are removed and multifurcations are automatically resolved.
- support¶
- swap_children(self)¶
Like reverse, but if there are only two children.
- to_str(self, show_internal=True, compact=False, props=None, px=None, py=None, px0=0, cascade=False)¶
Return a string containing an ascii drawing of the tree.
- Parameters:
show_internal – If True, show the internal nodes too.
compact – If True, use exactly one line per tip.
props – List of node properties to show. If None, show all.
px0 (px, py,) – Paddings (x, y, x for leaves). Overrides compact.
cascade – Use a cascade representation. Overrides show_internal, compact, px, py, px0.
- to_ultrametric(self, topological=False)¶
Convert tree to ultrametric (all leaves equidistant from root).
- traverse(self, strategy='levelorder', is_leaf_fn=None)¶
Traverse the tree structure under this node and yield the nodes.
There are three possible strategies. There is a breadth-first search (BFS) named “levelorder”, and two depth-first searches (DFS) named “preorder” and “postorder”.
- Parameters:
strategy – Way in which the tree will be traversed. Can be: “preorder” (first parent and then children), “postorder” (first children and the parent), and “levelorder” (nodes visited in order from root to leaves).
is_leaf_fn – Function to check if a node is terminal (a “leaf”). The function should receive a node and return True/False. Use this to traverse a tree by dynamically collapsing internal nodes.
- unroot(self, bprops=None)¶
Unroot the tree, that is, make the root not have 2 children.
The convention in phylogenetic trees is that if the root has 2 children, it is a “rooted” tree (the root is a real ancestor). Otherwise (typically a root with 3 children), the root is just an arbitrary place to hang the tree.
- up¶
up: ete4.core.tree.Tree
- write(self, outfile=None, props=(), parser=None, format_root_node=False, is_leaf_fn=None)¶
Return or write to file the newick representation.
- Parameters:
outfile (str) – Name of the output file. If present, it will write the newick to that file instad of returning it as a string.
props (list) – Properties to write for all nodes using the Extended Newick Format. If None, write all available properties.
parser – Parser used to encode the tree in newick format.
format_root_node (bool) – If True, write content of the root node too. For compatibility reasons, this is False by default.
Example:
t.write(props=['species', 'sci_name'])