Tree operations¶
Tree-related operations.
Sorting, changing the root to a node, moving branches, removing (prunning)…
- class Walker(root)¶
Represents the position when traversing a tree.
- add_next_branch(self)¶
- property first_visit¶
- go_back(self)¶
- property has_unvisited_branches¶
- property node¶
- property node_id¶
- add_branch_values(root, dist_fn, support_fn)¶
Add distances and support values to the branches.
- add_leaf_names(root, names)¶
Add names to the leaves.
- assert_root_consistency(root, bprops=None)¶
Raise AssertionError if the root node of a tree looks inconsistent.
- closest_descendant_leaf(tree, dist_max=-1, selector=None, is_leaf_fn=None, topological=False)¶
Return the closest descendant leaf from the tree and its distance.
- Parameters:
tree – Starting node for which to find its closest descendant leaf.
dist_max – If > 0, do not consider nodes farther than this distance.
selector – Function that returns True for the selected leaves. If None, all leaves will be selected as candidates to consider.
is_leaf_fn – Function that takes a node and returns True if it is considered a leaf. If None, node.is_leaf is used.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- closest_leaf(leaf, selector=None, is_leaf_fn=None, topological=False, relative=False)¶
Return the closest leaf to the given leaf, and their distance.
Note that if you want to find the closest descendant leaf from a node, the appropriate function is closest_descentant_leaf(node).
- Parameters:
leaf – Leaf for which to find its closest leaf.
selector – Function that returns True for the selected leaves. If None, all leaves will be selected as candidates to consider.
is_leaf_fn – Function that takes a node and returns True if it is considered a leaf. If None, node.is_leaf is used.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
relative – If True, it will return the closest relative leaf, that is, the one in the smallest branch including the original leaf.
- closest_relative_leaf(leaf, selector=None, is_leaf_fn=None, topological=False)¶
Return the closest relative leaf to the given leaf, and their distance.
Convenient common function, which calls closest_leaf() with relative=True.
- common_ancestor(nodes)¶
Return the last node common to the lineages of the given nodes.
If the given nodes don’t have a common ancestor, it will return None.
- Parameters:
nodes – List of nodes whose common ancestor we want to find.
- create_dichotomic_sister(tree)¶
Make tree start with a dichotomy, with the old tree and a new sister.
- distance_matrix(tree, selector=None, topological=False, squared=False)¶
Return a matrix of paired distances between all the selected leaves.
- Parameters:
tree – Tree (starting node) for which to compute the matrix.
selector – Function that returns True for the selected leaves. If None, all leaves will be selected.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
squared – If True, the output matrix will be squared and symmetrical. Otherwise, only the upper triangle is returned (to save memory).
- evolutionary_distinctness(tree, leaves, topological=False)¶
Return the evolutionary distinctness for the given leaves.
The
leaves
argument is typically just a list with one leaf (for which we want to know its evolutionary distinctness). But the precomputations can be used to quickly find the value of many.
- farthest_descendant(tree, is_leaf_fn=None, topological=False)¶
Return the farthest descendant and its distance.
- farthest_nodes(tree, topological=False)¶
Return the farthest nodes and the diameter of the tree.
- get_common_values(t1, t2, prop='name', strict=False)¶
Return the common leaf property prop values of trees t1 and t2.
If strict, raise AssertionError if t1 and t2 don’t share leaves.
- get_distance_fn(topological, asserted=True)¶
Return a function that returns node distances (branch lengths).
- Parameters:
topological – If True, the distance of a node is just 1 (a step).
asserted – If True, raises AssertionError on undefined distances.
- insert_intermediate(node, intermediate, bprops=None, dist=None)¶
Insert, between node and its parent, an intermediate node.
- interchange_references(node1, node2)¶
Interchange the references of the given nodes.
node1 will point where node2 was, and viceversa.
- join_branch(node, bprops=None)¶
Substitute node for its only child.
- ladderize(tree, topological=False, reverse=False)¶
Sort branches according to the size of each partition.
- Parameters:
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
reverse – If True, sort with biggest partitions first.
Example:
t = Tree('(f,((d,((a,b),c)),e));') print(t) # ╭╴f # ──┤ ╭╴d # │ ╭──┤ ╭──┬╴a # ╰──┤ ╰──┤ ╰╴b # │ ╰╴c # ╰╴e t.ladderize() print(t) # ──┬╴f # ╰──┬╴e # ╰──┬╴d # ╰──┬╴c # ╰──┬╴a # ╰╴b
- leaves_vs_random(tree, leaves, metric, tolerance=0.05, nmin=5, nmax=1000)¶
Helper function to compute NRI, NTI, and similar indices.
It returns:
(E(metric(random_leaves)) - metric(leaves)) / SD(metric(random_leaves))
where E is the expected value and SD the standard deviation (taken from a number < nmax of random samples of leaves, all of the same size as the original list of leaves).
- Parameters:
metric – Function that takes a list of leaves and returns some value associated to them. For example, the mean pairwise distance for NRI, and the mean closest distance for NTI.
- make_name(i, chars='abcdefghijklmnopqrstuvwxyz')¶
Return a short name corresponding to the index i.
- make_partitions(tree, common_vals, prop='name')¶
Return a set of partitions of the given tree.
A “partition” is an id for each node, based on the leaves that it has at each side. The id is unique no matter the topology.
- mean_distance(tree, weight_fn=None, leaf=None, topological=False)¶
Return the weighted mean distance between leaves, or from given leaf.
This is also called the “Mean Phylogenetic Distance” (MPD) for phylogenetic trees.
To “select” certain leaves, weight_fn can be used for example like:
weight_fn=lambda node: 1 if node.name in names else 0
But it can be used generally as relative leaf importance for averaging.
The algorithm is quite fast: for n leaves, it runs in O(n * log(n)).
- Parameters:
tree – Tree (starting node) for which to compute the mean.
weight_fn – Function that returns the weight of each leaf. If None, all leaves will have weight 1.
leaf – Leaf for which to compute the weighted mean distance to leaves. If None, a weighted mean for all leaves is made.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
- midpoint(tree, topological=False)¶
Return the node in the middle and its distance from the exact center.
- move(node, shift=1)¶
Change the position of the current node with respect to its parent.
- nearest_taxon_index(tree, leaves, topological=False, tolerance=0.05, nmin=5, nmax=1000)¶
Return the Nearest Taxon Index (NTI).
The nearest taxon index (NTI) is a standardized measure of the phylogenetic distance to the nearest taxon for each taxon in the sample and quantifies the extent of terminal clustering, independent of deep level clustering:
(mnY(n) - mn(Yobs)) / sdY(n)
where Yobs is the phylogenetic distance to the nearest taxon in the phylogeny of the pool, mn(Yobs) is the mean of all n taxa, and mnY(n) and sdY(n) are the mean and standard deviation expected for n taxa randomly distributed on the phylogeny of the pool.
- Parameters:
tree – Tree (starting node).
leaves – Observed taxa.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
tolerance – Maximum relative error on the result value (NTI).
nmin – Minimum iterations to estimate the mean of closest distances.
nmax – Maximum iterations to estimate the mean of closest distances.
Return the Net Relatedness Index (NRI).
The net relatedness index (NRI) is a standardized measure of the mean pairwise phylogenetic distance of taxa in a sample, relative to a phylogeny of an appropriate species pool, and quantifies overall clustering of taxa on a tree:
(mnX(n) - mn(Xobs)) / sdX(n)
where Xobs is the phylogenetic distance between two taxa (the sum of all intervening branch lengths) in the phylogeny of the pool, mn(Xobs) is the mean of all possible pairs of n taxa, and mnX(n) and sdX(n) are the mean and standard deviation expected for n taxa randomly distributed on the phylogeny of the pool.
- Parameters:
tree – Tree (starting node).
leaves – Observed taxa.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
tolerance – Maximum relative error on the result value (NRI).
nmin – Minimum iterations to estimate the mean of pairwise distances.
nmax – Maximum iterations to estimate the mean of pairwise distances.
- partition_id(values1, values2)¶
Return a unique id based on the given sets of values.
- phylogenetic_diversity(tree, topological=False)¶
Return the phylogenetic diversity of the tree.
- populate(tree, size, names=None, model='yule', dist_fn=None, support_fn=None)¶
Populate tree with a dichotomic random topology.
- Parameters:
size – Number of leaves to add. All necessary intermediate nodes will be created too.
names – Collection (list or set) of names to name the leaves. If None, leaves will be named using short letter sequences.
model –
Model used to generate the topology. It can be:
”yule” or “yule-harding”: Every step a randomly selected leaf grows two new children.
”uniform” or “pda”: Every step a randomly selected node (leaf or interior) grows a new sister leaf.
dist_fn – Function to produce values to set as distance in newly created branches, or None for no distances.
support_fn – Function to produce values to set as support in newly created internal branches, or None for no supports.
- populate_uniform(root, size)¶
Populate with the uniform model a topology with size leaves.
- populate_yule(root, size)¶
Populate with the Yule-Harding model a topology with size leaves.
- rehang(root, child_pos, bprops=None)¶
Rehang root on its child at position child_pos and return it.
- remove(node)¶
Remove the given node from its tree.
- resolve_polytomy(tree, descendants=True)¶
Convert tree to a series of dicotomies if it is a polytomy.
A polytomy is a node that has more than 2 children. This function changes them to a ladderized series of dicotomic branches. The tree topology modification is arbitrary (no important results should depend on it!).
- Parameters:
descendants – If True, resolve all polytomies in the tree, including all root descendants. Otherwise, do it only for the root.
- robinson_foulds(t1, t2, prop='name', normalized=False, strict=False)¶
Return the Robinson-Foulds distance between trees t1 and t2.
The distance is A + B, where:
A: number of partitions implied by t1 but not t2
B: number of partitions implied by t2 but not t1
Every node implies a partition (the leaves that it has at each side).
- Parameters:
prop – Property of the leaves used to identify them in partitions.
normalized – If True, divide by the maximum possible distance.
strict – If True, check that t1 and t2 have unique and same leaves.
- root_at(node, bprops=None)¶
Set the given node as the root of the tree.
The original root node will be used as the new root node, so any reference to it in the code will still be valid.
- Parameters:
node – Node to set as root. Its reference will be lost.
bprops – List of branch properties (other than “dist” and “support”).
- set_midpoint_outgroup(tree, topological=False)¶
- set_outgroup(node, bprops=None, dist=None)¶
Change tree so the given node is set as outgroup.
The original root node will be used as the new root node, so any reference to it in the code will still be valid.
- Parameters:
node – Node to set as outgroup (future first child of the root).
bprops – List of branch properties (other than “dist” and “support”).
dist – Distance from the node, where we put the new root of the tree.
- sort(tree, key=None, reverse=False)¶
Sort the tree in-place.
- swap_props(n1, n2, props)¶
Swap properties between nodes n1 and n2.
- to_dendrogram(tree)¶
Convert tree to dendrogram (remove all distance values).
- to_ultrametric(tree, topological=False)¶
Convert tree to ultrametric (all leaves equidistant from root).
- traverse(tree, order=-1, is_leaf_fn=None)¶
Traverse the tree and yield nodes in pre (< 0) or post (> 0) order.
- traverse_bfs(tree, is_leaf_fn=None)¶
Yield nodes with a breadth-first search (level order traversal).
- traverse_full(tree, order=-1, is_leaf_fn=None, topological=False, descend=None)¶
Traverse tree depth-first and yield (node, seen status, total distance).
Similar to traverse(), but more fully featured (and complex).
- Parameters:
tree – Tree (starting node) to traverse.
order – When to yield (-1 preorder, +1 postorder, 0 prepostorder).
is_leaf_fn – Function that takes a node and returns True if it is considered a leaf. If None, node.is_leaf is used.
topological – If True, the distance between nodes will be the number of nodes between them (instead of the sum of branch lenghts).
descend – If not None, a list whose first element is always checked before going deeper in the traversal. To dynamically cut/avoid branches.
- unroot(tree, bprops=None)¶
Unroot the tree (make the root not have 2 children).
The convention in phylogenetic trees is that if the root has 2 children, it is a “rooted” tree (the root is a real ancestor). Otherwise (typically a root with 3 children), the root is just an arbitrary place to hang the tree.
- update_size(node)¶
Update the size of the given node.
- update_sizes_all(tree)¶
Update sizes of all the nodes in the tree.
- update_sizes_from(node)¶
Update the sizes from the given node to the root of the tree.
- walk(tree)¶
Yield an iterator as it traverses the tree.