Clustering¶
- class ClusterTree(data=None, children=None, text_array=None, fdist=<function spearman_dist>)[source]¶
Bases:
Tree
A ClusterTree is a Tree that represents a clustering result.
- __init__(data=None, children=None, text_array=None, fdist=<function spearman_dist>)[source]¶
- Parameters:
data – A string or file object with the description of the tree as a newick, or a dict with the contents of a single node.
children – List of nodes to add as children of this one.
parser – A description of how to parse a newick to create a tree. It can be a single number specifying the format or a structure with a fine-grained description of how to interpret nodes (see
newick.pyx
).
Examples:
t1 = Tree() # empty tree t2 = Tree({'name': 'A'}) t3 = Tree('(A:1,(B:1,(C:1,D:1):0.5):0.5);') t4 = Tree(open('/home/user/my-tree.nw'))
- property deviation¶
- get_dunn(clusters, fdist=None)[source]¶
Calculates the Dunn index for the given set of descendant nodes.
- get_silhouette(fdist=None)[source]¶
Calculates the node’s silhouette value by using a given distance function. By default, euclidean distance is used. It also calculates the deviation profile, mean profile, and inter/intra-cluster distances.
- It sets the following features into the analyzed node:
node.intracluster
node.intercluster
node.silhouete
Intracluster distances a(i) are calculated as the Centroid Diameter.
Intercluster distances b(i) are calculated as the Centroid linkage distance.
- Citation:
Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.
Comput. Appl. Math., 20, 53-65.
- property intercluster_dist¶
- property intracluster_dist¶
- link_to_arraytable(arraytbl)[source]¶
Link the given arraytable to the tree and return a list of nodes for with profiles could not been found in arraytable.
Row names in the arraytable object are expected to match leaf names.
- property profile¶
- set_distance_function(fn)[source]¶
Set the distance function used to calculate cluster distances and silouette index.
- Parameters:
fn – Function acepting two numpy arrays as arguments.
Example::
# Set a simple euclidean distance. my_dist_fn = lambda x,y: abs(x-y) tree.set_distance_function(my_dist_fn)
- property silhouette¶