Clustering

class ClusterTree(data=None, children=None, text_array=None, fdist=<function spearman_dist>)[source]

Bases: Tree

A ClusterTree is a Tree that represents a clustering result.

__init__(data=None, children=None, text_array=None, fdist=<function spearman_dist>)[source]
Parameters:
  • data – A string or file object with the description of the tree as a newick, or a dict with the contents of a single node.

  • children – List of nodes to add as children of this one.

  • parser – A description of how to parse a newick to create a tree. It can be a single number specifying the format or a structure with a fine-grained description of how to interpret nodes (see newick.pyx).

Examples:

t1 = Tree()  # empty tree
t2 = Tree({'name': 'A'})
t3 = Tree('(A:1,(B:1,(C:1,D:1):0.5):0.5);')
t4 = Tree(open('/home/user/my-tree.nw'))
property deviation
get_dunn(clusters, fdist=None)[source]

Calculates the Dunn index for the given set of descendant nodes.

get_silhouette(fdist=None)[source]

Calculates the node’s silhouette value by using a given distance function. By default, euclidean distance is used. It also calculates the deviation profile, mean profile, and inter/intra-cluster distances.

It sets the following features into the analyzed node:
  • node.intracluster

  • node.intercluster

  • node.silhouete

Intracluster distances a(i) are calculated as the Centroid Diameter.

Intercluster distances b(i) are calculated as the Centroid linkage distance.

Citation:

Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.

  1. Comput. Appl. Math., 20, 53-65.

property intercluster_dist
property intracluster_dist
leaf_profiles()[source]

Yield profiles associated to the leaves under this node.

Link the given arraytable to the tree and return a list of nodes for with profiles could not been found in arraytable.

Row names in the arraytable object are expected to match leaf names.

property profile
set_distance_function(fn)[source]

Set the distance function used to calculate cluster distances and silouette index.

Parameters:

fn – Function acepting two numpy arrays as arguments.

Example::

# Set a simple euclidean distance.
my_dist_fn = lambda x,y: abs(x-y)
tree.set_distance_function(my_dist_fn)
property silhouette