SciPy 0.6.0 API Documentation Generated by Endo, 2007-10-17
Vector Quantization Module
Provides several routines used in creating a code book from a set of observations and comparing a set of observations to a code book.
All routines expect an "observation vector" to be stored in each row of the obs matrix. Similarly the codes are stored row wise in the code book matrix.
Generate a code book with minimum distortion.
| Parameters: |
|
|---|---|
| Returns: |
|
| SeeAlso: |
|
>>> from numpy import array
>>> from scipy.cluster.vq import vq, kmeans, whiten
>>> features = array([[ 1.9,2.3],
... [ 1.5,2.5],
... [ 0.8,0.6],
... [ 0.4,1.8],
... [ 0.1,0.1],
... [ 0.2,1.8],
... [ 2.0,0.5],
... [ 0.3,1.5],
... [ 1.0,1.0]])
>>> whitened = whiten(features)
>>> book = array((whitened[0],whitened[2]))
>>> kmeans(whitened,book)
(array([[ 2.3110306 , 2.86287398],
[ 0.93218041, 1.24398691]]), 0.85684700941625547)
>>> from numpy import random
>>> random.seed((1000,2000))
>>> codes = 3
>>> kmeans(whitened,codes)
(array([[ 2.3110306 , 2.86287398],
[ 1.32544402, 0.65607529],
[ 0.40782893, 2.02786907]]), 0.5196582527686241)
Classify a set of points into k clusters using kmean algorithm.
The algorithm works by minimizing the euclidian distance between data points of cluster means. This version is more complete than kmean (has several initialisation methods).
| Parameters: |
|
|---|---|
| Returns: |
|
Python version of vq algorithm.
The algorithm simply computes the euclidian distance between each observation and every frame in the code_book.
| Parameters: |
|
|---|---|
| Note: | This function is slower than the C versions, but it works for all input types. If the inputs have the wrong types for the C versions of the function, this one is called as a last resort. Its about 20 times slower than the C versions. |
| Returns: |
|
2nd Python version of vq algorithm.
The algorithm simply computes the euclidian distance between each observation and every frame in the code_book/
| Parameters: |
|
|---|---|
| Note: | This could be faster when number of codebooks is small, but it becomes a real memory hog when codebook is large. It requires NxMxO storage where N=number of obs, M = number of features, and O = number of codes. |
| Returns: |
|
Vector Quantization: assign features sets to codes in a code book.
Vector quantization determines which code in the code book best represents an observation of a target. The features of each observation are compared to each code in the book, and assigned the one closest to it. The observations are contained in the obs array. These features should be "whitened," or nomalized by the standard deviation of all the features before being quantized. The code book can be created using the kmeans algorithm or something similar.
| Parameters: |
|
|---|---|
| Returns: |
|
This currently forces 32 bit math precision for speed. Anyone know of a situation where this undermines the accuracy of the algorithm?
>>> from numpy import array >>> from scipy.cluster.vq import vq >>> code_book = array([[1.,1.,1.], ... [2.,2.,2.]]) >>> features = array([[ 1.9,2.3,1.7], ... [ 1.5,2.5,2.2], ... [ 0.8,0.6,1.7]]) >>> vq(features,code_book) (array([1, 1, 0],'i'), array([ 0.43588989, 0.73484692, 0.83066239]))
Normalize a group of observations on a per feature basis.
Before running kmeans algorithms, it is beneficial to "whiten", or scale, the observation data on a per feature basis. This is done by dividing each feature by its standard deviation across all observations.
| Parameters: |
|
|---|---|
| Returns: |
|
>>> from numpy import array
>>> from scipy.cluster.vq import whiten
>>> features = array([[ 1.9,2.3,1.7],
... [ 1.5,2.5,2.2],
... [ 0.8,0.6,1.7,]])
>>> whiten(features)
array([[ 3.41250074, 2.20300046, 5.88897275],
[ 2.69407953, 2.39456571, 7.62102355],
[ 1.43684242, 0.57469577, 5.88897275]])
| Local name | Refers to |
|---|---|
| arange | numpy.arange |
| argmin | numpy.argmin |
| array | numpy.array |
| common_type | numpy.common_type |
| compress | numpy.compress |
| double | numpy.double |
| equal | numpy.equal |
| mean | numpy.mean |
| minimum | numpy.minimum |
| N | numpy |
| newaxis | numpy.newaxis |
| randint | numpy.random.randint |
| shape | numpy.shape |
| single | numpy.single |
| sqrt | numpy.sqrt |
| std | numpy.std |
| take | numpy.take |
| warnings | warnings |
| zeros | numpy.zeros |