Search This Blog

Wednesday, February 22, 2012

Vector quantization

 http://en.wikipedia.org/wiki/Vector_quantization

Vector quantization

From Wikipedia, the free encyclopedia
Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.
The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation.
Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model.

Contents

 [hide

[edit] Training

A simple training algorithm for vector quantization is:
  1. Pick a sample point at random
  2. Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance
  3. Repeat
A more sophisticated algorithm reduces the bias in the density matching estimation, and ensures that all points are used, by including an extra sensitivity parameter:
  1. Increase each centroid's sensitivity by a small amount
  2. Pick a sample point at random
  3. Find the quantization vector centroid with the smallest <distance-sensitivity>
    1. Move the chosen centroid toward the sample point by a small fraction of the distance
    2. Set the chosen centroid's sensitivity to zero
  4. Repeat
It is desirable to use a cooling schedule to produce convergence: see Simulated annealing.
The algorithm can be iteratively updated with 'live' data, rather than by picking random points from a data set, but this will introduce some bias if the data is temporally correlated over many samples.

[edit] Applications

Vector quantization is used for lossy data compression, lossy data correction and density estimation.
Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid.
For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm).

[edit] Use in data compression

Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data have errors that are inversely proportional to their density.
The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output.
The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a K-dimensional vector [x1,x2,...,xk] of amplitude levels. It is compressed by choosing the nearest matching vector from a set of N-dimensional vectors [y1,y2,...,yn].
All possible combinations of the N-dimensional vector [y1,y2,...,yn] form the vector space to which all the quantized vectors belong.
Block Diagram: A simple vector quantizer is shown below

Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression.
Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization.

[edit] Video codecs based on vector quantization

and old versions of its spiritual successors:
All of which are superseded by the MPEG family.

[edit] Audio codecs based on vector quantization

[edit] See also


Part of this article was originally based on material from the Free On-line Dictionary of Computing and is used with permission under the GFDL.

[edit] References

  1. ^ "Vorbis I Specification". Xiph.org. 2007-03-09. Retrieved 2007-03-09.

[edit] External links

View page ratings
Rate this page
Trustworthy
Objective
Complete
Well-written

Delaunay三角剖分算法

http://baike.baidu.com/view/1691145.htm

Delaunay三角剖分算法

目录
1. 三角剖分与Delaunay剖分的定义
1.1.三角剖分定义
1.2. Delaunay三角剖分的定义
1.3.Delaunay三角剖分的准则
1.4.Delaunay三角剖分的特性
1.5.局部最优化处理
2.Delaunay剖分的算法
2.1.Lawson算法
展开

编辑本段1. 三角剖分与Delaunay剖分的定义

如何把一个散点集合剖分成不均匀的三角形网格,这就是散点集的三角剖分问题,散点集的三角剖分,对数值分析以及图形学来说,都是极为重要的一项预处理技术。该问题图示如下:

编辑本段1.1.三角剖分定义

【定义】三角剖分:假设V是二维实数域上的有限点集,边e是由点集中的点作为端点构成的封闭线段, E为e的集合。那么该点集V的一个三角剖分T=(V,E)是一个平面图G,该平面图满足条件:   1.除了端点,平面图中的边不包含点集中的任何点。   2.没有相交边。   3.平面图中所有的面都是三角面,且所有三角面的合集是散点集V的凸包。

编辑本段1.2. Delaunay三角剖分的定义

在实际中运用的最多的三角剖分是Delaunay三角剖分,它是一种特殊的三角剖分。先从Delaunay边说起:   
【定义】Delaunay边假设E中的一条边e(两个端点为a,b),e若满足下列条件,则称之为Delaunay边:存在一个圆经过a,b两点,圆内(注意是圆内,圆上最多三点共圆)不含点集V中任何其他的点,这一特性又称空圆特性。  
【定义】Delaunay三角剖分:如果点集V的一个三角剖分T只包含Delaunay边,那么该三角剖分称为Delaunay三角剖分。

编辑本段1.3.Delaunay三角剖分的准则

要满足Delaunay三角剖分的定义,必须符合两个重要的准则:   1、空圆特性:Delaunay三角网是唯一的(任意四点不能共圆),在Delaunay三角形网中任一三角形的外接圆范围内不会有其它点存在。如下图所示: 2、最大化最小角特性:在散点集可能形成的三角剖分中,Delaunay三角剖分所形成的三角形的最小角最大。从这个意义上讲,Delaunay 三角网是“最接近于规则化的“的三角网。具体的说是指在两个相邻的三角形构成凸四边形的对角线,在相互交换后,六个内角的最小角不再增大。如下图所示:

编辑本段1.4.Delaunay三角剖分的特性

以下是Delaunay剖分所具备的优异特性:   1.最接近:以最近临的三点形成三角形,且各线段(三角形的边)皆不相交。   2.唯一性:不论从区域何处开始构建,最终都将得到一致的结果。   3.最优性:任意两个相邻三角形形成的凸四边形的对角线如果可以互换的话,那么两个三角形六个内角中最小的角度不会变大。   4.最规则:如果将三角网中的每个三角形的最小角进行升序排列,则Delaunay三角网的排列得到的数值最大。   5.区域性:新增、删除、移动某一个顶点时只会影响临近的三角形。   6.具有凸多边形的外壳:三角网最外层的边界形成一个凸多边形的外壳。

编辑本段1.5.局部最优化处理

理论上为了构造Delaunay三角网,Lawson提出的局部优化过程LOP(Local Optimization Procedure),一般三角网经过LOP处理,即可确保成为Delaunay三角网,其基本做法如下所示:   1.将两个具有共同边的三角形合成一个多边形。   2.以最大空圆准则作检查,看其第四个顶点是否在三角形的外接圆之内。   3.如果在,修正对角线即将对角线对调,即完成局部优化过程的处理。   LOP处理过程如下图所示:

编辑本段2.Delaunay剖分的算法

Delaunay剖分是一种三角剖分的标准,实现它有多种算法。

编辑本段2.1.Lawson算法

逐点插入的Lawson算法是Lawson在1977年提出的,该算法思路简单,易于编程实现。基本原理为:首先建立一个大的三角形或多边形,把所有数 据点包围起来,向其中插入一点,该点与包含它的三角形三个顶点相连,形成三个新的三角形,然后逐个对它们进行空外接圆检测,同时用Lawson设计的局部 优化过程LOP进行优化,即通过交换对角线的方法来保证所形成的三角网为Delaunay三角网。   上述基于散点的构网算法理论严密、唯一性好,网格满足空圆特性,较为理想。由其逐点插入的构网 过程可知,遇到非Delaunay边时,通过删除调整,可以构造形成新的Delaunay边。在完成构网后,增加新点时,无需对所有的点进行重新构网,只 需对新点的影响三角形范围进行局部联网,且局部联网的方法简单易行。同样,点的删除、移动也可快速动态地进行。但在实际应用当中,这种构网算法当点集较大 时构网速度也较慢,如果点集范围是非凸区域或者存在内环,则会产生非法三角形。   如下图所示:当离散点集构成圆环时,Lawson算法产生的非法三角形
  
离散点集合
  
正确的三角剖分
  
Lawson算法产生的三角剖分

编辑本段2.2.Bowyer-Watson算法

Lawson算法(Lawson???)的基本步骤是:   1、构造一个超级三角形,包含所有散点,放入三角形链表。   2、将点集中的散点依次插入,在三角形链表中找出其外接圆包含插入点的三角形(称为该点的影响三角形),删除影响三角形的公共边,将插入点同影响三角形的全部顶点连接起来,从而完成一个点在Delaunay三角形链表中的插入。   3、根据优化准则对局部新形成的三角形进行优化。将形成的三角形放入Delaunay三角形链表。   4、循环执行上述第2步,直到所有散点插入完毕。   这一算法的关键的第2步图示如下:

Monday, February 20, 2012

Hierarchical clustering

http://en.wikipedia.org/wiki/Hierarchical_clustering

Hierarchical clustering

From Wikipedia, the free encyclopedia
In statistics, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
  • Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
  • Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
In the general case, the complexity of agglomerative clustering is \mathcal{O}(n^3), which makes them too slow for large data sets. Divisive clustering with an exhaustive search is \mathcal{O}(2^n), which is even worse. However, for some special cases, optimal efficient agglomerative methods (of complexity \mathcal{O}(n^2)) are known: SLINK[1] for single-linkage and CLINK[2] for complete-linkage clustering.

Contents

 [hide

[edit] Cluster dissimilarity

In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate metric (a measure of distance between pairs of observations), and a linkage criterion which specifies the dissimilarity of sets as a function of the pairwise distances of observations in the sets.

[edit] Metric

The choice of an appropriate metric will influence the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another. For example, in a 2-dimensional space, the distance between the point (1,0) and the origin (0,0) is always 1 according to the usual norms, but the distance between the point (1,1) and the origin (0,0) can be 2, \scriptstyle\sqrt{2} or 1 under Manhattan distance, Euclidean distance or maximum distance respectively.
Some commonly used metrics for hierarchical clustering are:[3]
Names Formula
Euclidean distance  \|a-b \|_2 = \sqrt{\sum_i (a_i-b_i)^2}
squared Euclidean distance  \|a-b \|_2^2 = \sum_i (a_i-b_i)^2
Manhattan distance  \|a-b \|_1 = \sum_i |a_i-b_i|
maximum distance  \|a-b \|_\infty = \max_i |a_i-b_i|
Mahalanobis distance  \sqrt{(a-b)^{\top}S^{-1}(a-b)} where S is the covariance matrix
cosine similarity  \frac{a \cdot b}{\|a\| \|b\|}
For text or other non-numeric data, metrics such as the Hamming distance or Levenshtein distance are often used.
A review of cluster analysis in health psychology research found that the most common distance measure in published studies in that research area is the Euclidean distance or the squared Euclidean distance.[citation needed]

[edit] Linkage criteria

The linkage criteria determines the distance between sets of observations as a function of the pairwise distances between observations.
Some commonly used linkage criteria between two sets of observations A and B are:[4][5]
Names Formula
Maximum or complete linkage clustering  \max \, \{\, d(a,b) : a \in A,\, b \in B \,\}.
Minimum or single-linkage clustering  \min \, \{\, d(a,b) : a \in A,\, b \in B \,\}.
Mean or average linkage clustering, or UPGMA  \frac{1}{|A| |B|} \sum_{a \in A }\sum_{ b \in B} d(a,b).
Minimum energy clustering   \frac {2}{nm}\sum_{i,j=1}^{n,m} \|a_i- b_j\|_2 - \frac {1}{n^2}\sum_{i,j=1}^{n} \|a_i-a_j\|_2 - \frac{1}{m^2}\sum_{i,j=1}^{m} \|b_i-b_j\|_2
where d is the chosen metric. Other linkage criteria include:
  • The sum of all intra-cluster variance.
  • The increase in variance for the cluster being merged ( Ward's criterion).[6]
  • The probability that candidate clusters spawn from the same distribution function (V-linkage).

[edit] Discussion

Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances.

[edit] Example for Agglomerative Clustering

For example, suppose this data is to be clustered, and the Euclidean distance is the distance metric.
Cutting the tree at a given height will give a partitioning clustering at a selected precision. In this example, cutting after the second row will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger clusters.
Raw data
The hierarchical clustering dendrogram would be as such:
Traditional representation
This method builds the hierarchy from the individual elements by progressively merging clusters. In our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.
Optionally, one can also construct a distance matrix at this stage, where the number in the i-th row j-th column is the distance between the i-th and j-th elements. Then, as clustering progresses, rows and columns are merged as the clusters are merged and the distances updated. This is a common way to implement this type of clustering, and has the benefit of caching distances between clusters. A simple agglomerative clustering algorithm is described in the single-linkage clustering page; it can easily be adapted to different types of linkage (see below).
Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. To do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters \mathcal{A} and \mathcal{B} is one of the following:
 \max \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B}\,\}.
 \min \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B} \,\}.
  • The mean distance between elements of each cluster (also called average linkage clustering, used e.g. in UPGMA):
 {1 \over {|\mathcal{A}|\cdot|\mathcal{B}|}}\sum_{x \in \mathcal{A}}\sum_{ y \in \mathcal{B}} d(x,y).
  • The sum of all intra-cluster variance.
  • The increase in variance for the cluster being merged ( Ward's method[6])
  • The probability that candidate clusters spawn from the same distribution function (V-linkage).
Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).

[edit] Software

[edit] Free

[edit] Commercial

[edit] See also

[edit] Notes

  1. ^ R. Sibson (1973). "SLINK: an optimally efficient algorithm for the single-link cluster method". The Computer Journal (British Computer Society) 16 (1): 30–34.
  2. ^ D. Defays (1977). "An efficient algorithm for a complete link method". The Computer Journal (British Computer Society) 20 (4): 364–366.
  3. ^ "The DISTANCE Procedure: Proximity Measures". SAS/STAT 9.2 Users Guide. SAS Institute. Retrieved 2009-04-26.
  4. ^ "The CLUSTER Procedure: Clustering Methods". SAS/STAT 9.2 Users Guide. SAS Institute. Retrieved 2009-04-26.
  5. ^ Székely, G. J. and Rizzo, M. L. (2005) Hierarchical clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification 22, 151-183.
  6. ^ a b Ward, Joe H. (1963). "Hierarchical Grouping to Optimize an Objective Function". Journal of the American Statistical Association 58 (301): 236–244. doi:10.2307/2282967. JSTOR 2282967. MR0148188.
  7. ^ Fernández, Alberto; Gómez, Sergio (2008). "Solving Non-uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms". Journal of Classification 25: 43–65. doi:10.1007/s00357-008-9004-x.

[edit] References and further reading

View page ratings
Rate this page
Trustworthy
Objective
Complete
Well-writte