Search This Blog

Friday, April 12, 2013

ctags

【前言】
    ctags是vim的一个非常有用的插件,可以大大提高程序编码(C、C++)的效率,比如快速调整到函数、变量定义处等等常用
功能,更详细的功能,自行百度。

【目的】
    在RHEL 5上面安装配置vim插件ctags

【前提条件】
    vim已经安装

【安装ctags】
    1、首先确定自己是否已经安装了ctag:
[root@ericsson:~]#which ctags/usr/bin/ctags
    如果结果如上,证明已经安装了该插件,则跳过下面步骤。
    如果找不到,极有可能ctags还没安装,安装步骤如下:
    1、下载ctags:http://ctags.sourceforge.net
    2、上传到linux,解压,假设目录为:/tmp/ctags
    3、确保/tmp/ctags/configure文件有执行权限,执行该文件:/tmp/ctags/configure
    4、执行命令:make;make install
    如果顺利,上面4个步骤可以成功安装ctags

【配置ctags】
    假设你要编译的源代码目录位置为:/worksapce/weather
    执行下面操作,生成tags标签文件:
复制代码
[root@ericsson:/workspace/weather]#pwd/workspace/weather [root@ericsson:/workspace/weather]#ls dispacth.cpp dispacth.h main.cpp [root@ericsson:/workspace/weather]#ctags -R * [root@ericsson:/workspace/weather]#ls dispacth.cpp dispacth.h main.cpp tags
复制代码
   
    vim的配置文件有下面三个:
        system vimrc file: "/etc/vimrc"
        user vimrc file: "$HOME/.vimrc"
        user exrc file: "$HOME/.exrc"
    我本机配置vim是针对全局的,所以修改/etc/vimrc文件,在文件最后添加下面内容:
set tags=/workspace/weather/tags


   
       如果有多个tags文件则用逗号隔开(tags文件名可以相同),设置完tags变量之后,使用如下:

[root@ericsson:/workspace/weather]#ls a.out dispacth.cpp dispacth.h main.cpp tags [root@ericsson:/workspace/weather]#vi -t ttcs
    ttcs是函数名,被定义在main.cpp,如果ctags安装配置正确,则会自动跳到该函数定义处,如果有多个函数,则会出现一个列表。如果
出现:“E257: cstag: tag not found ”,建议重新到代码目录,运行:ctags -R *

How to use cscope and ctags

(suppose cscope, ctags and vim are all installed
check:
which cscope)

First, generate database files:

1.
find . -name "*.h" -o -name "*.c"  > cscope.files
cscope -bkq -i cscope.files
2.
ctags -R *

Second, modify ~/.vimrc

add:

set tags=tags;                              (maybe need to explicity point out the path)
set autochdir

Third,

in vim, run:
 : cscope add cscope.out

Wednesday, February 22, 2012

Vector quantization

 http://en.wikipedia.org/wiki/Vector_quantization

Vector quantization

From Wikipedia, the free encyclopedia
Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.
The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation.
Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model.

Contents

 [hide

[edit] Training

A simple training algorithm for vector quantization is:
  1. Pick a sample point at random
  2. Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance
  3. Repeat
A more sophisticated algorithm reduces the bias in the density matching estimation, and ensures that all points are used, by including an extra sensitivity parameter:
  1. Increase each centroid's sensitivity by a small amount
  2. Pick a sample point at random
  3. Find the quantization vector centroid with the smallest <distance-sensitivity>
    1. Move the chosen centroid toward the sample point by a small fraction of the distance
    2. Set the chosen centroid's sensitivity to zero
  4. Repeat
It is desirable to use a cooling schedule to produce convergence: see Simulated annealing.
The algorithm can be iteratively updated with 'live' data, rather than by picking random points from a data set, but this will introduce some bias if the data is temporally correlated over many samples.

[edit] Applications

Vector quantization is used for lossy data compression, lossy data correction and density estimation.
Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid.
For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm).

[edit] Use in data compression

Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data have errors that are inversely proportional to their density.
The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output.
The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a K-dimensional vector [x1,x2,...,xk] of amplitude levels. It is compressed by choosing the nearest matching vector from a set of N-dimensional vectors [y1,y2,...,yn].
All possible combinations of the N-dimensional vector [y1,y2,...,yn] form the vector space to which all the quantized vectors belong.
Block Diagram: A simple vector quantizer is shown below

Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression.
Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization.

[edit] Video codecs based on vector quantization

and old versions of its spiritual successors:
All of which are superseded by the MPEG family.

[edit] Audio codecs based on vector quantization

[edit] See also


Part of this article was originally based on material from the Free On-line Dictionary of Computing and is used with permission under the GFDL.

[edit] References

  1. ^ "Vorbis I Specification". Xiph.org. 2007-03-09. Retrieved 2007-03-09.

[edit] External links

View page ratings
Rate this page
Trustworthy
Objective
Complete
Well-written

Delaunay三角剖分算法

http://baike.baidu.com/view/1691145.htm

Delaunay三角剖分算法

目录
1. 三角剖分与Delaunay剖分的定义
1.1.三角剖分定义
1.2. Delaunay三角剖分的定义
1.3.Delaunay三角剖分的准则
1.4.Delaunay三角剖分的特性
1.5.局部最优化处理
2.Delaunay剖分的算法
2.1.Lawson算法
展开

编辑本段1. 三角剖分与Delaunay剖分的定义

如何把一个散点集合剖分成不均匀的三角形网格,这就是散点集的三角剖分问题,散点集的三角剖分,对数值分析以及图形学来说,都是极为重要的一项预处理技术。该问题图示如下:

编辑本段1.1.三角剖分定义

【定义】三角剖分:假设V是二维实数域上的有限点集,边e是由点集中的点作为端点构成的封闭线段, E为e的集合。那么该点集V的一个三角剖分T=(V,E)是一个平面图G,该平面图满足条件:   1.除了端点,平面图中的边不包含点集中的任何点。   2.没有相交边。   3.平面图中所有的面都是三角面,且所有三角面的合集是散点集V的凸包。

编辑本段1.2. Delaunay三角剖分的定义

在实际中运用的最多的三角剖分是Delaunay三角剖分,它是一种特殊的三角剖分。先从Delaunay边说起:   
【定义】Delaunay边假设E中的一条边e(两个端点为a,b),e若满足下列条件,则称之为Delaunay边:存在一个圆经过a,b两点,圆内(注意是圆内,圆上最多三点共圆)不含点集V中任何其他的点,这一特性又称空圆特性。  
【定义】Delaunay三角剖分:如果点集V的一个三角剖分T只包含Delaunay边,那么该三角剖分称为Delaunay三角剖分。

编辑本段1.3.Delaunay三角剖分的准则

要满足Delaunay三角剖分的定义,必须符合两个重要的准则:   1、空圆特性:Delaunay三角网是唯一的(任意四点不能共圆),在Delaunay三角形网中任一三角形的外接圆范围内不会有其它点存在。如下图所示: 2、最大化最小角特性:在散点集可能形成的三角剖分中,Delaunay三角剖分所形成的三角形的最小角最大。从这个意义上讲,Delaunay 三角网是“最接近于规则化的“的三角网。具体的说是指在两个相邻的三角形构成凸四边形的对角线,在相互交换后,六个内角的最小角不再增大。如下图所示:

编辑本段1.4.Delaunay三角剖分的特性

以下是Delaunay剖分所具备的优异特性:   1.最接近:以最近临的三点形成三角形,且各线段(三角形的边)皆不相交。   2.唯一性:不论从区域何处开始构建,最终都将得到一致的结果。   3.最优性:任意两个相邻三角形形成的凸四边形的对角线如果可以互换的话,那么两个三角形六个内角中最小的角度不会变大。   4.最规则:如果将三角网中的每个三角形的最小角进行升序排列,则Delaunay三角网的排列得到的数值最大。   5.区域性:新增、删除、移动某一个顶点时只会影响临近的三角形。   6.具有凸多边形的外壳:三角网最外层的边界形成一个凸多边形的外壳。

编辑本段1.5.局部最优化处理

理论上为了构造Delaunay三角网,Lawson提出的局部优化过程LOP(Local Optimization Procedure),一般三角网经过LOP处理,即可确保成为Delaunay三角网,其基本做法如下所示:   1.将两个具有共同边的三角形合成一个多边形。   2.以最大空圆准则作检查,看其第四个顶点是否在三角形的外接圆之内。   3.如果在,修正对角线即将对角线对调,即完成局部优化过程的处理。   LOP处理过程如下图所示:

编辑本段2.Delaunay剖分的算法

Delaunay剖分是一种三角剖分的标准,实现它有多种算法。

编辑本段2.1.Lawson算法

逐点插入的Lawson算法是Lawson在1977年提出的,该算法思路简单,易于编程实现。基本原理为:首先建立一个大的三角形或多边形,把所有数 据点包围起来,向其中插入一点,该点与包含它的三角形三个顶点相连,形成三个新的三角形,然后逐个对它们进行空外接圆检测,同时用Lawson设计的局部 优化过程LOP进行优化,即通过交换对角线的方法来保证所形成的三角网为Delaunay三角网。   上述基于散点的构网算法理论严密、唯一性好,网格满足空圆特性,较为理想。由其逐点插入的构网 过程可知,遇到非Delaunay边时,通过删除调整,可以构造形成新的Delaunay边。在完成构网后,增加新点时,无需对所有的点进行重新构网,只 需对新点的影响三角形范围进行局部联网,且局部联网的方法简单易行。同样,点的删除、移动也可快速动态地进行。但在实际应用当中,这种构网算法当点集较大 时构网速度也较慢,如果点集范围是非凸区域或者存在内环,则会产生非法三角形。   如下图所示:当离散点集构成圆环时,Lawson算法产生的非法三角形
  
离散点集合
  
正确的三角剖分
  
Lawson算法产生的三角剖分

编辑本段2.2.Bowyer-Watson算法

Lawson算法(Lawson???)的基本步骤是:   1、构造一个超级三角形,包含所有散点,放入三角形链表。   2、将点集中的散点依次插入,在三角形链表中找出其外接圆包含插入点的三角形(称为该点的影响三角形),删除影响三角形的公共边,将插入点同影响三角形的全部顶点连接起来,从而完成一个点在Delaunay三角形链表中的插入。   3、根据优化准则对局部新形成的三角形进行优化。将形成的三角形放入Delaunay三角形链表。   4、循环执行上述第2步,直到所有散点插入完毕。   这一算法的关键的第2步图示如下:

Monday, February 20, 2012

Hierarchical clustering

http://en.wikipedia.org/wiki/Hierarchical_clustering

Hierarchical clustering

From Wikipedia, the free encyclopedia
In statistics, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
  • Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
  • Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
In the general case, the complexity of agglomerative clustering is \mathcal{O}(n^3), which makes them too slow for large data sets. Divisive clustering with an exhaustive search is \mathcal{O}(2^n), which is even worse. However, for some special cases, optimal efficient agglomerative methods (of complexity \mathcal{O}(n^2)) are known: SLINK[1] for single-linkage and CLINK[2] for complete-linkage clustering.

Contents

 [hide

[edit] Cluster dissimilarity

In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate metric (a measure of distance between pairs of observations), and a linkage criterion which specifies the dissimilarity of sets as a function of the pairwise distances of observations in the sets.

[edit] Metric

The choice of an appropriate metric will influence the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another. For example, in a 2-dimensional space, the distance between the point (1,0) and the origin (0,0) is always 1 according to the usual norms, but the distance between the point (1,1) and the origin (0,0) can be 2, \scriptstyle\sqrt{2} or 1 under Manhattan distance, Euclidean distance or maximum distance respectively.
Some commonly used metrics for hierarchical clustering are:[3]
Names Formula
Euclidean distance  \|a-b \|_2 = \sqrt{\sum_i (a_i-b_i)^2}
squared Euclidean distance  \|a-b \|_2^2 = \sum_i (a_i-b_i)^2
Manhattan distance  \|a-b \|_1 = \sum_i |a_i-b_i|
maximum distance  \|a-b \|_\infty = \max_i |a_i-b_i|
Mahalanobis distance  \sqrt{(a-b)^{\top}S^{-1}(a-b)} where S is the covariance matrix
cosine similarity  \frac{a \cdot b}{\|a\| \|b\|}
For text or other non-numeric data, metrics such as the Hamming distance or Levenshtein distance are often used.
A review of cluster analysis in health psychology research found that the most common distance measure in published studies in that research area is the Euclidean distance or the squared Euclidean distance.[citation needed]

[edit] Linkage criteria

The linkage criteria determines the distance between sets of observations as a function of the pairwise distances between observations.
Some commonly used linkage criteria between two sets of observations A and B are:[4][5]
Names Formula
Maximum or complete linkage clustering  \max \, \{\, d(a,b) : a \in A,\, b \in B \,\}.
Minimum or single-linkage clustering  \min \, \{\, d(a,b) : a \in A,\, b \in B \,\}.
Mean or average linkage clustering, or UPGMA  \frac{1}{|A| |B|} \sum_{a \in A }\sum_{ b \in B} d(a,b).
Minimum energy clustering   \frac {2}{nm}\sum_{i,j=1}^{n,m} \|a_i- b_j\|_2 - \frac {1}{n^2}\sum_{i,j=1}^{n} \|a_i-a_j\|_2 - \frac{1}{m^2}\sum_{i,j=1}^{m} \|b_i-b_j\|_2
where d is the chosen metric. Other linkage criteria include:
  • The sum of all intra-cluster variance.
  • The increase in variance for the cluster being merged ( Ward's criterion).[6]
  • The probability that candidate clusters spawn from the same distribution function (V-linkage).

[edit] Discussion

Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances.

[edit] Example for Agglomerative Clustering

For example, suppose this data is to be clustered, and the Euclidean distance is the distance metric.
Cutting the tree at a given height will give a partitioning clustering at a selected precision. In this example, cutting after the second row will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger clusters.
Raw data
The hierarchical clustering dendrogram would be as such:
Traditional representation
This method builds the hierarchy from the individual elements by progressively merging clusters. In our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.
Optionally, one can also construct a distance matrix at this stage, where the number in the i-th row j-th column is the distance between the i-th and j-th elements. Then, as clustering progresses, rows and columns are merged as the clusters are merged and the distances updated. This is a common way to implement this type of clustering, and has the benefit of caching distances between clusters. A simple agglomerative clustering algorithm is described in the single-linkage clustering page; it can easily be adapted to different types of linkage (see below).
Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. To do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters \mathcal{A} and \mathcal{B} is one of the following:
 \max \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B}\,\}.
 \min \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B} \,\}.
  • The mean distance between elements of each cluster (also called average linkage clustering, used e.g. in UPGMA):
 {1 \over {|\mathcal{A}|\cdot|\mathcal{B}|}}\sum_{x \in \mathcal{A}}\sum_{ y \in \mathcal{B}} d(x,y).
  • The sum of all intra-cluster variance.
  • The increase in variance for the cluster being merged ( Ward's method[6])
  • The probability that candidate clusters spawn from the same distribution function (V-linkage).
Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).

[edit] Software

[edit] Free

[edit] Commercial

[edit] See also

[edit] Notes

  1. ^ R. Sibson (1973). "SLINK: an optimally efficient algorithm for the single-link cluster method". The Computer Journal (British Computer Society) 16 (1): 30–34.
  2. ^ D. Defays (1977). "An efficient algorithm for a complete link method". The Computer Journal (British Computer Society) 20 (4): 364–366.
  3. ^ "The DISTANCE Procedure: Proximity Measures". SAS/STAT 9.2 Users Guide. SAS Institute. Retrieved 2009-04-26.
  4. ^ "The CLUSTER Procedure: Clustering Methods". SAS/STAT 9.2 Users Guide. SAS Institute. Retrieved 2009-04-26.
  5. ^ Székely, G. J. and Rizzo, M. L. (2005) Hierarchical clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification 22, 151-183.
  6. ^ a b Ward, Joe H. (1963). "Hierarchical Grouping to Optimize an Objective Function". Journal of the American Statistical Association 58 (301): 236–244. doi:10.2307/2282967. JSTOR 2282967. MR0148188.
  7. ^ Fernández, Alberto; Gómez, Sergio (2008). "Solving Non-uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms". Journal of Classification 25: 43–65. doi:10.1007/s00357-008-9004-x.

[edit] References and further reading

View page ratings
Rate this page
Trustworthy
Objective
Complete
Well-writte

Friday, November 11, 2011

强化学习(Reinforcement learning)

在强化学习(Reinforcement learning)中,学习主体自身通过训练,误差和反馈,学习在环境中完成目标的最佳策略。我们并没有直接告诉主体要做什么或采取那个动作,而是主体通过看那个动作得到了最多的奖励来自己发现。
强化学习由四部分组成:策略 奖励函数 值映射 和一个环境模型。
设计强化学习算法是要考虑三方面问题。
       一,如何表示状态空间和动作空间。
       二,如何选择建立信号以及如何通过学习来修正不同状态-动作对的值。
       三,  如何根据这些值来选择适合的动作。
       用强化学习方法研究未知环境下的机器人导航,由于环境的复杂性和不确定性,这些问题变得更复杂。
所谓强化学习是指从环境状态到动作映射的学习,以使动作从环境中获得的累积奖赏值最大.该方法不同于监督学习技术那样通过正例、反例来告知采 取何种行为,而是通过试错(trial and error)来发现最优行为策略。常用的强化学习算法包括TD(Temporal Difference)算法、Q学习算法、Sarsa算法等
标准的强化学习,智能体作为学习系统,获取外部环境的当前状态信息s,对环境采取试探行为u,并获取环境反馈的对此动作的评价r和新的环境状 态。如果智能体的某动作u导致环境正的奖赏(立即报酬),那么智能体以后产生这个动作的趋势便会加强;反之,智能体产生这个动作的趋势将减弱。在学习系统 的控制行为与环境反馈的状态及评价的反复的交互作用中,以学习的方式不断修改从状态到动作的映射策略,以达到优化系统性能目的。
据此可得出强化学习的目标是:学习从环境状态到行为的映射,使得智能体选择的行为能够获得环境最大的奖赏,使得外部环境对学习系统在某种意义下的评价 (或整个系统的运行性能)最佳。

http://en.wikipedia.org/wiki/Reinforcement_learning

Sunday, October 30, 2011

矩形相交算法

假定矩形是用一对点表达的(minx,miny)(maxx, maxy)
那么两个矩形rect1{(minx1,miny1)(maxx1, maxy1)}, rect2{(minx2,miny2)(maxx2, maxy2)}

相交的结果一定是个矩形,构成这个相交矩形rect{(minx,miny)(maxx, maxy)}的点对坐标是:
minx = max(minx1, minx2)
miny = max(miny1, miny2)
maxx = min(maxx1, maxx2)
maxy = min(maxy1, maxy2)

如果两个矩形不相交,那么计算得到的点对坐标必然满足
minx > maxx
或者
miny > maxy

判定是否相交,以及相交矩形是什么都可以用这个方法一体计算完成