Search This Blog

Monday, April 18, 2011

False sharing

 http://en.wikipedia.org/wiki/False_sharing

In computer science, false sharing is a performance degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism.

When a system participant attempts to periodically access data that will never be altered by another party, but that data shares a cache block with data that is altered, the caching protocol may force the first participant to reload the whole unit despite a lack of logical necessity. 

The caching system is unaware of activity within this block and forces the first participant to bear the caching system overhead required by true shared access of a resource.

By far the most common usage of this term is in modern multiprocessor CPU caches, where memory is cached in lines of some small power of two word size (e.g., 64 aligned, contiguous bytes). If two processors operate on independent data in the same memory address region storable in a single line, the cache coherency mechanisms in the system may force the whole line across the bus or interconnect with every data write, forcing memory stalls in addition to wasting system bandwidth. False sharing is an inherent artifact of automatically synchronized cache protocols and can also exist in environments such as distributed file system or databases, but current prevalence is limited to RAM caches.

[edit] Example

struct foo
{
  volatile int x;
  volatile int y;
};
 
foo f;
 
int sum_a()
{
  int s = 0;
  for (int i = 0; i < 1000000; ++i)
    s += f.x;
  return s;
}
 
void inc_b()
{
  for (int i = 0; i < 1000000; ++i)
    ++f.y;
}
Here, sum_a may need to continually re-read x from main memory (instead of from cache) even though inc_b's modification of y should be irrelevant.

False sharing,  in  its simplest  form, occurs  when  two  processors repeatedly write  to  two  different  words  of  the same cache block  in an  interleaved  fashion. This causes the cache block to  bounce back  and  forth  between  the two caches  as  if  the contents of  the block were  truly  being  shared. False  sharing usually  increases with the block size  and  tends  to drive miss rates  up  with increasing block size. 

Thursday, April 14, 2011

查看登陆历史

 last | grep  username

Monday, April 4, 2011

device query

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "Tesla C2070"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5636292608 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes
  Device is using TCC driver mode:               No

Device 1: "Tesla C1060"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    1.3
  Total amount of global memory:                 4294770688 bytes
  Multiprocessors x Cores/MP = Cores:            30 (MP) x 8 (Cores/MP) = 240 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.30 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   No
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = Tesla C2070, Device = Tesla C1060


PASSED

Press <Enter> to Quit...
-----------------------------------------------------------

Sunday, April 3, 2011

matlab

http://www.madio.net/forum-viewthread-tid-31247-extra-page%3D1%26filter%3Dauthor%26dateline%3D604800-page-1.html

cscope E567 no cscope connection

(first generate the database file:
 in command line
cscope -Rbq)

run the command in vim
cscope add cscope.out



【转】cscope 用法

由于公司的代码有点大,光用grep个人感觉太慢了,所以就到网上去找了找cscope的用法。感觉也挺实用的。
以下命令,用来生成cscope的索引文件,
find . -name "*.h" -o -name "*.c"  > cscope.files
cscope -bkq -i cscope.files
然后就可以在vim里使用了,加粗的那几个,用的比较多点。
    find  : Query cscope.  All cscope query options are available
            except option #5 ("Change this grep pattern").
        USAGE   :cs find {querytype} {name}
                0 or s: Find this C symbol
                1 or g: Find this definition
                2 or d: Find functions called by this function
                3 or c: Find functions calling this function
                4 or t: Find this text string
                6 or e: Find this egrep pattern
                7 or f: Find this file
                8 or i: Find files #including this file

Saturday, April 2, 2011