Codeb2cc's Blog

Cogito ergo sum

HBase Shell Commands

HBase shell commands are mainly categorized into 6 parts:

General HBase Shell Commands

status

Show cluster status. Can be ‘summary’, ‘simple’, or ‘detailed’. The default is ‘summary’.

hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'

version

Output this HBase versionUsage:

hbase> version

whoami

Show the current hbase user.Usage:

hbase> whoami

Tables Management Commands

alter

Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described on the main help command output. Dictionary must include name of column family to alter.For example, to change or add the ‘f1’ column family in table ‘t1’ from current value to keep a maximum of 5 cell VERSIONS, do:

hbase> alter 't1', NAME => 'f1', VERSIONS => 5

You can operate on several column families:

hbase> alter 't1', 'f1', {NAME => 'f2', IN_MEMORY => true}, {NAME => 'f3', VERSIONS => 5}

To delete the ‘f1’ column family in table ‘t1’, use one of:

hbase> alter 't1', NAME => 'f1', METHOD => 'delete'
hbase> alter 't1', 'delete' => 'f1'

You can also change table-scope attributes like MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end; for example, to change the max size of a region to 128MB, do:

hbase> alter 't1', MAX_FILESIZE => '134217728'

You can add a table coprocessor by setting a table coprocessor attribute:

hbase> alter 't1', 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'

Since you can have multiple coprocessors configured for a table, a sequence number will be automatically appended to the attribute name to uniquely identify it.

The coprocessor attribute must match the pattern below in order for the framework to understand how to load the coprocessor classes:

[coprocessor jar file location] | class name | [priority] | [arguments]

You can also set configuration settings specific to this table or column family:

hbase> alter 't1', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}
hbase> alter 't1', {NAME => 'f2', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}

You can also remove a table-scope attribute:

hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'MAX_FILESIZE'
hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'coprocessor$1'

There could be more than one alteration in one command:

hbase> alter 't1', { NAME => 'f1', VERSIONS => 3 }, { MAX_FILESIZE => '134217728' }, { METHOD => 'delete', NAME => 'f2' }, OWNER => 'johndoe', METADATA => { 'mykey' => 'myvalue' }

Reflection in Golang

Static Typed Go

Go作为一门静态类型的语言,所有变量都定义了其所属于的类型,不同类型的变量间不能随意赋值,例如:

1
2
3
4
5
6
7
var a int
var b string

a = 1
b = "codeb2cc"

a = b

a和b不是同一类型的变量,若尝试直接赋值将会报错cannot use b (type string) as type int in assignment,不同类型变量间的赋值需要进行类型转换(Conversion),这点与C/C++里是一致的。在Go里,对于变量x与目标类型T,类型转换需要满足以下其中一种情况:

  • x可以赋值为类型T的变量
  • x与T有着一致的实现类型(underlying types)
  • x与T都是匿名指针类型并且具有相同的实现类型
  • x与T都是整数/浮点数类型
  • x与T都是复数类型
  • x是整数/bytes片段/runes,T是string
  • x是string,T是bytes片段或runes

对于可以进行类型转换的变量,通过var c = float(100)即可得到目标类型的变量。但在实际开发过程中,我们需要的不仅仅是基本类型的转换,譬如对于给定的接口类型:

1
2
3
4
5
Type I interface {
    Read(b Buffer) bool
    Write(b Buffer) bool
    Close()
}

只要实现了这三种方法,就可以认为该类型是合法的、可操作的,但由于无法确定最终传入变量的类型,我们需要在使用前将其转化为我们已知的类型。这种情况下由于类型转化(Conversion)只关心数据,我们需要类型断言(Type Assertion)来帮助我们:

1
2
3
4
5
var x I
var y File

x = y.(I)
z, ok := y.(I)

Python GIF Processing

天气站点需要往微博推送气象信息,原始的基本反射率数据都是单帧的,为了好一点的效果计划自己合成GIF,但找了一下发现Python下没有现成好的GIF处理库,PIL能支持读存操作但不支持合并,没办法只好找些资料自己写。下面列些相关资料的链接和一个GIF合并处理的例子,基本上就是按照标准对数据封装,其它格式对GIF的转换可以同理处理。

Standards and References

GIF87

GIF89

Wikipedia

Application Extension Spec: NETSCAPE2.0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
byte   1       : 33 (hex 0x21) GIF Extension code
byte   2       : 255 (hex 0xFF) Application Extension Label
byte   3       : 11 (hex 0x0B) Length of Application Block
                 (eleven bytes of data to follow)
bytes  4 to 11 : "NETSCAPE"
bytes 12 to 14 : "2.0"
byte  15       : 3 (hex 0x03) Length of Data Sub-Block
                 (three bytes of data to follow)
byte  16       : 1 (hex 0x01)
bytes 17 to 18 : 0 to 65535, an unsigned integer in
                 lo-hi byte format. This indicate the
                 number of iterations the loop should
                 be executed.
byte  19       : 0 (hex 0x00) a Data Sub-Block Terminator.

Memcached Optimization

项目新功能上线后频繁发现缓存丢失,怀疑是数据还没过期就被LRU置换了,但Memcached设置了8G内存对目前的流量来说不应该出现内存不足的情况。以前觉得Memcached比较简单配好参数用就可以了,但借这次机会研究了一下它的细节后发现有坑,有大坑……下面说两个。

Basic Concept: Slab/Chunk/Page

在讲坑前需要先厘清几个重要的概念,在Memcached中数据是以Chunk为单位存储的,每一条数据就是一个Chunk,但由于写入的数据大小是不固定的,因此若是Chunk的大小简单等于数据大小的话会造成系统严重的碎片化从而降低性能,因此在Memcached的设计中Chunk的大小是一系列的固定值,数据会以其最合适的Chunk写入,Slab就是某个大小Chunk的集合。又因为Memcached中内存是以Page为单位管理的,因此Slab实际上关联着许多个Page,Chunk写入到对应Slab下的Page中。有一个经常提及的问题是Memcached中每条数据的最大长度,这个值实际上就是Page的大小(准确说必须减掉Key和Flag的长度),一个Chunk只能存储在一个Page中,但是一个Page可以存储多个Chunk。新一点的版本中可以通过-I参数设置Page的大小,范围是1K到128M。Slab的类型(Chunk的大小)由参数增长系数-f和初始大小-n控制,Memcached启动时会根据这两个值计算出所有的Slab,简单说就是循环计算slab_size * factor直到达到Page大小,其中还需要考虑8-bytes的对齐,参考源码:

slabs_init
1
2
3
4
5
6
7
8
9
10
11
12
13
while (++i < POWER_LARGEST && size <= settings.item_size_max / factor) {
    /* Make sure items are always n-byte aligned */
    if (size % CHUNK_ALIGN_BYTES)
        size += CHUNK_ALIGN_BYTES - (size % CHUNK_ALIGN_BYTES);

    slabclass[i].size = size;
    slabclass[i].perslab = settings.item_size_max / slabclass[i].size;
    size *= factor;
    if (settings.verbose > 1) {
        fprintf(stderr, "slab class %3d: chunk size %9u perslab %7u\n",
                i, slabclass[i].size, slabclass[i].perslab);
    }
}

增长系数的默认值是1.25,初始大小为48 bytes,Page大小为1M也就是1048576 bytes,默认情况下我们可以得到96, 120, 152, ... , 771184, 1048576一共42个Slab。这里值得注意的是第一个Slab大小不是48而是96,这是因为Memcached除了存储数据外还需要封装对应的描述结构item,而item的大小为48。

Git Submodule Update All

Vim配置里的插件很久没有更新了,今天打算统统升级一下时发现git submodule update没有效果,进入插件目录发现里面都是detached状态的,于是翻了翻Git文档Submodules的部分,原来git submodule update只会同步主库中对子库的设置,也就是说如果主库里没有提交新版本的子库的话,这条命令没有效果的。要想更新本地所有的子库,需要将其重新绑定到某个分支上:

git submodule foreach git checkout master
git submodule foreach git pull