厦门outing

8.14~8.16,技术保障部厦门outing。算起来这是到到淘宝以后的第三次大部门outing,一次临安,一次安吉,这次总算是跨出了浙江省的范围,来到美丽的海边城市厦门。来回都是动车,六个小时的车程,100多人的队伍,打牌聊天,到也不觉旅途的漫长。

淘宝DBA团队有了更多的新成员,难得的在鼓浪屿留下一张合影,可惜的是还有几位没能聚齐。
淘宝DBA厦门outing

虽然鼓浪屿上的建筑各具特色,厦门大学的校园风景怡人,南普陀的放生池乌龟成群,不过厦门的大海有点让人失望。
厦门海边小男孩

当然,海水一般,海鲜还是不一般的,15号晚上去了传说中的小眼镜,生意好得出奇,在外面露天排号差不多一个小时才轮到,好在味道还是对得起传说中的大名的。路上还有个小插曲,因为人多,需要打两辆车,另外几个同事打的一辆出租车,一说去小眼镜,师傅说小眼镜没空位,就被赶下车了,囧。厦门这边的服务水平有待提高,一不小心就可能碰个钉子。

Cassandra 0.7 值得期待

在Cassandra的wiki上,很早就有0.7的一些特性描述,其中很有些吸引人,而8月13号Cassandra 0.7 beta1版本终于发布了,这里可以下载。

个人比较关心的几个主要的新特性:

1. Key Space和Column Family定义可以在线增改,不再需要停集群修改配置文件了。
2. 支持secondary index,可以对column建索引,通过接口get_indexed_slices实现针对column的查询。
3. 支持truncate一个column family。
4. 可以针对keyspace设置replica_placement_strategy和replication_factor。
5. Row cache提升了8倍的读性能。之前版本的测试中,Cassandra写性能令人印象深刻,读性能则不如人意。
6. 支持hadoop格式的输出,可以使得数据仓库更容易从Cassandra中抽取数据。

另外,配置文件从xml改成了yaml格式的,读起来更顺畅些。代码里还有很多细节的改进,有时间需要慢慢去看了。期待0.7版本尽快GA。

0.7.0
=====

Features
--------
    - Row keys are now bytes: keys stored by versions prior to 0.7.0 will be
      returned as UTF-8 encoded bytes. OrderPreservingPartitioner and
      CollatingOrderPreservingPartitioner continue to expect that keys contain
      UTF-8 encoded strings, but RandomPartitioner no longer expects strings.
    - A new ByteOrderedPartitioner supports bytes keys with arbitrary content,
      and orders keys by their byte value. 
    - Truncate thrift method allows clearing an entire ColumnFamily at once
    - DatacenterShardStrategy is ready for use, enabling 
      ConsitencyLevel.DCQUORUM and DCQUORUMSYNC.  See comments in
      `cassandra.yaml.`
    - row size limit increased from 2GB to 2 billion columns
    - Hadoop OutputFormat support
    - Streaming data for repair or node movement no longer requires 
      anticompaction step first
    - keyspace is per-connection in the thrift API instead of per-call
    - optional round-robin scheduling between keyspaces for multitenant
      clusters
    - dynamic endpoint snitch mitigates the impact of impaired nodes
    - significantly faster reads from row cache
    - introduced IntegerType that is both faster than LongType and
      allows integers of both less and more bits than Long's 64

Configuraton
------------
    - Configuration file renamed to cassandra.yaml and log4j.properties to
      log4j-server.properties
    - Added 'bin/config-converter' to convert existing storage-conf.xml or
      cassandra.xml files to a cassandra.yaml file. When executed, it will
      create a cassandra.yaml file in any directory containing a matching
      xml file.
    - The ThriftAddress and ThriftPort directives have been renamed to
      RPCAddress and RPCPort respectively.
    - The keyspaces defined in cassandra.yaml are ignored on startup as a
      result of CASSANDRA-44.  A JMX method has been exposed in the 
      StorageServiceMBean to force a schema load from cassandra.yaml. It
      is a one-shot affair though and you should conduct it on a seed node
      before other nodes. Subsequent restarts will load the schema from the 
      system table and attempts to load the schema from YAML will be ignored.  
      You shoud only have to do this for one node since new nodes will receive
      schema updates on startup from the seed node you updated manually. 
    - EndPointSnitch was renamed to RackInferringSnitch.  A new SimpleSnitch
      has been added.
    - RowWarningThresholdInMB replaced with in_memory_compaction_limit_in_mb
    - GCGraceSeconds is now per-ColumnFamily instead of global
    - Configuration of DatacenterShardStrategy is now a part of the keyspace
      definition using the strategy_options attribute.
      The datacenter.properties file is no longer used.

JMX
---
    - StreamingService moved from o.a.c.streaming to o.a.c.service
    - GMFD renamed to GOSSIP_STAGE
    - {Min,Mean,Max}RowCompactedSize renamed to {Min,Mean,Max}RowSize
      since it no longer has to wait til compaction to be computed

Thrift API
----------
    - Row keys are now 'bytes': see the Features list.
    - The return type for login() is now AccessLevel.
    - The get_string_property() method has been removed.
    - The get_string_list_property() method has been removed.

Other
-----
    - If extending AbstractType, make sure you follow the singleton pattern
      followed by Cassandra core AbstractType extensions.
      e.g. BytesType has a variable called 'instance' and an empty constructor
      with default access

0.7.0-beta1
 * sstable versioning (CASSANDRA-389)
 * switched to slf4j logging (CASSANDRA-625)
 * access levels for authentication/authorization (CASSANDRA-900)
 * add ReadRepairChance to CF definition (CASSANDRA-930)
 * fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
 * convert to byte[] keys internally and all public APIs (CASSANDRA-767)
 * ability to alter schema definitions on a live cluster (CASSANDRA-44)
 * renamed configuration file to cassandra.xml, and log4j.properties to
   log4j-server.properties, which must now be loaded from
   the classpath (which is how our scripts in bin/ have always done it)
   (CASSANDRA-971)
 * change get_count to require a SlicePredicate. create multi_get_count
   (CASSANDRA-744)
 * re-organized endpointsnitch implementations and added SimpleSnitch
   (CASSANDRA-994)
 * Added preload_row_cache option (CASSANDRA-946)
 * add CRC to commitlog header (CASSANDRA-999)
 * removed multiget thrift method (CASSANDRA-739)
 * removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
 * add truncate thrift method (CASSANDRA-531)
 * http mini-interface using mx4j (CASSANDRA-1068)
 * optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
 * replace constant-size 2GB mmaped segments and special casing for index 
   entries spanning segment boundaries, with SegmentedFile that computes 
   segments that always contain entire entries/rows (CASSANDRA-1117)
 * avoid reading large rows into memory during compaction (CASSANDRA-16)
 * added hadoop OutputFormat (CASSANDRA-1101)
 * efficient Streaming (no more anticompaction) (CASSANDRA-579)
 * split commitlog header into separate file and add size checksum to
   mutations (CASSANDRA-1179)
 * avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
 * revise HH schema to be per-endpoint (CASSANDRA-1142)
 * add joining/leaving status to nodetool ring (CASSANDRA-1115)
 * allow multiple repair sessions per node (CASSANDRA-1190)
 * add dynamic endpoint snitch (CASSANDRA-981)
 * optimize away MessagingService for local range queries (CASSANDRA-1261)
 * make framed transport the default so malformed requests can't OOM the 
   server (CASSANDRA-475)
 * significantly faster reads from row cache (CASSANDRA-1267)
 * take advantage of row cache during range queries (CASSANDRA-1302)
 * make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
 * keep persistent row size and column count statistics (CASSANDRA-1155)
 * add IntegerType (CASSANDRA-1282)
 * page within a single row during hinted handoff (CASSANDRA-1327)
 * push DatacenterShardStrategy configuration into keyspace definition,
   eliminating datacenter.properties. (CASSANDRA-1066)
 * optimize forward slices starting with '' and single-index-block name 
   queries by skipping the column index (CASSANDRA-1338)
 * streaming refactor (CASSANDRA-1189)

btrfs文件系统

最近一位同事在美国参加LSF(Linux Storage and Filesystem Summit)回来,分享了他的总结,非常的精彩。这是一个没有演讲、没有照片、没有录像的讨论为主的会议,因此参会者基本上都是比较资深的Linux内核和文件系统的开发者,参加这个讨论会的有不少中国人(华人),相比起我们这些只是理解使用某个数据库产品的DBA来说,这是值得我们敬仰的一群人。也许在不远的将来,淘宝的DBA团队也不仅仅是一个维护某种数据库产品的团队,也能在开源数据库领域内贡献自己的力量,这也正是我们接下来招聘的思路所在。淘宝DBA团队,也已经建立了开源数据库研究小组,当前主要是MySQL/InnoDB,如果你是一位优秀的c/c++开发者,如果你对MySQL/InnoDB等开源数据库在大规模大并发量的网站应用有兴趣,如果你想有一天,能够为开源数据库提供Patch,欢迎加入我们,不管之前你有没有DBA经验,也不管你还是个应届毕业生(我的邮箱jiangfeng # taobao.com,你懂的)。

扯远了,这里我想说的是,从这个报告我注意到了btrfs文件系统。前一段时间,也一直在考虑,在Linux上部署MySQL数据库,究竟什么样的文件系统会比较适合?或者说什么文件系统适合不同的数据库环境?之前的思路,更多的集中在xfs, ext4等已经比较熟悉的文件系统上,上周去北京出差的时候,本来还想找前面的这位同事探讨一下ext4是否适合在数据库产品环境中部署,而当时他正好在美国参加LSF,因此也错过了机会。不过从他的这个总结来看,ext4已经达到了产品级别的稳定性,应该可以尝试,接下来可能会安排一些测试,这是后话。

对于btrfs,简单的从字面来理解,就是以B-Tree作为组织元数据的数据结构,相比于ext2/ext3使用线性表来保存目录等结构,好处在于查找,插入和删除操作都很高效。btrfs另外一个比较重要的特性是基于extent来分配管理文件空间,而不是传统的block。看到这里,作为DBA,其实可以很明显的看到,这和Oracle数据库组织管理数据的思路何其一致(如果寻根问祖,发现这个btrfs根本就是由Oracle最初在2007年开放出来的项目,本是同根生啊)。关于btrfs更详细的信息,可以参考这里,还有这里。另外,btrfs对于目前热点的SSD磁盘也提供了特别的支持,因此被认为是Linux未来文件系统的趋势所在,虽然目前还是开发版本,但值得关注。

实际上可以做一个简单的类比,就能对btrfs有一个更加感性的认识,可以将btrfs看作是Linux版本的ZFS,其设计思路和特性,很多都是借鉴自ZFS,其主要特性如下:

  • Copy on Write
  • Extent based file storage (2^64 max file size)
  • Space efficient packing of small files
  • Space efficient indexed directories
  • Dynamic inode allocation
  • Writable snapshots
  • Subvolumes (separate internal filesystem roots)
  • Object level mirroring and striping
  • Checksums on data and metadata (multiple algorithms available)
  • Compression
  • Integrated multiple device support, with several raid algorithms
  • Online filesystem check
  • Very fast offline filesystem check
  • Efficient incremental backup and FS mirroring
  • Online filesystem defragmentation

驾照到手

昨天下午去车管所拿驾照,晚上战战兢兢的把车开回了家。今天一大早的被人抓壮丁,从城西到三墩到余杭到城西再杀回余杭最后回城西吃晚饭,一口气开了超过100公里,嗯,有点太猛了^_^

其实去年四月份就过了理论考试,但一直拖着没去练车,直到今年三月才花了一个星期的晚上把移库练了一下,本来准备三月底考试,但今天杭州考驾照的人太多了,没报上名,然后四月份休假10天去了一趟泰国。这一拖,就把桩考和场考拖到了六月五号。场地之前没练过,考试前两天在考试中心直接练了两天,有点紧张的是从六月一号开始杭州要考百米加减挡,虽然练了半天,不过心里一点底也没有。

幸运的是,考试时车上四人,两个起伏路,一个曲线行驶,而我则是直角转弯,把最容易过的项目都凑到一辆车上了。然后是安排路考,因为中间端午节早早的安排了去西塘休闲游,本来教练准备给我报17号的路考,拖到了30号,杯具的是,这一拖居然抽到了夜考,而且城西繁华路段之一的古墩路,下午六点多的下班高峰期,被我戏称”史上最难路考“,而之前的路训我只去了两次,开了不到一个小时。经此“磨难”,幸运过关后,今天在路上应付繁忙的路况,反倒能应付自如了。

场考和路考中,和我同一辆车训练的,都是平时开得最好的两个人没过,技术不能代表一切,安全第一啊。

无觅相关文章插件,快速提升流量