使用RPM&YUM进行基础软件管理

上周花时间研究了下RPM打包的方法,今天和团队分享了一次。之前我们采用shell脚本进行批量的MySQL安装,虽然通过不断改进的脚本,批量安装部署的效率已经算不错。但即使是安装MySQL这样简单的事情,不断提升效率,在大规模的环境中也是会带来更多的收益。

追求简单,做到极致,共勉之。

使用smartmontools监控磁盘状况

现代的磁盘基本上都支持S.M.A.R.T.技术,关于S.M.A.R.T., 维基百科的条文如下:

S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology; sometimes written as SMART) is a monitoring system for computer hard disk drives to detect and report on various indicators of reliability, in the hope of anticipating failures.

When a failure is anticipated by S.M.A.R.T., the drive should be replaced, and can sometimes be returned to the manufacturer, who can use these failed drives to discover where faults lie and how to prevent them from recurring on the next generation of hard disk drives.

Smartmontool是sourceforge上的一个开源项目,可以对磁盘的S.M.A.R.T.进行提取和定期监控。Smartmontool包含两个工具: smartctl和smartd。

$sudo apt-get install smartmontools

Smartctl用于提取和修改硬盘的某些属性和统计信息。Smartd则可以作为守护进程运行,定期收集信息,监控硬盘状态。在我的Thinkpad X200上运行smartctl的结果如下:


ningoo@ning:/sys/block/sda/queue$ sudo smartctl --all /dev/sda
smartctl 5.40 2010-03-16 r3077 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     FUJITSU MJA2320BH G2
Serial Number:    K95DTA52FWAA
Firmware Version: 0084001C
User Capacity:    320,072,933,376 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3f
Local Time is:    Tue Mar 22 19:14:04 2011 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (1009) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 143) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   253   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x000f   100   100   060    Pre-fail  Always       -       4364
203 Run_Out_Cancel          0x0002   100   100   000    Old_age   Always       -       1533219504675
240 Head_Flying_Hours       0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 1222 hours (50 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 03 97 95 a6 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 08 01 01 00 00 00 ff      02:19:49.864  NOP [Reserved subcommand]
  60 08 00 90 95 a6 40 08      02:19:49.862  READ FPDMA QUEUED
  60 10 00 c8 3c a6 40 08      02:19:49.850  READ FPDMA QUEUED
  60 20 00 a8 3c a6 40 08      02:19:49.841  READ FPDMA QUEUED
  60 08 00 88 95 a6 40 08      02:19:49.833  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

对于使用了RAID卡的硬盘,还需要RAID卡的支持才能读取到磁盘的S.M.A.R.T.信息,同时intel的SSD也将自身的一些健康指标如磨损率等写入了S.M.A.R.T.信息,于是我们就能通过smartctl来获取安装了RAID卡的SSD的寿命等关键指标了。有了这些指标,可以提前预知到SSD的健康状况,对于SSD单点有写入极限这个担忧,就可以稍微降低那么一点,在生产环境中使用也就更有谱了。具体的用法,之前褚霸B2B的DBA都有写过blog介绍,这里就不展开了,通过MegeCli可以获取LSI的Raid卡的信息(其他Raid卡也有类似的工具):

sudo  /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL

值得一提的是,SSD的磨损率一般通过E9(233)来获取,也就是Media_Wearout_Indicator指标,是0~100的值,精度还是比较粗的,B2B这边线上用了很久,写入了40TB的测试数据,这个值也还是比较高(99,只少了一滴血,这个boss很难打啊^_^)。如果需要更细粒度的数据,上次Intel的人过来交流的时候,说可以取E2(226 Timed Workload Indicator),E3(227 Timed Workload Read Ratio),E4((228 Timed Workload Timer)的值来做更高精度的计算,在开始压力测试前,先将对应的E2/E3/E4的值清零

smartctl -t vendor, 0x40 -a /dev/hdx

当然,如果你使用了Riad卡,还是需要配合Raid卡的相应工具才行

smartctl -t vendor,0x40 -d megaraid,30 /dev/sdx

然后执行一次测试(至少60分钟),重启系统,再次执行smartctl得到E2 = 22, 则该次测试的磨损率为 22 / 1024 / 100= 0.021%(E9的一点血是E2的100点血,也就是说E2的精度比E9高100倍,所以可以在较短时间的测试中看到其变化)。E3 =99 说明读写比例为 99%。该次测试的时间则为E4 = 981分钟,通过计算可以得到该SSD在这样的工作压力下的寿命为E4*1024*100/E2/1440/365 = 8.68年(一年按365天计算)。当然,这个结果是假设这些E2/E3/E4/E9的值都是线性变化的前提下得出的,如果像某些手机的电池指示一样,第一格电可以用一天,最后一格电只能用一个小时,那么这些估算就都是浮云。欲知后事如何,只能等某块SSD用到寿命正常终止,再回头来分析这些指标是否靠谱了。

在Ubuntu上安装MySQLdb

准备用Python写点脚本练练手,于是在Ubuntu上安装Python的MySQLdb,本以为很简单的事,没想到还碰到几个小波折,因此记录一下以备忘。

首先需要安装Python-dev,否则后面编译MySQLdb的时候会报错,找不到头文件:

building '_mysql' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC
 -Dversion_info=(1,2,3,'final',0) -D__version__=1.2.3 
-I/u01/mysql/include/mysql -I/usr/include/python2.6 -c _mysql.c
-o build/temp.linux-i686-2.6/_mysql.o -DUNIV_LINUX
In file included from _mysql.c:29:
pymemcompat.h:10: fatal error: Python.h: 没有那个文件或目录
compilation terminated.
error: command 'gcc' failed with exit status 1
sudo apt-get install python-dev

其次需要先安装setuptools,否则MySQLdb无法编译

ImportError: No module named setuptools

setuptools从这里下载

python setup.py build
sudo python setup.py install

这里下载MySQLdb
修改site.cfg将mysql_config指向正确的位置

python setup.py build
sudo python setup.py install

最后还需要安装libmysqlclient-dev,否则import模块的时候会出错

ImportError: libmysqlclient_r.so.16: cannot open shared object file:
 No such file or directory
sudo apt-get install libmysqlclient-dev

装完以后,来个hello world式的简单查询

#!/usr/bin/env python
import MySQLdb

db=MySQLdb.connect(host="host_name",db="mysql",user="ningoo",passwd="password")
c=db.cursor()
n=c.execute("select user,host from user")
for row in c.fetchall():
        for col in row:
                print col

在Ubuntu上使用SystemTap

虽然很早以前听说过,但基本没用过,最近褚霸同学的介绍,勾起了我对这个东西的兴趣。最近在工作笔记本上装了个Ubuntu10.10做为主操作系统,因此正好在上面也实验学习下SystemTap。

安装systemtap

sudo apt-get install systemtap

Ubuntu Desktop默认没有安装kernel debug info的包,systemtap无法追踪内核信息。查看内核版本

ningoo@ning:~/stap$ uname -r
2.6.35-22-generic

这里下载对应的kernel debug info包,安装

sudo dpkg -i linux-image-2.6.35-22-generic-dbgsym_2.6.35-22.35_i386.ddeb 

至此内核追踪已经可以执行,但module的信息还需要多做些工作

sudo apt-get install elfutils

for file in `find /usr/lib/debug -name '*.ko' -print`
do
      buildid=`eu-readelf -n $file| grep Build.ID: | awk '{print $3}'`
      dir=`echo $buildid | cut -c1-2`
      fn=`echo $buildid | cut -c3-`
      mkdir -p /usr/lib/debug/.build-id/$dir
      ln -s $file /usr/lib/debug/.build-id/$dir/$fn
      ln -s $file /usr/lib/debug/.build-id/$dir/${fn}.debug
done

Hello world

ningoo@ning:~/stap$ sudo stap -ve 'probe begin { log("hello world") exit() }'
Pass 1: parsed user script and 72 library script(s)
 using 18896virt/12868res/1880shr kb, in 130usr/20sys/150real ms.
Pass 2: analyzed script: 1 probe(s), 2 function(s), 0 embed(s), 0 global(s) 
 using 19160virt/13132res/1908shr kb, in 10usr/0sys/5real ms.
Pass 3: using cached /home/ningoo/.systemtap/cache
 /f1/stap_f10ab2aeba4f2da2c03646b27b4d3627_757.c
Pass 4: using cached /home/ningoo/.systemtap/cache
 /f1/stap_f10ab2aeba4f2da2c03646b27b4d3627_757.ko
Pass 5: starting run.
hello world
Pass 5: run completed in 0usr/30sys/297real ms.

参考:
http://sourceware.org/systemtap/documentation.html
http://sourceware.org/systemtap/wiki/SystemtapOnUbuntu

无觅相关文章插件,快速提升流量