本系列第二和第三篇文章演示了同步CDC和异步HotLog CDC的实现,这两种模式都是在source database中捕获增量数据,所以配置只涉及到一个数据库,相对来说比较简单。而异步CDC的另外两种模式:Distributed HotLog和AutoLog,需要分别配置source database和staging datase,捕获增量数据主要是在staging database中执行,以尽可能的少影响source database的性能。
异步CDC主要是通过stream的技术来实现的,基本上是将Streams的一些配置步骤做了一层封装。如果对于Streams的配置很熟悉的话,配置异步CDC应该说还是一件比较简单的事情。
本文将演示异步Distribute HotLog的配置。文中Source database=ning,Staging Database=test。
一.版本
BANNER
----------------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production
二.设置发布者
1.在source database创建一个用户,并授予相应的权限。
User created.
SYS@ning>grant execute_catalog_role to cdcpub;
Grant succeeded.
SYS@ning>grant select_catalog_role to cdcpub;
Grant succeeded.
SYS@ning>grant create table to cdcpub;
Grant succeeded.
SYS@ning>grant create session to cdcpub;
Grant succeeded.
SYS@ning>grant dba to cdcpub;
Grant succeeded.
SYS@ning>grant execute on dbms_cdc_publish to cdcpub;
Grant succeeded.
SYS@ning>execute dbms_streams_auth.grant_admin_privilege(grantee=>'cdcpub');
PL/SQL procedure successfully completed.
2.在Staging database创建一个用户,并授予相应权限。
User created.
SYS@test>grant create session to cdcpub;
Grant succeeded.
SYS@test>grant create table to cdcpub;
Grant succeeded.
SYS@test>grant unlimited tablespace to cdcpub;
Grant succeeded.
SYS@test>grant select_catalog_role to cdcpub;
Grant succeeded.
SYS@test>grant execute_catalog_role to cdcpub;
Grant succeeded.
SYS@test>grant dba to cdcpub;
Grant succeeded.
SYS@test>execute dbms_streams_auth.grant_admin_privilege(grantee=>'cdcpub');
PL/SQL procedure successfully completed.
三.设置Source Database的初始化参数
根据文档推荐,为source database设置如下初始化参数:
COMPATIBLE=9.2.0或者10.1.0或者10.2.0 根据source database的版本确定
JAVA_POOL_SIZE=50000000
OPEN_LINKS=4 或者更高
JOB_QUEUE_PROCESSES=(当前值) + 2
PARALLEL_MAX_SERVERS=(当前值) + (5 * (change set数目))
PROCESSES=(当前值) + (7 * (change set数目))
SESSIONS=(当前值) + (2 * (change set数目))
STREAMS_POOL_SIZE=Max(当前值,50 MB) + ((change set数目) * (21 MB))
UNDO_RETENTION=3600
根据网络配置确定是否需要设置GLOBAL_NAMES=TRUE
如果source database是9iR2则还需要设置如下参数
LOGMNR_MAX_PERSISTENT_SESSIONS=chang source数目
四.设置Staging database的初始化参数
COMPATIBLE=10.2.0 staging database必须是10gR2版本
JAVA_POOL_SIZE=50000000
OPEN_LINKS=4 或者更高
JOB_QUEUE_PROCESSES=(当前值) + 2
PARALLEL_MAX_SERVERS=(当前值) + (5 * (change set数目))
PROCESSES=(当前值) + (7 * (change set数目))
SESSIONS=(当前值) + (2 * (change set数目))
STREAMS_POOL_SIZE=Max(当前值,50 MB) + ((change set数目) * (21 MB))
五.发布变化数据
1.配置网络,一边source database和staging database可以相互访问
1.例如要发布source database用户ning下的sales表
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER(38)
PRODUCTID NUMBER(38)
PRICE NUMBER(10,2)
QUANTITY NUMBER(38)
2.授予cdcpub用户对于该表的权限
Grant succeeded.
3.将Source Database置于Force logging
由于异步模式是从redo logfile中获得增量数据的,那么nologging操作就会影响到数据的捕捉,所以最好能
将数据库置于force logging模式。
Database altered.
为了捕获update操作中各个column的redo数据,必须在数据库级别启用supplimental日志模式。
Database altered.
同时在源表上为需要捕捉的列创建supplemental日志组
2 add supplemental log group log_group_sales
3 (id,productid,price,quantity);
Table altered.
如果打算捕捉所有列,也可以为所有列创建supplemental日志组
2 add supplemental log data(all) columns;
Table altered.
4.在source database创建到staging database的database link
Connected.
CDCPUB@ning>create database link test
2 connect to cdcpub identified by cdcpub using 'test';
Database link created.
5.在staging database创建到source database的database link
Connected.
CDCPUB@test>create database link ning
2 connect to cdcpub identified by cdcpub using 'ning';
Database link created.
注意database link的名字要和using的tnsname一致,一开始我使用了不同的名字,结果在后面创建change source的时候一直报错
ERROR at line 1:
ORA-26675: cannot create Streams capture process CDC$C_NING
ORA-06512: at “SYS.DBMS_CAPTURE_ADM_INTERNAL”, line 121
ORA-06512: at line 1
ORA-06512: at “SYS.DBMS_CDC_IPUBLISH”, line 133
ORA-06512: at “SYS.DBMS_CDC_PUBLISH”, line 194
ORA-06512: at line 2
郁闷了我大半个小时,后来将database link改成和tnsname一致后就没有这个错误了。
6.在staging database中创建change source
2 dbms_cdc_publish.create_hotlog_change_source(
3 change_source_name=>'ning',
4 description=>'distributed hotlog source',
5 source_database=>'ning');
6 end;
7 /
PL/SQL procedure successfully completed.
7.在staging database中创建change set
2 dbms_cdc_publish.create_change_set(
3 change_set_name=>'ning_sales_dhotlog',
4 description=>'dhotlog change set for ning.sales',
5 change_source_name=>'ning',
6 stop_on_ddl=>'y');
7 end;
8 /
PL/SQL procedure successfully completed.
8.在staging database中创建change table
2 dbms_cdc_publish.create_change_table(
3 owner=>'cdcpub',
4 change_table_name=>'sales_ct_dhotlog',
5 change_set_name=>'ning_sales_dhotlog',
6 source_schema=>'ning',
7 source_table=>'sales',
8 column_type_list=>'id int,productid int,price number(10,2),quantity int',
9 capture_values=>'both',
10 rs_id=>'y',
11 row_id=>'n',
12 user_id=>'n',
13 timestamp=>'n',
14 object_id=>'n',
15 source_colmap=>'n',
16 target_colmap=>'y',
17 options_string=>'tablespace users');
18 end;
19 /
begin
*
ERROR at line 1:
ORA-29540: class oracle/CDC/PublishApi does not exist
ORA-06512: at "SYS.DBMS_CDC_PUBLISH", line 611
ORA-06512: at line 2
faint,真是多灾多难,又出错了。Google了下, jlandzpa也遇到过这个问题,他是删除了CDC组件重装后解决的该问题。应该是staging database安装的时候CDC的编译出了问题,有个api的类没有正确编译。
Connected.
SYS@test>@$ORACLE_HOME\rdbms\admin\rmcdc.sql
SYS@test>@$ORACLE_HOME\rdbms\admin\initcdc.sql
继续执行,ok
Connected.
CDCPUB@test>begin
2 dbms_cdc_publish.create_change_table(
3 owner=>'cdcpub',
4 change_table_name=>'sales_ct_dhotlog',
5 change_set_name=>'ning_sales_dhotlog',
6 source_schema=>'ning',
7 source_table=>'sales',
8 column_type_list=>'id int,productid int,price number(10,2),quantity int',
9 capture_values=>'both',
10 rs_id=>'y',
11 row_id=>'n',
12 user_id=>'n',
13 timestamp=>'n',
14 object_id=>'n',
15 source_colmap=>'n',
16 target_colmap=>'y',
17 options_string=>'tablespace users');
18 end;
19 /
PL/SQL procedure successfully completed.
9.在staging database中启用change source
2 dbms_cdc_publish.alter_hotlog_change_source(
3 change_source_name=>'ning',
4 enable_source=>'Y');
5 end;
6 /
PL/SQL procedure successfully completed.
10.在staging database中启用change set
2 dbms_cdc_publish.alter_change_set(
3 change_set_name=>'ning_sales_dhotlog',
4 enable_capture=>'y');
5 end;
6 /
PL/SQL procedure successfully completed.
查看alert日志,发现streams的capture和apply进程已经启动成功
Streams APPLY A001 started with pid=14, OS id=2904
Streams Apply Reader started P000 with pid=20 OS id=3912
Streams Apply Server started P001 with pid=21 OS id=2448
11.在staging database中将change table的读取权限授予订阅者
先创建订阅者帐号,然后授权
User created.
CDCPUB@test>grant create session to cdcsub;
Grant succeeded.
CDCPUB@test>grant create table to cdcsub;
Grant succeeded.
CDCPUB@test>grant select on cdcpub.sales_ct_dhotlog to cdcsub;
Grant succeeded.
五.订阅变化数据
订阅的步骤和同步CDC模式基本一致,这里就不重复了,需要注意的是,在异步分布式HotLog和异步AutoLog中,订阅者和订阅过程都是在staging database中完成的。具体步骤请参考关于Change Data Capture(二)中关于订阅变化数据的部分。
我也同意 Kamus 的观点, 知道了Stream 以后, CDC基本没有意义
CDC主要应该是面向ETL的,方便的捕获增量数据给多个订阅者。可能也存在宣传方面的原因,毕竟现在DWBI很火,而在ETL方面,增量抽取一直是一个比较头大的问题,尤其是没有时间截的数据。像Informatica就有通过将全行数据散列保存,然后通过比较散列来判断数据是已经存在的,还是新的增量数据的方法。
我对于CDC的推出一直有所疑惑,这个跟replication,跟streams,跟logical dataguard抢生意的东西推出的目的何在?
意义在于多个元数据库的数据发布到staging database,然后再由ODI这样的工具统一处理?
Asynchronous Distributed HotLog Mode相当于Streams,而Asynchronous AutoLog Mode则相当于Logical Standby
In the asynchronous Distributed HotLog mode, change data is captured from the online redo log file on the source database.
仍然在source database捕获数据, 性能不理想.
对于 Asynchronous AutoLog Archive Change Data Capture Configuration,
缺点是要求 数据库版本必须一样, 唉! 麻烦…
The source database and the staging database must be running on the same hardware, operating system, and Oracle version