这篇文章描述为RAC环境创建STANDBY数据库。

由于篇幅限制,加上碰到了很多的bug,只能将文章拆分成多篇。

由于错误太多,导致一篇文章无法完全记录下来,这一篇继续记录STANDBY数据库创建过程中碰到的问题。

刚刚在进行RAC环境的DUPLICATE DATABASE的时候,就碰到了很多问题,由于二者命令比较相似,本来认为这次不会碰到太多的问题,没有想到的是,这次碰到的问题居然比DUPLICATE碰到的问题多出一倍。而且基本上所有碰到的问题都是DUPLICATE操作时不曾遇到的。

执行DUPLICATE DATABASE FOR STANDBY时出现下面的错误:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:28:54 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G(DBID=1712482917)

connected to auxiliary database: RAC11G(not mounted)

RMAN> run

2> {

3> allocate channel c1 device type disk connect sys/test@rac11g1;

4> allocate channel c2 device type disk connect sys/test@rac11g2;

5> allocate auxiliary channel ac1 device type disk;

6> allocate auxiliary channel ac2 device type disk;

7> duplicate target database for standby

8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

9> dorecover

10> from active database

11> spfile

12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

14> set fal_client='RAC11G_S'

15> set fal_server='RAC11G'

16> set log_archive_dest_1='LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'

17> set log_archive_dest_2='SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'

18> set standby_archive_dest='+DATA/RAC11G'

19> set db_unique_name='rac11g_s';

20> }

using target database control file instead of recovery catalog

allocated channel: c1

channel c1: SID=619 instance=rac11g1 device type=DISK

allocated channel: c2

channel c2: SID=119 instance=rac11g2 device type=DISK

allocated channel: ac1

channel ac1: SID=112 device type=DISK

allocated channel: ac2

channel ac2: SID=39 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:

{

  backup as copy reuse

  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format

'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file

'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format

'+DATA/rac11g/spfilerac11g.ora'   ;

  sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";

}

executing Memory Script

Starting backup at 09-SEP-08

RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06

ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"

ORA-27037: unable to obtain file status

SVR4 Error: 2: No such file or directory

Additional information: 3

continuing other job steps, job failed will not be re-run

released channel: c1

released channel: c2

released channel: ac1

released channel: ac2

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:29:08

RMAN-03015: error occurred in stored script. Memory Script

RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06

ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"

ORA-27037: unable to obtain file status

SVR4 Error: 2: No such file or directory

Additional information: 3

RMAN> exit

Recovery Manager complete.

这个错误是由于连接源数据库的服务名采用了RAC的总体配置,因此很可能连接到实例2上进行恢复,有两种方法解决这个问题,一个是在任意节点上保证密码文件orapwrac11g1和orapwrac11g2都存在,另一种方法更加简单,可以连接源数据库的时候指定唯一一个实例进行连接,比如这里将RAC11G替换为RAC11G1。

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:31:00 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G(DBID=1712482917)

connected to auxiliary database: RAC11G(not mounted)

RMAN> run

2> {

3> allocate channel c1 device type disk connect sys/test@rac11g1;

4> allocate channel c2 device type disk connect sys/test@rac11g2;

5> allocate auxiliary channel ac1 device type disk;

6> allocate auxiliary channel ac2 device type disk;

7> duplicate target database for standby

8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

9> dorecover

10> from active database

11> spfile

12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'

14> set fal_client='RAC11G_S'

15> set fal_server='RAC11G'

16> set log_archive_dest_1='LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'

17> set log_archive_dest_2='SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'

18> set standby_archive_dest='+DATA/RAC11G'

19> set db_unique_name='rac11g_s';

20> }

using target database control file instead of recovery catalog

allocated channel: c1

channel c1: SID=619 instance=rac11g1 device type=DISK

allocated channel: c2

channel c2: SID=119 instance=rac11g2 device type=DISK

allocated channel: ac1

channel ac1: SID=113 device type=DISK

allocated channel: ac2

channel ac2: SID=39 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:

{

  backup as copy reuse

  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format

'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file

'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format

'+DATA/rac11g/spfilerac11g.ora'   ;

  sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";

}

executing Memory Script

Starting backup at 09-SEP-08

Finished backup at 09-SEP-08

sql statement: alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''

contents of Memory Script.:

{

  sql clone "alter system set  log_file_name_convert =

''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment=

'''' scope=spfile";

  sql clone "alter system set  fal_client =

''RAC11G_S'' comment=

'''' scope=spfile";

  sql clone "alter system set  fal_server =

''RAC11G'' comment=

'''' scope=spfile";

  sql clone "alter system set  log_archive_dest_1 =

''LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'' comment=

'''' scope=spfile";

  sql clone "alter system set  log_archive_dest_2 =

''SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'' comment=

'''' scope=spfile";

  sql clone "alter system set  standby_archive_dest =

''+DATA/RAC11G'' comment=

'''' scope=spfile";

  sql clone "alter system set  db_unique_name =

''rac11g_s'' comment=

'''' scope=spfile";

  shutdown clone immediate;

  startup clone nomount ;

}

executing Memory Script

sql statement: alter system set  log_file_name_convert =  ''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment= '''' scope=spfile

released channel: c1

released channel: c2

released channel: ac1

released channel: ac2

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:31:11

RMAN-03015: error occurred in stored script. Memory Script

RMAN-03009: failure of sql command on clone_default channel at 09/09/2008 16:31:11

RMAN-11003: failure during parse/execution of SQL statement: alter system set  log_file_name_convert =  '/dev/vx/rdsk/datavg', '+DATA/RAC11G' comment= '' scope=spfile

ORA-17510: Attempt to do i/o beyond file size

ORA-17510: Attempt to do i/o beyond file size

RMAN> exit

Recovery Manager complete.

检查了一下,这个问题应该是ASM的bug造成的。关于ORA-17510的错误有一些,但是没有和当前情况相似的。

由于找不到解决这个问题的办法,只好想办法绕过这个bug。于是从这里以后,采用了先在本地创建一个SPFILE的方式,将所有需要修改的参数都提前修改完成,避免在DUPLICATE命令的时候设置SPFILE参数。这样Oracle就不会执行ALTER SYSTEM语句,也就不会导致SPFILE的动态扩展。

随后利用SQLPLUS创建了本地的SPFILE,然后启动到NOMOUNT状态,通过RMAN连接却报错:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:44:09 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G(DBID=1712482917)

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-00554: initialization of internal recovery manager package failed

RMAN-04006: error from auxiliary database: ORA-01031: insufficient privileges

这是因为RMAN刚才的恢复操作覆盖了密码文件,所以这里必须重建。从源数据库再次拷贝密码文件到本地,再次执行:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s

Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 17:25:24 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: RAC11G(DBID=1712482917)

connected to auxiliary database: RAC11GS (not mounted)

RMAN> run

2> {

3> allocate channel c1 device type disk connect sys/test@rac11g1;

4> allocate channel c2 device type disk connect sys/test@rac11g2;

5> allocate auxiliary channel ac1 device type disk;

6> allocate auxiliary channel ac2 device type disk;

7> duplicate target database for standby

8> dorecover

9> from active database;

10> }

using target database control file instead of recovery catalog

allocated channel: c1

channel c1: SID=621 instance=rac11g1 device type=DISK

allocated channel: c2

channel c2: SID=628 instance=rac11g2 device type=DISK

allocated channel: ac1

channel ac1: SID=333 instance=rac11g1 device type=DISK

allocated channel: ac2

channel ac2: SID=306 instance=rac11g1 device type=DISK

Starting Duplicate Db at 09-SEP-08

contents of Memory Script.:

{

  backup as copy reuse

  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format

'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;

}

executing Memory Script

Starting backup at 09-SEP-08

Finished backup at 09-SEP-08

contents of Memory Script.:

{

  backup as copy current controlfile for standby auxiliary format  '+DATA/rac11g/rac11g_control_1';

  restore clone controlfile to  '+DATA/rac11g/rac11g_control_2' from

'+DATA/rac11g/rac11g_control_1';

  restore clone controlfile to  '+DATA/rac11g/rac11g_control_3' from

'+DATA/rac11g/rac11g_control_1';

  sql clone 'alter database mount standby database';

}

executing Memory Script

Starting backup at 09-SEP-08

channel c1: starting datafile copy

copying standby control file

released channel: c1

released channel: c2

released channel: ac1

released channel: ac2

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 09/09/2008 17:25:35

RMAN-03015: error occurred in stored script. Memory Script

RMAN-03009: failure of backup command on c1 channel at 09/09/2008 17:25:35

ORA-17629: Cannot connect to the remote database server

ORA-17627: ORA-01031: insufficient privileges

ORA-17629: Cannot connect to the remote database server

RMAN> exit

Recovery Manager complete.

这个问题导致的原因比较复杂。在本地测试连接,没有发现任何问题。在远端用sys测试连接辅助实例,也没有发现任何异常的情况。

但是只要执行DUPLICATE命令就会报这个错误。

由于没有其他可供参考的信息,只能根据现有的错误信息进行推测。首先报错发生在CHANNEL c1上,而且是连接远端数据库服务器。说明问题是在执行过程中,源数据库尝试连接辅助实例时出现的错误,怀疑问题可能与密码文件有关。

可是通过命令行测试没有发现任何的问题。而且前面采用SPFILE方式的时候,也经历了类似的步骤,为什么没有报错。

通过观察Oracle执行的脚本发现了问题,Oracle执行了下面的操作:

contents of Memory Script.:

{

  backup as copy reuse

  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format

'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;

}

executing Memory Script

这个操作时前面通过SPFILE方式所没有的,这也是为什么原来没有报错,而这里报错的原因。

不过问题又出现了,Oracle这个操作的目录是将远端的密码文件同步到本地服务器上,使得后续的SYS连接可以正常登陆。这个操作本来是没有任何问题的,而且实际上前面我就是采用了操作系统上面的类似操作,将源数据库的密码文件同步到本地的。那么为什么Oracle的rman执行了这个操作后,源数据库再连接辅助实例的时候反而报错了呢。

检查了源数据库的配置后,发现了问题的所在,源数据库采用了裸设备的方式,而本地的$ORACLE_HOME/dbs目录下的密码文件只是一个链接而已。

$ ls -l

total 38404

-rw-rw----   1 oracle   oinstall    1552 Sep  5 16:09 hc_rac11g1.dat

-rw-r--r--   1 oracle   oinstall    8385 Sep 11  1998 init.ora

-rw-r--r--   1 oracle   oinstall   12920 May  3  2001 initdw.ora

-rw-r-----   1 oracle   oinstall      43 Jul 16 15:48 initrac11g1.ora

lrwxrwxrwx   1 oracle   oinstall      34 Jul 16 14:39 orapwrac11g1 -> /dev/vx/rdsk/datavg/rac11g_pwdfile

-rw-r-----   1 oracle   oinstall 19611648 Sep  8 18:27 snapcf_rac11g1.f

很可能Oracle在这里执行备份的时候出现了错误,没有采用裸设备备份而是当作操作系统文件进行了拷贝,导致恢复到辅助实例后,密码文件不可用。

于是先去掉了连接,直接将裸设备中的密码文件拷贝到本地目录,使得RMAN可以正常的拷贝这个文件:

$ dd if=/dev/vx/rdsk/datavg/rac11g_pwdfile f=/data/rac11g_pwfile

204800+0 records in

204800+0 records out

bash-3.00$ rm orapwrac11g1

bash-3.00$ rm orapwrac11g2

bash-3.00$ cp /data/rac11g_pwfile orapwrac11g1

再次执行DUPLICATE,则权限不足的问题被解决。

oracle视频教程请关注: