在使用Oracle數(shù)據(jù)庫的過程中,可能會遇到各種各樣的錯誤或異常,很多異常的提示并不具體,我們有必要了解一下Oracle的ErrorStack跟蹤方式。
ErrorStack是Oracle提供的一種對于錯誤堆棧進(jìn)行跟蹤的方法,通過設(shè)置跟蹤可以將一些錯誤的后臺信息詳盡地轉(zhuǎn)儲出來,寫入跟蹤文件,對于錯誤的研究與診斷非常有效。
設(shè)置ErrorStack主要有4個級別:
0 - 僅轉(zhuǎn)儲錯誤堆棧(0級已經(jīng)被逐漸廢棄)
1 - 轉(zhuǎn)儲錯誤堆棧和函數(shù)調(diào)用堆棧
2 - Level1 + ProcessState
3 - Level2 + Context area(顯示所有cursors,著重顯示當(dāng)前cursor)
ErrorStack可以在實(shí)例級或會話級別設(shè)置,也可以在參數(shù)文件中設(shè)置,這個設(shè)置僅當(dāng)某個特定的錯誤出現(xiàn)時才被觸發(fā),如設(shè)置ORA-00942事件的跟蹤:
alter session set events '942 trace name errorstack level 1';
一個客戶曾經(jīng)出現(xiàn)如下ORA-01438錯誤,提示數(shù)據(jù)的精度超過允許值,是后臺JOB調(diào)度的任務(wù):
Mon Jul 13 10:27:312009
Errors in file /admin/erpdb/bdump/erpdb1_j000_447020.trc:
ORA-12012: error on auto execute of job 22
ORA-01438: value larger than specified precision allowed for this column
ORA-06512: at 'ERP.TIMRDU', line 13
ORA-06512: at line 1
跟蹤文件中默認(rèn)的不會記錄具體的SQL、綁定變量等信息,我們可以通過ErrorStack進(jìn)行后臺跟蹤,獲得更詳細(xì)的信息,執(zhí)行如下代碼中的SQL:
alter system set events='1438 trace name errorstack forever,level 3';
然后可以手工執(zhí)行出錯的存儲過程(代碼如下),獲得跟蹤文件,再關(guān)閉跟蹤:
alter system set events='1438 trace name errorstack off';
在Oracle 10g中,這樣的操作會被記錄到日志文件中:
Mon Jul 13 10:48:392009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack forever,level 3'
Mon Jul 13 10:56:06 2009
Errors in file /admin/erpdb/udump/erpdb1_ora_267056.trc:
ORA-01438: value larger than specified precision allowed for this column
Mon Jul 13 10:56:08 2009
Trace dumping is performing id=[cdmp_20090713105608]
Mon Jul 13 10:57:15 2009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack off'
接下來分析獲得的跟蹤文件,就可以獲得SQL文本線索,找到根本問題。在這個案例中,我們得到的跟蹤文件,其關(guān)鍵SQL內(nèi)容如下,通過這個SQL對照數(shù)據(jù)表很快就找到了精度超過的Number型字段:
***SESSION ID:(857.16304) 2009-07-13 10:56:06.429
*** 2009-07-13 10:56:06.429
ksedmp: internal or fatal error
ORA-01438: value larger than specified precision allowed for this column
Current SQL statement for this session:
INSERT INTO CONTRAPAYM (IHCODE,GTICODE,IDX,HCODE,PORDATE,FCODE,FCY,CCODE,ECODE,FLAG,MATCHFLAG,BCODE,MONTHZL,STATUS,AUTOFLAG ,REMARK) SELECTDISTINCT IHCODE,'0000000000000000' GTICODE,(IDX+100) IDX,HCODE,:B3 PORDATE,FCODE,:B2FCY,CCODE,ECODE, FLAG,0 MATCHFLAG,BCODE,MONTHZL,STATUS,4 AUTOFLAG ,'Proc_AdjustContractpayment' REMARK FROM CONTRAPAYM WHERE IHCODE=:B1 ANDIDX=(SELECT MAX(IDX) FROM CONTRACTPAYMENT WHERE IHCODE=:B1 ) AND ROWNUM=1
----- PL/SQL Call Stack -----
object line object
handle number name
7000002ca366e80 100 procedure ERP.PROC_ADJCONTRAPAYM
700000336a1a070 236 procedure ERP.PROC_AUTOBATPROC
7000002ca367df0 5 procedure ERP.TIMRDU
700000342eb7c20 1 anonymous block
在跟蹤文件中,還有大量的堆棧信息,對于復(fù)雜的問題,還可以通過進(jìn)一步細(xì)致的堆棧分析進(jìn)行深入追蹤。
可以很容易地測試這一功能的使用,比如使用如下代碼中的測試過程:
SQL> alter system set events='1438 trace name errorstack forever,level 3';
SQL> connect eygle/eygle
SQL> create table t (name varchar2(10),id number);
Table created.
SQL> insert into t values(a,1);
insert into t values(a,1)
*
ERROR at line 1:
ORA-00984: column not allowed here
SQL> alter system set events '984 trace name errorstack off';
在告警日志文件中就可以獲得如下信息:
Mon Jul 13 22:55:592009
OS Pid: 2431 executed alter system set events '984 trace name ERRORSTACK level3'
Mon Jul 13 22:59:12 2009
Errors in file /opt/oracle/admin/mmstest/udump/mmstest_ora_2520.trc:
ORA-00984: column not allowed here
Mon Jul 13 23:01:01 2009
OS Pid: 2431 executed alter system set events '984 trace name errorstack off'
獲得的跟蹤文件里記錄了insert的相關(guān)信息:
*** 2009-07-13 22:59:12.928
ksedmp: internal or fatal error
ORA-00984: column not allowed here
Current SQL statement for this session:
insert into t values(a,1)
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (?means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+27 all ksedst1() 0? 1 ?
以下引用在ITPUB上的一個討論,是使用ErrorStack跟蹤解決問題的典型案例(原文鏈接指向ITPUB):
問題概述
在進(jìn)行數(shù)據(jù)庫例行維護(hù)時,遇到數(shù)據(jù)庫正常導(dǎo)出,但導(dǎo)入時遇到有張表不能導(dǎo)入的問題。這張表存儲工藝路線,大約共有160萬條記錄。表的不能導(dǎo)入影響一部分功能的實(shí)現(xiàn),各單位反映強(qiáng)烈。
初步分析
首先排除是系統(tǒng)bug所帶來的問題,因?yàn)榍捌趯?dǎo)入導(dǎo)出未發(fā)現(xiàn)問題;另外由于是對同一數(shù)據(jù)庫進(jìn)行的導(dǎo)入導(dǎo)出,也排除了字符集問題。
單獨(dú)再次導(dǎo)入此表,出現(xiàn)錯誤如下;
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp
Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008
Copyright (c) 1982, 2005, Oracle.All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in ZHS16GBK character set and ZHS16GBK NCHAR character set
. importing DBA_MGR's objects into DBA_MGR
. . importing table 'T_ROUTDEF'
IMP-00058: ORACLE error 1438 encountered
ORA-01438: value larger than specified precision allowed for this column
IMP-00028: partial import of previous table rolled back: 20501 rows rolled back
解決方向:有可能是數(shù)據(jù)中存在問題,很有可能是輸入一些特殊字符,這些字符和ORACLE自身使用的通配符一致,所以導(dǎo)致系統(tǒng)不能識別。
問題解決
1、對問題進(jìn)行跟蹤
對系統(tǒng)設(shè)置1438 errorstack跟蹤:
SQL> alter system set events='1438 trace name Errorstack forever,level 10';
System altered
然后重新執(zhí)行IMP;
2、查看UDUMP下日志,如下:
/opt/oracle/admin/orcl/udump/orcl_ora_32355.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:Linux
Node name:zhx25.cccc.com
Release:2.6.9-22.ELsmp
Version:#1 SMP Mon Sep 19 18:00:54 EDT 2005
Machine:x86_64
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 32355, image: oracle@zhx25.cccc.com (TNS V1-V3)
*** SERVICE NAMESYS$USERS) 2008-03-18 17:11:02.014
*** SESSION ID135.949) 2008-03-18 17:11:02.014
*** 2008-03-18 17:11:02.014
ksedmp: internal or fatal error
ORA-01438: value larger than specified precision allowed for this column
Current SQL statement for this session:
INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO 'T_ROUTDEF' ('PRTNO', 'SEQNO', 'SEQDESC', 'PRTEFLAG', 'SEQCC', 'SEQNXTNO', 'SEQDEPT', 'WKCCODE', 'WKCALTCODE', 'SEQSETHR', 'LAB_SEQSETHR', 'SEQRUNHR', 'LAB_SEQRUNHR', 'SEQQUEHR', 'LAB_SEQQUEHR', 'SEQTRANHR', 'LAB_SEQTRANHR', 'SEQNOQLTRAT', 'SEQCHGDAT', 'SEQCHGRSN', 'SEQCHGCTLR', 'MACHCODE', 'MACHALTCODE', 'RESCODE', 'KEYSEQ', 'SEQCRYLOT', 'INITOVERRATE', 'COMWORK', 'SEQLOTFLAG', 'SEQEFDAT', 'SEQIEFDAT', 'SEQEFLOTNO', 'SEQIEFLOTNO', 'SEQECONO', 'PRTSEQCST', 'SEQSELFCST', 'INVYORN', 'SEQQTY', 'SEQNOTE', 'NOPRTAVELZ', 'OUTSEQFLAG', 'WXCST', 'SEQGSCHGCTLR', 'SEQGSCHGDAT', 'SEQSHCTL', 'SEQSHDAT', 'SEQSHSIGN', 'PRICE', 'ACCNO', 'ROUSTAT', 'FIXMFLAG', 'BANCL' VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29, :30, :31, :32, :33, :34, :35, :36, :37, :38, :39, :40, :41, :42, :43, :44, :45, :46, :47, :48, :49, :50, :51, :52)
----- Call Stack Trace -----
calling
call
entry
argument values in hex
location
type
point
(? means dubious value)
…………
oacdef = 6a9477a0
Offsi = 48, Offsi = 1440
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a9477d0
Offsi = 48, Offsi = 1472
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947800
Offsi = 48, Offsi = 1504
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947830
Offsi = 48, Offsi = 1536
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947860
Offsi = 48, Offsi = 1568
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947890
Offsi = 48, Offsi = 1600
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a9478c0
Offsi = 48, Offsi = 1632
kkscoacd
Bind#0
oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=3344 off=0
kxsbbbfp=2a974dc398
bln=32
avl=23
flg=05
value='?
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=32
kxsbbbfp=2a974dc3b8
bln=22 avl=22 flg=01
value=###
An invalid number has been seen.Memory contents are :
Dump of memory from 0x0000002A974CDE88 to 0x0000002A974CDE9E
2A974CDE80 3331312E 0406C102 [.113....]
2A974CDE90 C9BBC6C8 52024D01 30310248 00003304[.....M.RH.10.3..]
Bind#2
oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=56
kxsbbbfp=2a974dc3d0
bln=32
avl=00
flg=01
Bind#3
oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=88
kxsbbbfp=2a974dc3f0
bln=32
avl=01
flg=01
value='M'
…………
從跟蹤文件可以看到,問題是系統(tǒng)認(rèn)為地址為2A974CDE88的數(shù)值非法,在導(dǎo)入會提示ORA-1438錯誤。本值在數(shù)據(jù)庫中定義為number(8)類型,但從結(jié)果來看,系統(tǒng)把這個值當(dāng)成了number(24)類型,并且由于格式不對,不能取出值。
3、找到DMP中相關(guān)數(shù)據(jù)
用winhex打開dmp文件,搜索十六進(jìn)制串“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”(對trace文件中的值要進(jìn)行順序調(diào)整)
4、對導(dǎo)出數(shù)據(jù)進(jìn)行更改
由于“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”不是一個有效的number類型,需要對它進(jìn)行調(diào)整。調(diào)整如下,只要確保調(diào)整的值是number類型的即可。
5、對數(shù)據(jù)庫相關(guān)數(shù)據(jù)項(xiàng)的類型進(jìn)行調(diào)整
把數(shù)據(jù)項(xiàng)的類型由number(*)改為number(38);
6、重新進(jìn)行數(shù)據(jù)導(dǎo)入
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp
Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008
Copyright (c) 1982, 2005, Oracle.All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in ZHS16GBK character set and ZHS16GBK NCHAR character set
. importing DBA_MGR's objects into DBA_MGR
. . importing table
'T_ROUTDEF'
1646404
本文通過介紹ErrorStack的應(yīng)用,將這一常用的診斷工具介紹給大家,作為DBA,在遇到復(fù)雜問題時,通過這些手段深入分析,不僅有助于問題的解決,還將完善自己的知識體系,祝大家學(xué)習(xí)能夠不斷進(jìn)步、深入。