當(dāng)一個(gè)用戶提交(commits)或者回滾(rollback),session的redo信息需要寫出到redo logfile中.
用戶進(jìn)程將通知LGWR執(zhí)行寫出操作,LGWR完成任務(wù)以后會(huì)通知用戶進(jìn)程.
這個(gè)等待事件就是指用戶進(jìn)程等待LGWR的寫完成通知.
對(duì)于回滾操作,該事件記錄從用戶發(fā)出rollback命令到回滾完成的時(shí)間.
如果該等待過多,可能說明LGWR的寫出效率低下,或者系統(tǒng)提交過于頻繁.
針對(duì)該問題,可以關(guān)注:
log file parallel write等待事件
user commits,user rollback等統(tǒng)計(jì)信息可以用于觀察提交或回滾次數(shù)
解決方案:
1.提高LGWR性能
盡量使用快速磁盤,不要把redo log file存放在raid 5的磁盤上
2.使用批量提交
3.適當(dāng)使用NOLOGGING/UNRECOVERABLE等選項(xiàng)
可以通過如下公式計(jì)算平均redo寫大?。?div style="height:15px;">
avg.redo write size = (Redo block written/redo writes)*512 bytes
如果系統(tǒng)產(chǎn)生redo很多,而每次寫的較少,一般說明LGWR被過于頻繁的激活了.
可能導(dǎo)致過多的redo相關(guān)latch的競(jìng)爭(zhēng),而且Oracle可能無法有效的使用piggyback的功能.
我們從一個(gè)statspack中提取一些數(shù)據(jù)來研究一下這個(gè)問題.
1.主要信息
DB Name DB Id Instance Inst Num Release OPS Host------------ ----------- ------------ -------- ----------- --- ------------DB 1222010599 oracle 1 8.1.7.4.5 NO sun Snap Id Snap Time Sessions ------- ------------------ -------- Begin Snap: 3473 13-Oct-04 13:43:00 540 End Snap: 3475 13-Oct-04 14:07:28 540 Elapsed: 24.47 (mins)Cache Sizes~~~~~~~~~~~ db_block_buffers: 102400 log_buffer: 20971520 db_block_size: 8192 shared_pool_size: 600MLoad Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 28,458.11 2,852.03 ......
2.等待事件
Event Waits Timeouts Time (cs) (ms) /txn---------------------------- ------------ ---------- ----------- ------ ------log file sync 14,466 2 4,150 3 1.0db file sequential read 17,202 0 2,869 2 1.2latch free 24,841 13,489 2,072 1 1.7 direct path write 121 0 1,455 120 0.0db file parallel write 1,314 0 1,383 11 0.1log file sequential read 1,540 0 63 0 0.1....log file switch completion 1 0 3 30 0.0refresh controlfile command 23 0 1 0 0.0LGWR wait for redo copy 46 0 0 0 0.0....log file single write 4 0 0 0 0.0
我們看到,這里log file sync和db file parallel write等待同時(shí)出現(xiàn)了.
顯然log file sync在等待db file parallel write的完成.
這里磁盤IO肯定存在了瓶頸,實(shí)際用戶的redo和數(shù)據(jù)文件同時(shí)存放在Raid的磁盤上,存在性能問題.
需要調(diào)整.
3.統(tǒng)計(jì)信息
Statistic Total per Second per Trans--------------------------------- ---------------- ------------ ------------....redo blocks written 93,853 63.9 6.4redo buffer allocation retries 1 0.0 0.0redo entries 135,837 92.5 9.3redo log space requests 1 0.0 0.0redo log space wait time 3 0.0 0.0redo ordering marks 0 0.0 0.0redo size 41,776,508 28,458.1 2,852.0redo synch time 4,174 2.8 0.3redo synch writes 14,198 9.7 1.0redo wastage 4,769,200 3,248.8 325.6redo write time 3,698 2.5 0.3redo writer latching time 0 0.0 0.0redo writes 14,572 9.9 1.0....sorts (disk) 4 0.0 0.0sorts (memory) 179,856 122.5 12.3sorts (rows) 2,750,980 1,874.0 187.8....transaction rollbacks 36 0.0 0.0transaction tables consistent rea 0 0.0 0.0transaction tables consistent rea 0 0.0 0.0user calls 1,390,718 947.4 94.9user commits 14,136 9.6 1.0user rollbacks 512 0.4 0.0write clones created in backgroun 0 0.0 0.0write clones created in foregroun 11 0.0 0.0 -------------------------------------------------------------
avg.redo write size = (Redo block written/redo writes)*512 bytes = ( 93,853 / 14,572 )*512 = 3K
這個(gè)平均過小了,說明系統(tǒng)的提交過于頻繁.
Latch Sleep breakdown for DB: DPSHDB Instance: dpshdb Snaps: 3473 -3475-> ordered by misses desc Get Spin &Latch Name Requests Misses Sleeps Sleeps 1->4-------------------------- -------------- ----------- ----------- ------------row cache objects 12,257,850 113,299 64 113235/64/0/ 0/0shared pool 3,690,715 60,279 15,857 52484/588/65 46/661/0library cache 4,912,465 29,454 8,876 23823/2682/2 733/216/0cache buffers chains 10,314,526 2,856 33 2823/33/0/0/ 0redo writing 76,550 937 1 936/1/0/0/0session idle bit 2,871,949 225 1 224/1/0/0/0messages 107,950 159 2 157/2/0/0/0session allocation 184,386 44 6 38/6/0/0/0checkpoint queue latch 96,583 1 1 0/1/0/0/0 -------------------------------------------------------------
由于過渡頻繁的提交,LGWR過度頻繁的激活,我們看到這里出現(xiàn)了redo writing的latch競(jìng)爭(zhēng).
關(guān)于redo writing競(jìng)爭(zhēng)你可以在steve的站點(diǎn)找到詳細(xì)的介紹:
http://www.ixora.com.au/notes/lgwr_latching.htm轉(zhuǎn)引如下:
When LGWR wakes up, it first takes the redo writing latch to update the SGA variable that shows whether it is active. This prevents other Oracle processes from posting LGWR needlessly. LGWR then takes the redo allocation latch to determine how much redo might be available to write (subject to the release of the redo copy latches). If none, it takes the redo writing latch again to record that it is no longer active, before starting another rdbms ipc message wait.
If there is redo to write, LGWR then inspects the latch recovery areas for the redo copy latches (without taking the latches) to determine whether there are any incomplete copies into the log buffer. For incomplete copies above the sync RBA, LGWR just defers the writing of that block and subsequent log buffer blocks. For incomplete copies below the sync RBA, LGWR sleeps on a LGWR wait for redo copy wait event, and is posted when the required copy latches have been released. The time taken by LGWR to take the redo writing and redo allocation latches and to wait for the redo copy latches is accumulated in the redo writer latching time statistic.
(Prior to release 8i, foreground processes held the redo copy latches more briefly because they did not retain them for the application of the change vectors. Therefore, LGWR would instead attempt to assure itself that there were no ongoing copies into the log buffer by taking all the redo copy latches.)
After each redo write has completed, LGWR takes the redo allocation latch again in order to update the SGA variable containing the base disk block for the log buffer. This effectively frees the log buffer blocks that have just been written, so that they may be reused.