天天看點

ORACLE等待事件: log file parallel write

<b>log file parallel write</b><b>概念介紹</b>

log file parallel write 事件是LGWR程序專屬的等待事件,發生在LGWR将日志緩沖區(log_buffer)中的重做日志資訊寫入聯機重做日志檔案組的成員檔案,LGWR在該事件上等待該寫入過程的完成。該事件的等待表示重做日志所處的磁盤裝置<b>緩慢或存在争用</b>。下面看看官方一些資料是如何解釋log file parallel write等待事件的。

<b>log file parallel write</b>

<b> </b>

Writing redo records to the redo log files from the log buffer.

Wait Time: Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.

<b> Parameters:</b>

<b></b> 

·         <b>P1 = </b>files

·         <b>P2 = </b>blocks

·         <b>P3 = </b>requests

        If you have more than one log member per group then the files are written to in parallel (if possible). This is the number of redo log members (files) that the writes apply to.

        The number of REDO blocks being written to each log member. eg: A value of 10 indicates 10 redo blocks for each log member.

·         <b>requests</b> Number of I/O requests.

        The number of distinct IO requests. The number of blocks to be written is divided up into a number of IO requests.

        The actual wait time is the time taken for all the outstanding I/O requests to complete. Even though the writes may be issued in parallel, LGWR needs to wait for the last I/O to be on disk before the parallel write is considered complete. Hence the wait time depends on the time it takes the OS to complete all requests.

注意:自己看到這個等待事件parallel write,以及網上有“LGWR程序對同一組多個重做日志檔案‘同時’寫,是通過異步I/O來實作的,是以等待事件log file parallel write應該是在同一組下有多個重做日志檔案時才會出現 ”這種說法,其實不管重做日志是一個或多個成員,都會出現log file parallel write等待事件。隻要有buffer log寫到redo log裡面就有這個等待,它不是說并行寫多個redo檔案,單個檔案也是這樣的。

<b>log file parallel write</b><b>出現原因</b>

如果資料庫出現了這個等待事件,意味着重做日志(redo log)所處的磁盤裝置I/O<b>緩慢或存在争用</b>:

· 磁盤I/O性能比較差

· REDO檔案的分布導緻了I/O争用,例如,同一組的REDO成員檔案放在相同的磁盤上。

<b>檢視</b><b>log file parallel write</b><b>等待事件</b><b></b>

<b></b>

上面time_waited、average_wait的機關為百分之一秒(in hundredths of a second), 'log file parallel write'事件的平均等待時間大于 10ms (or 1cs),這通常意味着存在較慢的I/O吞吐量。 LGWR程序寫入慢會影響使用者事務送出的時間。

If the log file parallel write average wait time is greater than 10ms (or 1cs), this normally indicates slow I/O throughput.The cure is the same as for the db file parallel write waits. Enable asynchronous writes if your redo logs are on raw devices and the operating system supports asynchronous I/O. For redo logs on file systems, use synchronous direct writes.

其它一些檢視redo log相關資訊的SQL

<b>如何減少</b><b>log file parallel write</b><b>等待時間</b><b></b>

要減少log file parallerl write等待時間,可以從下面兩個方面入手解決(其它細節參考下面英文資料):

(1)将重做日志檔案組放置到高速I/O磁盤上(不要将日志組成員檔案放置IO不一樣的磁盤上)。不要将重做日志放在RAID5的磁盤上。

(2)盡可能的降低重做數量:

       1:盡可能使用nologging選項,例如索引的重建、重組。包括create table...as select...等操作

       2:熱備可能生成大量的重做資訊,是以熱備份應該在非高峰時間運作。

       3:應用程式的commit的頻率小一些也能有一些改善。

<b>Reducing Waits / Wait times:</b>

You might want to reduce "log file parallel write" wait times in order to reduce user waits which depend on LGWR.

· Ensure tablespaces are NOT left in HOT BACKUP mode longer than needed.

· Tablespaces in HOT BACKUP mode cause more redo to be generated for each change which can vastly increase the rate of redo generarion.

· NOLOGGING / UNRECOVERABLE operations may be possible for certain operations to reduce the overall rate of redo generation

· Redo log members should ideally be on high speed disks

· eg: Historically RAID5 was not a good candidate for redo log members, although some of the RAID 5 drawbacks have been addressed more recently through larger intelligent disk caches, etc.

· Redo log members should be on disks with little/no IO activity from other sources.

· (including low activity from other sources against the same disk controller)

· With file systems like ASM and ZFS, you may also have to experiment with the disk stripe size to make more friendly with the type of IO LGWR usually performs.

· Check that the system has not become CPU (or memory) bound otherwise it may take LGWR more time to recognize IO has been completed.

· The higher the value for CPU_COUNT, the more strands can be used for writing redo. Having an increased number of strands can lead to an increase the average waits on "log file parallel write" (which in turn can affect "log file sync" waits). Filesystems that do not support Concurrent IO (CIO) are more likely to be affected by this, compared to ASM or raw devices. However, it can also impact filesystems that support CIO. On servers with CPU_COUNT greater than 128 LGWR behaviour may need manually adjusting through underscore parameters. If you think you are encountering long "log file parallel write" due to high CPU counts then contact Oracle Support.

---------------------------------------------------------------------------------------------------------

Unfortunately, you cannot spawn more than one LGWR process. In this case, it is critical that nothing else is sharing the mount point of the redo log files. Make sure the controller that services the mount point is not overloaded. Moving the redo logs to higher speed disks will also help.

We strongly suggest that you avoid putting redo logs on RAID5 disks, but we also understand that many times you don’t have a choice or a say in it.

Besides improving the I/O throughput, you can also work on lowering the amount of redo entries. This will provide some relief, but not the cure. Whenever possible, use the NOLOGGING option. Indexes should be created or rebuilt with the NOLOGGING option. CTAS operations should also use this option.

Note:

       The NOLOGGING option doesn’t apply to normal DML operations such as inserts, updates, and deletes. Objects created with the NOLOGGING option are unrecoverable unless a backup is taken prior to the corruption. If you have to take an additional backup, then the I/Os that you save by not logging will be spent on backup.        Database in FORCE LOGGING mode will log all changes (except for changes in temporary tablespaces), regardless of the tablespace and object settings.

       A lower commit frequency at the expense of higher rollback segment usage can also provide some relief.

       A high commit frequency causes the LGWR process to be overactive and when coupled with slow I/O throughput will only magnify the log file parallel write waits.

       The application may be processing a large set of data in a loop and committing each change, which causes the log buffer to be flushed too frequently. In this case, modify the application to commit at a lower frequency. There could also be many short sessions that log in to the database, perform. a quick DML operation, and log out.

下面提供一個使用者做log file sync waits、log file parallel write troubleshoot的一個腳本。官方文檔ID為1064487.1

參考資料:

<a href="http://www.oraclecommunity.net/profiles/blogs/wait-events-log-file-sync-and-log-file-parallel-write">http://www.oraclecommunity.net/profiles/blogs/wait-events-log-file-sync-and-log-file-parallel-write</a>

<a href="http://blog.itpub.net/15880878/viewspace-722994/">http://blog.itpub.net/15880878/viewspace-722994/</a>