Thursday, November 27, 2014

Troubleshooting Log File Sync Waits

If you notice a very large number of waits with a short average wait time per wait, that’s an indication that too many commit statements are being issued by the database. You must change the commit behavior by batching the commits.
Instead of committing after each row, for example, you can specify that the commits occur after every 500 rows.

If you notice that the large amount of wait time accumulated due to the redo log file sync event was caused by long waits for writing to the redo log file (high average time waited for this event), it’s more a matter of how fast your I/O subsystem is. You can alternate the redo log files on various disks to reduce contention. You can also see if you can dedicate disks entirely for the redo logs instead of allowing other files on those disks—this will reduce I/O contention when the LGWR is writing the buffers to disk. Finally, as a long-term solution, you can look into placing redo logs on faster devices, say, by moving them from a RAID 5 to a RAID 1 device.

How It Works

Oracle (actually the LGWR background process) automatically flushes a session’s redo information to the redo log file whenever a session issues a COMMIT statement. The database writes commit records to disk before it returns control to the client. The server process thus waits for the completion of the write to the redo log. 


The log file sync wait event includes the wait during the writing of the log buffer to disk by LGWR and the posting of that information to the session. The server process will have to wait until it gets confirmation that the LGWR process has completed writing the log buffer contents out to the redo log file.

The log file sync events are caused by contention during the writing of the log buffer contents to the redo log files. Check the V$SESSION_WAIT view to ascertain whether Oracle is incrementing the SEQ# column. If Oracle is incrementing this column, it means that the LGWR process is the culprit, as it may be stuck.

P1 = buffer#
All changes up to this buffer number (in the log buffer) must be flushed to disk and the writes confirmed to ensure that the transaction is committed and will be kept on an instance crash. The wait is for LGWR to flush up to this 

As the log file sync wait event is caused by contention caused by the LGWR process, see if you can use the NOLOGGING option to get rid of these waits. Of course, in a production system, you can’t use the NOLOGGING option when the database is processing user requests, so this option is of limited use in most cases.

Troubleshooting Buffer Busy Wait Events

The Buffer Busy Waits Oracle metric occur when an Oracle session needs to access a block in the buffer cache, but cannot because the buffer copy of the data block is locked. 

What causes the buffer busy wait event 

1. The block is being read into the buffer by another session, so the waiting session must wait for the block read to complete.

2. Another session has the buffer block locked in a mode that is incompatible with the waiting session's request.

While the block is being changed, the block is marked as unreadable by others. The changes that are being made should last under a few hundredths of a second, for example, a disk read should be under 20 milliseconds and a block modification should be under one millisecond. Therefore it will take a lot of buffer busy waits to cause a problem, but some examples of this are:

1. Hot block issue, such as the first block on the free list of a table, with high concurrent inserts. All users will insert into that block at the same time, until it fills up, then users start inserting into the next free block on the list, and so on.
2. Multiple users running an inefficient SQL statement performing a full table scan on the same large table at the same time. One user will read the block off disk, and the other users will wait on buffer busy waits (or read by other session in 10g and higher) for the physical I/O to complete.

Find more  information of buffer busy waits

col event for a30
select event,p1 file#,p2 block#
from v$session_wait
where sid in
(select sid from v$session where username='&username')
/

To get more information about the SQL statement being executed and the block being waited for, trace the session or gather data from V$SESSION V$SESSION_WAIT 

SELECT s.sql_hash_value, sw.p1 file#, sw.p2 block#, sw.p3 reason
FROM v$session_wait sw, v$session s
WHERE sw.event = 'buffer busy waits'
AND sw.sid = s.sid
/

P1 stands for File id 
P2 stands for Block id
P3 stands for reason code


Now determine the object being waited for, use the P1 (file_number) and P2 (block_number) information from the above query:

col owner for a10
col segment_name for a15
select owner, segment_name
from dba_extents
where file_id = &p1
and &p2 between block_id and block_id+blocks-1
/

Troubleshooting buffer busy waits

Now you know the database object for this wait, consider the following causes of contention and their solutions

Undo Header – If using Automatic Undo Management, increase the size of the undo tablespace. 

Data Block- Data blocks are the blocks that hold the row data in a table or index. The typical problem is that multiple sessions are requesting a block that is either not in cache or in an incompatible mode (read by other session in 10g and higher).

Tune inefficient queries that read too many blocks into the buffer cache. These queries could flush out blocks that may be useful for other sessions in the buffer cache. By tuning queries, the number of blocks that need to be read into the cache is minimized, reducing aging out of the existing “good” blocks in the cache.

Resolve Hot Blocks – If the queries above consistently return the same block or set of blocks, this is considered a hot block scenario. Delete some of the hot rows and insert them back into the table. Most of the time, the rows will be placed in a different block. The DBA may need to adjust PCTFREE and/or PCTUSED to ensure the rows are placed into a different block. Also talk with the developers to understand why a set of blocks are hot.

Place Table in Memory – Cache the table or keep the table in the KEEP POOL. When multiple sessions are requesting the blocks that reside in the disk, it takes too much time for a session to read it into the buffer cache. Other session(s) that need the same block will register ‘buffer busy wait’. If the block is already in buffer cache, however, this possibility is eliminated. Another alternative is to increase the buffer cache size. A larger buffer cache means less I/O from disk. This reduces situations where one session is reading a block from the disk subsystem and other sessions are waiting for the block.

Fix Low Cardinality Indexes – Look for ways to reduce the number of low cardinality indexes, i.e. an index with a low number of unique values that could result in excessive block reads. This can especially be problematic when concurrent DML operates on table with low cardinality indexes and cause contention on a few index blocks.

Data Segment Header- Each segment has one header block that contains segment information, e.g. free and available block details and the highwater mark. At times, this block can be a point of contention when multiple sessions are attempting to insert/delete into/from the same table.