sql 闩锁 原因
SQL Server locks, discussed in the article All about locking in SQL Server, which is applied on data for the duration of the logical operation to preserve logical transaction consistency. SQL Server latches, however, are a special type of low-level system locks which are held as long as the physical operation lasts on the memory page in order to protect memory consistency
SQL Server锁,在“ 关于 SQL Server中的锁”一文中进行了讨论,该锁在逻辑操作期间应用于数据,以保持逻辑事务的一致性。 但是,SQL Server闩锁是一种特殊类型的低级系统锁,只要物理操作在内存页上持续进行,它们就会保留下来,以保护内存一致性
SQL Server latches are an internal SQL Server mechanism that serves to protect shared memory resources, like pages and memory data structures inside the buffer pool, in order to coordinate access to those resources and protect them from corruption. Designed as an internal SQL Server mechanism that is not exposed outside the SQL Server Operating System (SQLOS), latches can be managed only by SQL Server, itself, and not by users (unlike locks that can be managed via NO LOCK hints). Every time SQL Server has to read memory, it will impose latches to the page or internal memory structure that cannot be accessed in a proper multi-threaded way. In this way, SQL Server establishes latches as a resource for the coordination of multiple physical thread execution in a SQL Server database
SQL Server闩锁是一种内部SQL Server机制,用于保护共享内存资源,例如缓冲池中的页面和内存数据结构,以协调对这些资源的访问并保护它们免遭损坏。 闩锁被设计为内部SQL Server机制,不会在SQL Server操作系统(SQLOS)外部公开,只能由SQL Server本身而不由用户管理(不同于可以通过NO LOCK提示进行管理的锁)。 每次SQL Server必须读取内存时,它将对无法以适当的多线程方式访问的页面或内部内存结构施加闩锁。 这样,SQL Server将闩锁建立为一种资源,用于协调SQL Server数据库中多个物理线程的执行
In the same manner as locks, SQL Server latches can come in various modes:
以与锁定相同的方式,SQL Server闩锁可以有多种模式:
- Destroy Latch (DT): the most restrictive latch mode, acquired when a latch is destroyed and a buffer is to be removed from the cache. DT latches block even the KP latch 销毁闩锁(DT) :限制最严格的闩锁模式,在销毁闩锁并从高速缓存中删除缓冲区时获取。 DT闩锁甚至挡住了KP闩锁
- Exclusive Latch (EX): acquires exclusive control of a page being written. Prevents all other latches to be acquired on the page where EX latch exists 排他锁(EX) :获得对正在写入的页面的排他控制。 防止在存在EX锁存器的页面上获取所有其他锁存器
- Update Latch (UP): restrictive similar to an exclusive latch, with an exception that it allows read operation to access the page, but restricts, explicitly, any write operation更新锁存器(UP):与独占锁存器类似,但有限制,区别在于它允许读取操作访问页面,但明确限制任何写操作
- Keep Latch (KP): it serves to preserve a latch order record but also to ensure that it stays in the buffer when a new latch is being placed on it 保留锁存器(KP) :它不仅可以保留锁存器顺序记录,而且还可以确保在将新的锁存器放置在缓冲区上时将其保留在缓冲区中
- Shared Latch (SH): acquired on a page when a read request issued to a page is granted 共享锁存器(SH) :在授予页面的读取请求时,在页面上获取
Similarly to locks, there is a compatibility or incompatibility component between the various latch modes. The table below gives an insight in compatibility between the various SQL Server latches
与锁类似,各种锁存模式之间存在兼容性或不兼容性组件。 下表深入介绍了各种SQL Server闩锁之间的兼容性
There are many different types of SQL Server latches, but essentially they can be split into three general categories: I/O latches, buffer latches, and non-buffer latches.
有很多不同类型SQL Server锁存器,但实际上它们可以分为三大类:I / O锁存器,缓冲区锁存器和非缓冲区锁存器。
I / O锁存器 (I/O latches)
I/O Latches are acquired in situations when an outstanding I/O operation is executed over the pages stored in the buffer pool, or more precisely, when data has to be read from or written to physical storage. The SQL Server will use PAGEIOLATCH_XX wait types to report when a process is waiting for on a SQL Server I/O latch to be released
当对缓冲池中存储的页面执行了出色的I / O操作时,或更确切地说,当必须从物理存储中读取或写入数据时,将获取I / O锁存器。 当进程正在等待释放SQL Server I / O闩锁时,SQL Server将使用PAGEIOLATCH_XX等待类型来报告
So, in situations when the page is requested to be brought from storage into a buffer pool, a PAGEIOLATCH will be acquired on that page, and if storage is not ready to be read the PAGEIOLATCH wait type count will increase
因此,在请求将页面从存储设备带入缓冲池的情况下,将在该页面上获取PAGEIOLATCH,并且如果尚未准备好读取存储设备,则PAGEIOLATCH等待类型计数将增加
缓冲锁存器 (Buffer latches)
In order to properly understand buffer latches, it is important to properly understand the idea behind the memory buffer pool, which is designed around the goal of maximizing SQL Server performance. The buffer pool is a physical memory range where data that is read from disk is stored in data pages. Data in SQL Server tables is stored in pages and each page has a fixed size of 8192 bytes (8 KB). Whenever a data page has to be read or written to, it will be first brought into a buffer pool. In that way, any further access to that page will be read directly from the memory buffer pool, thus improving SQL Server performance by minimizing disk IO.
为了正确地理解缓冲区闩锁,正确理解内存缓冲池背后的思想很重要,该思想旨在最大化SQL Server性能。 缓冲池是一个物理内存范围,将从磁盘读取的数据存储在数据页中。 SQL Server表中的数据存储在页面中,每个页面的固定大小为8192字节(8 KB)。 每当必须读取或写入数据页时,都会首先将其放入缓冲池中。 这样,将直接从内存缓冲池中读取对该页面的任何进一步访问,从而通过最小化磁盘IO来提高SQL Server性能。
This implementation of the memory pool concept in SQL Server is what drives SQL Server physical memory usage can be high even in situations where there is no SQL Server activity. The loading of data in the buffer pool is based on the First-In First-Out (FIFO) principle.
即使在没有SQL Server活动的情况下,SQL Server中内存池概念的这种实现也是驱动SQL Server物理内存使用率很高的原因。 缓冲池中的数据加载基于先进先出(FIFO)原理。
SQL Server uses the buffer manager for managing the buffer pool and it is therefore in charge of any hash tables, the pool array that contain pages and for pages stored in the buffer. The SQLOS is accessing the data stored in the memory exclusively via the buffer manager
SQL Server使用缓冲区管理器来管理缓冲池,因此它负责所有哈希表,包含页面的池数组以及存储在缓冲区中的页面。 SQLOS仅通过缓冲区管理器访问存储在内存中的数据
The pages that are modified in the buffer pool due to executed insert, delete or update command, are the so called “dirty” pages, while the unmodified pages are called “clean” pages. So when the page has to be accessed in memory, the SQL OS will acquire the buffer latch on that page. But unlike a lock, the SQL Server latch will not be held for the transaction duration but rather just during the critical period of a transaction, and it will be released as soon as it is no longer needed. SQL Server will use PAGELATCH_XX wait types to report when a process is waiting for on a SQL Server buffer latch to be released
由于执行了插入,删除或更新命令而在缓冲池中修改过的页面称为“脏”页面,而未修改的页面称为“干净”页面。 因此,当必须在内存中访问该页面时,SQL OS将在该页面上获取缓冲区锁存器。 但是与锁不同的是,SQL Server闩锁将不会在事务持续时间内被保持,而只会在事务的关键时期内被保持,并且在不再需要它时将立即释放它。 当进程正在等待释放SQL Server缓冲区闩锁时,SQL Server将使用PAGELATCH_XX等待类型来报告
非缓冲锁存器 (Non-buffer latches)
Non-buffer latches are designed to protect and guarantee any physical memory structure other than pages stored in the buffer pool. SQL Server will use LATCH_XX wait types to report when a process is waiting for on a SQL Server buffer latch to be released. Non-buffer latches are not often encountered during, and thus those are the least documented, but here are some use cases that can lead to SQL Server contention with non-buffer latches:
非缓冲锁存器旨在保护和保证除缓冲池中存储的页面以外的任何物理内存结构。 当进程正在等待释放SQL Server缓冲区闩锁时,SQL Server将使用LATCH_XX等待类型进行报告。 非缓冲锁存器在此期间并不经常遇到,因此,文献记载最少,但是这里有一些用例可能导致SQL Server与非缓冲锁存器竞争:
- Excessive parallelism – In a situation when a high level of parallelism is used on servers with 12+ logical processors, most if not all, queries can qualify to use parallel execution plans. In such a situation, non-buffered latches (LATCH_XX) will be acquired in memory to ensure the synchronization of internal memory structures used by parallel execution plans 过多的并行性 –在具有12个以上逻辑处理器的服务器上使用高度并行性的情况下,大多数(如果不是全部)查询可以使用并行执行计划。 在这种情况下,将在内存中获取非缓冲锁存器(LATCH_XX),以确保并行执行计划使用的内部内存结构同步
- Too many auto-grow/auto-shrink operations – in systems with poor planning of database sizing or storage capacity (bad default database settings), auto-grow operations can be executed frequently. In addition, when auto-shrink is turned on, frequent database shrinking will occur. When growth and shrink operations are executed, SQL Server acquires 自动增长/自动收缩操作太多 –在数据库大小或存储容量计划不佳(默认数据库设置不佳)的系统中,可能会频繁执行自动增长操作。 此外,启用自动收缩功能后,会频繁发生数据库收缩。 当执行增长和收缩操作时,SQL Server将获取FCB, FGCB_ADD_REMOVE and FGCB_ALLOC latches class to ensure the access to the file control block and to ensure synchronized access to information stored in the filegroup FCB,FGCB_ADD_REMOVE和FGCB_ALLOC闩锁类,以确保对文件控制块的访问并确保对文件组中存储的信息的同步访问
- Very high frequency of DML operations on heap and BLOB data structures – In a situation where excessive DML operations are performed on heap and BLOB data, it is necessary to make sure to keep all internal memory structures in responsible for allocation and deallocation of pages to heap synchronized. In such situations, excessive LATCH_EX wait types can be encountered. When this occurs ALLOC_CREATE_FREESPACE_CACHE, ALLOC_FREESPACE_CACHE, ALLOC_EXTENT_CACHE wait types could be found as prevailing wait types via the sys.dm_os_latch_stats DMV 对堆和BLOB数据结构进行DML操作的频率很高 –在对堆和BLOB数据进行过多DML操作的情况下,有必要确保使所有内部内存结构负责对要分配给堆的页进行分配和释放已同步。 在这种情况下,可能会遇到过多的LATCH_EX等待类型。 发生这种情况时,可以通过sys.dm_os_latch_stats DMV找到ALLOC_CREATE_FREESPACE_CACHE,ALLOC_FREESPACE_CACHE,ALLOC_EXTENT_CACHE等待类型作为主要的等待类型。
So, based on the previous, in situations when LATCH_XX wait type have excessive values or those are prevalent wait types, it is good to check which non-buffer latches are prevalent in the SQL Server using the following query
因此,根据前面的内容,在LATCH_XX等待类型具有过多值或这些值是普遍等待类型的情况下,最好使用以下查询来检查SQL Server中哪些非缓冲区锁存器普遍存在
SELECT latch_class, wait_time_ms,waiting_requests_count, 100.0 * wait_time_ms / SUM
(wait_time_ms) OVER() AS '% of latches'
FROM sys.dm_os_latch_stats
WHERE latch_class NOT IN ('BUFFER')
AND wait_time_ms > 0
超级锁 (SuperLatches)
Starting with SQL Server 2005, superlatches (also called sublatches) were introduced to improve SQL Server efficiency in highly concurrent OLTP workloads for a certain pattern of usage (i.e. very high shared read only access to the page (SH) while write access is very low or not exists). Superlatches are used by SQL Server only in NUMA systems with 32+ logical processors. This mechanism is an efficient way of SQL Server to deal with a latch contention by dynamically promoting an array of latches to a Superlatch and thus allowing an SH mode request to the superlatch, while the containing sublatches can remain different modes. When this occurs, the superlatch becomes just a pointer to an array of SQL Server latches.
从SQL Server 2005开始,引入了超级锁(也称为子锁),以针对某些使用模式(即,对页面(SH)的共享访问权限非常高,而对写入的访问权限非常低的情况),在高度并发的OLTP工作负载中提高SQL Server的效率。或不存在)。 SQL Server仅在具有32个以上逻辑处理器的NUMA系统中使用超级锁。 这种机制是SQL Server通过有效地将闩锁数组动态提升为超级闩锁,从而允许向超级闩锁发出SH模式请求的有效方法,而包含子闩锁可以保持不同的模式。 发生这种情况时,超级闩锁将成为指向SQL Server闩锁数组的指针。
A Superlatch will behave as a single latch with sublatch structures and there can be one sublatch per partition per logical CPU core. So when a superlatch is created, the CPU worker thread will just have to acquire the shared (SH) sublatch that is assigned to the scheduler. This ensures that a shared (SH) superlatch uses less resources while at the same time access to pages is more efficient comparing to non-partitioned shared latches. The reason for this is that the superlatch do not require any synchronization of the global state as it will access only the local NUMA memory
超级闩锁将充当具有子闩锁结构的单个闩锁,每个逻辑CPU内核的每个分区可以有一个子闩锁。 因此,当创建超级锁存器时,CPU工作线程将只需要获取分配给调度程序的共享(SH)子锁存器。 这样可以确保共享(SH)超级锁使用的资源较少,而与未分区的共享锁相比,访问页面的效率更高。 这样做的原因是超级锁不需要全局状态的任何同步,因为它将仅访问本地NUMA内存
锁存竞争 (Latch Contention)
Latch contention is a frequent scenario for systems with large number of CPUs, and it is the consequence of situations when on the same in-memory structure, multiple threads are trying, concurrently, to acquire SQL Server latches that are not compatible with each other. Since SQL Server latches are controlled by an internal SQL Server mechanism, SQLOS will determine on its own when to use them. Due to the deterministic nature of SQL Server latches and their behavior, various parameters such as application design or database schema structure can significantly affect SQL Server latches
锁存器争用是具有大量CPU的系统的常见情况,并且是在相同的内存结构中多个线程同时尝试获取彼此不兼容SQL Server锁存器的结果。 由于SQL Server闩锁由内部SQL Server机制控制,因此SQLOS将自行决定何时使用它们。 由于SQL Server闩锁的确定性和行为,各种参数(例如应用程序设计或数据库架构结构)会严重影响SQL Server闩锁
On high throughput systems which are designed for a large number of CPUs and thus, high-concurrency, active latch contention is expected as a regular occurrence of on memory structures are often accessed and protected using the latches. But the situation when latch contention and latch wait types wait time is large enough to decrease utilization of CPUs is what results in the reduced throughput
在为大量CPU设计的高吞吐量系统上,因此,高并发性会导致主动锁存器争用,因为经常会使用锁存器访问并保护内存结构的正常发生。 但是,当锁存器争用和锁存器等待类型的等待时间足够长以至于降低CPU利用率时,会导致吞吐量降低
Recognizing and identifying the signs of latch contention is important, so let’s shed light on some symptoms of latch contention
识别和识别闩锁争用的迹象很重要,因此让我们了解一下闩锁争用的某些症状
The expected behavior of SQL Server latches, in relation to the transactions per second, is that transactions per second will increase along with increasing average SQL Server latch waits, that themselves increase at a slow rate that will be within the margins of the throughput. Such a situation is represented in the image below and this is the desired system behavior which indicates that logical processors are not conflicting with each other. In such a scenario, adding more logical processors means that more can be done
SQL Server闩锁相对于每秒事务的预期行为是,每秒事务将随着平均SQL Server闩锁等待次数的增加而增加,而它们自身的增长速度却很慢,这将在吞吐量的范围内。 下图显示了这种情况,这是所需的系统行为,表明逻辑处理器之间没有冲突。 在这种情况下,添加更多逻辑处理器意味着可以完成更多工作
Situations when transactions/sec value is dropping when enabling additional logical processors while, at the same time, average SQL Server latch wait times are increasing at a greater rate than the system throughput, potentially indicate that there is a high probability that a problem with a latch contention may exist. The following image represents a typical situation where adding new logical processors worked until the certain point when longer latch wait times started to occur. This results in a situation where adding new logical processors will not have any benefits, up to a point where transactions/sec starts to negatively affect performance. This is ta typical situation where adding new logical processors actually had a negative, vs. a positive effect, as the resulting system environment will be spending a lot of time in a waiting state.
启用其他逻辑处理器时,事务/秒值下降的情况,同时平均SQL Server闩锁等待时间以大于系统吞吐量的速率增加,这潜在地表明存在问题的可能性很大。闩锁争用可能存在。 下图代表一种典型的情况,在这种情况下,添加新的逻辑处理器将一直工作到较长的闩锁等待时间开始出现的某个时刻。 这导致添加新的逻辑处理器不会带来任何好处,直到事务/秒开始对性能产生负面影响为止。 在典型情况下,添加新的逻辑处理器实际上会产生负面影响,而正面影响则是这样,因为最终的系统环境将在等待状态下花费大量时间。
Latch contention that can affect the OLTP performance is mainly caused when high concurrency is the result of some of the following factors:
可能影响OLTP性能的闩锁争用主要是由于以下某些因素导致高并发性导致的:
- Application design based on high concurrency – when a client application issues a high number of concurrent requests against the database 基于高并发性的应用程序设计 –当客户端应用程序对数据库发出大量并发请求时
- SQL Server logical files layout – allocation structures such as Global Allocation Map (GAM), Shared Global Allocation Map (SGAM), Page Free Space (PFS) and Index Allocation Map (IAM) can impact page latch contention when many concurrent threads are in conflict SQL Server逻辑文件布局 –当许多并发线程发生冲突时,诸如全局分配图(GAM),共享全局分配图(SGAM),页面可用空间(PFS)和索引分配图(IAM)之类的分配结构可能会影响页面闩锁争用
- Database schema design – read, write, delete data access patterns, index B+tree depth, design of clustered and non-clustered indexes, rows size and density per page 数据库架构设计 –读取,写入,删除数据访问模式,索引B +树的深度,集群和非集群索引的设计,每页的行大小和密度
- The performance of I/O subsystems – is a quite frequent cause since, due to low I/O subsystem performance, SQL Server must wait for the data to be moved to a buffer pool. Excessive PAGEIOLATCH_XX wait type is indicative of the slow I/O subsystem I / O子系统的性能 –是一个很常见的原因,因为由于I / O子系统的性能低,SQL Server必须等待将数据移到缓冲池中。 过多的PAGEIOLATCH_XX等待类型表示I / O子系统缓慢
- Large number of logical CPUs assigned to SQL Server – Excessive latch contention that affects the performance of SQL Server to a level that is not acceptable is indicated in the system with more than 16 logical CPUs, and more logical CPUs are available the higher level of contention might be 分配给SQL Server的逻辑CPU数量很多 –在具有超过16个逻辑CPU的系统中,指示将过多的闩锁争用影响SQL Server的性能达到不可接受的水平,并且可用逻辑CPU越多,争用级别越高可能
翻译自: https://www.sqlshack.com/all-about-latches-in-sql-server/
sql 闩锁 原因