Mechanical Sympathy

2021-07-05 13:06:13

Mechanical Sympathy

Memory Barriers/Fences

In this article I'll discuss the most fundamental technique in concurrent programming known as memory barriers, or fences, that make the memory state within a processor visible to other processors.

CPUs have employed many techniques to try and accommodate the fact that

CPU execution unit performance has greatly outpaced main memory

performance. In my “Write Combining”

article I touched on just one of these techniques. The most common

technique employed by CPUs to hide memory latency is to pipeline

instructions and then spend significant effort, and resource, on trying

to re-order these pipelines to minimise stalls related to cache misses.

When a program is executed it does not matter if its instructions are

re-ordered provided the same end result is achieved. For example,

within a loop it does not matter when the loop counter is updated if no

operation within the loop uses it. The compiler and CPU are free to

re-order the instructions to best utilise the CPU provided it is updated

by the time the next iteration is about to commence. Also over the

execution of a loop this variable may be stored in a register and never

pushed out to cache or main memory, thus it is never visible to another

CPU.

CPU cores contain multiple execution units. For example, a modern Intel

CPU contains 6 execution units which can do a combination of

arithmetic, conditional logic, and memory manipulation. Each execution

unit can do some combination of these tasks. These execution units

operate in parallel allowing instructions to be executed in parallel.

This introduces another level of non-determinism to program order if it

was observed from another CPU.

Finally, when a cache-miss occurs, a modern CPU can make an assumption

on the results of a memory load and continue executing based on this

assumption until the load returns the actual data.

Provided “program order” is preserved the CPU, and compiler, are free to do whatever they see fit to improve performance.

Figure 1.

private volatile long sequence = RingBuffer.INITIAL_CURSOR_VALUE;
 
// from inside the run() method
 
T event = null;
long nextSequence = sequence.get() + 1L;
while (running)
{
    try
    {
        final long availableSequence = barrier.waitFor(nextSequence);
 
        while (nextSequence <= availableSequence)
        {
            event = ringBuffer.get(nextSequence);
            boolean endOfBatch = nextSequence == availableSequence;
            eventHandler.onEvent(event, nextSequence, endOfBatch);
            nextSequence++;
        }
 
        sequence.set(nextSequence - 1L);
        // store barrier inserted here !!!
    }
    catch (final Exception ex)
    {
        exceptionHandler.handle(ex, nextSequence, event);
        sequence.set(nextSequence);
        // store barrier inserted here !!!
        nextSequence++;
    }
}

Mechanical Sympathy

Mechanical Sympathy

Memory Barriers/Fences

继续阅读

iOS中UITableview实现跨区域拖动效果的实现

设计比较好，有助于学习的Github上的iOS App源码 (中文)

Cocoa的MVC架构分析

iOS上的http请求：get、post以及同步、异步

Swift中去掉字符串格式数字小数点后多余的0 （自写）

iOS - Modal后控制器的关系及 modal后不移除后面的控件

iOS 为什么必须在主线程中操作UI

oc中的copy与mutableCopy

copy与mutableCopy 记住：

[学习笔记—Objective-C]《Objective-C 程序设计第6版》第六章选择结构课后题6-4&6-6&6-7

AppCode的使用

iOS FDMB添加新字段

iOS开发之提交App前需要准备的材料

iOS APP提交上架最新流程

EGORefreshTableHeaderView 解读代码解读 ELTableViewController 的使用写在最后

今日头条iOS客户端启动速度优化技术调研实测数据