What if locks are still mostly busy? Not only does waiting time to acquire lock increase, but also time to execute a critical section.

Why? Lock acquire interferes the lock release, due to atomic instructions. When waiting threads executes atomic instructions, that also delays release operation to free the lock.

MCS (Mellor-Crummey and Scott) Locks

  • Assign each waiting thread a separate memory location to spin-wait
  • To release a lock, the bit is set for one thread, telling it that it is the next to acquire the lock
  • Create a linked list of waiters using compare-and-swap
    • Front of the list holds the lock
    • New thread uses compare-and-swap to add to the tail
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
 Node {
    Node *next;
    std::atomic<bool> need_to_wait;
}

McsLock {
    Node *tail = NULL; // end of the list, shared variable
    thread_local Node *self; // something like thread local variable
}

McsLock::acquire() {
    self->next = NULL;

    // Common pattern of using compare_and_swap:
    // 1. store current value on local variable
    // 2. try compare_and_swap
    // 3. if fail (which means local variable is not synced),
    //    refresh current value

    // Read current tail
    Node *old_tail = tail;
    // Tail might be changed or not
    while (!compare_and_swap(&tail, old_tail, self)) {
        // If someone else changed the tail, do it again.
        old_tail = tail;
    }

    // If old_tail = NULL, then I am the front of the list => acquired.
    if (old_tail != NULL) {
        // need to wait until I am front of the list
        self->need_to_wait.store(true, release);
        old_tail->next = self;
        // spin wait on (self->need_to_wait)
        // each thread spin wait on different self
        while (self->need_to_wait);
    }
}

McsLock::release() {
    // If self = tail, then I am the only one => make list empty and free lock
    if (!compare_and_swap(&tail, self, NULL)) {
        // spin wait on (self->next), which is per-thread value
        // each thread spin wait on different self
        while (self->next == NULL);
        // proceed to next thread
        self->next->need_to_wait.store(false, relaxed);
    }
}


Back