2023-12-22

RAII and locks in kernel

C was the only go-to programming language when you enter the kernel realm, see that I specifically used was and not has been? It is now the time to switch out that old, (t)rusty C11 to a newer, more futuristic C++17 standard and start playing with those sweet, sweet, juicy std::vector, std::string, std::tuple you all deserve like a good boy! Sorry I was just messing around, no way that can be happening this soon, not at least a few ~~centuries~~ decades to come and actually I know some people would even threaten you if they were to see those quirky lambdas in your driver code. But today I want to introduce you to a type of technique which most people called black magic, in kernel programming. Say black magic but it is nothing new at all, in fact there are a lot of folks at kernel developing communities have already implemented this in their production code, including myself.

Disclaimer: Readers should take every word here with nothing more than a grain of salt, given the inexperience of the author.

The problem

Locking is quite easy to understand, yet pretty hard to use it correctly given of how many types of locking mechanisms are on the market such as primitive/queue/executive spin locks, push lock, mutex, semaphore and so on. Each of these locking mechanisms, especially those that support shared lock, usually requires user to call a pair of respective lock/unlock functions; not to mention the IRQL, overhead, deadlocks. Let us take a look at a reader/writer push lock as an example:

EX_PUSH_LOCK gGlobalLock;

VOID Initializer() {
    // ... works ...
}

NTSTATUS SomeFunction() {
    BOOLEAN critical = FALSE:
    ExInitializePushLock(&gGLobalLock);

	// ... works ...
	ExAcquirePushLockShared(&gGlobalLock);
    // ... do something with resource ...
    if (critical)
        goto Cleanup;  // error prone code
    ExReleasePushLockShared(&gGlobalLock); // what if the thread preemtively exited before we could call this?

    Cleanup:
        return status;
}

If the handler code was messy enough, one could forget whether the lock was shared or exclusive; or even mistake for different kind of locking mechanism since they were using C complier which has a weak type checking and all of the locks are referenced by their pointers.

And by chance did you forget anything? That is right, the IRQL. Did you call ExAcquireSpinLockAtDpcLevel to trade off for a little performance and facing a god-knows-why BSOD for hours because you previously called the wrong API ExReleaseSpinlock instead of ExReleaseSpinlockAtDpcLevel? Did you release all the push locks for those linked list traversal functions that not even your code to begin with? The list goes on and on, these errors are hard to be detected as compiler will not give you any warnings for these. Here is a table showcasing respective acquire/release routine for some locking mechanisms which I often encounter:

Acquire	Release	IRQL
KeAcquireSpinLockAtDpcLevel	KeReleaseSpinLockFromDpcLevel	IRQL <= APC_LEVEL
ExAcquireSpinLockShared	ExReleaseSpinLockShared	IRQL <= DISPATCH_LEVEL
ExAcquireSpinLockSharedAtDpcLevel	ExReleaseSpinLockSharedFromDpcLevel	IRQL >= DISPATCH_LEVEL
ExAcquirePushLockExclusive (must call KeEnterCriticalRegion preemptively)	ExReleasePushLockExclusive (must call KeLeaveCriticalRegion subsequently)	Any
ExAcquirePushLockShared (must call KeEnterCriticalRegion preemptively)	ExReleasePushLockShared (must call KeLeaveCriticalRegion subsequently)	Any
ExAcquireFastMutex	ExReleaseFastMutex	IRQL <= APC_LEVEL

The bigger your project is, the more likely you will make mistake. There is one of many solutions for this, including RAII which we are going to delve into shortly.

RAII

Resource acquisition is initialization, or RAII, is basically a term to describe an object behavior to initialize and un-initialize its resources automatically by calling the constructor and destructor, respectively. If you do OOP in any programming language that supports, you are likely to be familiar to this already. So does this even matter to kernel space, when we do not have the luxury of OOP? You are right… and wrong at the same time if you were to think it that way. It is correct that kernel cannot handle OOP but to and extend.

Believe it or not, I have frequently implemented RAII patterns for almost every kernel driver projects but mostly WDM and MiniFs; albeit I cannot share my entire source code here but to put it simply:

typedef union _SMART_LOCK {
    ULONG_PTR Type;
    PEX_SPIN_LOCK Spin;
    PEX_PUSH_LOCK Push;
}SMART_LOCK, *PSMART_LOCK;
class SmartLock {
public:
    explicit SmartLock(PSMART_LOCK Lock);
    SmartLock() : m_Lock(nullptr), m_IsLocked(false), m_Exclusive(false), m_OldIrql(0);
    ~SmartLock() { Release(); };
    
    BOOLEAN Acquire(PSMART_LOCK NewLock);
    VOID LockForRead();
    VOID LockForWrite();
    VOID Release();
    
    SmartLock(const SmartLock&) = delete;
	SmartLock(const SmartLock&&) = delete;
	SmartLock& operator=(const SmartLock&) = delete;
	SmartLock& operator=(const SmartLock&&) = delete;
private:
   PSMART_LOCK m_Lock;
   	BOOLEAN m_IsLocked;
	BOOLEAN m_Exclusive;
	KIRQL m_OldIrql;
    
	BOOLEAN lockable() const;
	BOOLEAN SetLock(BOOLEAN Mode);
}

The Acquire method is kind of simple, check for the initialization bit whether it is set or not, and acquire the lock. Observe the following pseudo:

BOOLEAN SmartLock::Acquire(PSMART_LOCK_TYPE Lock)
{
	if (!BitTest((const LONG*)&m_IsLocked, 0)) {
		m_Lock = Lock->Type;
		return TRUE;
	}
	m_Lock = nullptr;
	return FALSE;
}

Below is the snippet for LockForRead, and the same goes for LockForWrite, you just have to change the functions to ExAcquireSpinLockExclusiveAtDpcLevel, ExAcquireSpinLockExclusive or something like that. Inside these method, you can also implement a IRQL check to ensure which is the correct API to call.

VOID SmartLock::LockForRead()
{
	if (lockable()) {
		SetLock(TRUE);
		m_Exclusive = FALSE;
		if ((m_OldIrql = KeGetCurrentIrql()) >= DISPATCH_LEVEL) {
			// your favorite Dpc locking routine here
		}
		else {
			// your favorite below Dpc locking routine here
		}
	}
}

VOID SmartLock::LockForWrite()
{
	if (lockable()) {
		SetLock(TRUE);
		m_Exclusive = TRUE;
		if ((m_OldIrql = KeGetCurrentIrql()) >= DISPATCH_LEVEL) {
			// your favorite Dpc locking routine here
		}
		else {
			// your favorite below Dpc locking routine here
		}
	}
}

And Release, we use spin lock as demonstration for the sake of brevity:

VOID SmartLock::Release() 
{
	if (BitTest((const LONG*)&m_IsLocked, 0)) { // prevents double-release
		if (m_Exclusive) {
			if (KeGetCurrentIrql() >= DISPATCH_LEVEL) {
				ExReleaseSpinLockExclusiveFromDpcLevel(m_Lock);
			}
			else {
				ExReleaseSpinLockExclusive(m_Lock, m_OldIrql);
			}
		}
		else {
			if (KeGetCurrentIrql() >= DISPATCH_LEVEL) {
				ExReleaseSpinLockSharedFromDpcLevel(m_Lock);
			}
			else {
				ExReleaseSpinLockShared(m_Lock, m_OldIrql);
			}
		}
		SetLock(FALSE);
	}
}

When you want to acquire lock for your resource, it will be somewhat as:

EX_SMART_LOCK gMyLinkedListLock;
PLIST_ENTRY gMyLinkedListHead;
typedef struct _MY_LINKED_LIST {
	union {
		ULONG Flags;
		struct {
			ULONG Reserved : 32;
		}u;
	};
	LIST_ENTRY InListLinks;
	ULONG SomeInfo;
}MY_LINKED_LIST

NTSTATUS Handler() 
{
	SmartLock locker(&gMyLinkedListLock);
	locker.LockForRead();
	for(PLIST_ENTRY pEntry = gMyLinkedListHead->Flink;
		pEntry = &gMyLinkedListHead;
		pEntry = pEntry->Flink) {
		MY_LINKED_LIST* entry = CONTAINING_RECORD(pEntry, MY_LINKED_LIST, InListLinks);
		// do something
	}
    locker.Release();
    
    locker.LockForWrite();
    // ...
    
	
Cleanup:
	return STATUS_SUCCESS; // the lock would be released automatically
}

The code is not in its complete version but you can get the idea and it is also not that difficult to implement this into yourself if you already have some fundamental knowledge of OOP.