Thread safety in software libraries on the example of Anjay

Why is multithreading used?

In modern software engineering, it is rare to see an application written to run as a single, monolithic sequence of operations. Multiple concurrent tasks are the norm – these may take the form of processes (multiprocessing) or, more commonly, threads (multithreading). Developing multiprocessing or multithreaded applications is essential for server, desktop, and mobile computing, where multicore CPUs become the norm, so a single thread cannot possibly utilize the entire available processing power.

In embedded systems, single-core microcontrollers are still common, but organizing the application into multiple threads still has advantages. In a single-threaded application, an elaborate event loop would be necessary, polling on all possible types of input device at once – for example, waiting for button input and network packets would need to be realized in the same routine. In a multithreaded application, each type of IO can be handled in a separate routine and the thread scheduler – usually implemented as part of the real-time operating system (RTOS) on embedded systems – will handle the actual polling and switching between threads as necessary. Multithreaded applications also usually feel more responsive, as unrelated tasks may happen concurrently, i.e., even if the network management code is blocked, waiting on incoming packets, the user interface can run uninterrupted. Time-sensitive tasks (e.g., sending keep-alive packets) can also run at their requested time slots, even if other tasks are blocked on other operations.

However, having multiple threads running concurrently on the same machine requires care when information is exchanged or shared between threads. For example, when a separate thread is dedicated to handling the network interface (e.g., Ethernet, WiFi, or cellular) and another thread is implementing the actual business logic, if the task switch happens at the wrong time, or if the tasks actually run in parallel on separate CPU cores, it may happen that the business logic “sees” a network packet that has only partially been written to memory, with part of it missing, or actually containing data from another, unrelated packet. This may lead to erroneous behavior. When similar problems affect, e.g., configuration data or internal state that is assumed to be invariantly valid, the application becomes very prone to crashes.

Problems related to illegal concurrent accesses to the same data, often called “race conditions”, can become even more confusing and hard to debug on larger systems, with different hardware paths to access the same memory (NUMA), multiple CPU cores with separate local caches that need to be kept in sync, or even different types of CPU cores (heterogeneous computing; often employed in a “high-performance cores + energy-efficient cores” configuration on mobile devices) taking turns in running the same code.

Thread safety and synchronization primitives

To deal with this problem, several techniques and data structures, called synchronization primitives, have been invented and developed that allow sharing information between threads in a safe way.

Some of the most common of these are:

Mutex – short for “mutual exclusion” – is an object that allows only a single thread to “lock” it at the same time. To solve the example problem described above, a mutex could be used to guard accesses to the memory where network packets are stored. The network interface thread would need to lock the mutex to write the data that came from the network. Similarly, the business logic thread would need to lock the mutex to read it. When a thread attempts to lock a mutex that is already locked by another thread, it will be blocked until the existing lock is released. That way, it is guaranteed that inconsistent information will never be read.
Semaphore is a counter that can be incremented (“signaled”, “released”) and decremented (“waited”, “acquired”) atomically, and an attempt to decrement a semaphore that already has a value of 0 will wait until another thread increments it. It can be thought of as a generalization of a mutex - when the initial and maximum value is 1, the semaphore can be used as a mutex, with the “wait” and “signal” operations acting as “lock” and “unlock”, respectively. Higher semaphore counter values can be useful when guarding other types of resources, e.g., ensuring a maximum concurrent user limit.
Read-write lock is another variant of the mutex that allows any number of “readers” to access the same resource concurrently, but a “writer” lock is exclusive – it ensures that no other readers or writers access the same resource.
Condition variable is an object that allows one or more threads to wait until another thread performs a “notify” operation. This is often used to wake up threads that handle various external events – in the example problem from above, the network interface thread may notify a condition variable to wake up the business logic thread, informing it that a new packet is available. Condition variables usually work together with mutexes – the “wait” operation is designed to be called when a mutex is locked by the calling thread, and the mutex is implicitly unlocked atomically for the duration of the wait, which means that it is locked back again before the control flow is returned from the “wait” operation, when the thread is notified. When a mutex and a condition variable are tightly coupled together into a single structure that exposes operations of both, that structure is also called a monitor.
Various types of message queues are used to pass data between threads without explicit locking – messages are added and retrieved as atomic operations. These can be very effective, although, depending on the specific design, they may enforce making some otherwise unnecessary copies of the passed data.

These synchronization primitives are typically provided by the operating system or the standard library of high-level programming languages – for example, mutexes are available on various platforms as:

pthread_mutex_t (Unix-like operating systems)

CreateMutex and related APIs (Win32)

SemaphoreHandle_t (semaphore type that can be used as a mutex – FreeRTOS and derived systems)

k_mutex (Zephyr)

osMutexDef_t (CMSIS RTOS abstraction layer for ARM-based microcontrollers)

std::mutex (C++11)

rtos::Mutex (Mbed OS)

synchronized keyword (Java – every non-primitive object has an implicit monitor)

threading.Lock (Python)

Using the synchronization primitives properly in each case where data is shared or passed between threads is strictly necessary when writing multithreaded applications. Failing to do so will result in the application behaving erratically. However, it should be noted that thread synchronization by itself introduces a whole new category of possible bugs, such as “deadlocks” – infinite freezes of some or all threads – which may occur e.g., when two threads attempt to lock mutexes already held by the other.

When the code contains all the synchronization necessary to be safely usable with multiple threads running, this is called thread safety. A function or an object is said to be “thread-safe” when it can be called or used from multiple threads without explicit synchronization at the place of usage – usually, it means that all the necessary synchronization is encapsulated inside its implementation.

Thread safety in software libraries

Both “thread-safe” objects and functions that perform all the locking internally, and requiring explicit locking at the call site if necessary, have their own advantages and disadvantages.

Thread-safe objects are generally easier to use and debug. User code generally doesn’t need to worry about synchronization, which makes it less prone to errors. The actual implementation of a thread-safe object also contains all the synchronization code in it, which in general makes it easier to spot possible problems.

On the other hand, requiring synchronization at the call site generally yields better performance and smaller executable code. Locking and unlocking mutexes takes CPU time, even if there are no race conditions to prevent. When an object is known to be used only within a single thread, locking can be forgone, making the code faster and smaller. Similarly, when multiple sequential operations are performed on the same object, it can be put between a single pair of lock and unlock operations instead of locking on every function call. This has an additional advantage in that the calling code does not need to handle the possibility of another thread modifying the data between calls, which would otherwise be possible.

When both the called code and the call site reside within a single codebase, the developer can decide on which paradigm to use each time, with the possibility to take the “big picture” into consideration. This is not always true for library code, which is written once to be used in multiple different applications with greatly varying requirements, possibly by multiple different developers on different levels of proficiency.

Historically, many libraries did not even take thread safety into consideration. For example, parts of the standard ISO C library, such as the strtok() function, are inherently unsafe to use in multithreaded applications because they use global variables to persist state between calls. No locking in the implementation could solve the problem, and ensuring thread safety at the call site would be very difficult, especially considering that the other library code may call the same function. Reentrant versions of these APIs (e.g., strtok_r()) are a common extension, provided as standard on Unix-like systems, among others. These allow a user-provided variable to be used instead of a global one – however, if this state is to be shared between multiple threads, the user is still responsible for synchronizing such accesses.

There is no single standard on whether library code should be universally thread-safe or leave all the synchronization to the user. Library authors often decide that on a case-by-case basis, depending on the intended use cases for given APIs. For example, in standard libraries of many languages, IO objects (for accessing files, network sockets, etc.) are generally thread-safe, while collections (lists, sets, maps) generally are not. However, it is not uncommon for there to be a way to override this behavior. For example, many Unix-like systems provide “unlocked” variants of the standard C IO APIs (e.g., getchar_unlocked()) that do not lock the internal mutex. Conversely, some languages provide thread-safe wrappers for the built-in collection types (e.g. Java’s Collections.synchronizedList()). Sometimes the same functionality is provided twice in both thread-safe and thread-unsafe variants (e.g., StringBuffer and StringBuilder in Java).

Thread safety in Anjay

Since the beginning, Anjay has been designed with the intention that a single Anjay instance will generally be used as a single thread – so the Anjay APIs were not thread-safe by design. However, we never ruled out the possibility that multiple Anjay instances may coexist within a single process. For this reason, we avoided using global state – and in places that it is used (e.g. random number generator for the TLS backend), accesses to that have always been properly synchronized.

However, in a couple of our own projects, for example, in Anjay-zephyr-client, we needed to synchronize accesses to some Anjay-related objects. This is because we decided to run periodic updates of the data model in a separate thread, and the same data model could be accessed by the main LwM2M thread. This is realized by locking a mutex at the call site.

However, realizing that the event loop necessary to use Anjay may be difficult to implement, we wanted to introduce an API that would make it easier to get an LwM2M client up and running quicker – that would already contain all the necessary event loop logic, intended to be running in a dedicated thread. However, that necessitated an ability to interact with this thread in some way, e.g., to allow fatal error handling or data model object manipulation. This required the entire public API of Anjay to be thread-safe.

For this reason, we introduced optional thread safety, starting with Anjay 2.13. By default, it is disabled, and Anjay functions just like previous versions.

However, thread safety may be enabled at compile time by turning on the WITH_THREAD_SAFETY CMake option, for example:

cmake -DWITH_THREAD_SAFETY=ON .

This makes all public Anjay APIs automatically lock a mutex on entry and unlock it on exit. For compatibility reasons, the mutex is also unlocked before calling any of the user-provided callback functions, e.g., the data model handlers, and locked back immediately after exiting from them.

The public API is not affected by this change in any way – it is safe to enable it without any modifications to existing code. However, if you utilize multiple threads, it becomes safe to call Anjay APIs on an Anjay object being used in another thread without explicit synchronization.

However, please bear in mind that accesses to e.g., user-provided data model objects are not synchronized, and you still need to guard accesses to them yourself.

Summary

Of course, this article barely scratches the surface of the large topic that is multithreading, concurrency, and thread safety. Becoming proficient with writing concurrent code and dealing with thread safety is an absolute must for software developers of today. With modern software relying so heavily on reusing existing libraries as much as possible, it is essential to understand how the external code you are using interacts with concurrency. Here at AVSystem, we understand that and hope that the latest release of Anjay will help you ensure that your applications will always run smoothly and correctly.

Author:

Mateusz Kwiatkowski
Software Engineer

Share this article: