On writing your own RTOS

This is a consideration of issues associated with designing your own RTOS. It’s a follow-on from the RTOS post I wrote a little while ago. It isn’t a practical project, although you could make it so, if you choose. If you do, please let me know how you get on. I don’t know your needs, so I will use my own, as they were when I was a working designer, when I need examples. I may also refer to our in-house RTOS, which I will refer to as PAX. The -X implies that, like many OSs, it is based on Unix. In this case, a very tiny Unix. It was originally written for an 8031 platform, to support task-oriented architectures.

Every working firmware application contains the core functionality of an RTOS: hardware initialisation, peripheral device drivers, a kernel, (software) timers, tasks and a scheduler. The latter might not be immediately obvious, but even in the all-in-one applications we wrote in the 80s, there are methods that execute repetitively during normal operation. They are the tasks, and the mechanism by which they are executed is the scheduler.

Given the title of this post, we’ve already decided our applications do not need the facilities a commercial RTOS provides. After all, if there’s already an existing solution that properly meets our needs, we’d use it, wouldn’t we? It would almost certainly be cheaper. But our applications run on platforms with limited resources. We have less than 64k of RAM to play with, maybe as little as 256 bytes, in some cases. So we need minimum RTOS functionality, which we will design ourselves.

Hardware initialisation code and peripheral device drivers are usually provided free by our microcontroller or build-tool vendor. Our only concern is the API. Will a future platform, or a different micro-controller, make it difficult to maintain a consistent API for our existing and future applications? If so, do we need to provide wrapper ‘classes’, making a consistent API easier to maintain, even though every OS call will have to execute our wrapper code too? We can do as I did, and refactor the provided driver code to suit our preferred API. But when the vendor updates their code, we will probably need to shadow these updates in our own code….

Our RTOS will provide tasks, but will our applications be task-based? PAX was created to support apps that employed task-centric design. But I designed applications based on OO principles, coded in C. Each class occupied one .c file, with its public API defined in the .h file. My classes each owned zero or more tasks, and used them as necessary. So I used tasks, as attributes, and my applications were class-based. It’s a philosophical point, but it does affect how tasks are designed into our RTOS. We’re designing our own RTOS, so we can orient it to suit the way we work, and to improve the way we work.

More important than this: my decision to use OO meant that the methods of one class must be able to call the methods of another class (instance) during normal operation. This is important when we come to consider task-switching.

Preemptive task-switching supports isolated tasks, as Windows supports the multiple applications (tasks) I’m running as I write this. It makes inter-task messaging difficult and inefficient. Such messaging is easily supported by PCs, but less so by our tiny microcontrollers. Co-operative task-switching supports co-operating and communicating tasks, and is better suited to (almost) all firmware applications. Without it, I could not have allowed direct communication between my classes, not knowing whether they had been task-switched while their internal states were in transition. So our RTOS will use co-operative task switching. Is that right for your needs too?

Co-operative task-switching means that each task runs until it terminates, before the next task can be scheduled and run. Its big flaw is obvious: if a task runs too long, or never terminates, the application will fail. In practice, I never found this to be an issue, but if you’re worried, you’ll need to use a hardware watchdog timer, restarted every time a task is started. But what will you do if the timer expires? Reboot, I suspect….

With tasks as attributes of object instances, they are just a way to activate and execute instance methods as required. They need no parameters. [The RTOS has no information as to what parameters such a task might need; the class/instance takes care of that via its state/class/instance variables.] This simplifies implementation, and why not? As simple as it can be, but no simpler, is our mantra, here and wherever else it applies. So our scheduler needs little apart from a list of function pointers to the task methods. Contrast this with PAX, which provided an extended parameter list, intended to support all kinds of different data being passed to tasks.

Tasks also need (optional) timed execution, but we’ll deal with that when we talk about timers. For completeness, we also need tasks to be executable on a round-robin basis.

In firmware, the sort that’s simple enough for us to consider writing our own RTOS, we don’t need dynamic memory. Our programs execute forever, in theory, so fragmentation, maybe leading to a crash, is a serious issue for us. It’s a risk we don’t need to take. Timers, private class variables, and the like, are created during initialisation, and exist for the lifetime of the program.

Talking of simplicity, our scheduler needs only to loop through all tasks, and execute those that are active. Timed tasks are checked to see if execution is due before they are run. It is inefficient to loop through all tasks, as some of them will be inactive. But the alternative is a separate active task list, which consumes memory, and takes processor cycles to traverse. Some time ago, I tried looping through all tasks, just to see if the inefficiency caused problems. I expected it to, but it didn’t. So I adopted this much simpler way of working. See if it works for your RTOS too.

Interrupts are an interesting topic in RTOS design. Ideally, we might work without interrupts, ensuring a single thread of execution: our application. But I never found a satisfactory way of doing this. However, I did settle on one simplification that I found worthwhile. I did not allow nested interrupts, thus guaranteeing no more than two threads: interrupts and the app. We still have to consider re-entrancy effects, but in their simplest form. And, with all interrupt routines hidden away in the (RTOS) driver classes that own the interrupting hardware, it is easy to control what methods can be called from interrupts, and take the necessary steps to ensure this doesn’t cause problems.

While we must have a kernel, it doesn’t do much. It’s the main() of our RTOS (and of our application, as we can only have one main()). It executes hardware initialisation methods, then RTOS initialisation, application initialisation, and finally the kernel executes the scheduler, which runs forever. That’s about it for the kernel.

The only core RTOS feature we haven’t yet covered is timers. They’re loads of fun! I have spent many happy hours thinking about timers, over the years, and I’ve implemented them many times too, to work in various environments. Timers can be active or passive. Active timers carry out a predefined function when they terminate. This is very handy for the designer, but it comes at a price. Each timer must be checked every system tick, to see if it has expired. Yes, we could implement linked-lists of active timers, listed in order of their expiry, but the overhead is still considerable when it is traversed every system tick.

Passive timers do nothing. We query them when we want to know if they’ve expired. If they have, we do what needs doing. Ah, you’re thinking, what’s the difference between a timer task querying them, to produce active timers, and the application querying them when it thinks it should? The application knows how long the timer is supposed to run for. Some timers have a watchdog function, and are never expected to expire. When we need active timers, we use a timed task. Overheads minimised; (necessary) functionality supplied; job done. Will this design decision suit your RTOS too?

I keep asking ‘will this suit you too?’, but that’s what design decisions are about. They suit one project, one application, one way of working, but not all. I’m trying to make these decisions visible, and also emphasising that my decisions might not be best for you.

Timers are used and queried often. They need to be efficient as well as providing all the features we need from them. [We’re firmware designers, after all, and processor cycles don’t grow on trees!] When we delve deep into timer implementation, we find rollover to be a problem. That is when the timer-counter rolls over from its maximum value to zero, and also if the system tick timer rolls over. The picky way we have to code this is horrible. But there is an alternative for some of us, if our build tool supports 64-bit integers.

Timers count system ticks, and the system supports a regular interrupt to update its count. Ticks are between 1 and 10 ms in most applications. I used the former. A 32-bit timer or tick-counter only runs for 49.7 days (1 ms ticks), and our applications run for much longer than that, so rollover will occur. But 42-bits will give you 140 years, which is infinite, for our purposes. Using a 64-bit integer for timers and tick-counters means we can discount rollover completely. Our oft-called timer query code has suddenly become much more efficient. Behind the scenes, it uses library functions to manipulate 64-bit integers on a 32-bit (or less) microcontroller, but these are usually written in (efficient) assembly language, and the overall improvements in simplicity are well worth having.

Timer state flags? We need one bit — one flag — to say whether the timer is running or not. So split the 64-bit integer as you see fit, as long as the timer count is greater than 41 bits, and the flag space is one bit or more. But does a timer store its start time, its expiry time, or both? And should it store its duration too, so that it can easily be restarted? To store start and end times as well as duration is 24 bytes per timer, which is roughly 2400 bytes. [Typical applications use around 100 timers, in my experience.] This is a lot of RAM to most firmware designers, so we have to think carefully about what we really need to store. My most recent implementation stored only the expiry tick-count and a 1-bit running/stopped flag, relying on the application to know the duration if restart is needed. This amounts to about 800 bytes per application. This is still significant, but manageable in most cases. Tiny applications need fewer timers, decreasing the overhead.

Our timer API is also important. What do we, timer users, need? I suggest stop, start, restart and status (query). [Start is when we start the timer from now; restart is when we start it from the time it expired. We need the latter for repetitive timers, so that they don’t drift if we are a little late processing their expiry.] The only wrinkle we might decide to implement is in the status method. Mine returns RUNNING if the timer is active, STOP if it isn’t, and EXPIRED if this is the first query since expiry happened. The latter is very useful, and easier to code than it sounds: if the timer count is less than the system tick count [i.e. timer has expired], but the flag is set to RUNNING, reset the flag and return EXPIRED. That’s it.

And that about completes our tour of RTOS, and I hope it’s given you a good grounding in what’s involved. Maybe enough to go ahead and write your own? Please do, if that’s what you/team/applications need, and please let me know how you go on.

On writing your own RTOS