Thanks to jbauman I realized a fundamental flaw in my idea of what a "permission" is (perm_cap_t). So this is about to undergo a major overhaul. OOPS!
-
This is the kernel "spec". In reality this is the only piece that I'm designing here. That is the interaction between userland and the kernel.
Kernel API -
This is a draft of how I plan to implement the API. Primarilly this is critical as part of the design process, to ensure the API is implementable in a reasonable fashion. This may also serve as an exremely poor disambiguator for the API.
Kernel Implementation -
This is a discussion/explanation of how the API may be used to implement a working userland with all of the abstractions/protection and whatnot. Similar to the kernel implementation this ensures that the API is actually a useful set of abstractions.
Userland Implementation
Basis for the system
Why a microkernel?
There are many advantages to a microkernel. Properly implemented a microkernel requires far less "trusted" code. There are many many discussions on this topic out there, so I won't go into it too deeply. A tiny microkernel (unlike mach) also attempts to act as a minimal shim to the hardware. This has some additional advantages of simplicity, easy of implementation, speed, and opportunities for userland optomization WITHOUT mucking with the kernel.
A monolithic kernel can ALWAYS beat a microkernel at any given task, simply because not crossing the security boundary is more efficient. The problem is that as we do this over and over again the kernel gets more and more bloated until it barely functions. Sure it's fast, but we can't touch it to improve it, and we don't trust it to work any more. Then we try and shim security into this already broken system making it even more broken, and eventually get something even slower like Windows Vista. In todays world we can afford the tiny speed hit of that kernel boundary, especially if it saves us the enourmous speed hit of tons and tons of other security features, kernel extensions and similar fragile and partial solutions. The idea is do it right, and do it right once. Given minimal abstractions the kernel becomes nothing more than a security layer, making writing a secure OS under it much much simpler, and costing little in speed for the privledge.
What makes this different from others?
There have been numerous attempts at microkernels and exokernels. L4, V, symbian, QNX, ameoba, mach, XOK, etc. Which have met with varying degrees of success. There are many reasons for this. Many people blame speed, but with systems like Windows XP/Vista and Linux being the most popular OS's around that argument doesn't hold up.
The two newest and probably most interesting players in the microkernel world are L4 and Xen. Yes, Xen. A hypervisor is fundamentally a microkernel where the kernel API is the system API, and there is no IPC. Xen is actually even closer to a microkernel since "processes" running under it are often aware of that fact and use a special API to talk to it.
L4 is the continuation of both the micro and exo ideals. They're goal is to build a minimal non-portable kernel which has the absolute minimal set of abstractions to still run efficiently. They suffered a recent setback when Hurd discovered that L4's synchronous IPC mechanism was not sufficient for their security needs. I also feel that L4's in kernel IPC is more heavyweight than necessary, and that the scheduling system doesn't take full advantage of now ubiquitous SMP systems.
My kernel spec is heavilly inspired by the L4 ideal though. But here's a quick summery of what it actually has.
- In kernel asynchronous 1-N and N-1 "messaging" (no data, just a "kick")
- Stackless upcall based scheduler activation threading model
- Userland control of VM
- Kernel supported opaque capabilities
- Lack of ANY addressing mechanism besides capabilities
- userland interrupt handling
- untrusted userland device drivers
- lack of any in-kernel device drivers
- portable API (and I expect code)
- multi-call shimable kernel API
Asynchronous messaging + capabilities gives us a highly secure model. When combined with userland interrupt handling we can write untrusted userland device drivers, something Eros and L4 both cannot do. Since device are always changing this lets us run crappy drivers for crappy hardware (like say, wireless cards), without endangering the rest of the system. Userland control of VM is the basis for the entire system similar to L4, and is also the basis for the messaging system allowing users to optomize their messaging however they want, and keeping us away from the traps of Mach. Unlike L4 I do not expect the kernel to be re-written for each architecture. Rather it should be portable with minimal hardware shims at the bottom. I also hope to make writing portable userland code reasonably feasable. The kernel API acts exactly like the API to another process, combined with opaque capabilities this means that the system can trivially be shimmed with an extra layer for debugging or some other special requirement. As a bonus the API supports multiple calls in a single trip to kernel, saving overhead and making for a simple yet fast API.