In a previous post, I tried to provide a general overview of the work required to implement a full-featured GUI toolkit. Spoiler: there’s quite a bit of it.
One of the major strengths of Rust is how easy it is to build and share components, using cargo and crates.io. In this post, I would like to try and enumerate several of the subprojects involved in building a desktop GUI framework: these projects can be thought of as infrastructure, and ideally they can live as independent crates shared by various GUI projects.
Windowing and the event loop
This is the heart of things: the code that allows the application developer to interact with the window server and the underlying desktop environment. We will call this our core platform abstraction layer.
One of the main challenges here is determining your scope. What should be part of this abstraction, and what should be separate? The dividing lines are not always clear.
Let’s start with those areas that clearly belong to this layer. In vary broad strokes:
- window management: this layer is responsible for creating and interacting with windows. This means configuring window properties, hiding and maximizing windows, displaying child/modal windows, and providing access to some platform object that can be used to create a drawing environment.
- application/run loop management: closely related, since windows need to live on a runloop: this abstraction needs to allow starting up runloop that connects to the window server, receives input events, and delivers them to the application.
- mouse/pointer/keyboard/other hardware input: these are delivered by the platform’s window manager to particular windows, and need to be handled here.
- monitor/display & HiDPI information: in order to position windows, we need to know about the connected displays, their relative positions, and their resolution.
- IME and platform text input: this is closely intertwined with keyboard input, and would be very hard to implement without also being able to control delivery of keyboard events.
Additionally, there are a number of features that probably or maybe belong here, but which could conceivably live elsewhere
- window/application menus: this is tricky: on windows the menubar is closely related to the window, and menu events are delivered to the main runloop, so integration there makes sense. On other platforms it makes less sense: on macOS we can make it work, but on linux things are strange. Here we’re going to be responsible for drawing menus ourselves, and that starts to push against the boundary of our responsibilities.
- copy/paste and drag-and-drop: these are related; both often use similar API to describe arbitrary data and data formats. Drag-and-drop is also closely related to window management: on macOS you specify drag regions as special subviews in your window, and on windows drag-and-drop events are delivered to your window’s WNDPROC routine. Copy and paste generally doesn’t need to involve any windowing-specific code, but it might want to share code and types with drag-and-drop, so you might include it.
- special modal windows: alerts/messages, file open/save: there are normally special API for these types of windows, and it may make sense to take advantage of that. This has similar problems as with menus, above: it exists on some platforms, and not on others.
There are various other possible inclusions as well, such as API for setting the mouse cursor image, or starting timers, or providing support for connecting to windows that are created and managed elsewhere, something that is important for cases like VST plugins.
If these features are not included, then there needs to be some mechanism for them to be implemented externally, either as extensions or as some additional layer. This means that the abstraction layer needs to be designed with this extensibility in mind, which is an additional major challenge.
The main project attempting to fill this role is winit. It takes a fairly minimal approach, leaving things like menus up to the consumer, but sees active development and is used as a target for a large number of Rust GUI experiments.
For Druid, we chose not to use winit, instead writing druid-shell. druid-shell tries to do more, and is more opinionated. Not using winit has disadvantages, but it allows us the freedom to break API and experiment. My hope would be that work from druid-shell might eventually make its way into winit.
There is one major problem with druid-shell that I hope to address soon, however, and that is that it is tightly coupled to piet, our 2D drawing abstraction. It is clear to me now that there should be a strong division between window management and painting.
There are some major features still missing at this layer, across the ecosystem. The poor story around IME/input methods is perhaps the biggest one, along with missing support for modal/popup windows. In addition there may be room for a crate modeled on the w3c PointerEvent API, for unifying representations of mouse and touch events, with the ability to map native platform events into some common representation.
Once you have a window, you need some way of drawing its contents. As touched on briefly above, the core abstraction layer should be agnostic to how drawing happens: the same core code should allow someone to create and interact with a window that is drawing it’s content using the platform’s 2D drawing API, a 3D API like Vulkan or Metal, or some higher-level abstraction over these such as piet or wgpu.
In any case, you have lots of reasonable options for drawing, and I have previously enumerated many of them.
Although related to painting, it is worth considering text separately, since its implementation can be shared between painting implementations. By “text” we mean a number of related sub-problems, which together allow you to draw and manipulate text.
- font enumeration and loading: The first major task is finding out what fonts are available on the system, and exposing an API for finding fonts based on family name, PostScript name, or other metadata. This means loading and reading those files. It also means being able to load additional font files that may be bundled with an application or loaded dynamically, such as over the network.
- Reading glyphs from font files, including metrics and glyph information like advances and origins as well as the glyph outlines themselves, with or without hinting.
- font fallback: When a given string is being laid out in a given font, the font may not include glyphs for all of the characters in the string. In this case you need to fall back to some other font, which does include those characters.
- shaping: This is the process of turning a sequence of characters into a sequence of glyphs + positions.
- layout: Once you have the ability to convert strings into sequences of glyphs, you can start composing those sequences into larger blocks. The simplest version of this is breaking lines so that text fits inside some rectangle, aligned to some edge (or centered). Additionally it may include justification (adjusting the drawing of glyphs to better fill space on a line) or wrapping around arbitrary paths, or even truncation and elision of text that does not fit some provided space. This also includes exposing layout metrics to other parts of the text system, such as the bounding box to be used when drawing a selection rect, or the position of an underline or strikethrough.
- rasterization: Once you have a sequence of glyphs, you have to take the glyphs (either ‘outline glyphs’, which are quadratic or cubic bezier paths, or ‘image glyphs’ (used, e.g. for emoji) that may be bitmaps or SVGs) and convert them to pixels at a given font size.
- rich text: Text does not need to be all the same font, or all the same size, or all the same color. This also affects layout, because if different fonts or different sizes of the same font occupy the same line, you need to account for that when calculating line spacing.
- editing: In addition to drawing text, a GUI toolkit needs to support text editing. This is its own gigantic topic, but at the lowest level this means being able to map from points in “pixel space” to offsets within the underlying string, which in turn means being able to handle things like unicode word and grapheme segmentation.
- BiDi Many scripts are written left-to-right, but several major ones are written right-to-left. When text in one direction is contained in text in the other direction (such as when a french quotation is included in an Arabic paragraph) the text is said to be “bidirectional”. This complicates layout.
We’re getting there. For enumeration, there is currently font-kit and fontdb. For reading font files, there are a number of projects such as ttf-parser and fonttools-rs. Rasterization is included in many of these projects, including font-kit, as well as in some more specialized, standalone projects like fontdue and pathfinder. For shaping, things are still fairly rough (a shaper is a huge project) but there are rust bindings for HarfBuzz (harfbuzz_rs) as well as a reimplementation in rustybuzz, and two ambitious “green field” projects, Allsorts and swash. The former may not be appropriate for GUI use (it is designed for laying out documents, which is less performance sensitive than layout for a GUI) but the latter (if we project the current pace of work forward) is certainly interesting: In addition to shaping, it aims to also provide cross platform font enumeration and access, and there is a nascent companion project to do paragraph layout. If these projects mature, they could form the basis of an all-Rust text stack.
Text layout is one major missing component. There is no compelling general purpose story, here. Doing this well is complicated, and requires being able to work closely with the shaper. Especially if layout is doing hyphenization, the layout engine needs to be locale-aware, in order to use the correct word segmentation logic for a given locale. The story with text editing is similar. Although we have written a fairly complete text editor, we have not (yet) turned this into a reusable component.
A final related problem is around string types. The
String type in
std is not necessary a good fit for GUI use: it always allocates, even
for small strings, and always reallocates on clone. Strings in a GUI may be
frequently copied and shared. As an example, the contents of a text editing view
will have one “canonical” copy owned by the application, but then that text will
also need to be accessed by the text layout and shaping code, and will
additionally also be cached by the platform’s IME system. This last case is hard
to avoid, but in other cases it is nice if multiple references to the same text
can reuse the same underlying storage. This is another possibly useful
crate: a string type designed for GUI use, with copy-on-write semantics,
small string optimization, efficient editing operations, and good support
for operations like navigating by word/grapheme boundaries.
Hugely, tragically, saddeningly under emphasized. None of the current Rust GUI projects that I’m aware of have any accessibility support, nor a plan to add it.
Accessibility is a big project, and this is an area where, ideally, an ecosystem crate could exist that would be reused by many projects. In this direction, I am aware of one ray of light, in the form of AccessKit. This crate aims to provide an abstraction over the platform accessibility APIs that is independent of any particular framework. In essence it would maintain an alternate view tree of accessibility attributes that the framework would update in response to user actions. Druid is participating in the project’s design discussions, and we will be adopting it as soon as that is feasible. Realistically though this is a big project, and I do not expect it to be feature complete for some time.
AccessKit is our only current hope here, but it is an ambitious project in the very early stages of development. I’ve had several conversations with its author, Matt Campbell, who I think has both the experience and the motivation to do this work, but it will certainly be challenging. Accessibility is non-negotiable, and there is no plan B. As a community we should be trying to support this project however we’re able.
Localization and internationalization
Another extremely important area that is often overlooked.
For the localization of text, the necessary infrastructure is in place. By this I am thinking of Mozilla’s Fluent project, a modern localization system with an official Rust implementation. Fluent is a set of low-level tools for parsing locales, loading and parsing localization resources, and finally generating localized strings.
Fluent is not particularly ergonomic to use directly, but that is not a major
problem. It is intended as a set of foundational components, prioritizing
performance and flexibility, and there is room for some higher level
abstractions built on top of it that are more focused on ergonomics. One
potential example of this is the
LocalizedString type in
Druid, although it is tied closely to Druid’s data model.
For localization of other assets (such as images) there is no existing story that I know of; this is closely related to asset bundling and access, which we’ll talk about later.
For internationalization more generally (i.e. not just of bundled strings) there is another exciting recent development, in the form of ICU4X, which is a new Rust implementation of the Internationalization Components of Unicode. This should eventually be able to serve as the basis for things such as locale-aware date/time/currency/number formatting, collation, and the various other locale-specific considerations covered by the CLDR.
The foundations are coming together, and (I’m noticing a pattern here) the main remaining project is developing nice ergonomic APIs on top of those foundations.
Packaging and distribution
Once you’ve written an application, you need to be able to distribute it, which means creating the appropriate package or installer for the platforms you are targeting. There are several important steps here:
Manifests and required assets
To distribute a GUI application you generally need to generate some sort of manifest file, which provides information about your application to the system. You may also need to provide certain localization resources, such as localizations of your application’s name, in some format understood by the host platform. You then also need to include the appropriate icons, and any other required resources.
Along with the resources required by the system, your application will generally also include custom resources of various kinds, such as your strings files (localized versions of the strings used in your application) as well as any custom images, sounds, or other assets.
During development, you can access these directly, but in production that is trickier. Different platforms may have different sandboxing rules or different conventions around where application assets are stored, and so you generally won’t be able to just access some path relative to your executable and find the files you need.
It is likely that this is another area where some single crate could be used across the larger ecosystem. This crate would abstract over the particulars of storage on a given platform, and instead would let the user query and retrieve files by name or type or other attributes. It could also handle things like caching, managing scaled (HiDPI) versions of image assets, asset localization, and similar details.
Code signing and permissions
Finally, certain platforms may require applications to register for permission to use certain API (such as for accessing the microphone, or accessing various parts of the file system) and additionally platforms may require applications to be signed with some sort of private key or developer certificate before they can be run.
The tugger project(s) appears to be attempting to serve this role, or most of it, and seems promising/ambitious, although work is still ongoing. There is also cargo-bundle, which is focused more narrowly on the creation of packages and installers, and has no code signing or resource management (which is a totally reasonable design choice).
In both cases, it feels like there is still a lot of work to do. It doesn’t feel like that work is clearly enumerated, but there is at least forward motion.
I’m sure there are other useful crates or potential crates I have omitted: for instance it would be nice to have some persistent and efficient key-value store for things like user preferences or persistent application state. I believe, however, that I’ve covered most of the larger foundational items. Without these, “serious” Rust GUI applications will continue to be out of reach.
We are, perhaps, closer than I had expected us to be. That is not to say that we’re exactly close.
Coordination (a new Working Group?)
There have been various attempts in the past to start a GUI working group, but they haven’t tended to last very long. A major part of this problem is that GUI means different things to different people, and trying to collaborate on such a wide range of topics and projects is difficult.
In writing this post, though, I’ve come to wonder whether we might not be ready for a more narrowly scoped working group, specifically to discuss and coordinate work on these sorts of shared infrastructure components. At the very least, it would be good to have more open communication channels between some of the major GUI projects.
As always, I’m certain to have left out some important work. Please feel free to let me know of any glaring omissions.