Show events.asc syntax highlighted
The EmStar Event System
=======================
The EmStar Team <emstar-design at cens dot ucla dot edu>
$Id: events.asc,v 1.4 2005/07/28 05:41:27 ndbusek Exp $
EmStar has been designed to build complex, reactive systems. We chose
an event-driven structure to handle concurrency. This page describes
some of the details on how to use this event system. Most of the
EmStar libraries are built to integrate with the event system,
enabling easy integration of different libraries.
A Word About Threads
--------------------
Before getting to the usage details, we should address the main
competitor to event-drive structures: threads. Many systems use
threads to manage concurrency. Threads bring two advantages:
1. Threads are truly concurrent; a slow acting thread will not
impede other independent threads from operating in a timely way.
2. Threads enable a more direct programming style, such as a
sequence of blocking calls, that in an event model would be a sequence
of chained callbacks.
However, there is a hidden cost to threads: synchronization and
deadlock avoidance. Because threads run concurrently, they must
negotiate use of shared data structures. For simple uses of threads,
that don't need to coordinate shared data structures, threads are
easier to use, but when inter-thread communication is needed, the
complexity skyrockets.
Reactivity, one of the key design requirements of EmStar, essentially
boils down to inter-thread communication as the common case. For
example, when a lengthy operation is undertaken, reactivity means that
if arriving data makes that operation irrelevant, it should be
interrupted. Implementing this correctly with threads is much more
difficult and error-prone than the corresponding event-based
implementation. For a more complete and lucid discussion of the
difficulties of threads, see
http://home.pacbell.net/ouster/threads.pdf[Ousterhout 96] and
http://www.faqs.org/docs/artu/ch07s03.html#id2923889[ESR 03].
The Big Picture of EmStar Design
--------------------------------
EmStar follows the UNIX design philosophy: combat complexity with many
small modules connected by clean interfaces. EmStar modules are
implemented in separate processes with separate process
spaces. Clients of EmStar modules use thin, lightweight client
libraries to connect to other modules. This structure provides fault
isolation between clients and servers, and simplifies the client
libraries, reducing the possibility that bugs or unforseen
interactions in library code cause problems for applications. More
information on the big picture can be found in
http://lecs.cs.ucla.edu/%7Egirod/papers/emstar-usenix04.pdf[Girod 03]
and http://lecs.cs.ucla.edu/Publications/papers/emstar.pdf[Elson 03].
Individual modules form a dependency graph that is expressed in the
EmRun config file. This dependency graph must be free of cycles. That
is, a module can't be a client of some other module that is a client
of the original module. Such a loop in the dependency graph would
potentially cause deadlock, because while a process is making a client
call, it can't be responsive as a server. FUSD will detect this case
and prevent it by refusing the client connection. While this
restriction appears at first to be severe, in fact it is always
possible to get around this limitation by structuring your system
differently. If the dependency graph disallows module A being a client
of module B, the transaction can usually be reversed by module A
instead providing a service to module B. Notification and callback
can be exchanged for an active call. In practice, we have rarely found
this issue to be a problem.
While separate modules run concurrently, the modules themselves are
typically implemented using a single-threaded, event-driven
structure. An event-driven implementation simplifies the
implementation of reactive modules by eliminating the need to worry
about synchronization and deadlock: no event handler will ever be
interrupted to run another event handler, and no event handlers run
"concurrently". The underlying sources of events (e.g. IPC from other
modules, events from devices, etc.) are serialized at the kernel
interface. Thus, so long as every event handler leaves shared data
structures in a valid state when it terminates, there is never a need
to worry about whether a data structure will be valid when it is
accessed.
The main caveat to this structure is that in order to remain
responsive, no event handler can run for a long time. Direct system
calls that are likely to block should be replaced by a
notification-based scheme. For example, in place of a blocking read()
call, use an event to trigger when the file is readable and then call
read(). In practice, the key phrase above is "likely to block". If
domain knowledge suggests that blocking is unlikely (or would only
happen because of a bug), the system call is generally used directly.
The other caveat is processes that are by nature long-lived. For
instance, a module might need to perform a lengthy computation for
which segmentation into bite-sized chunks is impractical. In these
cases, where concurrency really becomes handy, a thread is used. A
message queue event can be used to enable the computation thread to
communicate with the "main" thread that is running the event
system. Even in this reduced case, working out the synchronization can
be tricky -- but not nearly as bad as using threads pervasively.
Relationship to GLib
--------------------
The EmStar event system is a wrapper around GLib, that provides a
slightly different interface. However, because it is a wrapper, it is
100% compatible with GLib events, and GLib-based code can be used
together with EmStar libraries. The purpose of the interface changes is
to simplify them and tune it for the common cases of embedded
networked systems. GLib was designed to run GTK GUI apps, and the
native simplified interface they provide didn't have quite the right
API for building network protocols and reactive systems. The other
change relative to GLib has to do with memory management. GLib uses a
cumbersome reference-counting scheme to track memory usage. Rather
than support a full reference counting scheme, we implement a much
simpler scheme that handles a subset of the problem that represents
the common case for our applications.
EmStar Initialization and the Main Event Loop
---------------------------------------------
An EmStar main() function takes a routine form, that is typically
cut-and-pasted from one module to the next. The main() function is
typically composed of the following steps:
1. Initialize the module's state variables
2. Call misc_init(). This will initialize certain EmStar libraries and facilities.
3. Process command-line arguments
4. Initialize the events to be handled
5. Call emrun_init(). This will connect to EmRun and enable
graceful shutdown and enable EmRun to start other modules that depend
on this one.
6. Call g_main(). This will start up the event loop. This
function never returns; all activity from now on occurs when events
are triggered.
An example of this can be found in the
http://cvs.cens.ucla.edu/viewcvs/viewcvs.cgi/emstar/skeletons/motor_controller/motor_controller.c?rev=1.2&content-type=text/vnd.viewcvs-markup[skeletons]
directory.
Event Initialization
~~~~~~~~~~~~~~~~~~~~
Events are created and installed by event constructors. These
constructors typically take an options structure that defines the
configuration of the event, and fill in an "event reference", which is
a handle on the event. This handle can be used to reconfigure the
event, or perform other operations on it. The nature of these event
references will be discussed in detail in the next section.
Examples of events are:
1. A timer that fires once per second.
2. A timer that fires 4.25 seconds from now.
3. An event that triggers when a particuar file descriptor becomes readable.
4. An event that encapsulates an interface to a neighbor discovery service,
and reports a new list of neighbors whenever a change occurs in the neighborhood.
As you can see, these events encapsulate varying amounts of complexity
and domain knowledge; whereas timers are very simple, the neighbor
discovery client event encapsulates the knowledge of how to connect to
the neighbor discovery service, how to receive notification of new
data, and how to read the new data. This capability to encapsulate
arbitrary mechanism within an "event" enables very simple APIs to be
constructed that encapsulate complex access protocols.
The other advantage of events is that independent events can be
integrated together very easily. Individual EmStar events rarely need to
know anything about the other events in the system in order to
integrate properly. Through the GLib event system, all events from
timers to notification events configure hooks in GLib's central
select() loop, and then are dispatched accordingly.
Event References
~~~~~~~~~~~~~~~~
By convention, all event constructors take the address of an "event
reference" as their final argument. This reference is a handle that
enables the event to be asynchronously reconfigured, accessed, or
destroyed. For example, the g_timer_add() function creates a timer
event. Using the reference, the timer may be asynchronously canceled
using the g_event_destroy() function. For another example, the
g_status_dev() function creates a status device. A reference to that
status device can be used to cause notification on that device, or to
destroy it. In the sense that it is passed to "method" functions, the
event reference is similar to an "instance" of an object class.
GLib internally uses reference counting to address the variety of
conditions that can arise when GLib applications need asynchronous
destruction or multiple independent references. However, reference
counting can be very tricky to use, leading to subtle memory
bugs. Automated forms of reference counting always fall short as well,
most notably when confronted with cyclic linked structures.
After gaining some experience with early versions of EmStar, we found
that we never used the "multiple reference" capability of reference
counting, but that we did need to support asynchronous
destruction. That is, we found that all of our applications could get
by storing only one copy of a pointer to the event, but that our
software often needed to destroy an event from several different
places in the code. While events are never run concurrently, it's
possible that an event might be destroyed down inside a call chain,
and then be referenced later by the caller.
Given this insight, we implemented event references to support
asynchronous destruction, but not multiple reference. Thus, each event
has a single "owner" who holds the pointer to it. Anyone may obtain
that pointer from the owner's structure, and use it, even to destroy
the event. If the pointer obtained is NULL, that means the event does
not exist. Once obtained, the pointer should only be used while it is
known to be valid; if you call out to unknown library code, you should
re-acquire the pointer on your return. When the event is destroyed,
the pointer in the owner's structure will automatically be zeroed, so
that it can't be used again.
This sounds complicated, but it is actually very simple, as long as a
few simple rules are followed.
Rules for event "owners":
1. Allocate the event reference in a *fixed* location (typically in the
owner's state structure).
2. Initialize the event reference to NULL.
3. Pass the (fixed) address of the event reference to the event constructor.
4. *Never* "move" or reallocate the structure containing the reference.
5. Before deallocating the structure containing the reference, call
the appropriate destroy() function to destroy the event.
Rules for using event references:
1. Don't store a copy of an event reference, except temporarily. If you
call out to code that might destroy the event, acquire a fresh copy
before using it again. When possible, use the reference directly from
its fixed location.
2. If the reference is NULL, that means the event has been destroyed.
However, you need not worry about this in general, because methods
and destructors silently ignore NULL references. (For example, it
is always safe to destroy a destroyed reference.)
These rules place slight restrictions on the memory that holds event
references, and disallow multiple independent copies of the event
reference (i.e. requires a single "owner"). However, compared with the
potential for reference-counting related bugs in a full-blown
reference counted framework, this seemed a small price to pay.
NOTE: If you will never need to destroy or access the event, you can
pass NULL in to the event reference argument of the constructor. In
that case, no reference will be saved.
NOTE: Some data structure libraries, such as STL, may move memory
"under the covers". If stored event references are moved this will
cause problems. Most EmStar modules use dynamically allocated structures
kept in linked lists to avoid this problem.
The Main Event Loop
~~~~~~~~~~~~~~~~~~~
After initialization, all activity in an EmStar system originates with
the event loop. This is similar in structure to the scheduler in a
multitasking OS. The implementation of the main loop is a part of
GLib; it is documented in the
http://developer.gnome.org/doc/API/2.0/glib/glib-The-Main-Event-Loop.html[GLib
API reference]. Applications that need to control the details of the
scheduling can use optional arguments provided throughout the EmStar
API. Event-based programming can take a bit to get used to for
programmers accustomed to thread-based programming, but once the
control flow is understood, it should become natural.
Installing a Timer Event
------------------------
The simplest form of event is the timer event. This event has a single
callback that is called when the specified time expires. The callback
may set the timer to retrigger, or terminate the event, depending on
its return value. In addition, a timer can be canceled at any time by
calling g_event_destroy() with the timer's event reference.
Setting a Timer
~~~~~~~~~~~~~~~
To set a timer, call the g_timer_add() function. For example:
------
g_event_t *timer_ref = NULL;
int cb_data = 5;
int status =
g_timer_add(1000, /* 1 second (1000 ms) */
cb_func, /* our callback function */
&cb_data, /* a data pointer */
NULL, /* some rarely-used options */
&timer_ref); /* an event reference */
------
This call will set a one-second timer. If for some reason it cannot be
set, it will return -1. If it is not canceled before 1 second, the
callback will be called. The following call would cancel the timer:
g_event_destroy(timer_ref);
An example of a callback function is:
-------
int cb_func(void *data, int interval, g_event_t *event)
{
int *count = (int *)data;
if (*count == 0) {
elog(LOG_INFO, "No more timeouts");
return TIMER_DONE;
}
(*count)--;
elog(LOG_INFO, "%d more timeouts", *count);
return TIMER_RENEW;
}
-------
Given that cb_data starts out 5, after 1 second, this callback will
print '4 more timeouts', then wait one second, print '3 more
timeouts', then wait one seconds, etc. until it prints 'No more
timeouts', and destroys the event. You can see this behavior in action
by running the program in skeletons/tutorial/timer.c
(obj.i686-linux/skeletons/timer). You may notice that after the timer
finishes, the program hangs. This is because, having destroyed the
timer, there are no further events registered, so the event loop is
idle, and will stay that way forever.
There are just a few additional details about timers. First, in
addition to renewing the timer for the same time interval, a timer
callback can also return TIMER_RENEW_MS(x) to reset the timer for a
different interval.
Second, if no callback is specified when setting the timer, the timer
event will still be created, and will exist for the specified time,
and then be destroyed. This can actually be useful, because the
function g_timer_isset(timer_ref) will tell you if the timer is still
set. Thus, this effectively sets a variable "true" for the specified
length of time.
Watching a File Descriptor
--------------------------
The second-simplest event to construct is one that watches for
notification on a file descriptor.
See more files for this project here