Chapter 5: Performance Evaluation

- Running at the ragged edge. -

Page Contents

    5.1.  Setup Details
    5.2.  Results from Performance Tests
    5.3.  An Analysis of the Results

Back to Table of Contents


When multiple concurrent threads combine with message queues and asynchronous communication, the result is higher throughput and lower latency. This chapter describes the experiments used to compare the routing performance of the original packet routing system with the new, multi-threaded design. Interesting packet arrival patterns and dropped packet patterns are discussed and illustrated.


Back to Top


5.1. Setup Details

The new packet routing system and the original packet routing system were analyzed using a 486, 50 MHz Computer, with 16 Mbytes of RAM, identical to the acquisition computer used on the plane during Antarctic field surveys. Three 233 MHz Pentium computers were utilized as visualization stations. During performance testing the RAD display requested only radar packets. The size of each radar packet is 12,386 bytes. The CAM and RTQC displays each requested all serial packets. Serial packet sizes are as follows: Avionics System (AVN) is 162 bytes, GPS Clock (CLK) is 84 bytes, Global Positioning System 1 (GPS1) is 235 bytes, Global Positioning System 2 (GPS2) is 178 bytes, Gravity meter (GRV) is 81 bytes, Laser Altimeter (LAS) is 73 bytes, Magnetometer (MAG) is 85 bytes, Pressure Transducer (PRS) is 84 bytes (Table 5.1). This setup was chosen to resemble the real-time monitoring setup employed during Antarctic geophysical surveying. In the field, one visualization computer is utilized to accept and display radar packets as they are acquired. Three additional visualization computers are used to keep track of the multiple streams of acquired serial data, each requesting a subset of serial packet types.

The original packet routing code was modified slightly to allow insertion of a packet's start time within each packet acquired by the spool process. All display programs read this start time to calculate a packet's travel interval (Figure 4.10), writing both the packet arrival time and packet travel interval out to disk before sending a PING for another data packet. The arrival times recorded by the display programs were calculated by using the acquisition computer's real-time clock.

The new packet routing system is also able to record packet drop intervals (Figure 4.10). A drop interval is the time between packet drops, measured in milliseconds, for a specific mailbox. The drop interval along with packet drop time can be written to a file for later analysis, although this task was not executed during performance comparisons with the original system since disk writes do not occur within the original code. Packet drop intervals provide a window into when and where the new system is dropping packets during packet routing.

Performance testing began by generating packets at the data acquisition rates currently employed during geophysical surveying in Antarctica. Testing proceeded by increasing the packet generation rate until both of the spool processes (from the original and the new routing systems) could no longer accept packets at the increased rate  (Table 5.1). The fastest rate chosen was one that still allowed all packets to be recorded to disk. Radar packet rate was increased separately from serial packet rate since each of these two data types is acquired via a different input system.

The programs executing on the experimental visualization computers were identical for both systems except for 'SETUP' type messages sent before packet routing actually began. The action of sending a 'PING' message and receiving a data packet as a reply, was constant for both packet routing systems. Both the original and new version of the spool process received and recorded to disk all of the generated packets, but each system used its own routing mechanism to transfer the generated packets to the packet-requesting display processes. Performance evaluation was based on data packets actually received by the individual display sub-stations. Thus, a reasonable comparison of performance was achieved.


Back to Top


5.2. Results from Performance Tests

Throughput measurements for serial and radar packets were recorded at three experimental visualization computers. The RTQC and CAM display programs usually maintained identical throughputs, but when their rates differed they were averaged. Increasing radar packet transmissions, from 4 Hz to 16 Hz, while maintaining a constant serial packet rate of 34 Hz, affected the throughput of both serial and radar packets. The new system maintained the expected throughput of radar and serial packets from 4 Hz through 12 Hz transmission rates (Figure 5.1a), except for one run at 8 Hz when radar throughput dropped 50%. At 16 Hz, radar throughput dropped below the expected, to 14.5 Hz, and serial packet throughput decreased from the 34 Hz, expected rate, to 28 Hz (Figure 5.1b). The original system usually maintained a steady increase in radar packet throughput, but was always below optimum and below the level of the new system (Figure 5.1a). With increasing radar packet transmissions, serial packet throughput decreased from 15.2 Hz to 11 Hz, more than 50% below the expected throughput of serial packets (Figure 5.1b).

Increasing serial packet transmissions from 34 Hz to 116 Hz while maintaining a constant 4 Hz radar packet rate also affected radar and serial packet throughput measured at the visualization stations (Figures 5.2a and 5.2b). The new packet routing system maintained near optimal serial and radar packet throughputs during 34 Hz and 56 Hz transmission rates. At 86 Hz, radar throughput could remain optimal if the RAD Communication-thread was given higher (+1) running priority than the CAM and RTQC Communication-threads, but serial throughput dropped to 55 Hz, 36% below the expected rate. At equal priority levels, radar throughput dropped to 3.6 Hz, 10% below the expected level, with serial throughput at 75 Hz, 12% below optimum. Beyond 86 Hz a sharp decrease in measured performance was noted for both radar and serial packet throughputs.

At serial transmissions of 116 Hz, radar packet throughput could be raised to 3.5 Hz by increasing the priority of the RAD Communication-thread, but this caused serial throughput to drop to ~20Hz, more than 75% below the expected rate. At equal priorities, radar throughput could be maintained at only 1.8 Hz and serial throughput at 44 Hz, still more than 50% below optimum. The original packet routing system consistently maintained a serial packet throughput below 50% of the expected rate, regardless of increases in serial packets transmitted. Radar packet throughput stayed at the expected rate for serial transmissions of 34 Hz, SOAR's current system configuration, and 56 Hz. Performance proved variable at 86 Hz and 116 Hz with radar throughput oscillating between 2 and 4 packets per second. This unstable behavior is illustrated in Figure 5.3 during a 5-minute run at an 86 Hz serial packet rate (Figure 5.3a illustrates data collected using the current system configuration). Figure 5.3b shows a pattern of routing that gave a near optimal radar throughput rate of 3.9 Hz. The same initial parameters produced the run illustrated in Figure 5.3c. Radar throughput decreased to 2 Hz, 50% below the expected rate. During the first 1.5 minutes and again between 2.5 and 4 minutes, no radar packets (blue diamonds on the plots) arrived at the RAD display (Figure 5.3c). As mentioned previously, this 50% decrease in radar packet throughput was also seen during an 8 Hz radar / 34 Hz serial packet rate.

The new system offers 100% packet arrival at display stations for the currently configured system and consistently performs better than the original system as data packet transmissions increase (Figures 5.1, and 5.2b). In contrast, the original system optimizes transferring radar packets at the expense of lower serial packet throughput and maintains a fairly constant serial packet throughput regardless of the data acquisition rate (Figures 5.1a, 5.1b, 5.2a, 5.2b). Less than 50% of all serial packets acquired, are actually routed through the original system to the awaiting visualization sub-systems. The original system routes better than 80% of all radar packets acquired except during unstable runs (Figure 5.3b). During the 116 Hz serial packet rate, both systems drop a considerable fraction of packets and perform about the same (Figures 5.2a and 5.2b).

Packet travel intervals (i.e. latency), the other major statistic collected, illustrate the length of time individual packets remain en route before arriving for visualization at a display station (Figures 5.4 and 5.5). The new system clearly routes packets more quickly than the original system (Figure 5.4b and 5.5b). This results in the new system's ability to handle higher rates of packet routing when data acquisition rates increase.


Back to Top


5.3. An Analysis of the Results

What happens when the packet routing systems are forced to accept increasing data acquisition rates of radar (large-sized) or serial (small-sized) packets? The new system can optimize the throughput of data packets at a particular display program by running that program's individual communication thread at a higher priority. Radar packet throughput to the RAD display, was increased by running the RAD display's communication thread at a +1 increase in priority (Table 5.2). Concurrent with this increase was a decrease in throughput at the CAM and RTQC display stations. Since radar packets are at least ~50 times larger than any serial packet, the cost of holding their travel time under 100ms has to be extracted from the system somewhere. A system can use scheduling to optimally arrange packets for routing. Interleaving tiny serial packets between very large radar packets can be accomplished, but at a cost. When the system is running at its ragged edge, the cost is dropping greater numbers of serial packets. The CAM and RTQC communication threads are preempted by the RAD communication thread and must wait to send their tiny serial packets until after the larger radar packets are sent. This tug-of-war between displays for better packet throughput may be avoided by increasing the speed of the acquisition computer. When a machine is running at its limit, knowing how to prioritize processes can give optimal performance results.


Table 5.2. The effect, on throughput, of setting communication thread priorities before starting packet routing at high packet acquisition rates. After giving the RAD Communication-thread a higher priority, radar packet throughput increases to 100%, concurrently with a comparable decrease in serial packet throughput.

Display Name/Packet Rate Throughput Throughput
  Thread Priorities: Thread Priorities:
Serial packets: 86 Hz RAD: 12r; CAM: 12r; RTQC: 12r RAD: 12r; CAM: 11r; RTQC: 11r
Radar Packets: 4 Hz
CAM 64.4 packets/sec (75%) 59.2 packets/sec (69%)
RAD 3.6 packets/sec (90%) 4.0 packets/sec (100%)
RTQC 65.0 packets/sec (76%) 60.8 packets/sec (71%)
Serial Packets: 34 Hz
Radar Packets: 16 Hz
CAM 29.6 packets/sec (87%) 23.3 packets/sec (69%)
RAD 13.8 packets/sec (86%) 16.0 packets/sec (100%)
RTQC 29.6 packets/sec (87%) 23.9 packets/sec (70%)



A visual comparison of two performance runs (Figure 5.6a and 5.6b), differing only in the priority of the RAD display's Communication-thread, shows the dramatic reduction in average transfer time of radar packets. By giving radar packets higher priority, the average radar packet travel time decreased from 180 milliseconds to 50 milliseconds (Figure 5.6b). Visualization of the time series brings to light another routing phenomenon. At the beginning of two similarly spaced (maybe periodic?) intervals, radar packet transfer time is suddenly increased (Figure 5.6b). Why is this happening? What significance does this hold for understanding when and why packets are dropped?

A time series plot overlaying packet drop intervals with packet arrival intervals, reveals two different occurrences of packet dropping and the relationship of these occurrences to packet arrivals at the display stations (Figure 5.7). One set of packets, a mixture of radar and serial, is dropped by Spool, when the Coordinator could not remove packets fast enough from the COORD mailbox. A second set of radar packets is dropped by the RAD Communication-thread, when 'PING' messages, packet requests, do not arrive fast enough from the RAD visualization station. During this run, the RTQC and CAM Communication-threads did not drop any packets. These Communication-threads received packet requests at rates below packet delivery rates to their mailboxes.

As stated above, recording drop intervals was not done during performance comparisons because of the high time overhead of disk writes. Nevertheless, by comparing a time series plot including dropped packets, with a time series plot omitting dropped packets, the same type of anomaly within the packet arrival pattern signals similar bursts of dropped packets. This anomaly is also evident for the smaller, serial packets, but only at high serial packet acquisition rates. A close-up plot of a small anomaly shows three packet streams coexisting until some perturbation causes the interval pattern to lose synchronicity resulting in a burst of dropped packets (Figure 5.7b).

Scheduling and run priorities were ignored at the onset of testing, but it became apparent that some system interference was affecting the output of runs at higher packet generation rates. The nameloc system utility provides a QNX network with global naming capability. Both the new and original system use global names during inter-process communication. nameloc needs to be running on at least one node in a QNX distributed network for global names to work.

The nameloc process probes each node in a QNX network to maintain a current list of QNX network-wide global names. The time interval between probes is settable when nameloc is first started. More than one node running nameloc increases network traffic because each instance of nameloc will probe the network. Sometimes, a small improvement in packet throughput was seen when nameloc was limited to only the acquisition node. During testing, network probing was set to twice and then reduced to once per second. No conclusive results could be obtained regarding the benefits of fewer nodes running nameloc or by decreasing the probing delay. The possible periodicity seen in Figures 5.6 and 5.7 could be related to nameloc probing the network. During the run illustrated in Figure 5.7, nameloc was set to probe twice per second, and probing occurred once per second in Figure 5.6.

Due to the synchronous nature of the original design, packet throughputs obtained using the original packet routing program do not seem to be affected by the network probing of the nameloc utility. Since Communication-threads run asynchronously, small network perturbations may more easily disrupt their interwoven operation. Regular packet transmission rates from the packet generators, and constant 'PING' messages from the display programs, inject synchronicity into the running system, so eventually, the executing threads develop a distinguishable pattern. This is seen as their ability to interlace packet transfers across the ethernet network. Any disruption in this pattern is evident as a perturbation in the arrival pattern of packets at the display computers (Figures 5.6 and 5.7).


Back to Top