CS 5523 Operating Systems
Laboratory 3 Post Mortem
Comparing Threaded Servers
- Timing errors: - timing errors usually involve introducing side-effects
that are significant compared with the numbers you are trying to measure.
Most of the timing errors involved some sort of printing in the timing loop:
- Most people avoid this explicitly in
the client, but many put print statements in the thread loop. This is
a disaster as far as timing is concerned, because it delays the
thread response by an unpredictable amount.
- Writing to the screen is even more of a disaster because the screen
is a very slow SHARED device. This has the effect of synchronizing all
of the threads (effect gets worse when there are a lot of threads) on
each of request-response. You have eliminated the parallelism.
- Use flags and conditional compilation to handle debugging statements.
The server should have NO output of informational and debugging
messages of any kind when you are running timing messages.
- Writing to disk after each connection. This isn't as bad as
some other problems, but it introduces delay and hence the client isn't
maintaining the steady load you would expect. Since you know how many
request*connections the client will have, allocate a structure for the times
(it is small) and write all of the times to a structure. When the client
is completely done, it should then write to disk.
- If you must write to disk, write to
/tmp which is usually a local
disk (you should check) to avoid introducing spurious load and delay from
NFS.
- Avoid using
sprintf during timing. It is very slow, especially for
floating point conversions.
- Data analysis errors:
- Taking the median of the medians and presenting that as the median of
the entire data set.
- In general medians should be smaller and more stable than averages for
skewed distributions. Some people had just the opposite effect. This
indicates an incorrect calculation or not enough numbers to base statistics
on.
- Medians and averages that were consistently different by an order
of magnitude. If you couldn't find your problem, you should at least
point it out as a problem.
- Many people structured their output so that it was nearly impossible
to import into a spreadsheet or other program for analysis. You
should print out your tables without intermediate text so that you
can actually see the numbers in columns.
- You should combine the numbers from all of the client processes
for final analysis. Standard deviation or quartiles are good to present too.
- Synchronization errors:
- Buffer errors. The size of the buffer determines how many connections
ahead the server can get. Some people pre-allocated the buffer to a large
(or small) size instead of allocating at runtime as specified.
- Your server threads should release the slots before doing the communication.
- If your producer thread broadcasts when it puts an item in the buffer
rather than signals, all waiting threads will be awakened and have to contend
for the item mutex. This puts a significant synchronization load for 50
workers.
- Your producers should never signal (broadcast) on slots. Your consumers
should never signal (broadcast) on items.
- Putting
printf
statements in your thread loops synchronizes all of the
threads on the shared screen device. This invalidates all of your results.
- One mutex lock for everything results in incorrect or extremely delayed
synchronization depending on how it was used.
- Failure to synchronize empty slots can result in the server overwriting
file descriptors before they are consumed.
- Some people used
select in the threads even though each thread
was handling 1 file descriptor. One reason to use threads is to
eliminate the use of select in handling multiple file descriptors.
Even though they aren't using the same
file descriptor mask, the use of the select introduces unwanted
overhead and usually unwanted thread synchronization.
- Inadequate testing:
- Some people failed to understand the relationship of the
processes/connections/messages values and so picked numbers that didn't make
any sense. The number of processes roughly corresponds to the number of
parallel connections that are established. However, this assumes steady
state. Some people used 2 connections and 2 messages. In this
case the client won't be able to
fork all of the processes before some
of the processes have completed. The test does not produce any
meaningful results. To do this correctly, one would need to actually
synchronize all of the processes at a barrier before going on.
- The basic question of this lab is under what circumstances do you
expect thread-per-connection to be better/worse than worker pool? Most
people did not make a hypothesis and develop tests to confirm or reject
the hypothesis. Tests seemed to be at random.
- Writeup:
- Did not state conditions under which experiment was run (machines, times of day, etc.)
- Did not give an analysis of what happened and what you expected to
happen.
- Did not plot consecutive graphs using the same units.
- Gave architectural diagrams for server 1 and 2 that were the same.
- Did not label the axes of the plots.
- Spelling, grammar and organization.
- Some people did not seriously or correctly address the problem of
when the servers should exit. In general a client should NOT be able
to cause a server to exit. The server should only exit if there is
an irrecoverable error due to resources (memory, descriptors, etc) that
would jeopardize future correct execution. Remember the
Mars Pathfinder!
- Programming errors and style:
- Indentation
- Big loops
- Lint errors
- Didn't catch errors on system calls
Last Revision: April 6, 2002 at 1:15pm by Kay A. Robbins