CS 5523 Operating Systems
A Proxy Server to Monitor Web Traffic

Objectives:

Overview: In this laboratory you will develop a proxy server that monitors the traffic generated by the browsers that use it. The web monitor is a server that listens on a port for TCP/IP socket requests from clients (web browsers). Upon receiving a client (browser) request, the monitor creates another TCP/IP connection to the actual destination web server and transmits the client's request. The monitor forwards the web server's response back to the client. The monitor should also support an option to dump all of the traffic to a file. In part I, the web server name is passed as a command line argument to the monitor and all client requests are forwarded to the specified server. In part II, monitor acts as a proxy server and can handle client requests to arbitrary servers. You will be assigned 10 unique port numbers to use during testing. (See this link.) This laboratory must be done in Java.

Requirements:

Part I: Simple Pass Through (Tunnel) Write a tunnel that takes two command line arguments: a port number p and a destination web server name. The tunnel listens on port p for a TCP/IP socket connection request. When a client connection is been made, the monitor makes a connection to port 80 of the destination web server. The tunnel forwards all messages received from the client to the specified destination web server (that was passed to the tunnel as a command line argument) and vice versa. If either the client or web server closes a connection, the tunnel closes its connections to both the client and server. The tunnel then writes status information to standard error indicating the name of the client and the time it took from when the tunnel received the first information from the client to when the connection was closed.

Test your program by accessing the tunnel through your web browser. Suppose you want to test your program by retrieving http://vip.cs.utsa.edu/classes/cs5523s2002/home.html, and your port number was 10355. Start your tunnel program on machine X with:

   java tunnel 10355 vip.cs.utsa.edu 
In your web browser access this URL with:
    http://X:10355/classes/cs5523s2002/home.html

Test your tunnel with a variety of destination web servers and web pages. Why does the tunnel not have to parse the client's request in order to forward it to the web server?

Notes: CDK Figures 4.5 and 4.6 have example programs for doing network communication in Java. You should start by getting these programs to run.

You will not receive full credit if your implementation assumes that first the client will send all of its implementation and then the server will respond. A tunnel should make no assumptions about the ordering of incoming information. A typical method of handling this is to have two threads in the tunnel --- one for each direction.

Part II: A Proxy Monitor Modify the tunnel program developed in Part I so that it acts as a proxy and uses HTTP redirect to make the connection between the client and the destination server. You will no longer need to pass the destination web server as a command line argument to the monitor. Instead, you will need to parse the incoming HTTP from the client. If the first token is not GET, close all of the connections and treat the connection as an error. You will need to extract the destination web server address from the URL and peel off proxy headers. Explain how the client request to a proxy is different from a client request to a web server. How does the monitor use the proxy protocol when it forwards the client's request?

Part III: A Proxy with Logging Modify the proxy monitor developed in Part II so that it performs logging. The program takes two command line arguments --- the port number on which it listens and the level of logging that it should perform. Logging levels are specified by the strings none (for no logging -- e.g. Part II), headers (all headers are dumped to a log file) and all (headers are dumped to a log file and each returned resource is saved as a file). Each header should be prefaced by the client name and the server name. Develop a naming convention for your resource files that includes the server name and the path name on the server encoded in the name. Explain your naming convention in your report.

References for HTTP 1.0:


Last Revision: April 6, 2002, at 1:15 pm by Kay A. Robbins. This material may be used for educational purposes provided that the source is credited.