CS 5523 Operating Systems
A Proxy Server to Monitor Web Traffic

Objectives:

Overview: In this laboratory you will develop a proxy server that monitors the traffic generated by the browsers that use it. The web monitor is a server that listens on a port for TCP/IP socket requests from clients (web browsers). Upon receiving a client (browser) request, the monitor creates another TCP/IP connection to the actual destination web server and transmits the client's request. The monitor forwards the web server's response back to the client. The monitor should also support an option to dump all of the traffic to a file.

Parts I, II and III are independent of each other and provide the groundwork and skills needed for the remaining three parts. Part IV develops a tunnel in which the web server name is passed as a command line argument to the monitor and all client requests are forwarded to the specified server. In part V, monitor acts as a proxy server and can handle client requests to arbitrary servers. Part VI adds logging. You will be assigned 10 unique port numbers to use during testing. (See this link.) This laboratory must be done in Java.

Requirements:

Part I: Parsing the Initial Request Line

Develop and test a class called InitialLine that has the following public methods:

    public InitialLine(String line);
    public boolean isValid( ); // Returns true if line is a valid header line 
    public String getArgument( ); // Returns the argument or query
    public String getCommand( ); // Returns the command
    public String getHost( ); // Returns the server part of line
    public String getLine( );  // Returns line
    public String getPath( );  // Path
    public int getPort( );  // Returns the port
    public String getProtocol( ); // Returns the protocol
    public String getVersion( ); // Returns the protocol version
For example, if the line is:
    GET http://vip.cs.utsa.edu:8080/classes/cs5523s2003/index.html HTTP/1.0
The command is GET, the host is vip.cs.utsa.edu, the path is /classes/cs5523s2003/index.html, the port is 8080, the protocol is http and the version is HTTP/1.0.

The initial line is correct if it contains exactly three components. The first component is either a GET or a POST. The second component is a valid URL and the third component is HTTP/1.0 or HTTP/1.1.

Thoroughly test this class using a standalone test program. Include the test program and test results as part of your hand in for Part I. You may find the Java URL and StringTokenizer classes useful here.

Part II: Getting an HTTP Header

Develop and test a class called HTTPHeader that has the following public methods:

    public HTTPHeader(DataInputStream in) throws IOException;
    public HTTPHeader(DataInputStream in, int linelengthlimit) throws IOException;
    public String getHeaderLine(int n); // Returns the n-th header line 
    public int getNumberLines( ); // Returns the number of header lines
    public void print(PrintStream out); // Output the header in readable form
The class should read the HTTPHeader from in and store it. The first form of the constructor does not put any restriction on the length of the header lines, while the second form throws an exception if it encounters lines longer than linelengthlimit. Detect the end of the header by an empty line. Make sure that you do not read past the end of the header on in. Do not use any buffered I/O classes in your implementation. Include the initial line as part of the header.

Thoroughly test this class using a standalone test program. Include the test program and test results as part of your hand in for Part II. You may find the Java Vector class useful here.

Part III: Copy Thread Develop and test a class called CopyFromTo that extends Thread and has the following public methods:

    public CopyFromTo (DataInputStream in, DataOutputStream out); 
    public int getNumberBytes( );
    public void run( );
The run method of CopyFromTo thread copies all of the bytes from in to out until the end of file or until an error. It then closes both in and out. The getNumberBytes returns the number of bytes successfully read and written so far.

Thoroughly test this class using a standalone test program. You should create multiple CopyFromTo threads to copy from one file to another. Include the test program and test results as part of your hand in for Part II. Remember that you have to call the start method to start the thread running after it is created. You might find the synchronized attribute useful for getNumberBytes. Why?

Part IV: Simple Pass Through (Tunnel) Write a tunnel program that takes two command line arguments: a port number p and a destination web server name. The pass-through monitor listens on port p for a TCP/IP socket connection request. When a client connection is been made, the monitor makes a connection to port 80 of the destination web server. The monitor then creates two CopyFromTo threads, one that forwards all messages received from the client to the specified destination web server (that was passed to the tunnel as a command line argument) and the other to handle communication from the server to the client. If either the client or web server closes a connection, the tunnel closes its connections to both the client and server. The tunnel then writes status information to standard error indicating the name of the client and the time it took from when the tunnel received the first information from the client to when the connection was closed.

Test your program first by using a modified TCPClient and TCPServer and then by accessing the tunnel through your web browser. Suppose you want to test your program by retrieving http://vip.cs.utsa.edu/classes/cs5523s2003/home.html, and your port number was 10355. Start your tunnel program on machine X with:

   java tunnel 10355 vip.cs.utsa.edu 
In your web browser access this URL with:
    http://X:10355/classes/cs5523s2003/home.html

Test your tunnel with a variety of destination web servers and web pages. Why does the tunnel not have to parse the client's request in order to forward it to the web server?

Notes: CDK Figures 4.5 and 4.6 have example programs for doing network communication in Java. You should start by getting these programs to run.

You will not receive full credit if your implementation assumes that first the client will send all of its implementation and then the server will respond. A tunnel should make no assumptions about the ordering of incoming information. A typical method of handling this is to have two threads in the tunnel --- one for each direction.

Part V: A Proxy Monitor Modify the tunnel program developed in Part IV so that it acts as a proxy and uses HTTP redirect to make the connection between the client and the destination server. You will no longer need to pass the destination web server as a command line argument to the monitor. Instead, you will need to parse the incoming HTTP from the client. If the first token is not GET, close all of the connections and treat the connection as an error. You will need to extract the destination web server address from the URL and peel off proxy headers. Explain how the client request to a proxy is different from a client request to a web server. How does the monitor use the proxy protocol when it forwards the client's request?

Part VI: A Proxy with Logging Modify the proxy monitor developed in Part V so that it performs logging. The program takes two command line arguments --- the port number on which it listens and the level of logging that it should perform. Logging levels are specified by the strings none (for no logging -- e.g. Part V), headers (all headers are dumped to a log file) and all (headers are dumped to a log file and each returned resource is saved as a file). Each header should be prefaced by the client name and the server name. Develop a naming convention for your resource files that includes the server name and the path name on the server encoded in the name. Explain your naming convention in your report.

References for HTTP 1.0:


Last Revision: Janaury 25, 2003, at 5:15 pm by Kay A. Robbins. This material may be used for educational purposes provided that the source is credited.