Overview: In this laboratory you will develop a proxy server that monitors the traffic generated by the browsers that use it. The web monitor is a server that listens on a port for TCP/IP socket requests from clients (web browsers). Upon receiving a client (browser) request, the monitor creates another TCP/IP connection to the actual destination web server and transmits the client's request. The monitor forwards the web server's response back to the client. The monitor should also support an option to dump all of the traffic to a file.
Parts I, II and III are independent of each other and provide the groundwork and skills needed for the remaining three parts. Part IV develops a tunnel in which the web server name is passed as a command line argument to the monitor and all client requests are forwarded to the specified server. In part V, monitor acts as a proxy server and can handle client requests to arbitrary servers. Part VI adds logging. You will be assigned 10 unique port numbers to use during testing. (See this link.) This laboratory must be done in Java.
Requirements:
Part I: Parsing the Initial Request Line
Develop and test a class called InitialLine that has the following
public methods:
public InitialLine(String line);
public boolean isValid( ); // Returns true if line is a valid header line
public String getArgument( ); // Returns the argument or query
public String getCommand( ); // Returns the command
public String getHost( ); // Returns the server part of line
public String getLine( ); // Returns line
public String getPath( ); // Path
public int getPort( ); // Returns the port
public String getProtocol( ); // Returns the protocol
public String getVersion( ); // Returns the protocol version
For example, if the line is:
GET http://vip.cs.utsa.edu:8080/classes/cs5523s2003/index.html HTTP/1.0
The command is GET, the host is vip.cs.utsa.edu,
the path is /classes/cs5523s2003/index.html, the port
is 8080, the protocol is http and the version is
HTTP/1.0.
The initial line is correct if it contains exactly three components. The first
component is either a GET or a POST. The second
component is a valid URL and the third component is HTTP/1.0
or HTTP/1.1.
Thoroughly test this class using a standalone test program. Include the
test program and test results as part of your hand in for Part I.
You may find the Java URL and StringTokenizer
classes useful here.
Part II: Getting an HTTP Header
Develop and test a class called HTTPHeader that has the following
public methods:
public HTTPHeader(DataInputStream in) throws IOException;
public HTTPHeader(DataInputStream in, int linelengthlimit) throws IOException;
public String getHeaderLine(int n); // Returns the n-th header line
public int getNumberLines( ); // Returns the number of header lines
public void print(PrintStream out); // Output the header in readable form
The class should read the HTTPHeader from in and
store it.
The first form of the constructor does not put any restriction on the
length of the header lines, while the second form throws an exception
if it encounters lines longer than linelengthlimit.
Detect the end of the header by an empty line.
Make sure that you do not read past the end of the header on in.
Do not use any buffered I/O classes in your implementation.
Include the initial line as part of the header.
Thoroughly test this class using a standalone test program. Include the
test program and test results as part of your hand in for Part II.
You may find the Java Vector class useful here.
Part III: Copy Thread
Develop and test a class called CopyFromTo that extends
Thread and has the following public methods:
public CopyFromTo (DataInputStream in, DataOutputStream out);
public int getNumberBytes( );
public void run( );
The run method of
CopyFromTo thread copies all of the bytes from in
to out until the end of file or until an error. It then closes
both in and out. The getNumberBytes
returns the number of bytes successfully read and written so far.
Thoroughly test this class using a standalone test program. You should
create multiple CopyFromTo threads to copy from one file
to another. Include the
test program and test results as part of your hand in for Part II.
Remember that you have to call the start method to start
the thread running after it is created. You might find the
synchronized attribute useful for getNumberBytes.
Why?
Part IV: Simple Pass Through (Tunnel)
Write a tunnel program
that takes two command line arguments: a port number p and
a destination web server name. The pass-through monitor
listens on port p for a TCP/IP socket connection request.
When a client connection is been made, the monitor makes a connection
to port 80 of the destination web server.
The monitor then creates two CopyFromTo threads,
one that forwards
all messages received from the client to the specified destination web server
(that was passed to the tunnel as a command line argument) and the
other to handle communication from the server to the client.
If either the client or web server closes a connection, the
tunnel closes its connections to both the client and server. The tunnel then
writes status
information to standard error indicating the name of the client
and the time it took
from when the tunnel received the first information from the client to
when the connection was closed.
Test your program first by using a modified TCPClient and
TCPServer and then by accessing the tunnel
through your web browser. Suppose you
want to test your program
by retrieving
http://vip.cs.utsa.edu/classes/cs5523s2003/home.html,
and your port number was 10355.
Start your tunnel program on machine X with:
java tunnel 10355 vip.cs.utsa.eduIn your web browser access this URL with:
http://X:10355/classes/cs5523s2003/home.html
Test your tunnel with a variety of destination web servers and web pages. Why does the tunnel not have to parse the client's request in order to forward it to the web server?
Notes: CDK Figures 4.5 and 4.6 have example programs for doing network communication in Java. You should start by getting these programs to run.
You will not receive full credit if your implementation assumes that first the client will send all of its implementation and then the server will respond. A tunnel should make no assumptions about the ordering of incoming information. A typical method of handling this is to have two threads in the tunnel --- one for each direction.
Part V: A Proxy Monitor
Modify the tunnel program developed in Part IV so that it acts as a proxy and uses
HTTP redirect to make the connection between the client and the destination
server. You will no longer need to pass the destination web server as a command
line argument to the monitor. Instead, you will need to parse the incoming
HTTP from the client. If the first token is not GET, close all of the
connections and treat the connection as an error. You will need to extract
the destination web server address from the URL and peel off proxy headers.
Explain how the client request to a proxy is different from a client request
to a web server. How does the monitor use the proxy protocol when it forwards
the client's request?
Part VI: A Proxy with Logging
Modify the proxy monitor developed in Part V so that it performs logging.
The program takes two command line arguments --- the port number on which
it listens and the level of logging that it should perform. Logging levels
are specified by the strings none (for no logging -- e.g. Part V),
headers (all headers are dumped to a log file)
and all (headers are dumped to a log file and each returned
resource is saved as a file). Each header should be prefaced
by the client name and the server name. Develop a naming
convention for your resource files that includes the server name and
the path name on the server encoded in the name. Explain your naming
convention in your report.
References for HTTP 1.0:
http://www.jmarshall.com/easy/http/
http://www.w3.org/Protocols