The goal of this assignment is to build a threaded HTTP server that behaves as a client for requests it receives, proxying those requests to the proper sites, and performing some simple filtering or other tasks.
In all problems you should make an effort to handle errors in a reasonable way, they will be tested with invalid inputs or usage. Give the user a meaningful message if they misuse the program or an error occurs.
The sample input/output shown in the problem statements are not an exhaustive list of tests, you should invent and try others.
Your code should have some comments. They should indicate who wrote the program, what the purpose of the program is, how to run the program and explain what any important or potentially confusing code is doing.
Assignment Due: Wed 1 Dec
To submit: submit cs491n http <list of files>
The <list of files> should include all your source files as well as any build instructions or a Makefile needed to construct the code. A README file with comment about building and testing would be helpful as well. Document any unusual behavior and be sure to clarify what your part 2 is doing and what switch/configuration is required to activate it. If a config file is needed, provide a sample along with an explanation of what it does.
You only turn in one project, it should behave as described in part 1 by default, and then based on commandline switches optionally enable your part 2 features.
Write a HTTP Server program in C, C++ or Java, that uses TCP sockets and threading to handle multiple clients. It should turn those requests around to send out to the appropriate sites and return the results to the client, acting as a proxy for the requests. By default, it should let all requests pass through transparently, simply forwarding all data in both directions.
For threading in C or C++ you are expected to use POSIX Threads as discussed in class. If using Java, use the Java threading system. All new connections to the proxy should be handled by a thread. Depending on how a given browser implements proxy connections, this may mean very many or very few threads at any one time.
Your proxy should accept an optional commandline switch: -p <port>. By default, your proxy should listen on 8003 if not told otherwise.
Different browsers will hide the proxy setup in different places in the GUI. In one version of Mozilla, clicking on Edit->Preferences brings up a window with many configuration options. In that dialog, Advanced->Proxies yields a panel with the option of setting a HTTP Proxy. Entering "127.0.0.1" in as the host and 8003 as the port causes all HTTP requests from the browser to go to that network location for all requests. In addition, the Mozilla tested did odd thing with streaming many sequential requests over a single connection to a proxy.
Read up on the HTTP 1.1 spec and the issues involved in keepalive.
It might be helpful to see if you have two different browsers available. One that you can set to talk to your proxy, the other you can use for reading online documentation while working. Remember that when your proxy isn't running, the browser can't talk without it. And also remember browser caching, be sure that it is really reloading pages when you think it is (shift-click in many forces a reload).
Step one will be proving you can write a server that responds to HTTP requests. Have it accept connections and send back the same thing every request, such as:
HTTP/1.1 200 OK\nContent-type: text/html
# blank line between header and body required
<h1>Boo!</h1>
Start working the thread issue in this early state.
As mentioned a few times, not every browser behaves the same way. Be sure to note which one you're using and what sort of behavior you discover it exhibiting in a README.
Q: Browser behavior weirdness.
A: Some browsers may chain many sequential requests together
over a single connection to the proxy. Closing those sockets
early, after responding to the first request may convince the
browser to use different connections to the proxy for every
request.
Q: Testing help?
A: Try manual connection to your proxy with telnet or
nc:
$ telnet 127.0.0.1 <yourport>
# blah blah blah
GET http://something.com/ # type in request manually
Q: What are some valid response codes for a server to
give a client?
A: There are quite a few you might give the client
in different error conditions.
here is a list. (not that you should need most of them)
As a second part of the projct, upgrade the proxy to enable it to perform an "interesting" task with the data passing through it. By default, it should continue to run as a forwarding proxy, taking no actions unless the user who starts the proxy requests them with commandline options.
Choose something for your proxy to do, such as (but not limited to):
The function should be useful, and require the proxy to do some thinking when requests and responses are passing through it. It should, in some way, alter the content of the material passing through. It should be user-configurable, probably through a config file or additional arguments.
Your proxy should provide messages to the screen or a logfile about what it is doing. Such as: blocked access to X, filtered out Y, etc. You'll certainly want something like this for debugging, clean it up and leave it in.
Provide an additional commandline switch for any feature to enable it, it should be disabled by default.
Different functions require a different level of effort. Some will require simply editing some header information, others will require possibly parsing and modifying the HTML to make it work.
You might check with Greg (a quick email is fine) to see if your idea for part 2 is appropriate (he might be able to give you some hints and suggest if it is too easy or too hard).