The World-Wide Web is a metaphorical description for the sophisticated interactions among computers. The core technology that creates this phenomenon is the Internetworking Protocol suite, sometimes called The Internet. Fundamentally, the internetworking protocols define a relationship between pieces of software called the client-server model. In this case some programs (like browsers) are clients. Other programs (like web servers, databases, etc.) are servers.
This client-server model of programming is very powerful and adaptable. It is powerful because it makes giant, centralized servers available to large numbers of remote, widely distributed users. It is adaptable because we don’t need to send software to everyone’s computer to make a change to the centralized service.
Essentially, every client-server application involves a client application program, a server application, and a protocol for communication betweem the two processes. In most cases, these protocols are part of the popular and enduring suite of internetworking protocols based on TCP/IP. For more information in TCP/IP, see Internetworking with TCP/IP [Comer95].
We’ll digress into the fundamentals of TCP/IP in About TCP/IP. We’ll look at what’s involved in a web server in The World Wide Web and the HTTP protocol. We’ll look briefly at web services in Web Services . We’ll look at slightly lower-level protocols in Writing Web Clients: The urllib2 Module. Finally, we’ll show how you can use low-level sockets in Socket Programming. Generally, you can almost always leverage an existing protocol; but it’s still relatively simple to invent your own.
The essence of TCP/IP is a multi-layered view of the world. This view separates the mechanics of operating a simple Local Area Network (LAN) from the interconnection between networks, called internetworking.
Hardware. The lowest level of network services are provided by mechanisms like Ethernet (see the IEEE 802.3 standards), which covers wiring between computers. The Ethernet standards include alternatives like 10BaseT (for twisted pairs of thin wires), 10Base2 (for thicker coaxial cabling). Network services may also be wireless, using the IEEE 802.11 standards. In all cases, though, these network services provide for simple naming of devices and moving bits from device to device.
What makes these “low level” is that these services are limited by having to know the hardware name of the receiving device; usually called the MAC address. When you buy a new network card for your computer, you – effectively – change your computer’s hardware name.
The TCP/IP standards put several layers of control on top of these data passing mechanisms. While these additional layers allow interconnection between networks, they also provide a standard library for using all of the various kinds of network hardware that is available.
Internetworking Protocol. First, the Internet Protocol (IP) standard specifies addresses that are independent of the underlying hardware. The IP also breaks messages into packets and reassembles the packets in order to be independent of any network limitations on transmission lengths.
Additionally, the IP standard specifies how to route packets among networks, allowing packets to pass over bridges and routers between networks. This is the fundamental reason why internetworking was created in the first place.
Finally, IP provides a formal Network Interface Layer to divorce IP and all higher level protocols from the mechanics of the actual network. This allows for independent evolution of the application software (like the World Wide Web) and the various network alternatives (wired, wirelss, broadband, dial-up, etc.)
Transport Control Protocol. The Transport Control Protocol (TCP) protocol relies on IP. It provides a reliable stream of bytes from one application process to another. It does this by breaking the data into packets and using IP to route those packets from source to receiver. It also uses IP to send status information and retry lost or corrupted packets. TCP keeps complete control so that the bytes that are sent are recieved exactly once and in the correct order.
Many applications, in turn, depend on the TCP/IP protocol capabilities. The Hypertext Transport Protocol (HTTP), used to view a web page, works by creating a TCP/IP connection (called a socket ) between browser and web server. A request is sent from browser to web server. The web server responds to the browser request. When the web page content is complete, the socket is closed and the socket connection can be discarded.
Python Modules. Python provides a number of complete client protocols that are built on TCP/IP in the following modules: urllib, httplib, ftplib, gopherlib, poplib, imaplib, nntplib, smtplib, telnetlib. Each of these exploits one or more protocols in the TCP/IP family, including HTTP, FTP, GOPHER, POP, IMAP, NNTP, SMTP and Telnet. The urllib and urllib2 modules make use of multiple protocols, including HTTP and FTP, which are commonly provided by web servers.
We’ll look into the details of just one of these higher-level procotols built on TCP/IP. We’ll look at HTTP and how this serves web pages for people. We’ll look at using this to create a web service, also.
Protocols, like SMTP, POP and IMAP are used to route and read email. One can argue that SMTP is perhaps the most used protocol ever invented, since every email on the internet is pushed around by SMTP.
One of the most widely-used protocol built on top of TCP/IP is probably HTTP. It is the backbone of the World Wide Web. The HTTP protocol defines two parties: the client (or browser) and the server. The browser is generally some piece of software like FireFox, Opera or Safari. The web server is usually based on the Apache web server, but there are several others in common use.
The HyperText Transfer Protocol (HTTP) specifies a request and a reply. Our client (usually a browser) sends a request. The web server sends us a reply. [And yes, the World Wide Web is that simple. the sophistication comes from all the clever things that browsers and servers do with this simple protocol.]
Requests. An HTTP request includes a number of pieces of information. A few of these pieces of information are of particular interest to a web application.
| operation: | The operation (or method) is generally GET or POST. There are other commands specified in the protocol (like PUT or DELETE), but they aren’t provided by browsers. This isn’t visible. Generally, any URL you enter into a browser is accessed with a GET method. When you fill in a form and click a button, then the form is often sent as a POST request. |
|---|---|
| url: | The URL locates tthe resource. It includes a scheme, a path, a query, and other optional information like a query. When we browse http://homepage.mac.com/s_lott, the //homepage.mac.com/s_lott is the path. The http: is the scheme (or protocol) being used. |
| headers: | There are a number of headers which are included in the query; these describe the browser, and what the browser is capable of. The headers summarize some of the browser’s preferences, like the language and locale. They also describe any additional data that is attached to the request. The “content-length” header, in particular, tells you that form input or a file upload is attached. |
Reply. An HTTP reply includes a number of pieces of information. It always begins with a MIME-type string that tells the browser what kind of document will follow. This string us often text/html or text/plain.
The reply also includes the status code and a number of headers. Often the headers are version infromation that the browser can reveal via the Page Info menu item in the browser. Finally, the reply includes the actual document, either plain text, HTML or an image.
There are a number of HTTP status codes. Generally, a successful request has a status code of 200, indicating that request is complete, and the page is being sent.
The 30x status codes indicate that the page was moved, the "Location" header provides the URL to which the browser will redirect.
The 40x status codes indicate problems with the request. Generally, the resource was not found.
The 50x status codes indicate problems with the server or the fundamental syntax of the request.
Since the World Wide Web is a client-server protocol, we can create clients or servers (or both). Generally, the clients are web browsers.
There are, however, numerous applications where we want to get software from a server on the web, but we don’t want to use a browser. We might have a daily extract of data, or an hourly summary of Twitter postings.
These can be done by writing a web client. Fundamentally, a web client engages in an HTTP request and processes the reply that comes from the web server.
When the response is in a structured markup language (like HTML or XML), then we’ll need to parse this resulting file format. We looked at XML parsing in XML Files: The xml.etree and xml.sax Modules. HTML parsing is similar.
Resources. A central piece of the design for the World-Wide Web is the concept of a Uniform Resource Locator (URL) and Uniform Resource Identifier (URI). A URL provides several pieces of information for getting at a piece of data located somewhere on the internet. A URL has several data elements. Here’s an example URL: http://www.python.org/download/
It turns out that we have a choice of several schemes for accessing data, making it very pleasant to use URL’s. The protocols include
HTTP Interaction. A great deal of information on the World Wide Web is available using simple URI’s. In any well-design web site, we can simply GET the resource that the URL identifies.
A large number of transactions are available through HTTP requests. Many web pages provide HTML that will be presented to a person using a browser.
In some cases, a web page provides an HTML form to a person. The person may fill in a form and click a button. This executes an HTTP POST transaction. The urllib2 module allows us to write Python programs which, in effect, fill in the blanks on a form and submit that request to a web server.
Also note that some web sites manage interaction with people via cookies. This, too, can be handled with urllib2.
Example. By using URL’s in our programs, we can write software that reads local files as well as it reads remote files. We’ll show just a simple situation where a file of content can be read by our application. In this case, we located a file provided by an HTTP server and an FTP server. We can download this file and read it from our own local computer, also.
As an example, we’ll look at the Collaborative International Dictionary of English, CIDE. Here are three places that these files can be found, each using different protocols. However, using the urrllb2 module, we can read and process this file using any protocol and any server.
| FTP: | ftp://aeneas.mit.edu/pub/gnu/dictionary/cide.a This URL describes the aeneas.mit.edu server that has the CIDE files, and will respond to the FTP protocol. |
|---|---|
| HTTP: | http://ftp.gnu.org/gnu/gcide/gcide-0.46/cide.a This URL names the ftp.gnu.org server that has the CIDE files, and responds to the HTTP protocol. |
| FILE: | file:///Users/slott/Documents/dictionary/cide.a This URL names a file on my local computer. Your computer may not have this path or this file. |
urlreader.py
#!/usr/bin/env python
"""Get the "A" section of the GNU CIDE Collaborative International Dictionary of English
"""
import urllib2
#baseURL= "ftp://aeneas.mit.edu/pub/gnu/dictionary/cide.a"
baseURL= "http://ftp.gnu.org/gnu/gcide/gcide-0.46/cide.a"
#baseURL= "file:///Users/slott/Documents/dictionary/cide.a"
dictXML= urllib2.urlopen( baseURL, "r" )
print len(dictXML.read())
dictXML.close()
A web application is usually embedded in a web server. The point of a web application is to respond to HTTP requests with appropriate replies. The HTTP protocol is fairly simple, making it possible – in principle – to write a complete web server in Python.
In the long run, however, a web server written entirely in Python doesn’t scale well. To provide reasonable levels of service to large numbers of users, there are a great many optimizations that are essential.
One of the most important optimizations relates to the nature of the various downloads from a web server. When we request a page, the initial download in response to the GET request is usually an HTML document. Embedded in the HTML are references to numerous other files, including style sheets, Javascript libraries, images and other media.
The HTML is often built dynamically and requires a sophisticated Python-based application. The rest of the content, however, is more-or-less static, and does not require deep sophistication. The static media needs to be sent as simply as possible.
This dichotomy between small, complex dynamic HTML content and large, simple static content leads us to a two-part design. We want to use Python only for the HTML, and use some other, faster, application for the static content. It works out best if we embed our Python application in a web server like Apache. We can delegate the static content to Apache. We reserve the dynamic HTML creation for our Python web application programs.
We usually use a component called mod_wsgi to extend Apache with Python. The idea is to configure Apache to separate requests for static media content from the requests for the dynamic HTML pages. Apache serves the static content from local files. Apache delegates (via mod_wsgi) some web requests to our Python application.
Priviledge. Note that web servers usually listen on port 80. Writing applications that use this port (or any other port numbered below 1024) requires special operating system privileges.
Writing priviledged applications is beyond the scope of this book. For that reason, we’ll focus on writing applications which do one of two things.
A web server handles half of the HTTP conversation. We have a number of choices of ways to implement this half of the protocol.
We can write our own from scratch. Python provides us some seed modules from which we can build a working server. In some applications, where the volume is low, this is entirely appropriate.
See the BaseHTTPServer, SimpleHTTPServer and CGIHTTPServer modules for simple web servers. Also see the wsgiref package for a more sophisticated web server.
As noted above, this is relatively inefficient because we’ll be using the vast power of Python to serve a lot of static content files.
Also, it’s difficult to listen for web requests on port 80 using a Python application.
We can plug into the Apache server.
Apache supports a wide variety of Gateway Interface technologies, including CGI and SCGI. Using the Python cgi module, we can create a CGI or SCGI script. This is an inefficient use of system resources because each request starts a complete, fresh Python interpreter.
We can plug into the Apache server with mod_python. This Apache module embeds a Python interpreter directly in Apache. This embedded interpreter then runs your Python programs as part of Apache’s response to HTTP requests. This is very secure and very fast. This is a relatively direct connection with Apache.
One of the most popular (and flexible) connections to Apache is mod_wsgi. We can use the mod_wsgi Apache extension in one of two ways. We can embed Python into Apache, or we can have Python running as a separate daemon process.
Using Python as a separate deamon means that the Apache process is free to serve other web requests while our Python process is doing the complex work of creating the HTML.
Generally, using cgi or mod_wsgi is still rather complex. There are numerous details of parsing requests, handing sessions, identifying users, managing logs, etc., which are common problems with common solutions.
Web Frameworks. Rather than invent all of the supporting technology for a web site, it’s easiest to use a web application framework. If we use a framework, we can focus on the content and presentation of our web site and leave the housekeeping to the folks who write the framework.
A web framework will connect to Apache; it will handle the details of parsing a web request and providing a suitable response. Using a web framework means that we do much, much less programming. Python has dozens of popular, successful web frameworks. You can look at Zope, Pylons, Django and TurboGears for some examples of dozens of ways that the Python community has simplified the construction of web applications.
We can’t easily cover any of the web frameworks in this book. But we can take a quick look at BaseHTTPServer, just to show what’s involved in HTTP.
Fundamentally, a web server is an application that listens for and handles requests sent using the HTTP protocol. The handler is required to formulate a suitable response.
This “listen and handle” loop is implemented by an instance of the BaseHTTPServer.HTTPServer class. We construct the server by providing a handler class. Each HTTP request will lead to creation of an instance of the handler class.
The BaseHTTPServer.HTTPServer class has two methods to provide the overall “main loop” of a web server.
| Parameters: |
|
|---|
This method of a server will handle just one request. It’s handy for debugging. Or, you could create your own “serve forever” loop.
This method of a server will handle requests until the server is stopped forcibly. A forcible stop is usually an external kill signal (or the equivalent in Windows).
An HTTPServer object requires a subclass of BaseHTTPServer.BaseHTTPRequestHandler. The base class does a number of standard operations related to handling web service requests.
Generally, you’ll need to override just a few methods. Since most browsers will only send GET or POST requests, you only need to provide do_GET() and do_POST() methods.
This class has a number of instance variables which characterize the specific request that is currently being handled.
- client_address¶
- An internet address as used by Python. This is a 2-tuple: (host address, port number).
- command¶
- The command in the request. This will usually be GET or POST.
- path¶
- The requested path.
- request_version¶
- The protocol version string sent by the browser. Generally it will be 'HTTP/1.0' or 'HTTP/1.1'.
- headers¶
- This is a collection of headers, usually an instance of mimetools.Message. This is a mapping-like class that gives you access to the individual headers in the request. The header "cookie", for instance, will have the cookies being sent back by the browser. You will need to decode the value of the cookie, usually using the Cookie module.
- rfile¶
- If there is an input stream, this is a file-like object that can read that stream. Do not read this without providing a specific size to read. Generally, you want to get headers['Content-Length'] and read this number of bytes. If you do not specify the number of bytes to read, and there is no supplemental data, your program will wait for data on the underlying socket. Data which will never appear.
- wfile¶
This is the response socket, which the browser is reading. The response protocol requires that it be used as follows:
- Use self.send_response( number ) or self.send_response( number, text ). Usually you simply send 200.
- Use self.send_header( header, value ) to send specific headers, like "Content-type" or "Content-length". The "Set-cookie" header provides cookie values to the browser. The "Location" header is used for a 30x redirect response.
- Use self.end_headers() to finish sending headers and start sending the resulting page.
- Then (and only then) you can use self.wfile.write() to send the page content.
- Use self.wfile.close() if this is a HTTP/1.0 connection.
Your class should provide some class level values which are provided to the browser.
- server_version¶
- A string to identify your server and version. This string can have multiple clauses, each separated by whitespace. Each clause is of the form product/version. The default is 'BaseHTTP/0.3'.
- error_message_format¶
- This is the web page to send back by the send_error method. The send_error method uses the error code to create a dictionary with three keys: "code", "message" and "explain". The "code" item in the dictionary has the numeric error code. The "message" item is the short message from the self.responses dictionary. The “explain” method is the long message from the self.responses dictionary. Since a dictionary is provided, the formatting string for his error message must include dictionary-oriented conversion strings: %(code)d, %(message)s and %(explain)s.
- protocol_version¶
- This is the HTTP version being used. This defaults to 'HTTP/1.0'. If you set this to 'HTTP/1.1', then you should also use the "Content-Length" header to provide the browser with the precise size of the page being sent.
- responses¶
A dictionary, keyed by status code. Each entry is a two-tuple with a short message and a long explanation. These two values become the message and the explain in an error message.
The message for status code 200, for example, is 'OK'. The explanation is somewhat longer.
This class has a number of methods which you’ll want to use from within your do_GET() and do_POST() methods.
- send_error(code[, message])¶
- Send an error response. By default, this is a complete, small page that shows the code, message and explanation. If you do not provide a message, the short message from the self.responses[code] mapping will be used.
- send_response(code[, message])¶
Sends a response in pieces. If you do not provide a message, the short message from the self.responses[code] mapping will be used.
This method is the first step in sending a response. This must be followed by self.send_header() if any headers are present. It must be followed by self.end_headers(). Then the page content can be sent.
- send_header(name, value)¶
- Send one HTTP header and its value. Use this to send specific headers, like "Content-type" or "Content-length". If you are doing a redirect, you’ll need to include the "Location" header.
- end_headers()¶
- Finish sending the headers; get ready to send the page content. Generally, this is followed by writing to self.wfile.
- log_request(code[, size])¶
- Uses self.log_message() to write an entry into the log file for a normal response. This is done automatically by send_headers().
- log_error(format, args...)¶
- Uses self.log_message() to write an entry into the log file for an error response. This is done automatically by send_error().
- log_message(format, args...)¶
- Writes an entry into the log file. You might want to override this if you want a different format for the error log, or you want it to go to a different destination than sys.stderr.
Example. The following example shows the skeleton for a simple HTTP server. This sever merely displays the GET or POST request that it receives. A Python-based web server can’t ever be fast enough to replace Apache. However, for some applications, you might find it convenient to develop a small, simple application which handles HTTP.
webserver.py
import BaseHTTPServer
class MyHandler( BaseHTTPServer.BaseHTTPRequestHandler ):
server_version= "MyHandler/1.1"
def do_GET( self ):
self.log_message( "Command: %s Path: %s Headers: %r"
% ( self.command, self.path, self.headers.items() ) )
self.dumpReq( None )
def do_POST( self ):
self.log_message( "Command: %s Path: %s Headers: %r"
% ( self.command, self.path, self.headers.items() ) )
if self.headers.has_key('content-length'):
length= int( self.headers['content-length'] )
self.dumpReq( self.rfile.read( length ) )
else:
self.dumpReq( None )
def dumpReq( self, formInput=None ):
response= "<html><head></head><body>"
response+= "<p>HTTP Request</p>"
response+= "<p>self.command= <tt>%s</tt></p>" % ( self.command )
response+= "<p>self.path= <tt>%s</tt></p>" % ( self.path )
response+= "</body></html>"
self.sendPage( "text/html", response )
def sendPage( self, type, body ):
self.send_response( 200 )
self.send_header( "Content-type", type )
self.send_header( "Content-length", str(len(body)) )
self.end_headers()
self.wfile.write( body )
def httpd(handler_class=MyHandler, server_address = ('', 8008), ):
srvr = BaseHTTPServer.HTTPServer(server_address, handler_class)
srvr.serve_forever()
if __name__ == "__main__":
httpd( )
Fundamentally, an web server is an application that listens for and handles requests sent using the HTTP protocol. The handler is required to formulate a suitable response.
Python Enhance Proposal PEP 333 defines a standard approach to handling web requests, called the Web Services Gateway Interface, WSGI. This standard allows us to build large, sophisticated web sites as a composition of many smaller components.
It’s best to think of WSGI as a system of pipes for routing requests and responses.
To make this composition work, each WSGI application must adhere to a standardized definition.
A WSGI application must have the following signature.
| Parameters: |
|
|---|
The start_response() function is what your application uses to start sending an HTTP response. This includes the status and the various headers.
| Parameters: |
|
|---|
All WSGI-compatible applications must do two things. They must see to it that the start_response() function is called. They must return a list of strings.
When we think of a WSGI application as a pipe, we see that an application will accomplish the above requirements one of two ways.
The WSGI environment includes the following items that define the request.
| REQUEST_METHOD: | The HTTP request method, generally "GET" or "POST". |
|---|---|
| SCRIPT_NAME: | The initial portion of the request URL path. This may be empty, depending on the structure of your applications. |
| PATH_INFO: | The remainder of the request URL path, designating the resource within your application. |
| QUERY_STRING: | The portion of the request URL that follows the “?”. |
| CONTENT_TYPE: | The value of any Content-Type header in the HTTP request. If an upload is being done, this may have a value. |
| CONTENT_LENGTH: | The value of any Content-Length header in the HTTP request. If an upload is being done, this may have a value. |
| SERVER_NAME: | |
| SERVER_PORT: | The host name and port number |
| SERVER_PROTOCOL: | |
| The protocol the client used; either "HTTP/1.0" or "HTTP/1.1". | |
The WSGI environment includes the following WSGI-specific items.
| wsgi.version: | The tuple (1,0), representing WSGI version 1.0. |
|---|---|
| wsgi.url_scheme: | |
The “scheme” portion of the URL; either "http" or "https". |
|
| wsgi.input: | An input file from which the HTTP request body can be read. Generally, the body of a POST request will contain the input fields from the associated HTML form. |
| wsgi.errors: | An output file to which error output can be written. This generally the main log file for the server. |
| wsgi.multithread: | |
True if the application object may be simultaneously invoked by another thread in the same process. An application might use this information to determine how to manage database connections or other resources. |
|
| wsgi.multiprocess: | |
True if an equivalent application object may be simultaneously invoked by another process. |
|
| wsgi.run_once: | True if the server or gateway expects that the application will only be invoked once by the containing process; i.e., is this a one-shot CGI-style script. |
There are numerous WSGI-based applications and frameworks. We’ll look at some components based on the wsgiref implementation. A good alternative is the werkzeug implementation. For more information, see http://werkzeug.pocoo.org/.
Here’s an example of a WSGI application that dumps it’s environment as the response page.
import cgi
def dump_all_app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/html')]
start_response(status, headers)
env_dump= [
"<tt>%s=%r</tt><br/>" % (k,cgi.escape(str(environ[k]))) for k in environ
]
return [
"<html>",
"<head><title>dump_all</title></head>",
"<body><p>"] + env_dump + ["</p></body>",
"</html>"
]
WSGI Server. Separate from the WSGI applications is the WSGI server. This is built around a single application that will respond to requests on a specific port. This example uses the wsgiref implementation.
from wsgiref.simple_server import make_server
httpd = make_server('', 8008, dump_all_app)
print "Serving HTTP on port 8008..."
# Respond to requests until process is killed
httpd.serve_forever()
Composite Applications. The beauty of WSGI is that it allows the construction of Composite Applications.
There are two general design patterns.
Dispatching or Routing. In this case, a WSGI application selects among other applications and forwards the request to one or more other applications.
A URL parsing application, for example, can use wsgiref.util.shift_path_info() as part of transforming a URL into an application.
Middleware or Pipe. In this case, a WSGI application enriches the environment and passes the request to another application.
For example, authorization and authentication is a kind of pipe. The authorization application forwards valid requests with user information or reponds with an error.
Each individual aspect of a complex web application can be separated into a distinct WSGI application. This individual aspects include things like the following.
Authentication. An fork-style application can handle the HTTP_Authentication header. If the request lacks a proper header, this application can respond with an status 401. It can delegate basic authentication to one application and digest authentication to another application.
One authenticated, an application can enrich the environment with the authenticated user information. Perhaps fetching any saved session information.
Authorization. A pipeline application can determine if the user is actually allowed to perform the requested function. If the user is not authorized, it can produce a redirection to a login page. If the user is authorized, it can redirect to another application that does “real” work.
Caching. A pipeline application can check for a given URL and return a previous result for known URL’s that haven’t expired. For new, unknown URL’s (or expired URL’s) the request can be passed on to application that does the “real” work.
Form Data Parsing. A pipeline application can parse the form data and enrich the environment with data from the various form fields. After parsing, another application can be called to process the form input.
Upload Storing. A pipeline application can capture the uploaded file and save it in an upload directory for further processing. It can enrich the environment with information about the uploaded file. After saving, another application can be called to process the uploaded file.
The HTTP protocol is defined as being stateless. Each request-reply transaction is independent, with no memory of any prior transaction. If a web server is only providing access to static pages of content, this stateless transaction is precisely what we expect.
However, if we want a richer, more sophisticated, data processing application, we expect the application to be stateful. Indeed, one of the primary reasons for using computers is to store and retrieve information. Stored information represents the state of a database or file.
Also, an individual transaction often involves the server retaining state as we enter data, correct that data, and finally commit the change to the database.
The core issue is this. Given the stateless HTTP transactions and numerous concurrent clients, how do we distinguish the sequence of requests for a single uiser?
Cookies. The HTTP/1.1 standard introduced the concept of a cookie. A cookie is a small packet of data that is sent to a browser as part of a response header. The browser must then include the cookie as part of each subsequent request. This permits a web server to recognize a specific browser session, and assure that the user’s interactions are stateful.
By making the HTTP session stateful, a web application can respond in more meaningful ways.
Sessions. To create stateful web applications, we need to introduce the concept of a session. The web application must do the following kinds of things.
All data that must be reflected back to the user must be kept in an object that is unique to each session. Clearly, these session objects will accumulate as a web server runs.
For speed of access, the sessions are kept in a simple dictionary. Periodically, the web server must examine the sessions and discard any that are older than some reasonable threshold. For private information (like financial or medical records) 20 minutes is deemed old enough. For other things, session objects may last for several hours.
While the full extent of web applications is beyond the scope of this book, we can look at the essential ingredients in processing form input in a web server.
Here’s an example of a simple form. This form will send a POST request to the path . when the user clicks the Convert button.
The input will include three name-value pairs with keys of fahrenheit (from the <input type="text">), celsius (from the other <input> tag) and action (from the <button type="submit">).
<html><head><title>Conversion</title><head>
<body><form action="." method="POST">
<label>Fahrenheit</label> <input type="text" name="fahrenheit"/>
<br/>
<label>Celsius</label> <input type="text" name="celsius"/>
<br/>
<button type="submit" name="action" value="submit">Convert</button>
</form>
</body>
</html>
Browser Processing. Given a form, a browser displays the elements. It then allows the user to interact with the form.
When the user clicks submit, the contents for the form are transformed into a HTTP request.
The method attribute of the form determines what request method is used and how the form’s data is packaged for transmission to the web server.
For method="GET", the request is a GET, and the contents of the form are URL-encoded and put into the URL after a ?.
The request might look like this.
http://localhost:8008/?fahrenheit=&celsius=12.0&action=submit
A WSGI application will find this data in environ["QUERY_STRING"].
The easiest way to handle this data is to use cgi.parse() in the cgi module.
data = cgi.parse( environ["QUERY_STRING"], environ )
For method="POST", the request is a POST, and then contents of the form are URL-encoded and put into the request as a stream of data.
A WSGI application will find this data in the file-like object environ["wsgi.input"]. This object has the data associated with the request. The number of bytes is given by environ["CONTENT_LENGTH"].
The easiest way to handle this data is to use cgi.parse() in the cgi module.
data = cgi.parse( environ["wsgi.input"], environ )
Application Processing. Generally, the best design pattern is to build applications that have the following outline. This isn’t complete, but it is a useful starting point. We’ll add to this below.
When the user clicks a URL, the browser sends a GET request. The application responds with an empty form.
The user fills in the form, clicks the submit button. The browser sends a POST request, often to the same URL. The application validates the form input. If the input is valid, the application responds with the resulting page. If the input is not valid, the application responds with the form and any error messages.
The form’s data is parsed with cgi.parse().
Here’s an example WSGI application that shows the POST and GET processing.
form ="""\
<html><head><title>title</title><head>
<body>
<p>%(messages)s</p>
<form action="." method="GET">
<label>label1</label> <input type="text" name="field1" value="%(field1)s"/>
<br/>
<label>label2</label> <input type="text" name="field2" value="%(field2)s"/>
<br/>
<button type="submit" name="action" value="submit">Convert</button>
</form>
</body>
</html>
"""
def conversion( environ, start_response ):
# For a GET, display the empty form.
if environ['REQUEST_METHOD'] == "GET":
status = '200 OK' # HTTP Status
headers = [('Content-type', 'text/html')] # HTTP Headers
start_response(status, headers)
return [ form % { 'field1' : '', 'field2' : '', 'messages':'' } ]
# For a POST, parse the input, validate it, and try to process it.
else:
data= cgi.parse( environ['wsgi.input'], environ )
try:
if 'field1' in data:
field1= data.get('field1',[""])[0]
if 'field2' in data:
field2= data.get('celsius',[""])[0]
# Validate...
# Do processing...
status = '200 OK' # HTTP Status
headers = [('Content-type', 'text/html')] # HTTP Headers
start_response(status, headers)
return [ form % { 'field1' : field1, 'field2' : field2, 'messages':'' } ]
except Exception, e:
status = '400 ERROR' # HTTP Status
headers = [('Content-type', 'text/html')] # HTTP Headers
start_response(status, headers, exc_info=e)
return [ form % { 'field1' : '', 'field2' : '', 'messages':repr(data) } ]
The Post and Back Problem. Note that if you submit the form as a POST and click your browser’s back button, after looking at the next page, the form gets submitted again. Your browser will confirm that you want to submit form data again.
This behavior is usually prevented by using the “Redirect-after-Post” (also called the “Post-Redirect-Get”) design pattern. The response to a page processed with POST, is a status 301 (Redirect) response. This response must include a header with a label of Location and a value that is a URL to which the browser will address a GET request. This makes the back button behave nicely.
The complete overview, then, is the following.
When the user clicks a URL, the browser sends a GET request.
The application responds with the form, including any messages.
The user fills in the form, clicks the submit button. The browser sends a POST request, often to the same URL.
The application validates the form input.
If the input is valid, the application does the expected processing. The session is updated with completion messages. The application sends a "301 REDIRECT" response. This causes the browser to do a GET to the given location.
If the input is not valid, the application responds with the form and any error messages.
In more complex applications, there may be multiple pages, or multiple-step transactions. There may also be a “confirm” page at the end which summarizes the transaction before the real work is done. This requires accumulating considerable information in the session.
When we looked at HTTP in The World Wide Web and the HTTP protocol, we were interested in its original use case of serving web pages for people. We can build on HTTP, creating an interface between software components, something called a web service. A web service leverages the essential request-reply nature of HTTP, but takes the elaborate human-centric HTML web page out of the response. Instead of sending back something for people to read, web services send just the facts without a sophisticated presentation.
Web services allow us to have a multi-server architecture. A central web server provides interaction with people. When a person’s browser makes a request, this central web server can make web service requests of other servers to gather the information. After gathering the information, the central server aggregates it and builds the final HTML-based presentation, which is the reply sent to the human user.
Web services are an adaptation of HTTP; see The World Wide Web and the HTTP protocol for a summary. Web services rely on a number of other technologies. There are several competing alternatives, and we’ll look at web services in general before looking at a specific technique.
There are a number of ways of approaching the problem of coordinating work between clients and servers. All of these alternatives have their advantages and disadvantages.
We’ll focus on REST because it can be done largely using urllib2 features (see Writing Web Clients: The urllib2 Module) and the JSON library.
RESTful Web Services. The essence of REST is that we are accessing a resource that resides on another, remote computer. In order to do this, we must transfer a representation of that object’s state.
We have, therefore, three separate issues that we have to address.
Representing an object’s state. We can use XML for this. There are other notations including JSON and YAML which are also used to represent an object’s state. We’ll focus on JSON because it’s widely used and very simple.
This representation issue happens on both client and server side of the transaction. When the client wants to create or update a resource, it must represent the object. When the server wants to provide a resource, it must represent the object, also.
Making the client request. This means marshalling the arguments, making the request, and unmarshalling the response. Since REST is based on HTTP, this is a kind of HTTP client access using one of the four methods: GET, POST, PUT, DELETE.
Serving requests. This means unmarshalling arguments, doing something useful, and marshalling a response. Since this is based on HTTP, this is a kind of HTTP server.
A GET request, generally, doesn’t have any arguments; it identifies a resource, which is marshalled and returned.
A POST request creates a new resource. The associated data is an URL-encoded version of the resources to create. Often the created resource is marshalled and returned as a kind confirmation.
A PUT request will replace a resource. The associated data is an URL-encoded version of the resources to replace or update the existing resource.
A DELETE request will remove a resource. Generally, this doesn’t have any arguments; it identifies a resource, which is removed.
Let’s imagine that we’ve built a an extremely good simulation of a roulette wheel. We’d like to package this as a web service so that many people can share this in their simulations of Roulette.
In some cases, a web service is built into a more complete web application framework. Often the server will have a human interface as well as a web service interface. The human interface will use HTML. The web service interface will use JSON.
We’ll simplify things slightly, and create a family of WSGI applications to route requests, handle the JSON replies and handle HTML replies.
The Resource. The resource we’re serving is a roulette wheel. We created this Python module that defines the wheel. Each spin creates a dictionary that shows a number of bets which are won by this spin.
This is a separate module that includes just the class definition that we’ll be serving.
wheel.py
import random
class Wheel( object ):
redNumbers= set( [1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36] )
domain = range(1,37) + [ "0", "00" ]
def __init__( self ):
self.rng= random.Random()
self.last_spin= None
self.count= 0
def spin( self ):
n = random.choice( Wheel.domain )
if n in ( "0", "00" ):
self.last_spin= {
"number": n,
"color": "green",
"even": False,
"high": None,
"twelve": None,
"column": None,
}
else:
color = "red" if n in Wheel.redNumbers else "black"
self.last_spin= {
"number": n,
"color": color,
"even": n%2==0,
"high": n>=18,
"twelve": n//12,
"column": n%3,
}
self.count += 1
return self.last_spin
The WSGI Applications. We’ll define several WSGI Applications that will create a comprehensive wheel web service.
This first example is the top-level “routing” application that parses the URL and delegates the work to another application. This is not a very flexible design. There are numerous better examples of very flexible routing using regular expression matching and other techniques.
wheelservice.py, part 1
import wsgiref.util
import cgi
import sys
import traceback
import json
import wheel
# A global object so we can maintain state.
theWheel= wheel.Wheel()
def routing( environ, start_response ):
"""Route based on top-level name of URL."""
try:
top_name = wsgiref.util.shift_path_info( environ )
if top_name == "html":
return person( environ, start_response )
elif top_name == "json":
return service( environ, start_response )
else:
start_response( '404 NOT FOUND', [("content-type","text/plain")] )
return [ "Resource not found." ]
except Exception, e:
environ['wsgi.errors'].write( "Exception %r" % e )
traceback.print_exc( file=environ['wsgi.errors'] )
status = "500 ERROR"
response_headers = [("content-type","text/plain")]
start_response(status, response_headers, sys.exc_info())
return ["Application Problems. ", repr(e) ]
A common extension to this routing is to respond to the /favicon.ico request by providing a graphic image file that can be displayed in the URL box of the browser.
This second example is the next-level “person” and “service” applications. These do the real work of the overall service. They either respond to a person (using HTML) or to a web services client (using JSON).
wheelservice.py, part 2
def person( environ, start_response, exc_info=None ):
"""Print some information about the stateful wheel."""
global theWheel
status = '200 OK'
headers = [('Content-type', 'text/html')]
start_response(status, headers, exc_info)
return [
"<html>",
"<head><title>Wheel Service</title></head>",
"<body>",
"<p>Wheel service is spinning.</p>"
"<p>Served %d spins.</p>" % (theWheel.count,),
"</body>",
"</html>"
]
def service( environ, start_response ):
"""Update the stateful wheel."""
global theWheel
spin= theWheel.spin()
status= '200 OK'
headers = [("content-type","text/plain")]
start_response( status, headers )
return [ json.dumps(spin) ]
The WSGI Service. The WSGI service simply wraps our composite application routing() and serves it.
wheelservice.py, part 3
from wsgiref.simple_server import make_server
httpd = make_server('', 8008, routing)
print "Serving HTTP on port 8008..."
# Respond to requests until process is killed
httpd.serve_forever()
Let’s imagine that a colleague has built a web service which provides us with an extremely good simulation of a roulette wheel. Our colleague has provided us with the following summary of this web service.
| host: | 10.0.1.5. While IP address numbers are the lowest-common denominator in naming, some people will create Domain Name Servers (DNS) which provide interesting names instead of numeric addresses. If you are testing on a single computer, you will use localhost. |
|---|---|
| port number: | 8008. While the basic HTTP service is defined to run on port 80, you may have other web services which, for security reasons, aren’t available on port 80. Port numbers from 1024 and up may be allocated for other purposes, so port numbers are often changed as part of the configuration of a program. |
| path: | /json/ for the basic web services request. This isn’t the best definition for this resource, since it can’t easily be expanded. |
| method: | GET. |
| response: | JSON-encoded dictionary with attributes of the spin. |
This gives us a final URL of http://localhost:8008/json/ for access to this service.
To create a web services client, we can use the urllib2 module to access this service.
wheelclient.py
import urllib2
import json
def get_a_spin( ):
result= urllib2.urlopen( "http://localhost:8008/json/" )
assert result.code == 200
assert result.msg == "OK"
# print result.headers # to see information about the service
data= result.read()
return json.loads( data )
spin= get_a_spin()
print spin
Write a simple, special-purpose FTP client that establishes a connection with an FTP server, gets a directory and ends the connection. The FTP directory commands are “DIR” and “LS”. The responses may be long and complex, so this program must be prepared to read many lines of a response.
For more information, RFC 959 has complete information on all of the commands an FTP server should handle. Generally, the DIR or LS command, the GET and PUT commands are sufficient to do simple FTP transfers.
Your client will need to open a socket on port 21. It will send the command line, and then read and print all of the reply information. In many cases, you will need to provide additional header fields in order to get a satisfactory response from a web server.
To test this, you’ll need to either activate an FTP server on your computer, or locate another computer that offers FTP services.
You can easily write a desktop application that uses web technology, but doesn’t use the Internet. Here’s how it would work.
Your application is built as a very small web server, based on BaseHTTPServer.HTTPServer or wsgiref.simple_server. This application prepares HTML pages and forms for the user.
The user will interact with the application through a standard browser like Firefox, Opera or Safari. Rather than connect to a remote web server somewhere in the Internet, this browser will connect to a small web server running on your desktop.
The URL will be http://localhost:8008/.
You can package your application with a simple shell script (or .BAT file) which does two things. (This can also be done as a simple Python program using the subprocess module.)
Since this is a single-user application, there won’t be multiple, concurrent sessions, which greatly simplifies web application implementation.
Example Application. We could, for example, write a small application that did Fahrenheit to Celsius conversion. We would create a Python web server and a “wrapper” script that launched the server and launched a browser.
The Input Form. While the full power of HTML is beyond the scope of this book, we’ll provide a simple form using the <form>, <label>, <button> and <input> tags.
Here’s an example of a simple form. This form will send a POST request to the path . when the user clicks the Convert button.
The input will include name-value pairs with keys of fahrenheit, celsius and action. The value will be a list of strings. Since the form only has a simple text field with a given name, there will be a single string in each list.
The cgi.parse() function can parse the encoded form input.
<html><head><title>Conversion</title><head>
<body><form action="." method="POST">
<label>Fahrenheit</label> <input type="text" name="fahrenheit"/>
<br/>
<label>Celsius</label> <input type="text" name="celsius"/>
<br/>
<button type="submit" name="action" value="submit">Convert</button>
</form>
</body>
</html>
The WSGI Applications. You’ll need to write at least one WSGI application to handle the form input.
An Overview of the WSGI Server. You’ll use wsgiref.make_server() to create a server from your form-handling application. You’ll need to provide an address like ('', 8008) , and the name of your application. This object’s serve_forever() method will then handle HTTP requests on port 8008.
We’ll create an alternative implementation of the simple Roulette server shown in Web Services.
We’ll define a simple REST-based protocol for placing bets, spinning the wheel and retrieving the results of the placed bets.
Each BET will be considered a “resource”. We’ll use POST to create the resource, and GET to check on the resource after the spin.
The spin is a subtle issue. In a sense, we’re merely getting a value. However, executing the spin, changes the state of the various bets. Therefore, the spin should be a kind of POST transaction.
Session and State. A web site that interacts with a browser generally uses cookies to maintain state so that the person doesn’t have to be aware of how state is maintained.
For web services applications, it’s cosiderably simpler to maintain state explicitly. A WS client program can use explicit session identification.
We’ll handle this thorough a simple GET request which provides a unique session identifier.
A more secure method would include HTTP Digest Authentication. However, that’s beyond the scope of this book.
Resource Details. Our top-level application will examine the request URL. We’ll consider the top-level URL as the “resource” type. Our top-level application can then route the request based on this resource name.
| session: | A POST request to /session/ will allocate a new session. The response is a JSON document with the session identifier. The data sent is a JSON document with information like the bettor’s name. The session identifier must be used in all further transactions to identify the specific bettor. |
|---|---|
| bet: | A POST request to /bet/session/ will create a new bet. The processing could look like the following.
A GET request to /bet/session/confirmation/ will return the status of the requested bet. The response is a JSON document with the bet and the outcome. If the wheel has not been spun, the outcome is None. Otherwise, the outcome is the bet’s payout multiplied by the amount of the bet. |
| spin: | A POST request to /spin/session/ indicates that the bets and placed and the wheel can be spun. This will respond with a JSON document that has the wheel spin confirmation number. Currently, there’s not much use for this information except to acknowledge that the wheel was spun. After a POST request to /spin/session/, a client will have to do a GET request to retrieve bet results. |
Write a simple client which places a number of bets on the Roulette server, using the Martingale betting strategy.
The Martingale strategy is relatively simple. A bet is placed on just one 2:1 proposition (Red, Black, Even, Odd, High or Low). A base betting amount (the table minimum) is used. If the outcome of the spin is a winner, the betting amount is reset to the base. If the outcome of the spin is a loser, the betting amount is doubled.
Note that this “double up on a loss” betting will lead to situations where the ideal bet is beyond the table maximum. In that case, your simulation must adjust the bets to be the table maximum.
Also note that a proper simulation has a budget for betting. When the budget is exhausted, the simulation has to stop playing.
We can write a web application which uses our Roulette Server. This will lead to a fairly complex (but typical) architecture, with two servers and a client.
We’ll have the Roulette Server from the Complete Roulette Server exercise, running on some non-priviledged port (like 36000). This server accepts bets and spins the wheel on behalf of a client process. It has no user interaction, it simply maintains state, in the form of bets placed.
We’ll have a web server, similar to the Desktop Web Application exercise, running on port 8008. This application can present a simple form for placing a bet or spinning the wheel.
The user can fill in the fields to define a bet. When the user clicks the Bet button, the web application will make a request to the Roulette Server and present the results in the HTML page that is returned to the user.
If the bet is valid, the web application will make a request to the Roulette Server to spin the wheel. It will present the results in the HTML page that is returned to the user.
This interaction between web application and Roulette server can all be done with urllib2.
We’ll can then use a browser to contact our web server. This client will browse “http://localhost:8008” to get a web page with a simple form for placing a bets and spinning the wheel.
A simple HTML form might look like the following.
<html><head><title>Roulette</title></head>
<body>
<p>Results from previous request go here</p>
<form action="." method="POST">
<label>Amount</label> <input type="text" name="amount"/>
<br/>
<label>Proposition</label> <select name="proposition">
<option>Red</option>
<option>Black</option>
<option>Even</option>
<option>Odd</option>
<option>High</option>
<option>Low</option>
</select>
<br/>
<button type="submit" name="action" value="bet">Bet</button>
</form>
</body>
</html>
A GET request can present the form.
A POST request must parse the input to find the values of the two fields ("proposition" and "amount"). It must validate input on the form, make requests to the server, and present the results.
In Chessboard Locations we described some of the basic mechanics of chess play. A chess server would allow exactly two clients to establish a connection. It would then a chess moves from each client and respond to both clients with the new board position.
We can create a simple web service that has a number of methods for handling chess moves. To do this, we’ll need to create a basic ChessBoard class which has a number of methods that establish players, move pieces, and report on the board’s status.
It’s essential that the ChessBoard be a single object that maintains the state of the game. When two players are connected, each will need to see a common version of the chessboard.
Here are some of the methods that are essential to making this work.
A web service will offer a simple RESTful resource for interacting.
| game: |
|
|---|
A client application for this web service can use urllib2 to make the various POST and GET requests of the server.
Generally the ue case for the client would have the following outline.
The client process will attempt a connection. If that fails, the server is somehow unable to start a new game.
The client will display the board state and wait for the user to make a move. Once the move is entered, the client will make web services requests to provide moves from the given player and display the resulting board status.
Also, the client must “poll” the server to see if the other player has entered their move.
If a web page includes the following HTML, it will periodically refresh itself, polling the server. <meta http-equiv="refresh" content="60">. This is included within the <head> tags. This will poll once every 60 seconds.
For a desktop applicaiton, the polling is usually done by waiting a few seconds via time.sleep().
Socket-level programming isn’t our first choice for solving client-server problems. Sockets are nicely supported by Python, however, giving us a way to create a new protocol when the vast collection of existing internetworking protocols are inadequate.
Client-server applications include a client-side program, a server, a connection and a protocol for communication betweem the two processes. One of the most popular and enduring suite of client-server protocols is based on the Internetworking protocol: TCP/IP. For more information in TCP/IP, see Internetworking with TCP/IP [ Comer95 ] .
All of the TCP/IP protocols are based on the basic socket . A socket is a handy metaphor for the way that the Transport Control Protocol (TCP) reliably moves a stream of bytes between two processes.
The socket module includes a number of functions to create and connect sockets. Once connected, a socket behaves essentially like a file: it can be read from and written to. When we are finished with a socket, we can close it, releasing the network resources that were tied up by our processing.
When a client application communicates with a server, the client does three things: it establishes the connection, it sends the request and it reads the reply from the server. For some client-server relationships, like a databsae server, there may be multiple requests and replies. For other client-server requests, for example, the HTTP protocol, a single request may involve a number of replies.
To establish a connection, the client needs two basic facts about the server: the IP address and a port number. The IP address identifies the specific computer (or host) that will handle the request. The port number identifies the application program that will process the request on that host. A typical host will respond to requests on numerous ports. The port numbers prevent requests from being sent to the wrong application program. Port numbers are defined by several standards. Examples include FTP (port 21) and HTTP (port 80).
A client program makes requests to a server by using the following outline of processing.
Developing an Address. An IP address is numeric. However, the Internet provides domain names, via Domain Name Services (DNS). This permits useful text names to be associated with numeric IP addresses. We’re more used to "www.python.org". DNS resolves this to an IP address. The socket module provides functions for DNS name resolution.
The most common operation in developing an address is decoding a host name to create the numeric IP address. The socket module provides several functions for working with host names and IP addresses.
Typically, the socket.gethostbyname() function is used to develop the IP address of a specific server name. It does this by makig a DNS inquiry to transform the host name into an IP address.
Port Numbers. The port number is usually defined by your application. For instance, the FTP application uses port number 21. Port numbers from 0 to 1023 are assigned by RFC 1700 standard and are called the well known ports. Port numbers from 1024 to 49151 are available to be registered for use by specific applications. The Internet Assigned Numbers Authority (IANA) tracks these assigned port numbers. See http://www.iana.org/assignments/port-numbers. You can use the private port numbers, from 49152 to 65535, without fear of running into any conflicts. Port numbers above 1024 may conflict with installed software on your host, but are generally safe.
Port numbers below 1024 are restricted so that only priviledged programs can use them. This means that you must have root or administrator access to run a program which provides services on one of these ports. Consequently, many application programs which are not run by root, but run by ordinary users, will use port numbers starting with 1024.
It is very common to use ports from 8000 and above for services that don’t require root or administrator privileges to run. Technically, port 8000 has a defined use, and that use has nothing to do with HTTP. Port 8008 and 8080 are the official alternatives to port 80, used for developing web applications. However, in spite of an official use, port 8000 is often used for web applications.
The usual approach is to have a standard port number for your application, but allow users to override this in the event of conflicts. This can be a command-line parameter or it can be in a configuration file.
Generally, a client program must accept an IP address as a command-line parameter. A network is a dynamic thing: computers are brought online and offline constantly. A “hard-wired” IP address is an inexcusable mistake.
Create and Connect a Socket. A socket is one end of a network connection. Data passes bidirectionally through a socket between client and server. The socket module defines the SocketType, which is the class for all sockets. The socket() function creates a socket object.
A SocketType object has a number of method functions. Some of these are relevant for server-side processing and some for client-side processing. The client side method functions for establishing a connection include the following.
Sending the Request and Receiving the Reply. Sending requests and processing replies is done by writing to the socket and reading data from the socket. Often, the response processing is done by reading the file object that is created by a socket’s makefile() method. Since the value returned by makefile() is a conventional file, then readlines() and writelines() methods can be used on this file object.
A SocketType object has a number of method functions. Some of these are relevant for server-side processing and some for client-side processing. The client side method functions for sending (and receiving) data include the following.
Example. The following examples show a simple client application using the socket module.
This is the Client class definition.
#!/usr/bin/env python
import socket
class Client( object ):
rbufsize= -1
wbufsize= 0
def __init__( self, address=('localhost',7000) ):
self.server=socket.socket( socket.AF_INET, socket.SOCK_STREAM )
self.server.connect( address )
self.rfile = self.server.makefile('rb', self.rbufsize)
self.wfile = self.server.makefile('wb', self.wbufsize)
def makeRequest( self, text ):
"""send a message and get a 1-line reply"""
self.wfile.write( text + '\n' )
data= self.rfile.read()
self.server.close()
return data
print "Connecting to Echo Server"
c= Client()
response= c.makeRequest( "Greetings" )
print repr(response)
print "Finished"
A Client object is initialized with a specific server name. The host ( "localhost" ) and port number ( 8000 ) are default values in the class __init__() function. The address of "localhost" is handy for testing a client and a server on your PC. First the socket is created, then it is bound to an address. If no exceptions are raised, then an input and output file are created to use this socket.
The makeRequet() function sends a message and then reads the reply.
When a server program starts, it creates a socket on which it listens for requests. The server has a three-step response to a client. First, it accepts the connection, then it reads and processes the client’s request. Finally, it sends a reply to the client. For some client-server relationships, like a database server, there may be multiple requests and replies. Since database requests may take a long time to process, the server must be multi-threaded in order to handle concurrent requests. In the case of HTTP, a single request will lead to multiple replies.
A server program handles requests from a client by using the following outline of processing.
Create and Listen on a Socket. The following methods are relevant when creating server-side sockets. These server side method functions are used for establishing the public socket that is waiting for client connections. In each definition, the variable s is a socket object.
Accept a client connection, returning a socket connected to the client and client address.
The original bound socket, which was set in listen mode is left alone, and is still listening for the next connection.
Once the socket connection has been accepted, processing is a simple matter of reading and writing on the daughter socket.
We won’t show an example of writing a server program using simple sockets. The best way to make use of server-side sockets is to use the SocketServer module.
Generally, we use the SocketServer module for simple socket processing. Usually, we create a TCPSocket using this module. This can simplify the processing of requests and replies. The SocketServer module is the basis for the SimpleHTTPServer (see The World Wide Web and the HTTP protocol).
Much of server-side processing is encapsulated in two classes of the SocketServer module. You will subclass the StreamRequestHandler class to process TCP/IP requests. This subclass will include the methods that do the essential work of the program.
You will then create an instance of the TCPServer class and give it your RequestHandler subclass. The instance of TCPServer will to manage the public socket, and all of the basic processing. For each connection, it will create an instance of your subclass of StreamRequestHandler to handle the connection.
Define a RequestHandler. Defining a handler is done by creating a subclass of StreamRequestHandler or BaseRequestHandler and adding a handle() method function. The BaseRequestHandler defines a simple framework that TCPServer can use when data is received on a socket.
Generally, we use a subclass of StreamRequestHandler. This class has methods that create files from the socket. This alliows the handle() method function to simply read and write files. Specifically, the superclass will assure that the variables self.rfile and self.wfile are available.
For example, the echo service runs in port 7. The echo service simply reads the data provided in the socket, and echoes it back to the sender. Many Linux boxes have this service enabled by default. We can build the basic echo handler by creating a subclass of StreamRequestHandler.
#!/usr/bin/env python
"""My Echo"""
import SocketServer
class EchoHandler( SocketServer.StreamRequestHandler ):
def handle(self):
input= self.request.recv(1024)
print "Input: %r" % ( input, )
self.request.send("Heard: %r\n" % ( input, ) )
server= SocketServer.TCPServer( ("",7000), EchoHandler )
print "Starting Server"
server.serve_forever()
This class can be used by a TCPServer instance to handle requests. In this, the TCPServer instance named server creates an instance of EchoHandler each time a connection is made on port 7. The derived socket is given to the handler instance, as the instance variable self.request.
A more sophisticated handler might decode input commands and perform unique processing for each command. For example, if we were building an on-line Roulette server, there might be three basic commands: a place bet command, a show status command and a spin the wheel command. There might be additional commands to join a table, chat with other players, perform credit checks, etc.
Methods of TCPServer. In order to process requests, there are two methods of a TCPServer that are of interest.
Generally, HTTP-based web services do almost everything we need; and they do this kind of thing in a simple and standard way. Using sockets is done either to invent something new or to cope with something very old. Using web services is often a better choice than inventing your own protocol.
If you can’t, for some reason, make suitable use of web services, here are some lessons gleaned from the reading the Internetworking Requests for Comments (RFCs).
Many protocols involve a request-reply conversational style. The client connects to the server and makes requests. The server replies to each request. Some protocols (for example, FTP) may involve a long conversation. Other protocols (for example, HTTP) involve a single request and (sometimes) a single reply. Many web sites leverate HTTP’s ability to send multiple replies, but some web sites send a single, tidy response.
Many of the Internet standard requests are short 1- to 4-character commands. The syntax is kept intentionally very simple, using spaces for delimeters. Complex syntax with optional clauses and sophisticated punctuation is often an aid for people. In most web protocols, a sequence of simple commands are used instead of a single, complex statement.
The responses are often 3-digit numbers plus explanatory comments. The application depends on the 3-digit number. The explanatory comments can be written to a log or displayed for a human user. The status numbers are often coded as follows:
| 1yz: | Preliminary reply, more replies will follow. |
|---|---|
| 2yz: | Completed. |
| 3yz: | More information required. In the case of FTP, this is typically the start of a dialog. In the case of HTTP, it is often a redirect. |
| 4yz: | Request not completed; trying again makes sense. This is a transient problem like a deadlock, timeout, or file system problem. In the case of HTTP, this is also used for an authentication problem. |
| 5yz: | Request not completed because it’s in error; trying again doesn’t make sense. This a syntax problem or other error with the request. |
The middle digit within the response provides some additional information.
| x0z: | The response message is syntax-related. |
|---|---|
| x1z: | The response message is informational. |
| x2z: | The response message is about the connection. |
| x3z: | The response message is about accounting or authentication. |
| x5z: | The response message is file-system related. |
These codes allow a program to specify multi-part replies using 1 yz codes. The status of a client-server dialog is managed with 3 yz codes that request additional information. 4 yz codes are problems that might get fixed. 5 yz codes are problems that can never be fixed (the request doesn’t make sense, has illegal options, etc.)
Note that protocols like FTP (RFC 959) provide a useful convention for handling multi-line replies: the first line has a - after the status number to indicate that additional lines follow; each subsequent lines are indented. The final line repeats the status number. This rule allows us to detect the first of many lines, and absorb all lines until the matching status number is read.