Software Carpentry logo

Web Client Programming

April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.

1) Introduction

2) You Can Skip This Lecture If...

3) Small Pieces, Loosely Joined

4) Distributed Is Different

5) Partial Failure

6) Under the Hood

7) Sockets

Sockets

Figure 22.1: Sockets

8) Client/Server vs. Peer-to-Peer

9) Socket Client

import sys, socket

buffer_size = 1024     # bytes
host = '127.0.0.1'     # local machine
port = 19073           # hope nobody else is using it...
message = 'ping!'      # what to send

# AF_INET means 'Internet socket'.
# SOCK_STREAM means 'TCP'.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))

# Send the message.
sock.send(message)

# Receive and display the reply.
data = sock.recv(buffer_size)
print 'client received', `data`

# Tidy up.
sock.close()
client received 'pong!'

10) Socket Server

import sys, socket

buffer_size = 1024     # bytes
host = ''              # empty string means 'this machine'
port = 19073           # must agree with client

# Create and bind a socket.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, port))

# Wait for a connection request.
s.listen(True)
sock, addr = s.accept()
print 'Connected by', addr

# Receive and display a message.
data = sock.recv(buffer_size)
print 'server saw', str(data)

# Replace vowels in reply.
data = data.replace('i', 'o')
sock.send(data)

sock.close()
Connected by ('127.0.0.1', 1297)
server saw ping!

11) The Hypertext Transfer Protocol

HTTP Request Cycle

Figure 22.2: HTTP Request Cycle

12) HTTP Request Line

13) Headers

14) Body

15) HTTP Response

HTTP Response

Figure 22.4: HTTP Response

16) HTTP Response Codes

Code Name Meaning
100 Continue Client should continue sending data
200 OK The request has succeeded
204 No Content The server has completed the request, but doesn't need to return any data
301 Moved Permanently The requested resource has moved to a new permanent location
307 Temporary Redirect The requested resource is temporarily at a different location
400 Bad Request The request is badly formatted
401 Unauthorized The request requires authentication
404 Not Found The requested resource could not be found
408 Timeout The server gave up waiting for the client
500 Internal Server Error An error occurred in the server that prevented it fulfilling the request
601 Connection Timed Out The server did not respond before the connection timed out

Table 22.1: HTTP Response Codes

17) HTTP Example

import sys, socket

buffer_size = 1024

HttpRequest = "GET /greeting.html HTTP/1.0\n\n"

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('www.third-bit.com', 80))

sock.send(HttpRequest)

response = ''
while True:
    data = sock.recv(buffer_size)
    if not data:
        break
    response += data
sock.close()

print response
HTTP/1.1 200 OK
Date: Fri, 03 Mar 2006 18:12:55 GMT
Server: Apache/2.0.54 (Debian GNU/Linux)
Last-Modified: Fri, 03 Mar 2006 18:12:23 GMT
Content-Length: 92
Content-Type: text/html

<html>
<head><title>Greeting Page</title></head>
<body>
<h1>Greetings!</h1>
</body>
</html>

18) Fetching Pages

19) urllib Example

import urllib

instream = urllib.urlopen("http://www.third-bit.com/greeting.html")
lines = instream.readlines()
instream.close()
for line in lines:
    print line,

20) Building A Spider

import sys, urllib, re

url = sys.argv[1]
instream = urllib.urlopen(url)
page = instream.read()
instream.close()

links = re.findall(r'href=\"[^\"]+\"', page)
temp = set()
for x in links:
    x = x[6:-1]    # strip off 'href="' and '"'
    if x.startswith('http://'):
        temp.add(x)
links = list(temp)
links.sort()
for x in links:
    print x
$ python spider.py http://www.google.ca
http://groups.google.ca/grphp?hl=en&tab=wg&ie=UTF-8
http://news.google.ca/nwshp?hl=en&tab=wn&ie=UTF-8
http://scholar.google.com/schhp?hl=en&tab=ws&ie=UTF-8
http://www.google.ca/fr

21) Passing Parameters

22) Special Characters

23) Encoding Example

import urllib
print urllib.urlencode({'surname' : 'Von Neumann', 'forename' : 'John'})
surname=Von+Neumann&forename=John

24) Screen Scraping (And Why Not)

25) Web Services

Web Services

Figure 22.5: Web Services

26) Example: Amazon

import sys, amazon

# Display author's name nicely.
def prettyName(arg):
    if type(arg) in (list, tuple):
        arg = ', '.join(arg[:-1]) + ' and ' + arg[-1]
    return arg

if __name__ == '__main__':

    # Get information.
    key, asin = sys.argv[1], sys.argv[2]
    amazon.setLicense(key)
    items = amazon.searchByASIN(asin)

    # Display information.
    for item in items:
        productName = item.ProductName
        ourPrice = item.OurPrice
        authors = prettyName(item.Authors.Author)
        print '%s: %s (%s)' % (authors, productName, ourPrice)
$ python findbook.py 123ABCDEFGHIJKL4MN56 0974514071
Greg Wilson: Data Crunching : Solve Everyday Problems Using Java, Python, and more. ($18.87)

27) Summary