Originally, I sat down to write down an article titled:
When the story finally moved into the domain of http, it started to grow bigger. I enjoyed documenting about http. However, in doing so I was deviating badly from what I originally intended to do with the story. So I decided to split the story in parts. In this episode, we shall talk about the evolution of HTTP through ages.
The Hyper Text Transfer Protocol is the protocol for WWW. It was developed in collaboration of W3C (WWW Custodian) and Internet Engineering Task Force for distributed, collaborative, hypermedia information systems. It was essentially designed to retrieve linked resources. Understanding HTTP is perhaps the most crucial step in understanding how www works. So let’s have a detailed look at HTTP.
The HTTP Protocol
HTTP Protocol had a real humble beginning as a application layer protocol in TCP Suite for transmission of an HTML File. Period. The Protocol underwent several changes over a period of time –
The protocol was a simplicity in itself. Client would connect to the server and issue a GET request for retrieving an HTML document. The overall process of a request-response cycle is as follows-
- Client connects to server on port 80
- Client sends a get request to retrieve a HTML document in the following format
GET /hello.html [cr][lf]
- Server responds with the content of requested HTML file. The response will invariably be an HTML document.
- Once document is transferred to the Client; Server disconnects the connection. Client may disconnect the connection while transmission is in process, however server wont register it as an error case.
- If another document is required the connection process is re-initiated.
There were to verbs other than GET, no request or response header. It lacked all the fancy that future of HTTP was to see. While most servers are still capable for handling HTTP/0.9, there is no good reason why it should still be used.
If www was to become the synonym of internet itself, a lot more was needed than just HTML output. HTTP must change first. Version 1.0 was a major change. The official RFC RFC1945 for next major HTTP/1.0 came out as late as 1996. However, browsers and servers had already moved up with newer ideas long before. HTTP/1.0 merely seem to be consolidation of ideas already in practice. The major Highlights of the version 1.0 included:
- New Verbs: Whereas HTTP/0.9 had just one request format, the new version included several verbs including POST, HEAD, CONNECT. These commands (verbs) in turn extended the ways in which client can connect to server and use its functionality. Not only we get read from server but also can add new content to server (PUT), delete the contents from server (DELETE) and run debug (TRACE) and other test while reducing traffic (HEAD). more of these verbs later.
- Request Header: Now the request is no more limited to a verb line. Request can also carry a lot of key value pair along with request. Each key-value pair will be send on a separate line (ending with a [cr][lf] ). The verb line and the key value pairs are together described as Request Header. A Request header terminates with an empty line. The empty line indicates end of request header and a signal to server to start its processing. While the quite few request headers are standardized the basic idea is supply extra information to the server so that server can more effectively serve the request. Examples of this approach could be:
- A user agent specification can help server send different version of document depending on who is requesting.
- We can specify the date of cached version of page and server may send new content only if there is change since the last cached version.
- HTTP Response: Once server is ready to with a response, good or bad, it sends it response which is is a three part response:
- Status Code: Indicates a numeric code that sums of server’s response.
- 200 Ok: The most favoured response. Meaning granted. And server is going to send a proper response to the request.
- 1xx : It is a series of response which are informal. That may mean processing. Or may need more input. It is far from completing the request
- 2xx : Indicates that the request is received, successfully understood and accepted as valid response. 200 Ok granted. 201 indicates request to create certain request succeeded (recall PUT)
- 3xx: Redirection. 301 indicates resources permanently moved to a new location or 302 may indicate temporary redirection, 304 indicates that content not modified since last request (date need to be sent in request).
- 4xx: Request Error. Request has a bad format, resource not found or an authorized request fall in this category.
- 5xx: Server Side Error: This may be caused due to some problem with the server such as configuration issue, server busy or server unavailable.
- Response Header: Response header typically is a meta information about the actual content that is supposed to follow. It typically includes date when content was changed, length of content, type of content (html, image, zip etc.). The response header ends with an empty line. The empty line acts as a separator between the Response Header and actual content.
- Actual Content: Server sends the actual content soon after Response Header. They are typically collected and saved.
- Status Code: Indicates a numeric code that sums of server’s response.
How HTTP/1.0 Interaction works
- Client connects to server on port 80 (typically)
- Client Sends the Request Header including the verb.
- Client sends an empty line to indicate end of request.
- Server checks the request.
- Server sends status code
- Server sends Response Header
- Server sends an empty line
- Server sends the actual content
- Server terminates the connection.
- Any related or connected resource need to be requested by starting a new connection.
The latest version of HTTP is 1.1 was released under RFC 2068 in 1997 and later improved under RFC 2616 in 1999. The latest HTTP which is still almost a decade old added certain new provisions to already stable HTTP/1.0
- New Verbs: OPTIONS, TRACE, DELETE, PUT
- Host name Identification: HTTP/1.1 allowed the identification of Host. Prior of this version, server never new the host name. This provision paved way for sharing a single IP among various host because now server can now send different content depending on Host provided.
- Content Negotiation: HTTP/1.1 now allows server to maintain different versions of a single resource. For example, a document might be available in English as well as Hindi or may be available as a pdf document or word document format or a desktop version of document or pda version of it. Further HTTP allows the this negotiation to be either server driver or agent driven (client or browser)
- Server Driven Negotiation: Here the server decides or guesses the right version depending on the OS, Browser versions, country info based on the client IP and other information provided in the request Header
- Agent Driven Negotiation: Here the browser directly requests for the resource version or type it requires using suitable request header and server doesn’t decide.
- Persistent Connections: One of the major issues with the previous versions of HTTP was the fact that the connection was closed by the server as soon as response is sent. As it is often with the case to display a web page browser needs more than just the html page. For every other connected document (.css, .js, images, applet etc) it needs to create a new connection to the server. Such connection appear to be quite time taking. With HTTP/1.1 you can now request server to keep the connection alive. Earlier this was implemented by sending a special request header in this format:
However, with the official documentation out, the persistent is the default behaviour for HTTP server and now the connection is closed either on time out or by sending an explicit header request to do so.
- Chunk Transfer: This allows server to send chunks of data followed by additional header and more data. This is a good solution in case the content is dynamic and its size may not be known before hand.
- Byte Range: Byte range allows user to request documents in parts. This seem to be one of the biggest advantage in the sense it allows us to resume broken downloads and also allows us to have faster download by virtue of multi-threading. (Check out this link for an example of Byte Range download).
- Other Changes: There are several other changes done to make HTTP/1.1 a better protocol.
- Proxy and cache management: Gives better control on who access the document and how.
- Digest Authentication
- More Response Codes
- New Headers : It introduced new Headers such as Retry-After and Max-Forwards.
- New Media type: message/http, multipart/byteranges
So Where do we stand today
Let us sum up where we stand today and what is the significance of the latest innovations of HTTP protocol.
- Non-IP virtual Hosts
Virtual hosts can be used without needing additional IP addresses.
- Content Negotiation means more content types and better selection
Using content negotiation means that resources can be stored in various formats, and the browser automatically gets the ‘best’ one (e.g. the correct language). If a best match cannot be determined, the browser or server can offer a list of choices to the user.
- Faster Response
Persistent connections will mean that accessing pages with inline or embedded documents should be quicker.
- Better handling of interrupted downloads
The ability to request byte ranges will let browsers continue interrupted downloads.
- Better Behaviour and Performance from Caches
Caches will be able to use persistent connections to increase performance both when talking to browsers and servers. Use of conditionals and content negotiation will mean caches can identify responses quicker.