www was born to be just another route to internet; fate made it the internet.
In its inception it was meant to be for publishing electronic documents and the model was quite similar to print industry. concentration was on formatting the documents and documents changed infrequently. Access may require authentication but authentication was simple and need for a real time data non-existent.
As dependence on internet grew more and more complicated, need for a smarter communication was clear. There were seemingly two options:
- Create a new protocol.
- Hack and Existing Protocol.
And if your choice was second, your choice had to be HTTP.
Why hack HTTP and not just create a new protocol?
Here we will be getting into the guessing work. But the guesses are likely to be as perfect as fact itself. But why guess work? Because, hacks are rarely documented and never systematic.
First reason, that favours use of any existing protocol, and not just http, is the availability of large infrastructure which needs to be created across the globe, otherwise.
Http protocol emphasised on presenting formatted information, something which was the common denominator of all the programming requirement. Using http as the starting point made sense. The other important reasons that favoured http are:
- Presence of customizable request and response headers.
- Output in presentable format that can support layouts, links, tables etc.
- Availability of forms to get user input.
- HTTP anyway needed a solution for such a dynamic requirement.
As we have already talked, http soon felt the need of a search engine and the first proposed solution to it was to modify the http server itself. While the solution worked, it was clearly less than a desirable design and very far from an ideal solution. Not only it required modification to the server, it was apparent that this kind of solution would necessitate a lot of change in the server for so many different reasons.
What we really needed was what we term as open-close principle. A design that can allow other applications to run and support server thus extending its functionality without modifying the server for every business requirement.
Roadmap to http hack
Before we understand the solution, let us quickly picture how a simple http communication works. We will look at four different scenario of http communication.
Scenario 1: Requesting an html page
Client sends following request to the server.
GET HTTP/1.1 /profile/vivek.html [cr][lf]
CAN-ACCEPT: */*[cr][lf]
[… more headers …]
[cr][lf]
The server responds to client with a response header and the content.
HTTP/1.1 200 OK [cr][lf]
CONTENT-TYPE: text/html[cr][lf]
CONTENT-LENGTH: 2012[cr][lf]
[… more headers …]
[cr][lf]
[… actual content follows … ][content ends; connection closes]
Notice, the request was for an html page and the server responds with status code of 200, indicating that request was successful. Next it sends CONTENT-TYPE as text/html. It indicates that the generic type of output is text and the format of document is html. The header gives sufficient information so that client can save the output as a .html file and render it as an html document.
Scenario 2: Requesting an image
A typical request for image will be no different from the previous one.
GET HTTP/1.1 /profile/vivek.jpg [cr][lf]
CAN-ACCEPT: */* [cr][lf]
[… more headers …]
[cr][lf]
The server once again checks for the document and sends a response:
HTTP/1.1 200 Ok [cr][lf]
CONTENT-TYPE: image/jpeg [cr][lf]
CONTENT-LENGTH: 42085 [cr] [lf]
[… more headers …]
[cr][lf]
[… byte by byte content of image … ]
Notice this time server reports generic content type as a image file. The specification further reveals that the image is of type jpeg. Thus the client saves the image in a file with extension .jpg. The content is once again rendered as a image and appropriate viewer is used.
Scenario 3: Requesting an executable
Once again the request remains same.
GET HTTP/1.1 /downloads/time-piece.exe [cr] [lf]
CAN-ACCEPT: */* [cr][lf]
[… more headers …]
[cr][lf]
Here we have placed a request for an executable called time-piece.exe. Server checks for the same and responds:
HTTP/1.1 200 Ok [cr][lf]
CONTENT-TYPE: application/octet-stream [cr][lf]
CONTENT-LENGTH: 47200 [cr][lf]
[… more headers …]
[cr][lf][… byte by byte content of the executable …]
As we can see this time the mime type reported is application/octet-stream. This is a indication to the server that the response is an application and the content is not supposed to be displayed on the browser. It is supposed to be downloaded. The browsers typically present a save option.
Let us look at a fourth scenario. Here the request is made for an image. However server sends header such that the image is downloaded rather than displayed in the browser.
Scenario 4: Request that allows you to download the image
GET HTTP/1.1 /misc/india-map.jpg [cr][lf]
CAN-ACCEPT:*/* [cr][lf]
[… more headers …]
[cr][lf]
This time server sends the response in the following manner.
HTTP/1.1 200 Ok [cr][lf]
CONTENT-TYPE: application/octet-stream [cr][lf]
CONTENT-LENGTH: 48029 [cr][lf]
[… more headers …]
[cr][lf]
[… actual image content …]
Notice the request and response is proceeds in the same way as in scenario 2, except for one change. This time server describes content-type as application/octet-stream. This is a signal to client to save the output rather than to display on the browser.
So what is the bottom line?
Clients can place the request for a resource. All requests are placed in the same way.
Server Response typically include a CONTENT-TYPE. Clients reaction to response will typically be governed by the CONTENT-TYPE specified in the response header and NOT on the request headers. That also implies that the things can change between the request and response.
CGI – The http Hack
Since the modification to the server was neither desirable nor practical, NSCA developed Common Gateway Interface (CGI) specification. CGI soon became an standard way in which applications can interact with the server and generating dynamic content.
CGI is a specification and not an application or a particular programming language. The specification is designed for an application that reads (http request) from STDIN and presents the output on STDOUT. As long as this requirement is met, any application compiled or interpreted or scripts can act as CGI.
So how does it typically work?
- The client will request for the CGI application. Let us say a binary executable file.
- This time server instead of providing the binary executable for download will actually execute the application on server.
- The CGI application will generate the output to STDOUT
- The output from the CGI application will be piped to HTTP response stream.
- Thus the client will get the output of executable rather than executable itself.
Let us understand the whole scenario in http request response format:
GET HTTP/1.1 /whats-the-time.exe [cr][lf]
CAN-ACCEPT:*/* [cr][lf]
[…other headers…]
[cr][lf]
You will notice that the request header remains unchanged. However this will work differently this time. Server will actually execute the application whats-the-time.exe on the server. This application will check the current time on the server and print it in html format on the STDOUT. Server will respond to client as:
HTTP/1.1 200 ok [cr][lf]
CONTENT-TYPE:text/html [cr][lf]
CONTENT-LENGHT:1280[cr][lf]
[… other headers …]
[cr][lf]
[… the STDOUT output of the application …]
Notice this time we requested for a .exe. However, the content-type in response indicate a text/html output. The application executes on the server and generates dynamic information which is sent to the client.
But how will the server know whether to execute the executable as an CGI or to allow it as a download? To simply the issue, a directory was designated to store all the CGI application. A request from the designated directory will be treated as CGI, any other requested will be treated differently.
The First CGI application was written in C language. The CGI directory was accordingly named as cgi-bin as the CGI was a binary executable. Ever since it is a convention to name the CGI directory as cgi-bin.
This kind of application opened a Pandora of new possibilities including:
- User registration and authentication.
- online data manipulation, query and reporting. This is the most important aspect which is the back bone of all e-commerce, b2b and b2c applications.
- Accessing the network resources using dynamic interface
Beyond CGI
A host of technology evolved over time. Many claiming to be superior and more efficient than the original CGI. However, in philosophy they remain exactly similar to the original idea of CGI:
- All these technology essentially mapped a request to some external application.
- The application processed the request, interacted with database or other server resources and presented the output.
- The output is then sent to the client.
They however, differed with the CGI in some subtle aspects:
- Most of the newer technology used scripts rather than an executable. For efficiency, these scripts can be compiled to an intermediate code.
- Use of multiple thread rather than multiple process.
- The respective handlers are typically mapped to a particular extension rather than a particular folder.
The alternative technologies that work on these principles are asp and asp.net from Microsoft, jsp and servlets from Java world, php and host of other open source technology. CGI scripts are still written mostly in languages like perl.
Client side hacks
Although, initially the server side hacks well served its purpose, as the load on server begin to increase problems with this approach became more than apparent.
- Every decision needs to be taken on the server (html doesn’t have decision making capabilities)
- This caused un-due load on the server.
- Round the trip communication between client and server were costly both in terms of time and the bandwidth.
- If technology could support, many a decision can actually be taken on client side and it even makes more sense.
- A client side technology, however, can never replace the need of server side technology as several information are available only on the server end.
The first client side technology which revolutionized the scenario was java applet. Java applet is essentially an application written in java which is embedded in a web page in a manner quite similar to a picture. However, it is an intelligent application written in java. Also the browser need to have special plug-in support to understand and execute an applet in addition to html.
Soon a number of new technologies offered similar approach:
Client side technologies requires special browser addons. These intelligent objects are downloaded off the browser using standard HTTP request and are stored in local machine in a temporary cache. Once loaded these applications are executed with the help of their respective addons. Support for these technology typically depends on browser and their capacity to expand.
Popular client side technology include:
- Java applet – being the first of its kind
- Scripts – Java script being most popular. other scripts include vbscript, jscript and so on
- Adobe Flash – Orginally macromedia flash. Rich graphical experience. Continued to rule its world
- Microsoft Silverlight – Comparatively new entrant; Offers functionality similar to flash. However, it is backed by .Net programming language which makes it more charming. However, being a Microsoft technology, chances of full hearted support from industry looks bleak.
Ajax
A wonderful hack which comprises support from both client side and server side and gets the best of both the worlds. A complete discussion on Ajax is beyond scope of current discussion but certainly merits on on itself.
Hack and hell
Believe it or not, the two words often go hand in hand. A typical web application is a mix of so many different technology:
- HTML Document
- A server side script merged in the HTML document. These scriptlets will be processed by the server side applications to produce clean HTML document.
- Client side scripts. These scripts will be executed by browser addons.
- Client side objects. These objects will be embedded within the page and displayed. However, they are not HTML and they remain together physically.
A web application is typically made up of different components developed in different technology and they execute in different environment and machine. Their development requires varied set of expertise and negotiating between them is often complicated. Deployment of these technologies on different machines and configuration is another herculean task.
There are real difference in how different browsers interpret various client side components such as html, css, javascript etc.
All these things create a real hell for web developer and confusion for end users.
We still dream to see a more perfect web world