story of www – between http and html

Long Long ago (well, just two decades to be precise) when Google didn’t exist, hotmail was not thought of and domains were free (yes free!!!), there were few first class citizens of the web – Archie, Veronica and Jughead who played around Gopher and FTP playgrounds (networks). People exchanged emails and finger pointed at the online people and machines. They called it Internet. It was a simple world and every body had just one role. People used internet using ftp and telnet, exchanged mails using pop and smtp and found the status information using finger. (if you want to check more about this fascinating era have a look at  Internet Life before www)

Then came the www. It was destined not just to change the internet for ever; but also to push most of the older protocol to the books of history. Gopher was forgotten; Archie, Veronica, Jughead returned back to the pages of comic. Other protocols such as ftp, pop, smtp remained active but became an alternative approach of what was actually their primary roles. Steadily http redefined internet as the world wide web. It all started with one server and one browser.

The first http server was named httpd. It was developed by CERN. Ever since, many web servers name their main executable as httpd. Later, it was taken over by software development group of NCSA (National Centre of Supercomputing Application). NCSA also developed Mosaic, the browser that revolutionized the internet. Later the same team that developed Mosaic went on writing Netscape.

 

www started for publishing hyperlinked information to the web so that many people can access the information at one central location rather than exchanging using e-mails.

 

There were three core elements that redefined internet as www:

 

Core Elements of WWW

1. HTML

The first public document on the HTML specification was titled HTML Tags in late 1991. In this document, Tim Berners-Lee  described 20 elements (tags) for describing a simple html document. Thirteen of those 20 elements still exists in HTML. The idea of a hypertext system for internet was an direct adaption of Tim’s own 1980 prototype called ENQUIRE implemented for CERN researchers for sharing documents. Later in year 1989, Robert Cailliau, another CERN Engineer also proposed a hypertext system for similar functionality. In year 1990, they collaborated on a joint project – World Wide Web (W3) which was accepted by CERN. In 1993, HTML was officially declared as an application of SGML and necessary DTD were created. Over years HTML would undergo a lot more change such as acknowledging custom tag to display inline image with the text. Versions of HTML and specification would come. However all those additions concentrated on formatting the document. Ironically the Hypertext Mark up Language lacked on of the most fundamental elements of a language – ability to express and decide. It has no concept of variables, loops conditions and so on…

 

2. A Browser

The credit to popularise the idea of WWW goes to a large extent to Mosaic, often credited as the first graphical browser (at least the first real popular one). It was Mosaic that proposed the idea of  displaying image inline with text. The browser was released in the year 1993 and was officially discontinued in the year 1997. However, the road was set for newer browsers that will continue to struggle for their supremacy.

 

3.  HTTP

HTTP started as a humble transmission protocol to transmit the HTML page across the web. Client would simply connect to internet and send a simple command –

GET /welcome.html

And server would respond with an HTML content. Disconnects. Period. There is no other option in request; no other possibility of a response. It was meant to pull textual (html) content from web pages which will be hyperlinked together. No images, no other content. Just HTML. But it was supposed to change soon.

Because pictures have been integral part of literature and content, soon HTTP grew more flexible and more complex. HTML became just one of the contents that can be sent. More Verbs, more control. It underwent several changes from its inception (version 0.9) to its current release (version 1.1). A detailed discussion on HTTP through ages is available here. However, a brief account of what is available in the current release of HTTP is as follows:

  • Allows a range of requests apart from the original GET. The other allowed include POST, HEAD,PUT,DELETE etc.
  • Request and Response can be contents other than HTML page. In fact it can be almost anything that can pass over the wire.
  • Connection can be persistent making HTTP faster.
  • Transfer can be in chunks and byte range; allowing resuming downloads.
  • Better support for proxies.

 

www – Is it really the next generation ready?

 

Its important to realize that neither www nor http were designed to carry out the task future had in store for them.

It appears that the whole idea was designed for a publishing industry. The idea was to publish hypertext documents which would typically be an electronic version of the text books. Just like the real books the chances of revisions are likely to be far and wide and when they come a new version can either replace the old one or perhaps can be published at a new location. The content would be mostly read only and you really don’t need much of programming in it. Another advantage of such publication was to reduce transfer of information to every body using emails that seem to be only alternative pre www.

 

This idea is clear from the design of both HTML and HTTP. Let us have a look at the design once again from a different perspective.

  • HTML is no programming language. It doesn’t have the ability to decide or loop. And the new world changes more frequently than a text book. Users interaction often needed validations, conditions. Strangely all those things are completely unavailable in HTML.
  • HTTP is connection less. In simple terms it means.
    • It doesn’t have any built in ability to remember what it served last.
    • There is no direct way to co-relate between different requests made by an user agent.
    • It can’t distinguish weather the requests are made by the same user agent or different ones.
    • There is no required sequence in which requests are supposed to me made.

However, world needed to change. Perhaps the very first dynamic requirement was for searching the web.

When the need of a search engine was felt; the first proposed solution was to modify the web server to add this functionality. Before late it was clear that such a solution is not only troublesome but also doesn’t fit in the big picture. As there could just be so many other requirement that will necessitate a similar change to server. A good solution would be to allow separate applications to run an assist server.

So there we stand. We have a technology that is supposed to bear the load of future generation of programming. We stood on the threshold of new generation of programming system – Web Programming. Desktop seem to be the way of past. And our best bet is www. And the two main components of www, namely html and http, are just not fit enough. Unfortunately its too late now to turn back or consider alternatives. To conclude:

Web programming is all about making www the platform of future generation of programming system. HTTP and HTML are not just ready to shoulder the responsibility and yet there is no real alternative other than to perhaps hack into it.

We will look at the story of how www transformed from a publishing protocol to almighty web application platform in our next instalment titled HTTP Hacked

Leave a Reply

Your email address will not be published. Required fields are marked *