Internet Internationalized

Finally Internet is changing. And its not a change that we come across every other day. It is the internationalization of Internet and we stand to witness the history getting created. The ICANN chairman Peter Dengate declares the event as the

biggest technical change to the Internet since it was created four decades ago

Continue reading “Internet Internationalized”

story of www – between http and html

Long Long ago (well, just two decades to be precise) when Google didn’t exist, hotmail was not thought of and domains were free (yes free!!!), there were few first class citizens of the web – Archie, Veronica and Jughead who played around Gopher and FTP playgrounds (networks). People exchanged emails and finger pointed at the online people and machines. They called it Internet. It was a simple world and every body had just one role. People used internet using ftp and telnet, exchanged mails using pop and smtp and found the status information using finger. (if you want to check more about this fascinating era have a look at  Internet Life before www)

Then came the www. It was destined not just to change the internet for ever; but also to push most of the older protocol to the books of history. Gopher was forgotten; Archie, Veronica, Jughead returned back to the pages of comic. Other protocols such as ftp, pop, smtp remained active but became an alternative approach of what was actually their primary roles. Steadily http redefined internet as the world wide web. It all started with one server and one browser.

The first http server was named httpd. It was developed by CERN. Ever since, many web servers name their main executable as httpd. Later, it was taken over by software development group of NCSA (National Centre of Supercomputing Application). NCSA also developed Mosaic, the browser that revolutionized the internet. Later the same team that developed Mosaic went on writing Netscape.

 

www started for publishing hyperlinked information to the web so that many people can access the information at one central location rather than exchanging using e-mails.

 

There were three core elements that redefined internet as www:

 

Core Elements of WWW

1. HTML

The first public document on the HTML specification was titled HTML Tags in late 1991. In this document, Tim Berners-Lee  described 20 elements (tags) for describing a simple html document. Thirteen of those 20 elements still exists in HTML. The idea of a hypertext system for internet was an direct adaption of Tim’s own 1980 prototype called ENQUIRE implemented for CERN researchers for sharing documents. Later in year 1989, Robert Cailliau, another CERN Engineer also proposed a hypertext system for similar functionality. In year 1990, they collaborated on a joint project – World Wide Web (W3) which was accepted by CERN. In 1993, HTML was officially declared as an application of SGML and necessary DTD were created. Over years HTML would undergo a lot more change such as acknowledging custom tag to display inline image with the text. Versions of HTML and specification would come. However all those additions concentrated on formatting the document. Ironically the Hypertext Mark up Language lacked on of the most fundamental elements of a language – ability to express and decide. It has no concept of variables, loops conditions and so on…

 

2. A Browser

The credit to popularise the idea of WWW goes to a large extent to Mosaic, often credited as the first graphical browser (at least the first real popular one). It was Mosaic that proposed the idea of  displaying image inline with text. The browser was released in the year 1993 and was officially discontinued in the year 1997. However, the road was set for newer browsers that will continue to struggle for their supremacy.

 

3.  HTTP

HTTP started as a humble transmission protocol to transmit the HTML page across the web. Client would simply connect to internet and send a simple command –

GET /welcome.html

And server would respond with an HTML content. Disconnects. Period. There is no other option in request; no other possibility of a response. It was meant to pull textual (html) content from web pages which will be hyperlinked together. No images, no other content. Just HTML. But it was supposed to change soon.

Because pictures have been integral part of literature and content, soon HTTP grew more flexible and more complex. HTML became just one of the contents that can be sent. More Verbs, more control. It underwent several changes from its inception (version 0.9) to its current release (version 1.1). A detailed discussion on HTTP through ages is available here. However, a brief account of what is available in the current release of HTTP is as follows:

  • Allows a range of requests apart from the original GET. The other allowed include POST, HEAD,PUT,DELETE etc.
  • Request and Response can be contents other than HTML page. In fact it can be almost anything that can pass over the wire.
  • Connection can be persistent making HTTP faster.
  • Transfer can be in chunks and byte range; allowing resuming downloads.
  • Better support for proxies.

 

www – Is it really the next generation ready?

 

Its important to realize that neither www nor http were designed to carry out the task future had in store for them.

It appears that the whole idea was designed for a publishing industry. The idea was to publish hypertext documents which would typically be an electronic version of the text books. Just like the real books the chances of revisions are likely to be far and wide and when they come a new version can either replace the old one or perhaps can be published at a new location. The content would be mostly read only and you really don’t need much of programming in it. Another advantage of such publication was to reduce transfer of information to every body using emails that seem to be only alternative pre www.

 

This idea is clear from the design of both HTML and HTTP. Let us have a look at the design once again from a different perspective.

  • HTML is no programming language. It doesn’t have the ability to decide or loop. And the new world changes more frequently than a text book. Users interaction often needed validations, conditions. Strangely all those things are completely unavailable in HTML.
  • HTTP is connection less. In simple terms it means.
    • It doesn’t have any built in ability to remember what it served last.
    • There is no direct way to co-relate between different requests made by an user agent.
    • It can’t distinguish weather the requests are made by the same user agent or different ones.
    • There is no required sequence in which requests are supposed to me made.

However, world needed to change. Perhaps the very first dynamic requirement was for searching the web.

When the need of a search engine was felt; the first proposed solution was to modify the web server to add this functionality. Before late it was clear that such a solution is not only troublesome but also doesn’t fit in the big picture. As there could just be so many other requirement that will necessitate a similar change to server. A good solution would be to allow separate applications to run an assist server.

So there we stand. We have a technology that is supposed to bear the load of future generation of programming. We stood on the threshold of new generation of programming system – Web Programming. Desktop seem to be the way of past. And our best bet is www. And the two main components of www, namely html and http, are just not fit enough. Unfortunately its too late now to turn back or consider alternatives. To conclude:

Web programming is all about making www the platform of future generation of programming system. HTTP and HTML are not just ready to shoulder the responsibility and yet there is no real alternative other than to perhaps hack into it.

We will look at the story of how www transformed from a publishing protocol to almighty web application platform in our next instalment titled HTTP Hacked

Http Through Ages

Originally, I sat down to write down an article titled:

story of www – between http and html

 

When the story finally moved into the domain of http, it started to grow bigger. I enjoyed documenting about http. However, in doing so I was deviating badly from what I originally intended to do with the story. So I decided to split the story in parts. In this episode, we shall talk about the evolution of HTTP through ages.

 

The Hyper Text Transfer Protocol is the protocol for WWW. It was developed  in collaboration of W3C (WWW Custodian) and Internet Engineering Task Force for distributed, collaborative, hypermedia information systems. It was essentially designed to retrieve linked resources. Understanding HTTP is perhaps the most crucial step in understanding how www works. So let’s have a detailed look at HTTP.

 

The HTTP Protocol

HTTP Protocol had a real humble beginning as a application layer protocol in TCP Suite for transmission of an HTML File. Period. The Protocol underwent several changes over a period of time –

HTTP/0.9

The protocol was a simplicity in itself. Client would connect to the server and issue a GET request for retrieving an HTML document. The overall process of a request-response cycle is as follows-

  1. Client connects to server on port 80
  2. Client sends a get request to retrieve a HTML document in the following format
    GET /hello.html [cr][lf]

  3. Server responds with the content of requested HTML file. The response will invariably be an HTML document.
  4. Once document is transferred to the Client; Server disconnects the connection.  Client may disconnect the connection while transmission is in process, however server wont register it as an error case.
  5. If another document is required the connection process is re-initiated.

There were to verbs other than GET, no request or response header. It lacked all the fancy that future of HTTP was to see. While most servers are still capable for handling HTTP/0.9, there is no good reason why it should still be used.

 

HTTP/1.0

If www was to become the synonym of internet itself, a lot more was needed than just HTML output. HTTP must change first. Version 1.0 was a major change. The official RFC  RFC1945 for next major HTTP/1.0 came out as late as 1996. However, browsers and servers had already moved up with newer ideas long before. HTTP/1.0 merely seem to be consolidation of ideas already in practice. The major Highlights of the version 1.0 included:

  • New Verbs: Whereas HTTP/0.9 had just one request format, the new version included several verbs including POST, HEAD, CONNECT. These commands (verbs) in turn extended the ways in which client can connect to server and use its functionality. Not only we get read from server but also can add new content to server (PUT), delete the contents from server (DELETE) and run debug (TRACE) and other test while reducing traffic (HEAD). more of these verbs later.
  • Request Header: Now the request is no more limited to a verb line. Request can also carry a lot of key value pair along with request. Each key-value pair will be send on a separate line (ending with a [cr][lf] ). The verb line and the key value pairs are together described as Request Header. A Request header terminates with an empty line. The empty line indicates end of request header and a signal to server to start its processing. While the quite few request headers are standardized the basic idea is supply extra information to the server so that server can more effectively serve the request. Examples of this approach could be:
    • A user agent specification can help server send different version of document depending on who is requesting.
    • We can specify the date of cached version of page and server may send new content only if there is change since the last cached version.
  • HTTP Response: Once server is ready to with a response, good or bad, it sends it response which is is a three part response:
    • Status Code: Indicates a numeric code that sums of server’s response.
      • 200 Ok: The most favoured response. Meaning granted. And server is going to send a proper response to the request.
      • 1xx : It is a series of response which are informal. That may mean processing. Or may need more input. It is far from completing the request
      • 2xx : Indicates that the request is received, successfully understood and accepted as valid response. 200 Ok granted.  201 indicates request to create certain request succeeded (recall PUT)
      • 3xx: Redirection. 301 indicates resources permanently moved to a new location or 302 may indicate temporary redirection, 304 indicates that content not modified since last request (date need to be sent in request).
      • 4xx: Request Error. Request has a bad format, resource not found or an authorized request fall in this category.
      • 5xx: Server Side Error: This may be caused due to some problem with the server such as  configuration issue, server busy or server unavailable.
    • Response Header:  Response header typically is a meta information about the actual content that is supposed to follow. It typically includes date when content was changed, length of content, type of content (html, image, zip etc.). The response header ends with an empty line. The empty line acts as a separator between the Response Header and actual content.
    • Actual Content: Server sends the actual content soon after Response Header. They are typically collected and saved.
How HTTP/1.0 Interaction works
  1. Client connects to server on port 80 (typically)
  2. Client Sends the Request Header including the verb.
  3. Client sends an empty line to indicate end of request.
  4. Server checks the request.
  5. Server sends status code
  6. Server sends Response Header
  7. Server sends an empty line
  8. Server sends the actual content
  9. Server terminates the connection.
  10. Any related or connected resource need to be requested by starting a new connection.

HTTP/1.1

The latest version of HTTP is 1.1 was released under RFC 2068 in 1997 and later improved under RFC 2616 in 1999. The latest HTTP which is still almost a decade old added certain new provisions to already stable HTTP/1.0

  1. New Verbs: OPTIONS, TRACE, DELETE, PUT
  2. Host name Identification: HTTP/1.1 allowed the identification of Host. Prior of this version, server never new the host name. This provision paved way for sharing a single IP among various host because now server can now send different content depending on Host provided.
  3. Content Negotiation: HTTP/1.1 now allows server to maintain different versions of a single resource. For example, a document might be available in English as well as Hindi or may be available as a pdf document or word document format or a desktop version of document or pda version of it. Further HTTP allows the this negotiation to be either server driver or agent driven (client or browser)
    1. Server Driven Negotiation: Here the server decides or guesses the right version depending on the OS, Browser versions, country info based on the client IP and other information provided in the request Header
    2. Agent Driven Negotiation:  Here the browser directly requests for the resource version or type it requires using suitable request header and server doesn’t decide.
  4. Persistent Connections: One of the major issues with the previous versions of HTTP was the fact that the connection was closed by the server as  soon as response is sent. As it is often with the case to display a web page browser needs more than just the html page. For every other connected document (.css, .js, images, applet etc) it needs to create a new connection to the server. Such connection appear to be quite time taking. With HTTP/1.1 you can now request server to keep the connection alive. Earlier this was implemented by sending a special request header in this format:
    connection:keep-alive [cr][lf]

    However, with the official documentation out, the persistent is the default behaviour for HTTP server and now the connection is closed either on time out or by sending an explicit header request to do so.

    connection:close [cr][lf]

  5. Chunk Transfer:  This allows server to send chunks of data followed by additional header and more data. This is a good solution in case the content is dynamic and its size may not be known before hand.
  6. Byte Range: Byte range allows user to request documents in parts. This seem to be one of the biggest advantage in the sense it allows us to resume broken downloads and also allows us to have faster download by virtue of multi-threading. (Check out this link for an example of Byte Range download).
  7. Other Changes:  There are several other changes done to make HTTP/1.1 a better protocol.
    1. Proxy and cache management: Gives better control on who access the document and how.
    2. Digest Authentication
    3. More Response Codes
    4. New Headers : It introduced new Headers such as Retry-After and Max-Forwards.
    5. New Media type: message/http, multipart/byteranges

So Where do we stand today

 

Let us sum up where we stand today and what is the significance of the latest innovations of HTTP protocol.

  • Non-IP virtual Hosts
    Virtual hosts can be used without needing additional IP addresses.
  • Content Negotiation means more content types and better selection
    Using content negotiation means that resources can be stored in various formats, and the browser automatically gets the ‘best’ one (e.g. the correct language). If a best match cannot be determined, the browser or server can offer a list of choices to the user.
  • Faster Response
    Persistent connections will mean that accessing pages with inline or embedded documents should be quicker.
  • Better handling of interrupted downloads
    The ability to request byte ranges will let browsers continue interrupted downloads.
  • Better Behaviour and Performance from Caches
    Caches will be able to use persistent connections to increase performance both when talking to browsers and servers. Use of conditionals and content negotiation will mean caches can identify responses quicker.

C – A Brief History

C is quirky, flawed and an enormously success – Dennis Ritchie

The story of the success of C is often told and retold. C is deemed as the mother of all modern programming languages since almost all surviving modern programming language finds it origin or resemblance with this language. However there are two points that must be interesting to note about this very high profile mother of modern programming languages –First, the story of C begins, interestingly, with a failure.

Continue reading “C – A Brief History”

Hello World – “Aum” of Programming

It has been quite a long time since I set up my developer blog. Blog name decided. Blogger engine installed. Themes selected. Then I changed my CMS engine a couple of times; need not add, updated themes a lot more times. What I never did, however, was to post anything to my blog. Not because, I didn’t had the idea of what I wanted to do with my blog but because of some other reason. In fact, I had already collected quite a few interesting stuffs and some must have gone out of date due to the delay. So why didn’t I post? Well answer is ridiculously simple (or will you say simply ridiculous). I couldn’t decide what should be the first post. Yes, the first Post. And then… It all struck to me last night. And now when I think of it, why didn’t it strike me earlier. Well now you know. What I really wanted to start was with a “Hello World”. Of course, since time immemorial, it has been our tradition to start with a “Hello World”. So how old is exactly this time immemorial?

Continue reading “Hello World – “Aum” of Programming”