Pennyroyal: 2010

Thursday, December 23, 2010

Microsoft Montage experiment

The overview given at the official site of montage gives the best description of montage, so according to them:

Montage is a shareable, personal, visual album of the web. You are able to design your personal Montage around a topic by adding content that pulls information from a variety of sources including, RSS feeds, Twitter, Bing News, YouTube, video and Bing Images. Your Montage is constantly evolving as you arrange each area with the content of your choice - which is easy, intuitive and fun; and can be on any topic from, movies, sports, to robots.

I found montage extremely interesting and fun. The possibilities with montage are infinite. You can design your resume, portfolio etc. The best part about it is that it's soo easy to use and learn that anyone can use this with alot of ease. I did a small experiment on montage and created a montage of "kurt cobain", the link is:

http://bit.ly/gHhQaQ

Saturday, December 11, 2010

Extract contacts from Gmail using google AuthSub

There are alot of apis provided by google in order to extract contacts from your gmail account (and alot of other things as well). But the easiest method to extract gmail contacts i found is using AuthSub.

In order to use AuthSub, it's not mandatory to register your app/website with google, however, if you don't do that, google will show a warning message to the user which doesn't look friendly at all. So, it's better to register your site with google. In order to do so, go to the link https://www.google.com/accounts/ManageDomains. You can add a new domain and also manage the domains you have added previously. Keep in mind that in order to use other APIs of google, you'll need to register your site with google and get app secret and key.In order to understand the working of AuthSub, look at the following image (taken from google).

Now, let's get on with the code. You need to tell Google that what information do you want to extract from user's account. Google calls this "Scope". In order to get the contacts, the scope is; https://www.google.com/m8/feeds/. The scope should be url encoded.When the user has authenticated, Google returns an access token in the url. Hence, in order to check if the user has authenticated or not, simply check if the url contains a parameter "token" and it's not empty. i.e.;

if (isset($_GET['token']) && !empty($_GET['token'])) {

     Authenticated();

else

    notAuthenticated();

If you look at the figure above, you'll see that the first step is to request for an access token, now, on in order to do that, we need to generate a authentication url. That can be done using the following code;

function notAuthenticated(){ 

   $returnURL = "http://www.example.com/getGmailContacts.php"; 

   $GoogleScope =  "https://www.google.com/m8/feeds/";

   $link = 'https://www.google.com/accounts/AuthSubRequest?scope='.$GoogleScope;

$link .= '&session=1&secure=0&next='.urlencode($returnURL);
echo "<a href='$link'>Click here to authenticate request</a>";
}

In the above code, the description of parameters are,
session: (optional) Boolean flag indicating whether the one-time-use token may be exchanged for a session token (1) or not (0)
secure: (optional) Boolean flag indicating whether the authorization transaction should issue a secure token (1) or a non-secure token (0). Secure tokens are available to registered applications only.
next: (required) URL the user should be redirected to after a successful login. This value should be a page on the web application site, and can include query parameters

there are some other parameters as well, the details can be seen at: http://code.google.com/apis/accounts/docs/AuthSub.html#AuthSubRequest

Now, the user will be asked to login to his account and then he will be shown a page whether the user allows the request. Once the user has approved the request, the user will be taken back to the "return url" with "token" parameter in the url parameters.

We have covered the first 4 points shown in the figure shown above. Let's see the code for the rest of 2 points.

function Authenticated(){

       $token = $_GET['token'];

       $Contacts = array ();

       $GMailAuthSubUrl = "https://www.google.com/accounts/AuthSubSessionToken";

       $GMailContactsUrl = "https://www.google.com/m8/feeds/contacts/default/full?max-results=1000";

In the above code, now we are making request to get the session token. The "max-results" parameter in the GmailContactsUrl tells how many contacts do you want to retrieve. Like Yahoo API, give a very big number here to get all your contacts using one call. Now, let's make the call using curl

       $headers = array('Authorization: AuthSub token='.$token,
                         'Content-Type: application/x-www-form-urlencoded');

       $cURLHandle = curl_init();
       curl_setopt($cURLHandle, CURLOPT_RETURNTRANSFER, 1);
       curl_setopt($cURLHandle, CURLOPT_TIMEOUT, 60);
       curl_setopt($cURLHandle, CURLOPT_SSL_VERIFYPEER, FALSE);
       curl_setopt($cURLHandle, CURLOPT_URL, $GMailAuthSubUrl);
       curl_setopt($cURLHandle, CURLOPT_HTTPHEADER, $headers);
       $response = curl_exec($cURLHandle);

after making the above call, we will get the session token in the response. We will use this session token to get the contacts of the user now.

           $newToken = substr($response, 6);
     $headers = array('Authorization: AuthSub token='.$newToken,
                       'Accept-Charset: utf-8, iso-8859-2, iso-8859-1',
                       'Content-Type: application/x-www-form-urlencoded');

     $cURLHandle = curl_init();
     curl_setopt($cURLHandle, CURLOPT_RETURNTRANSFER, 1);
     curl_setopt($cURLHandle, CURLOPT_TIMEOUT, 60);
     curl_setopt($cURLHandle, CURLOPT_SSL_VERIFYPEER, FALSE);
     curl_setopt($cURLHandle, CURLOPT_URL, $GMailContactsUrl);
     curl_setopt($cURLHandle, CURLOPT_HTTPHEADER, $headers);
     $response = curl_exec($cURLHandle);
     curl_close($cURLHandle);


As you can see, in the above curl call, we have now called the gmail contact url with the session token. Now, let's get the contacts from the response.

     $namespaceChanged = str_replace("gd:email", "gdemail", $response);
     $retrievedContacts = new SimpleXMLElement($namespaceChanged);

     echo "<ul>";

     if (!empty($retrievedContacts->entry)) {

        foreach ($retrievedContacts->entry as $contact) {

           $email = strip_tags($contact->gdemail['address']);

           $name = strip_tags($contact->title);

           echo "<li>".($name!=""?$name:$email)." ( ".$email." ) </li>";

        }

    }

    echo "</li>";

}

I hope this post was useful.

Thursday, December 9, 2010

Extract contacts from Yahoo

Whenever you are building a social media website, one of the very important components is to allow the user to send invitations to his contacts in his email account.Yahoo is one of the biggest email services provider. In this post, I will be giving a brief tutorial on how to setup your website/application on yahoo and how to extract the contacts from your yahoo account. In the coming posts, I will also be discussing the methods to extract contacts from windows live mail (hotmail) and gmail. But let's focus on yahoo for the time bein.

Let's start with the registration of your website/application on yahoo. The steps to do so are;
1. Go to "My Projects" section located at https://developer.apps.yahoo.com/projects (you need to login)
2. Click on "New Project". This will show you a popup showing two options. Now, if your application is an external website then select "Standard" and click continue. Since, this tutorial is for webapplications, therefore, I am assuming that you selected "Standard" and clicked "Continue"
3. On the following screen, select "Web based" under "Kind of application" if it's a website. Then enter the name, description of your website. In the "Application URL" you have to mention the url of the main page of your website. In the "Application Domain" you have to enter the domain name of your website. Keep in mind, that you have to enter the "domain name" only and the domain name shouldn't have path associated with it.
4. Since, we are going to extract the contacts of the user, we need to attach these permissions to our api key. Hence, under "Access Scope", select the second option i.e. "This app requires access to private user data". Once you select the second option, a list of permissions are shown below. Since, we are only interested in getting the contacts of the user, scroll down to the section "Yahoo! Contacts" and select "Read" from the dropdown menu next to it. Then scroll down to the "Relationships" section and select "Read/Write" from the dropdown menu next to it. Then click on Get an API key.
5. The next page will show you the options to get your application/website verified. Follow the steps mentioned to verify your application. You may skip this step, however, doing so will show a nasty warning whenever user tries to extract the contacts through your website.
6. Once done, you'll be shown your application/consumer key and secret. The application id is mentioned on the top under the application name on the same page.

Now, the website/application registration is done. Let's move on to the coding part. Yahoo provides various APIs to do the job, however, I will be using Yahoo PHP SDK for the purpose. You can download the PHP SDK from http://developer.yahoo.com/social/sdk/#php. You'll be needing the files placed in the "lib" folder in the archive you download.

The code is very simple. Though, in order to understand the code, you should be familiar with OAuth. As a first step, you need to include the "Yahoo.inc" file in your code. Make sure, that all the files in the "lib" folder are placed in the same folder.

<?php

include_once ("Yahoo.inc");

$APP_KEY = "Your application key comes here";

$APP_SECRET = "Your application secret comes here";

$APP_ID = "Your application ID comes here";

As a first step, let's see if the session of the user is already initialized or not.

$isActive = YahooSession::hasSession($APP_KEY, $APP_SECRET, $APP_ID);

If the session of the user is not already initialized then we need to initialize the session first. In order to initialize the session, we need to take the user to the yahoo login page, once the user is logged in, he will be shown a screen to authorize the access to his contacts from our application and once the user has authorized, he is taken back to the page of our website which deals with the extraction of contacts. Now, you might be wondering, how on earth will that happen, well it's actually very simple. Yahoo will trace our application through app secret, app key and app id that we will pass with the url. And once the user has authorized the access, where to return??Well, again we will pass the return url to Yahoo. Enough talk, let's see the code;

$returnUrl = "the complete url of the page on which you want to return";

In this case, i'll be returning to this page. so let's say the name of this page is getYahooContacts.php and the name of domain is www.example.com, then the returnUrl should be "http://www.example.com/getYahooContacts.php"

if(!$isActive){

$authorization_url = YahooSession::createAuthorizationUrl($APP_KEY, $APP_SECRET, $returnUrl);

echo "<a href='$authorization_url'>Click here to get yahoo contacts</a>"; 

}else{

/*get and display contacts*/

}

Now, once the user has authorized the access, we will be brought to the same page (depending upon the $returnUrl). So, let's now talk about getting the contacts of the user;

In order to get contacts of the user, we need to first get the session object of the logged in user and get the id of the logged in user. We can achieve that using the yahoo api;

$session = YahooSession::requireSession($APP_KEY, $APP_SECRET, $APP_ID);

$user = $session->getSessionedUser();

Now, the built in function of yahoo api, "getContacts($start, $count)" called on the user object returns the contacts of the user..The problem with this is that it returns the contacts starting from the offset "$start" and returns "$count" number of contacts. Now, if you want to get all contacts of the user, one way is to loop through and keep asking for contacts untill you get all contacts, however, you can use a simple trick. Give a very big number in the "$count" and 0 as "$start", this will give you all your contacts in one single call. Let's see the final code;

$contacts = new stdClass();

$contacts = $user->getContacts(0, 1000);

The most tricky part is to extract the email addresses from the yahoo returned object. The returned object is a bit complex one. You'd need to spend some time dumping the contents of entire object to see which things you require are place where. However, the following code does that for you, but those of you who want to see what other information is there (and believe me there is alot), dump the entire $contacts object and go through it

$contactsData = "<ul>"; 

foreach($contacts->contacts->contact as $singleContact){

   $name = "";

   $email = "";

   $yahooid = "";

   foreach($singleContact->fields as $singleField){

      if($singleField->type == "name"){

        $name = $singleField->value->givenName;

        if($singleField->value->familyName != "")

          $name .= " ".$singleField->value->familyName;

        }

        if($singleField->type == "email"){

          $email = $singleField->value;

        }else if($singleField->type == "yahooid"){

          $email = $singleField->value."@yahoo.com";

          $yahooid = $singleField->value;

        }

      }

      if($name == "")

        $name = $yahooid;

      $contactsData .= "<li>" . $email ." ( ". $name . " ) </li>";

}

$contactsData .= "</ul>";

echo $contactsData;

Hope you find this post helpful

Sunday, November 21, 2010

mooCarousel... a mootools based generic js carousel script

A mootools based image carousel. This is very lightweight and extremely configurable pluggin. Supports both horizontal and vertical carousels with multiple options. You can change the background color of the carousel, the animation style, animation duration, arrow image themes etc.View at;

http://saadnawaz.github.com/mooCarousel/#demos

CSS to show code snippet

You can show your code snippets in HTML pages using PRE or CODE tags. Following is a very simple CSS code to display the code in a code snippet.

pre{

border: 1px dashed #B11718;

background: white;

color: black;

font-family: Courier;

font-size: 11pt;

padding: 10px 30px; 

}

If you want to add more styles, then use the widget written by Alex Gorbatchev.The details of how to use the library is given at http://www.bloggersentral.com/2009/04/how-to-show-code-in-blog-post.html

Friday, November 5, 2010

HTML_Template_IT

If you are working with a large web application, then you must keep the user interface and the data separate. A very good open source template method for this purpose is HTML_Template_IT. I found a very good and nice tutorial on the same, so thought of sharing it. View the following URL to see the tutorial.

http://www.weberdev.com/ViewArticle/Using-PEAR-HTML_Template_IT-For-Modular-Interface-Design

Tuesday, October 26, 2010

image height, width with Chrome

If one wants to get the height and width of an image, the simple syntax would be;

var img = document.getElementById("image");

alert("width = "+img.width);

alert("height = "+img.height);

now suppose, i am executing the above javascript code ondomready event. It will work perfectly fine with Opera, firefox and even with IE, however, this won't work under Chrome. The reason is simple, Chrome fires domready event before loading the images. So if you place some images in a html page and some text, you'll see that domready event is fired as soon as text is loaded. Now, since the images are not loaded yet, the above javascript code will return 0 in case of both height and width. The solution is simple, in order to make sure that the above code works under Chrome, execute the code on window.load. Hence, in mootools code, following won't work under chrome (it will display 0 for both height and width);

window.addEvent('domready', function(){

        alert("(in domready)width = "+$('image').getStyle('width') + " height = "+ $('image').getStyle('height'));

       //alert("(in domready)width = "+$('image').width + " height = "+ $('image').height);

    });

The following code will work under Chrome i.e. giving proper height and width of the element;

window.addEvent('load', function(){
        alert("(in load)width = "+$('image').getStyle('width') + " height = "+ $('image').getStyle('height'));
        //alert("(in load)width = "+$('image').width + " height = "+ $('image').height);
    });

mooPager... a mootools based generic js pagination script

Almost on every web related project, there is a need for displaying data in pages. So, i thought of writing a simple yet generic pagination script which can apply pagination on any type of data including div, images etc.

checkout the demo at following url;

http://saadnawaz.github.com/mooPager/

The source can be downloaded from http://github.com/saadnawaz/mooPager/

Mootools using Events in Class

Following is a simple tutorial on how to use events in a mootools class. Mootools class provide a method to use events in your class. You can simply do that by using the "Implements: Events" in your class declaration. Hence, for new class it should be like following;

var myClass = new Class ({

              Implements: Events,

              initialize: function(elements){

                         //your code

}

});

and if you want to add the support for events in an existing class, you can use the following syntax;

myClass.implement(Events);

Now, once we have added the support for the events in the class, it's time to see how to add and fire event from a class. Let's assume there is a div with id "el" and there are multiple divs inside this particular div. At the time being, we just want to hook an event on this div "el" and we want the function to fire when mouse is clicked inside the div.

var clicked = function(){

         alert("here");

}

var myClass = new Class ({

              Implements: Events,

              initialize: function(elements){

                         //your code

                         $('el').addEvent('click', this.click.bind(this));

},

              click: function(){

                       this.fireEvent('click');

}

});

window.addEvent('domready', function(){
var obj = new myClass();
obj.addEvent('click', clicked);
});
Now, in above method, we have passed parameter divObj (the object of the div on which mouse is clicked) through fireEvent. if you want to pass more than one parameters to a function through fireEvent, you'll have to use array i.e. this.fireEvent('click', [arg1, arg2]);. When the above code is executed, it displays the id of the div on which mouse was clicked. See the above script in action on the following url;

http://jsfiddle.net/vCYmk/1/

Monday, October 25, 2010

How to write a Mootools class

Following is a very nice and detailed article on how to write a mootools class. A very good article for beginners.

http://mootorial.com/wiki/mootorial/09-howtowriteamootoolsclass

Thursday, October 21, 2010

Applying same style on multiple elements in mootool

I have been lately trying to see how to apply same style on multiple elements using mootools. Now, mootools does provide a method for that known as Fx.Elements. Now, if you want to apply only 1 style, use tween on Fx.Elements and if you want to change multiple styles, then use morph. The issue is if you visit the official documentation here you will see that it requires you to know the total number of elements and you have to mention the index number as well. However, what if you want to apply same effect on all the elements and you don't know how many elements there are. The method to do that is fairly simple. Use the following code;



$$('.simpleBox').morph({

height: 50,

opacity: 0.3

});

where simpleBox is the name of the class which is applied on all the elements on which you wanna apply the style/effect. See the following url to see it in action;

http://www.jsfiddle.net/xwUFN/2/

Friday, October 15, 2010

Event Delegation with Mootools

Ever stuck in a situation where you have hundreds of elements and you want to execute same function for each one of them on some event? One possible solution to that approach is simply going through each and every element and hooking the event. However, that is low in performance and doesn't look that elegant. Consider a simple scenerio, there are a lot of images on your page and you want to apply a simple effect that when the mouse is moved on any image, the opacity of the image should be set to 1 and when the mouse is taken out, the opacity is set to 0.4. Let's say that the image is contained in a div with the id "parentDiv". Consider the following code to achieve in a traditional way;

var increaseOpac = function(image){

     image.set({'opacity': 1}); 

};

var decreaseOpac = function(image){

     image.set({'opacity': 0.4});  

};

$$('#parentDiv > img').each(function(img, index){

   img.addEvent('mouseover', increaseOpac.bind(this, img));

   img.addEvent('mouseout', decreaseOpac.bind(this, img)); 

} );

The above code will work like a charm, however, it's not a good way to do it. This method makes the browser keep track of a lot of work. And if you want to add/remove elements dynamically, then you'll have to hook event on the newly created element as well. Hence, we should go for Event delegation. With event delegation you simply need to add the event on the parent element and it'll delegate it down to it's children. Now, even if you add new element dynamically, since it'll be under the same parent, the event will be automatically apply on it as well. Let's try to achieve the same thing with event delegation;

var increaseOpac = function(event, image){

     image.set({'opacity': 1});  

};

var decreaseOpac = function(event, image){

     image.set({'opacity': 0.4});  

};

window.addEvent('domready', function(){

    $('parentDiv').addEvent('mouseover:relay(img)', increaseOpac);

    $('parentDiv').addEvent('mouseout:relay(img)', decreaseOpac);

});

Now, with event delegation, when the mouse will be moved on any image inside the parentDiv, the event will be fired, however, it will be relayed to the matching selector, which in our case is "img". Another interesting thing is that we don't need to bind any value, the object of the image on which the mouse is moved is sent to the event handler automatically.

For event delegation in mootools, you'll need to download "mootools more" along with mootools core.

Thursday, October 14, 2010

Mootools Function.Bind

There are alot of cases when you need to pass some value to the function you have hooked on an event. For example, consider there is an image. you want to hook an event on the image that when the mouse is moved on the image, a value is passed to the function by which the opacity of the image should be increased. Consider the following code for this;

var opacVal = 0.9;

$('someImage').addEvent('mousemove', 

                        function(element, opacVal){

                            element.set({'opacity': opacVal});

                        }.bind( this, [$('someImage'), opacVal])

);

However, there is a problem with above approach. Since, we have forcefully bound the parameters of the function, therefore, the event object is not passed to the function. If you want to pass the event object aswell, then use the bindWithEvent function of mootools. Consider the following code;

var opacVal = 0.9;

$('someImage').addEvent('mousemove', 

                       function(element, opacVal){

                          element.set({'opacity': opacVal});

                          event.stop();

                       }.bindWithEvent( this, [$('someImage'), opacVal])

);

Tuesday, October 12, 2010

Change url dynamically through javascript

If you want to change the value of some attribute in the url dynamically through javascript without reloading the page, keep in mind that you can't change the complete url. However, there is one way around. You can change the anchor part of the url dynamically through javascript.

The window.location.hash property sets or returns the value from the anchor part of the url. Hence, if you want to change the value of the anchor, simply use the following syntax;

window.location.hash = "something";

One of the reasons for placing some value in the anchor part of the url dynamically is to make the url shareable. For example, consider you are making a slide show of images. The images are shown on the page without reloading the page and the id of the image being displayed is placed in the anchor part of the url, so that the user may bookmark/share the url and when the url is opened the slide show starts from the image whose id is placed in the url.

Wednesday, October 6, 2010

Oracle sequence and cache option

Let's first see what is the syntax of a sequence;

CREATE SEQUENCE sequence_name

    MINVALUE value

    MAXVALUE value

    START WITH value

    INCREMENT BY value

    CACHE value;

In order to get the current value of a sequence, we have currval and in order to get the next value of a sequence, we have nextval.

Let's say we create a sequence using the following syntax;

CREATE SEQUENCE pubnum

    MAXVALUE 1000

    START WITH 1

    INCREMENT BY 1;

now, if I try to get the current and next sequence number using a single query, one might use a query similar to following;

select pubnum.currval, pubnum.nextval from dual;

However, the above query simply increments the sequence and gives you the incremented value for both current value and next value. This shows that we have to use some other mechanism to do this. Fortunately, oracle stores the details of sequences in a table called user_sequences which stores the details of all sequences created. This table has one column by the name of "LAST_NUMBER" which tells you the next number to be generated from the sequence. Now, let's say the last sequence number generated from pubnum sequence 2 and we execute the following query;

select last_number from user_sequences where sequence_name = 'PUBNUM';

The above should return 3, however, the above returns 21 which is surprising. The reason is hidden in the create statement of the sequence. Remember, the CACHE option of the sequence create statement. The CACHE option pre-allocates a set of sequence numbers and keeps them in memory so that sequence numbers can be accessed faster. When the last of the sequence numbers in the cache has been used, Oracle reads another set of numbers into the cache. However, when you don't mention any value for the CACHE option, the default value of 20 is used. Now, read the bold part again. What Oracle does is that it stores the next 20 numbers in cache(assuming cache size is 20). The current value of the sequence was 2, however, due to the 20 numbers placed in cache, the last_number in the user_sequences table gave us 21 instead of 3. The value of last_number in the user_sequences will remain 21 till the actual sequence reaches 20. As soon as the current value of the sequence becomes 21, the last_number will start pointing to 41 (it will prefetch next 20 numbers). Hence, if you want this to work, you have to mention NOCACHE option during the sequence creation. Look at the following updated syntax;

CREATE SEQUENCE pubnum1

    MAXVALUE 1000

    START WITH 1

    INCREMENT BY 1

    NOCACHE;

Let's suppose the current value of the pubnum1 is 3. If you run the following query;

select last_number from user_sequences where sequence_name = 'PUBNUM1';

Now, the above will return 4 which is the correct value. So, in order to get current value and next value of the sequence using a single query, we can use the following query;

select last_number - 1 as "Current value", last_number as "Next Value" from user_sequences where sequence_name = 'PUBNUM1';

Saturday, October 2, 2010

how google, yahoo, bing retrieve results sooo quickly

The other day i was working with multi curl in php to retrieve the results from google, yahoo and bing.. now remember how it's soo surprising that these search engines retrieve results containing millions of documents within less than a second... everyone thinks it's because of powerful and distributed servers but there is also a very simple thing that all of the above search engines return only top 1000 documents.. if you try to go beyond that it just doesn't allow. for example, try the following urls;

http://www.google.com.pk/search?q=nirvana&hl=en&client=firefox-a&hs=v61&rls=org.mozilla:en-US:official&prmd=vli&ei=LCinTP7ENZKiuQPAp8SADQ&start=640&sa=N

http://www.bing.com/search?q=nirvana&go=&qs=n&sk=&sc=4-7&first=1596&FORM=PERE7

http://search.yahoo.com/search;_ylt=A0oGdUnHKKdMFTkAMGhXNyoA?p=nirvana&ei=UTF-8&fr=yfp-t-963&xargs=0&pstart=1&b=1101&xa=cymfuMaNmNDApYTZxBtEDg--,1286109767

now in google search the term "start = 640" means return documents starting from 640th document...
in bing search the term "first = 1596" means return documents starting from 1596th document...
in yahoo search the term "b=1101" means return documents starting from 1101th document...

now, if you visit the google url mentioned above, you'll see that the pages beyond 65 are not even visible and if you try to go beyond that, it simply comes back to the 65th page. similarly, yahoo and bing don't show page links for more than 100

now, this shows that google doesn't return documents more than 650 if you use "show 10 results per page" and 700 at max if you use "show 100 results per page" whereas bing and yahoo both only go till 1000...

another interesting thing is that even google shows an error page if you try to go beyond 1000, try the following url;

http://www.google.com.pk/search?q=nirvana&num=100&hl=en&lr=&client=firefox-a&hs=Mqi&rls=org.mozilla:en-US:official&prmd=b&ei=Lj2nTPDODYTovQOI5d37DA&start=2000&sa=N

the above simply displays the error that "Sorry, Google does not serve more than 1000 results for any query. (You asked for results starting from 2000.)".

what these search engines are doing is that during indexing, they store the count of documents containing a specific term.. so all they do is that they take the union of the number of documents containing the terms and show that in the top (referring to ... out of 1,554,874 results).

I remember building a simple search engine based on reuters dataset (http://www.daviddlewis.com/resources/testcollections/reuters21578/) and on a 3.02 GHz 64 bit machine with 2 GB ram, it returned 1000 documents in 250 milliseconds. Now, I wonder the reason behind the quick retrieval by google, bing and yahoo is distributed servers and complex algorithms or just a simple thought that humans won't even go through 100 results for a search query so let's just retrieve only top 1000 results.