Pennyroyal: February 2011

Thursday, February 10, 2011

Thrift installation on ubuntu (for usage with ruby)

(This tutorial is tested on thrift 0.5.0)

I was working on thrift (http://thrift.apache.org/), found a couple of issues during the installation and had to look at a couple of links to figure out the correct method.. so i thought to sum it up here.

Download thrift (http://thrift.apache.org/)
Extract the downloaded archive and go to the folder through terminal
In order to install thrift, you'd need following dependencies: (as mentioned on the thrift wiki page)

g++ 3.3.5+
boost 1.33.1+ (1.34.0 for building all tests)
Runtime libraries for lex and yacc might be needed for the compiler
The dependencies for each language are different which you can see at the link => http://wiki.apache.org/thrift/ThriftRequirements
GNU build tools: autoconf 2.59+ (2.60+ recommended), automake 1.9+, libtool
For C, you'll also might need to install "libglib2.0-dev"

In order to install above dependencies, you can use the following command
sudo apt-get install g++ libboost-dev libevent-dev automake pkg-config libtool flex bison
You might need to bootstrap first, for that run ./bootstrap.sh (assuming you are in the directory of extracted thrift)
Now, run the following commands (in sequence) to configure and install thrift

./configure
make
sudo make install

Now, you are good to go with thrift. The final piece missing is the thrift gem. You can install that using command => sudo gem install thrift

Hope you find the post helpful. I will try to post example of thrift using ruby soon :)

Wednesday, February 9, 2011

mySql is scalable and NoSql is not an answer to everything

I was kinda getting sick of hearing that "SQL is not scalable", "you can't have schema-free engine in SQL" .. so i thought of looking at what the fuss is all about.. i won't add anything myself here since i haven't done any benchmarking but from my experience of mySql, i can say one thing for sure that it IS scalable, you CAN have schema-free engine in mySql and if you use it rightly i.e. using techniques like sharding and memcache, you can achieve pretty high performance with the security and data-safety. Here are few links that i saw on the topic;

a nice article on why you shouldn't use NoSQL for everything http://bit.ly/cVUm4B
another nice article on how you can create a schema-free engine in mySql http://bit.ly/9zqBcK
this is an open source library for using key/value structure with mySql http://bit.ly/gYvBWF
and finally a big proof that mySql is scalable and if you use it rightly you can do wonders.. friendfeed uses mySql and here is a description of their mechanism http://bit.ly/U2A8M

Extraction of summary of a url (facebook post link feature)

I initially thought to share the ruby on rails code i wrote to achieve this particular functionality, but then i ain't a big fan of rails so here is a simple step by step description of how to achieve this (between this ain't a rocket science)

first of all you need to have a dom parser (there are alot of parsers available in various languages.. choose the one you like)
parse the url and create it's DOM
now use XPATH (or whatever DOM parsing mechanism the library provides)
if you have ever paid a bit of attention to the summary facebook generates against a link, you'll notice that it consists of 3 things ... i) title .. ii) description... iii) image. now let me describe each of them individually

Title: This is the most easiest of all.. you can simply extract the contents (title) from the <title> tag under <head> tag
Description: this is also a very easy thing to achieve.. you can extract the description from the contents attribute of <meta name="description" content="..."> tag.
Images: Now, here is a tricky part.. you need to extract all images available on the page and then choose the largest image (height and width wise) and use that as the thumbnail.. if you want to provide a "choose thumbnail" feature like fb, then simply extract all images which have height and width greater than 50px and show all those in a simple slideshow. One might think that this is time consuming, and i agree.. there is a small hack but that is not applicable on all links. Since, fb introduced "open graph", the websites who have moved to it use tags like "<meta property='og:image' ...>" to tell fb the representative image of the page. you can simply use the image given in this tag (if the tag exists) and save all the trouble of going through the complete DOM.

After this all you need is to display the extracted stuff according to your style.

Will try to post PHP code for this if i get time otherwise might even post code of rails