Thursday, October 16, 2008

Yahoo is going open ?

Today I got a chance to listen to a very interesting talk from Dr. Larry Heck , he is the Vice President, Search & Advertising Sciences, Yahoo. The title of the talk was Large Scale Data Analysis for Web Search & Online Advertising R&D Using Pig™ and Hadoop™ . The talk is interesting due to few reasons , first he did a great job explaining how yahoo search and yahoo advertisement works. And he explained some of the algorithm behind the secret of yahoo search and advertisement. One thing I heard most useful is that seems like Yahoo is moving towards open source ,and they are try to open all of there search API open so that would be a very good news. Actuallty as the frist setp that they have open sourced and controbuting to Apache Hadoop and Apache Pig. It is amaze to see the perfromance improvements they have gained after they have moved to above metiuoned apache tool. They have able to reduce some of the taksed which took more than few days to less than hour.

Part of the abstarct from his talk

Pig is an open-source platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig™ programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Its infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., Hadoop).

1 comment:

Shane said...

Don't forget that Yahoo! and a lot of other Hadoop users/contributors are running Hadoop Camp at ApacheCon US 2008 in just 2 short weeks. They have their own 2 day mini-track of focused presentations about a whole range of Hadoop and related projects. Good stuff.