Wikia Acquires Distributed Web Crawler Grub
by Duncan Riley on July 27, 2007

grub.jpgWikia has acquired the distributed web crawling engine Grub from Looksmart.

Grub utlizes spare computing cycles on users’ computers to index the web, a similar concept to that used by SETI@home in the search for extraterrestrial life.

Grub originally started in 2000 and was purchased in 2003 by LookSmart. Grub was shut in 2005 following some minor controversy over the closed source code and the failure to comply with Robots.txt blocking requests.

Wikia will open source the Grub code “as soon as possible” and use the data from Grub to feed the Search Wikia Project.

The price of the acquisition was not disclosed.

(via R/WW)

Advertisement

Comments rss icon

  • I have a feeling this project will do much better now under its new ownership. A distributed Web crawler is a very neat system, especially one that works similar to *@home systems. Wikia’s “Distributed Lab” is going to house projects focused on distributed computing, crawling, and indexing (more interesting is the Semantic Lab).

  • crawling and indexing and using my resources for the betterment of my kind, I will give it a shot.

    James what is a Semantic Lab.

    Vijay

  • http://www.faroo.com goes even a step further: distributed crawling and distributed search.

  • I wonder what Grub.com sold for?

  • My guess is they gave it away for near nothing. After all Wikia has leased LookSmart’s AdCenter platform and some search tech.
    Good things ahead for Wikia I feel. LookSmart’s stock in the industry is also rising with partnerships with the likes of NYTimes, Reed, Facebook, Wikia, ASK and Active Athlete.
    I can’t wait to see the wikia Search fully deployed.

  • I can see a really interesting revenue model forming here:
    Since distributed indexing will cut down costs, they could actually share some ad revenue with their users based on how much processing power they contribute. I could see that getting a lot of people hooked to the system.

  • On the surface a distributed crawler seems like a great idea. That being said, though, if the algorithm used to index crawled data isn’t any good, all the processor cycles in the world will get Wikia nowhere.

    BTW, thanks for making me feel dirty for using the word “algorithm,” Ask.

  • Truthfully I don’t see the use in this acquisition. While distributed crawling using the community is useful, search is a hardward intensive business because of the serving of search results, not the crawling of the web. Having a distributed crawler does little to impact the overall effectiveness of a search system or cut down on hardware costs.

  • I dont understand how they plan to use this..

    Using distributed crawling will make it even easier for people to generate spam links as search results for everything under the sun. First thing many people will write is something that returns their set of pages for every single query.

  • I like this acquisition b/c I’ve always thought Grub was an interesting idea. What in the world is Looksmart up to these days? It seems they closed down most of their interesting services.

  • Dang – the Grub website has been down most of the day today – overwhelmed by potential volunteers?

  • Exactly

  • >> Grub was shut in 2005 following some minor controversy over the closed source code and the failure to comply with Robots.txt blocking requests.

    It wasn’t a minor controversy if Grub got stuck on your site, disobeyed robots.txt and then crawled it to death ;-)

  • Wikia’s plans for Grub go beyond what we were working on at the time LookSmart acquired us. Crawling in a distributed fashion isn’t that cool in and of itself – it’s what you can do with the data in a distributed fashion that makes things really exciting.

    The best place to start getting involved with Grub, and the other pieces of the project is http://search.wikia.com/. The guy in charge is Jer Miller, of Jabber fame. We are looking for a few people to start working on porting the client back to Linux and starting a few projects that can reuse the crawl data – which will be open to everyone.

  • check this link it’s very interesting about web crawling

    http://crawlthe...b.blogspot.com/

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbugbug
Techcrunch on Facebook