Vast – Aggregating Listings From The Whole Web
by Nik Cubrilovic on March 14, 2006

vast logo

Vast is a search service that crawls the entire web and structures the data that it finds so that it can be categorized fully and indexed. The team launched a developer preview of the service tonight with three initial verticals available – Cars, Jobs and Profiles. Vast does an extremely impressive job of finding and categorizing data in the long tail – making it very simple for others to find listings from anywhere on the web.

The first thing that amazed me was how big Vast was. They crawl over 3 billion web pages and their intelligent algorithms have indexed 4.4 million cars for sale, 4.7 million job listings and 8.6 million user profiles – and this is just the beginning. This already makes Vast the largest car sale database on the web, the 2nd or 3rd largest job listing site, and in the top 5 for number of personal listings (profiles from social networks, blogs etc.).

The key to Vast is that they have done an excellent job of aggregating the long tail, as well as the top sites, and they want developers to now ’steal this site’ (as it says on their homepage). Vast is making all this information available through an API which developers and site owners can use to integrate with other services or to build their own services with. The licensing terms for the API are as liberal as they can get – they just ask that you don’t do anything illegal and that you attribute Vast as the source of your data. Developers can now build their own implementation of the world largest auto search site in a matter of days and do what they like with it (place ads on it, etc.). In fact, the Vast.com site itself is an implementation of this API that took only a couple of weeks to write.

There are a few secrets to Vast that makes them very good at aggregating listings and organizing them. The first is that they are very good at crawling the whole web, not just the pages that every crawler can see but also beyond authentication screens and into what is known as the ‘deep web’. Vast is very good at extracting information out of all types of web pages, those with complex markup, Javascript, Flash , different document types etc. so the index is very comprehensive.

The second is the artificial intelligence they have developed that recognizes a listing and pulls the vital metadata out of what is usually a messy web page and indexes that. For cars this is the make, model, year, price, location information (amongst many other fields). Vast isn’t built with particular verticals in mind, it learns what it needs to find and what a job listing looks like, what an auto listing looks like, so it will only take the Vast team a month or so to setup a new vertical and have the crawlers find all the information required for it.

What they have also developed well is spam filtering methods, Vast CEO Naval Ravikant told me that 70% of the listings that they find are spam or duplicates. The results that you get out of Vast are generally very clean.

vast screen 1

The Vast business model is that later on at some point they will have paid listings within the search results, but only in results that do not have adequate results – so they will be adding sponsored listings in to ‘fill up’ the long tail of their own results. To the integrators, they just see these listings as any others. All listings link back to the original site so there isn’t as much of a threat of site owners wanting to block the Vast crawler as it is beneficial to them to have their listings indexed and syndicated out.

Vast was founded in April of 2005, but the project has been in development since 2000 and has taken multiple forms but has always been based around search. Vast is 21 people in total spread around the world with a development hub in Belgrade, Serbia and an office in San Francisco, California. They were funded by Leapfrog Ventures and Clearstone Ventures last April. Vast have established a very unique culture at which I really admire – the company is very engineering focused with every employee having a programming background and contributing code to this project. The team in San Francisco is Naval along with a few others such as a COO and VP of Products (who both also code). Their strong engineering focus comes through in their product which even at such an early stage is very well implemented, clean and stable with a huge (vast!) index.

Naval said that before he joined the team, he was looking for somebody who was crawling the web and dragging information out of thousands of sites, and he was surprised that not many people were taking that approach. Vast’s competition at the moment in the jobs space would be SimplyHired and Indeed, both of who index other sites – but the difference here being that those sites are not indexing the long tail, and a lot of human intervention is used to add source sites and to filter them. Vast’s job listings have identified 25,878 source sites – and it is finding more and more of them every day.

Google is in a position to attempt to recognize data in the way Vast does, they are already crawling the whole web – but the Google Base approach has been to setup a marketplace where others list their goods on its site. With the technology Vast have in finding listings in the long tail as well as the mainstream sites, it means they are not dependant on any one or few sources of data, and they will always be a superset of other sites out there attempting to aggregate listings just because of how well they crawl into the long tail.

Vast has it all for success, a smart and experienced team, some fantastic and innovative technology and a market strategy with its open API that will see it being used everywhere and widely adopted.

vast screen 2

Advertisement

Comments rss icon

  • Isn’t this basically the same thing as edgeio but without the need of tagging your posts?

  • Mikkom,

    Not at all. EdgeIO focuses on blogs and covers a lot of categories. It has a few thousand listings. EdgeIO seems to be building a destination site as well, with some RSS exports.

    Vast crawls the whole web and at the moment only covers three categories. Vast has about 16 Million listings. Vast is not meant to be a destination site but more of a content router for other sites.

    EdgeIO has a good team and a powerful vision of the future, but the two sites are pretty different.

  • Seems someone didn’t get the memo about proper web 2.0 logo design. Blech. ;-)

    Great idea however, and at quick glance the execution seems top notch.

  • “later on at some point”

    Why? Right now they’re running a bloody charity – how are they funding the project/themselves?

  • Hi Naval and Nik

    Vast does seem top-notch, well-done and all the best.

    I haven’t checked out the vast site itself yet, but based on Nik’s post, would the API be providing geography-filtered search?

    Why I ask this is because car sale search doesn’t lend itself to cross regional boundaries, does it? To a slightly lesser extent, neither does job search (Though I completely agree there are many scenarios where job search requires crossing boundaries). Profiles of course must cross boundaries. Aha, this kind of brings about a new question: how did you guys decide these first few verticals

    cheers
    Gaurav

  • Seems to me the real issue here is the ratio between frequency of crawling and useful shelf life of the listing.

    Well priced, good cars, sell fast. If they’re crawling at a frequency greater than the useful shelf life of the listing then they’re going to be missing the good stuff and only featuring overpriced 1992 Ford Tauruses (Tauri?).

  • Also a related question…

    How do you expire listings?

    Of the 4.4 million cars for sale, how many of them are still for sale?

    Of the 4.7 million job listings, how many of them are still open?

  • Some good questions – I am sure Naval will answer them once he is up. In the meantime you can checkout his posts on both the Vast blog:

    http://blog.vast.com/

    and Naval’s own blog at:

    Startupboy.com (a good read)

  • Just a small point – Google doesn’t crawl the “whole web” – they only crawl maybe 1/3 of it at best.

  • I just looked for a 2003 Honda Odyssey. The price was at 2003$. Seemed like a great deal. After going to a site the price was 18,999$. I guees they are confusing year with price.

    It looks like there is some more work to do, but overall looks like a very usefull resource that I will be using.

  • The thing here is that if this is successful, I assume they’ll just allow you too post there. Feels liek a backhanded way to crush craigslist.

  • Mark, new logo is coming, but logos don’t make or break companies (hopefully).

    Harrball, we are financed. Don’t worry, we want to make money too.

    Gaurav, everything you see on Vast is built on the API, so of course geo-specific search is included.

    Erik, right now stuff expires every 45-60 days. We refresh the crawl on a 7-day cycle, but now that we are launched and public, that pace is going to pick up.

    Yosef, yes there are some extraction errors, but in general, given the scope of the problem, there is a lot less junk in there than in traditional search engines.

    Saul, we do not and will not take direct listings from individuals. Craigslist is a community and a place to post. Vast is neither – it’s a search engine and syndication service for other people who want to carry classified ads.

  • Naval,
    How are you different from Kosmix ? They do aggregation on certain verticals like health, travel, politics etc. They can easily do for cars, profiles too. How do you plan to play your cards ?

    Balaji

  • Naval,

    Any plans to aggregate classified ads outside the US, e.g. China, India, Russia, etc.???

  • Naval,

    Will your API allow for organic classified ads to appear above VAST’s ads??? And, how will sponsored classified ads from VAST appear on an API site??? Will such ads appear at the top, with different backgrounds, etc.??? I’m wondering how your API might stack up against Oodle’s API.

  • Balaji, Vast and Kosmix are extremely different. Play with each site for a few minutes and you will see. Kosmix is competing with Google. We’re closer to Google Base, although unlike Google Base we’re crawler based and not focused on a destination site.

    Observer, we will be doing other countries, around the end of this year

    Observer again :-) , the API is available for commercial uses. You can’t change the sort order of our results, but you can put your own ads or Google ads above the results, however you want. I can’t speak to Oodle’s API, but given that they’re building a destination site, I think you need special approval from them. Our entire site is built on the same API that we’re offering you, and we’re hoping that you’ll take it.

  • How does Vast compare to Oodle? I have been using Oodle (oodle.com), now for about a year. They cover not only cars and jobs, but also appartments, tickets, service etc etc…they all all over US and they claim to have 10 million listings…

    Does Vast have an allert system? I find allerts extremely usefull – Oodle’s email allert helped me already twice to locate a classified literally minutes from its posting.

  • Deal Seeker,

    Vast only has three categories but already 16 million listings across those three since it crawls the whole web, not just the top few sites. Vast’s API is also fully open – it’s a developer service more than a destination site, unlike Oodle. That being said, email alerts and RSS are coming.

  • Naval,
    Thanks for the explanation. Still, I was wondering what if Kosmix enters this space. Technically, they should be able to do your stuff very easily. The only difference is you are focusing on a different vertical compared to Kosmix. However, if you notice Kosmix, they are debuting vertical by vertical every month. They started with Health, now they have travel and politics. Given that, if they choose to do for classifieds like you, what is the barrier for entry on your part ?

    Balaji

  • Hi Naval and All,
    I found some issues with the Vast API. I’m pretty sure it’s what you intended, but I just got a bit disappointed with what you can do with the data – basically, it’s only for search.

  • Balaji, Kosmix does categorization, not deep crawling, extraction, image extraction, normalization, de-duplication, and fast-refreshing. On the other hand, Vast doesn’t keep a full cache of the whole web or do full-text analysis. They are solving very different problems. I would be surprised if Kosmix can do what we can and if we can do what they can. These are completely different problems.

    Migs, drop me a note at my first name at vast dot com and we’ll do what we can for you on the API front so you can accomplish what you want.

  • Hi Naval,

    The Vast idea of targetting a particular Vertical is extremely good. Its becoming the order of the day with many other search engines targetting particular audiences. Would like to know what is the thinking behind ’steal this site’? Why Vast doesn’t want to rule this for sometime before exposing to everyone?

    @S
    http://www.pcmspace.com

  • Enter your Zip Code?

    Another fucking website that’s only useable in the United States.

    There ought to be a death penalty against Web 2.0 companies that launch with this type of mentality.

  • To Balaji & Naval from me @ Kosmix. . . Naval was pretty spot on.

    From what I can tell, Vast’s goal is to find structured data from all over the Web, normalize the data, and present it in a single interface. Kosmix, on the other hand, searches the general Web and returns regular Web pages, much like traditional search engines. We create a new set of meta-data about Web pages, i.e. our categories.

    Sounds like we’re complimentary search companies, rather than direct competitors.

  • Naval clarified API usage for me. Since it’s now good, I should be coming up with my app.

  • From: Simon.Jimmy.
    Zulberg Closeoff
    Ernest Oppenheimer
    Road Bruma Lake,
    Johannesburg,2196
    South Africa.

    Good Day,

    ( RE: TRANSFER OF ($125,000.000.00 USD)
    ONE HUNDRED & TWENTY FIVE MILLION DOLLARS

    I want to transfer ($125,000.000.00 USD) One Hundred & Twenty five million United States Dollars from a Prime Bank here in South Africa to an oversea account. First, I must solicit your strictest confidence in this ransaction. This is by virtue of it’s nature as being utterly confidential. I am sure and have confidence of your ability and reliability to prosecute a transaction of this great magnitude.

    I solicit your assistance to enable us transfer the said amount into your safe account for unward investment.You can either provide us with an existing account or to set up a new Bank account immediately to receive this money, even an empty a/c can serve to receive this money, as long as you will remain honest to me till the end of this important business trusting in you and believing that you will never let me down either now or in future.

    I am the personal account officer to the great late INDUSTRIALIST who has an account in one of the top banks here in South Africa. The account was opened in 1998 and he died in 2003 without a written or oral WILL and since 2003 nobody has operated on this account again hence the money is floating and if I do not remit this money out urgently it will be forfeited for nothing.

    The owner of this account is a foreigner and no other person knows about this account or anything concerning it, the account has no other beneficiary and until his death he was the manager of the company.My investigation through the National immigration department proved to me as well that he was single as at the time of his entry into the Republic of South Africa. The amount in this account is USD$125 Million ( One Hundred & Twenty five Million United States Dollars ) only.

    As a matter of fact, I have decided to transfer this fund abroad for investment. Your assistance as a foreigner is necessary because the management of the bank will welcome any foreigner who has correct information to this account which I will give to you immediately, if you are interested to do this business with me.There is no risk in this business. With my position and my personal contact with the manager of the bank, the money can be transferred to any account you can provide with assurance that this money will be intact pending our physical arrival in your country for sharing.

    We will start the first transfer with Twenty five million [$25,000.000]. Upon successful transfer without any disappointment from your side, we shall re-apply for the payment of the remaining balance of $100 Million to your account. I am only contacting you as a foreigner because this money can not be approved to a local person here, but can only be pproved to any foreigner who has the correct information of the account which I will provide for you.

    So you should provide me with your correct account detail where you will like the fund to be transferred or you can set up a new account for the transfer even an empty account is ok. At the conclussion of the transfer you will take 35%,5% will be for any expenses both parties incurred in the process of this business and the remaining 60% will be for me. As soon as I hear from you and upon your strong assurance that you will not let me down once the fund goes into your account I will then start the processing of the transfer of the fund to your account without further delay.

    You should send the a/c information as stated below…….together with your telephone nos. both Home, Office & Cell phone for easier & faster communication.

    Bank Name………..
    Bank Address………
    A/C Name…………..
    A/C Number……………
    Swift Code………………
    Your Telephone…………….
    Routing Number………………
    State Country…………………

    After the proper sharing of the first $25Million, we will decide how to transfer the 2 phase whether to use the same of your account or if you will set up a new Bank account to receive the second tranche of 100M.When calling do not add South African country code beacuse it’s a satelite phone only add your country dial out code and dial direct.

    Contact me urgently via email: {Mrdon_Simon@yahoo.com} or telephone phone: +874-763593730
    +870-763593730 for further details.

    Thanks.
    Yours Faithfully,
    Simon.Jimmy.

  • From:Mr.Albert Sambo.
    7th Floor, Kamkholo Bld,
    Cnr. Brown and
    Voortrekker Str,
    Nelspruit 1200 Johannesburg,
    South Africa.

    {MONEY TRANSACTION}
    {THIRTY SIX MILLION USD}
    (RE: TRANSFER OF ($ 36,000,000.USD
    {BUSINESS OPPORTUNITY/YOUR ASSISTANCE IS NEEDED}

    My name is Mr.Albert Sambo.and I work in the International operation department in a Local Bank here in South Africa On a routine inspection ,I discovered a dormant domiciliary account
    with a BAL Of 36,000,000 (Thirty Six Million USD) on furtherdiscreet investigation I also discovered that the account holder has long since passed away (dead)leaving no beneficiary to the accountThe bank will approve this money to any foreigner because the former operator of the
    a/cis a foreigner and from Iraq in particular and I am certainly sure that he is dead and nobody willcome again for the claim of this money A foreigner can only claim this money with legal claims to the account Holder therefore I need your cooperation in this transaction.

    I will provide the necessary information needed in order to claim this money,
    But you will need to open an account where this can be transferred.If you are interested send me your private Telephone No And Fax number including the full details of the account to be
    used for the Deposit. I wish for utmost confidentiality in handling this transaction as my job
    and the future of my family would be jeopardized if it were breached The information will be as
    follows——————-

    Bank Name……
    Bank Address…..
    Account Name…….
    Account Number…….
    Routing Number………
    Swife Code……………
    State Country………………..

    And i am assuring you that the transaction is risk free hence we are going to follow the normal transfer process .Finally a 25% of the total fund will be given to you for your steadfast corporation.

    I look forward to a favorable response from you as soon as possible.Thank’s for your maximum co-operation and remain blessed.

    Best Regards,
    Mr.Albert Sambo.

    [141940za@myway.com]

  • In India, vertical niches in the classifieds segment such as Shaadi.com, Bharatmatrimony.com, Jeevansaathi.com, Naukri.com, ClickJobs.com etc. are flourishing. The more generic classifieds sites are now also hotting up. The latest addition is India’s answer to Craigslist and Oodle- Khichdee.com

    Khichdee is a famous Indian delicacy, containing a combination of various pulses, spices and curries. Khichdee.com is, in a way, a mixture of classifieds that have already been published either online or offline. These classifieds have been re-organized and parameterized into categories and are presented to viewers in an easy-to-search format.

    For ages now, classifieds have been successfully used by the world, for announcing their intentions to buy or sell a commodity or service. Local newspapers have always thrived on this service of classified listings and many have survived and grown on the revenues generated by classifieds.

    In the past few years, classifieds have gone online and have outperformed the local newspapers. More and more people are now putting up their goods / services for sale on on-line classifieds. An increasing number of buyers are now looking for products and services online rather than browsing through a set of local news publications, owing to the fact that online search is faster and has the natural advantages of database querying.

    Khichdee.com attempts to take this phenomenon, one step further. With khichdee.com, the attempt is to bring all the classifieds that are published on-line or offline at one place. So as a buyer, you get the entire set of classifieds published online for your preferred product or service in your preferred city, at one place. This reduces the bother of browsing and navigating through a large number of websites, newspapers and magazines. As a seller, you get probably, the largest number of buyers interested in your product or service not just from your city, but from across the globe. As a publisher of online classifieds, you get added traffic because Khichdee also publishes the URL where the ad appeared originally.

    http://www.khichdee.com

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbug