November 24, 2005

Deep Web Search - Two Approaches

Michael Arrington

4 comments »

Silicon Beat had an interesting article about Palo Alto based Dipsie dCloak yesterday (press release), a product designed to help websites make their content more indexable by search engines.

Much, if not most, of the web is considered “dark” because the data is not readily available via a permanent URI. Cookies, forms, javascript and flash can affect what content (or if any content) is shown on a page. Today, this information is not indexable by search engines.

dCloak is a product that websites can use to make this content visable to the engines.

There are, however, some concerns that the technology could be used by spammers to further promote their content.

Glenbrook Networks is another company trying to tackle the Deep Web problem. However, they’re attacking this from the search engine side, which wouldn’t require website changes like dCloak.

Glenbrook has developed a suite of proprietary unstructured information retrieval and extraction technology that trawls the Deep Web (see Jeff Clavier’s post here) . The company has built a showcase around job search in the Bay Area (they were actually the first ones to deliver a job/google maps mashup), and is reportedly building an application in the local search space.

The interesting thing about the Deep Web problem is that both sides, search engines and websites, want to make the content indexable. It will be interesting to see how this space evolves.

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. How to make your website more indexable by search engines « Tom Raftery’s I.T. views
  2. Job Search Engines

Comments

RSS feed for comments on this post.

  1. Steve

    Looks very interesting. Perhaps with something like this we can actually build web sites the way we want to, without worrying abou how the search engines will or will not index the content.
    We are working on several flash designed sites, but always consider not using flash because the engines do not index the content in flash.

  2. tvorba web stranok

    useful topic