Deep Web Search – Two Approaches

Silicon Beat had an interesting article about Palo Alto based Dipsie dCloak yesterday (press release), a product designed to help websites make their content more indexable by search engines.

Much, if not most, of the web is considered “dark” because the data is not readily available via a permanent URI. Cookies, forms, javascript and flash can affect what content (or if any content) is shown on a page. Today, this information is not indexable by search engines.

dCloak is a product that websites can use to make this content visable to the engines.

There are, however, some concerns that the technology could be used by spammers to further promote their content.

Glenbrook Networks is another company trying to tackle the Deep Web problem. However, they’re attacking this from the search engine side, which wouldn’t require website changes like dCloak.

Glenbrook has developed a suite of proprietary unstructured information retrieval and extraction technology that trawls the Deep Web (see Jeff Clavier’s post here) . The company has built a showcase around job search in the Bay Area (they were actually the first ones to deliver a job/google maps mashup), and is reportedly building an application in the local search space.

The interesting thing about the Deep Web problem is that both sides, search engines and websites, want to make the content indexable. It will be interesting to see how this space evolves.