
Udi Manber, VP Engineering at Google (and former CEO of Amazon’s A9) just announced at the Searchology event that Google will “soon” announce a cross-language search engine. Screen shot of the user interface is above.
To return more results, queries will be auto translated into other languages to retrieve more results, and all results will then be translated back into the original query language. The goal is to return many more results for queries, particularly queries done in less popular languages.
The screen shot is extremely poor quality (I apologize), but it shows arabic results on the left, translated from English results on the right.








Look interesting and very useful. Hopefully you can get another screenshot.
Neat. Not really useful to me at al, but still neat.
They should be focusing on quality and not quantity. Do any of us not have enough search results already? Do you ever get to page 3?
While I see this tool can be useful for research in different cultures I think they are moving away from their core product which is really quality search results.
Very interesting with respect to the cross-language search technology and IP from qNaturally. Here is a link to one of their patents:
http://www.goog...p;dq=qnaturally
Looks neat and not too packed at all.
I moved to vivisimo.com for any search that is more of a research.
To find brand names I am still using Google. Try the results tree in vivisimo and you will get it. You get a nice tree of “what there is about” your topic and not just an endless list of sites.
Google will never do that since it will help people to find exactly what they need and faster but on the other hand will cuase bidderrs to place ads just where they needed to be and therfore less money to Google since there is less competition for common search terms.
I definitely agree with Sahar. The point should be quality results, not just more pages returned. Furthermore, I see this relying too heavily on automated translation which is, as far as the technology is concerned, by and large still in its infancy.
I think the number of a results returned is – not “real” anyways
who ever sees the last page of the 1.9 million found? The answer (0%)
the truth is they have been working on quality – they only find the first 10 results ever, but pre cache the next 10 —
– They are doing the best they can – (and better than anyone else)
automated translation is one of those things that ai research has been trying to tackle for fifty years, with decent, but not spectacular results. i personally don’t want to see broken english in my search results.
dude, SMT (statistical machine translation) is so far away from reality via google that it’s not even funny – and even though they’ve got franz and another buncha super smart dudes working on it, this thing is *not* what those folks are working on….this is like putting google queries into a low end systran system and spitting back results – impressive if you’re query is about one word long, but beyond that not too useful….i’m totally confused about why they would bother launching in this way, seems antithetical to what they’re all about: making the world’s information more inaccessible in this case, via bad translation potential and wrongful query redirections…
google is poor in Arabic… i used it several times to translate and sometimes it was good othertimes it was wrecked
This is HUGE for any socialnetwork/portals with international audience looking to cut a search deal with Google.
Does TC moderate comments? My previous comment did not get posted.
Ok, it must be the URL in my original comment. Here you go:
This will be really neat, especially useful for people who cannot read English, in which most valuable content on the Internet is written. However, the value depends on how well this will work – I agree with dave that SMT is far away from reality. (And it’s interesting to know this feature is not built by Google translate team.) That’s why sites like Yeeyan and other human translators bring in values.
And Michael, click Thunder for the Chinese translation of this post.
If you have any concerns with the translation, please do let me know.
Any sufficienly advance idea should be indistinguishable from magic. So let the magic happen
This can be good new for many people like Chinese, Turks etc..
But i still prefer to use English and i think many do too
I feel like English users do not see the real utility of such a tool. Many people search the internet for pure information, which is second-best to full comprehension – and in many languages the amount of content available on the web is very little. Thus if you do not know English (let’s say you are a regular teenager in a developing country) you will be able to query pages in English, German, Japanese, French, Spanish, Russian… and get back results in your language.
I think this is amazing, even if the translation is not perfect. It will certainly help many people learn English and other languages, little by little.
It is also good for gathering web intelligence – finally you will be able to know how many Persian bloggers are blogging about ’sex’ without knowing a word of Persian, or how many French bloggers are blogging about ‘the war in iraq’.
I have a feeling that google at some point will be capable of drawing some information from all the thousands of books it scans from many different languages. Why not figure out how to translate things by comparing all the translations of the ouvre of tolstoy in many different languages (including alternate translations!)?
The thing I do not like about the common search engines is, that they do not recognize documents with similar content. It happens often on the Web that a post or document is spread out over more then 50 websites. Now that is great for the author but not for the searcher because it blows up your search result unnecessarily. With InfoCodex this will not happen because the linguistical database recognizes similar documents and puts them into groups. This does not blow up your search result unnecessarily.
http://www.ywes...oCodexProcedure
Three things a modern Search engine should do:
1. Automatically classify a document according to its content.
2. Automatically generate an abstract of a document.
3. Generate a Heat-Map of the Contents of a Search Result.
http://www.ywes...x_22.2.2007.pdf
I’m interested to identify relevant material via an in-depth crawling of selected cultural heritage institutions, accepting and processing any semantic web encoding of the information retrieved; crawl the Internet to identify websites with cultural heritage information, locating relevant texts, images and videos, regardless of the source and target languages used to write the query and/or describe the results; automatically classify the results in a semantic-web compliant fashion, based on document content, its metadata, its context, and on the occurrence of relevant CH concepts in the document, and automatically extract relevant information which will then be used to create cross-links between related material, such as the biography of an artist, exhibitions of his/her work, critical analyses, etc.; organize and further analyse the material crawled to serve focused queries generated from user-formulated information needs; interact with the user to obtain a more specific definition of initial information requirements, and finally; organize and display search results in an integrated, user-friendly manner, allowing users to access and exploit the information retrieved regardless of language barriers.
I’ve found an interesting site about htese aspects at: http://www.multimatch.eu
The thing about cross-language search is, that it does not help if you just translate a search term or a search-sentence into another language. This will not help you to find any single document in another language. What you need is a linguistical database where the terms and words have actually already been translated and connected to each other via a neural network. The user does not just want to search in one language at the time maybe in 5 or then language simultaneously. That is the trick and Google is very very very far away of doing so.
Google has a link based strategy and they – obviously – only think of making money with their link based strategy.
What is needed is a software that recognizes not the links but the _content_ of the document and then delivers a search result according to the content.
Delivering search results based on descriptive links is nice and good but far away from any deep understanding of the content of a document or website.
As the Internet grows, information grows. That results in higher importance of quality-of-information, not quantity-of-information.
If anyone is interested to get intormation about MultiMatch please, contact us – we lead the dissemination activities of the project. MultiMatch is a Specific Targeted Research Project (STREP) funded by the European Community and it plans to develop a multilingual search engine specifically designed for access, organisation and personalised presentation of cultural heritage information. The target users are defined and limited to this domain.
The project consortium comprehends universities, research institutions and content providers (list of partners: http://www.mult...consortium.html). The consortium will organize two workshops in Europe related to its activities and to the issues here evidenced.
The Cross-Language Evaluation Forum has seen several international Information Retrieval and NLP organizations working on this effort for years.
all results will then be translated back into the original query language
I totally agree with Deniz. Another reason why cross-language information retrieval would be great comes to mind without delay: it might be so that there is a lot of redundant information available in many languages, but this does not apply to all information.
For instance, I live in Japan but my Japanese skills are very limited. It is quite difficult to find information about local things sometimes, as most of the small businesses (and many bigger ones, too) only have their web page implemented in Japanese, or if there is an English page it contains only a fraction of the information on the Japanese pages. So if I could get my hands on those pages through search engine, I’d be pretty delighted.
That is, if they really could make the thing work properly.
Interesting comments, but I think some have miss-understood what they see from the fuzzy picture. Some people are talking about how great this would be to search for content in other languages without knowing the language. If you look at it closely, you will see that you can only search Arabic if you know the Arabic search words to type in. You don’t type a search in English and then see it translate the English search words into Arabic and then search Arabic sites.
The cross-language search is only useful to those of us who are bi-lingual, and who may have some limitations in what we can read in our second language and want to have it machine translated back into our native language.
This is a step in the right direction, but more work needs to be done to allow users to search the world’s content from the luxury of their native language only.
Having said that I like where they are going.
I should probably correct my last post as I am assuming that they are not translating the query and searching websites in other languages. It could be from this example that they have typed in the words they want to find in Arabic and asked it to search Engish sites and the hits are displayed in Arabic (MT translation) on the left and English (original content) on the right.
I try to take another jab at the difference of Multi-Lingual Search and Cross-Language Search:
What is cross-language search?
A1. The question is being translated: Examples of companies that do a simple one-on-one translation: Eurospider, Convera, Google Cross-Language, Temis.
A2. Content is indexed across several languages ergo the question is being asked across several languages: InfoCodex.
Shortcomings of A1.:
**********************
a) The translation is never sharp. What meaning does “Automobile” have, what are the possible translations into German:
- Automobil
- Auto
- Kraftfahrzeug
- Motorfahrzeug
- Personenkraftwagen
- PKW
etc.
b) What meaning does “Insurance” have when translated into German:
- Versicherung
- Haftpflichtversicherung
- Assekuranz
etc.
c) What meaning does “IT” have when translated to German:
- IT
- Informatik
- EDV
- Elektronische Datenverarbeitung
etc.
The A1 solution will always only give you just one solution for a translation of a word even though there are always several translations of a common word. With a linguistical database you do not have this problem, i.e. your search result increases.
Advantages of A2:
*******************
a) A linguistical database with synonym groups will help you put a document or a search term into context (Verschlagwortung).
b) A linguistical database with a taxonomy will help you do more then just a one-on-one translation.
c) With a linguistical database you can search in more then just one language. You can search in 5 languages at the same time.
See: http://www.ywes...oCodexProcedure
Nice design, hope it useful and many searcher love it.
Nice, i like it!
http://www.akuv...hung-webmaster/
http://www.akuv...hung-webmaster/