A mysterious yet intriguing project from Russia has come across our inbox. It is a search-engine optimization analysis tool for Websites called TheRarestWords. For any given URL, like Microsoft’s or Techcrunch’s, it shows you the rarest keywords on the homepage (i.e., the ones most likely to give your site some search-engine juice), other sites with related keywords, and a list of categories the site would fit under based on those keywords. For Microsoft, some the rare keywords it identifies are “silverlight,” “biztalk,” “onecare,” “skydrive, “popfly,” “ballmer,” and “ozzie.” You can try your site by going to http://therarestwords.com/YOURSITE.com.
TheRarestWords then tries to tap into crowd intelligence by letting anyone add a 100-character definition for each keyword, which could give it a semantic edge in trying to categorize each site. This could also be gamed pretty easily, but this looks to be just a Web project at this point. It could also be used to create a Wiki dictionary like Lingoz or Wiktionary, but that does not seem to be the focus of the project.
The developer is a mysterious Russian who does not want to give out his name. You can find more info on his blog and on this forum post. Mircea Goia from MyTestBox dug into it for us and reports:
The author and the sole founder – who is from Russia and wants to have a low profile for now – says it is just a hobby that was started in December 2007 and he calls it a “linguistic experiment”.
Their spider (called TheRarestParser/0.2a) started scouting the internet in May and extracted words from many websites. It looked at which one are used most often on those websites and which ones are rarely used, or not at all. For now it extracts only the words from the first page of a domain. It doesn’t go deeper than that, however the spider managed to index 20 million words from many domains.
The author wants to implement new options like:
* Trend spotting (which of the words are gaining popularity – like “django” is becoming more popular, “python” is still strong, and which are losing it like “perl”)
* Help with SEO for mom-and-dad kinds of business sites (it could be useful from this stand point, the author says)
* Auto-categorization of your sites against a big list of categories (actually, at this time it has already been implemented, but the algorithm still needs to be perfected)
The interface is confusing the first time you go there, but there is some interesting data you can pull from it. For instance, you can have an SEO fight between any two sites by typing in the address: http://therarestwords.com/vs/your-site.com/competitors-site.com. This feature shows which rare words your site has that your competitor doesn’t and vice versa.
For example, here’s TechCrunch Vs. GigaOm. This is only a snapshot of what is on each frontpage, but we are more likely to get search traffic right now for terms like “friendfeed,” “gamestop,” and “blogosphere.” While they are kicking our butts on “qualcomm,” “powerset,” and “sarcasm.” (At least that was the case before I put up this post. I really can’t let Om beat us on sarcasm).











Just tried with our http://therares...rds.com/113.com
It says:
So, I guess it needs some kind of preconfiguration, for some, if not any, url…
dang, i love this stuff… who could ever think that the future is predictable?
there are so many applications beyond the seo thing
some guy in russia, or some guy in bangalore, or some gal somewhere, born to change the world, doesn’t even know it, just doing something interesting
love it
TheRarestWords — with the most common PHP error: “Fatal error: Call to undefined function”
Still, it sounds like a damn good idea…
Well, the site is now overloaded. TechCrunch effect?
Just tried it with TechCrunch…
rubyonrails = “Overhyped framework for Web 2.0 applications”
Someone should tell 37signals that!
TheRarestWords definition of arrington is not a very nice one.
It’s a crowd fun! Click and edit! Yes, you! See the words? Those are the rarest (or popular) words, found at that site…
So which is it, the rarest or the most popular. We’re talking about words and not valuable metals right, most popular means most used.
I’ve been seeing a bit of referral traffic for some of my obscure sites from TheRarestWords for a couple of months. It is quite an interesting service, and the definitions are great:
Request:
Something to do after finding the holy grail.
Software:
Is like underware no one knows whats in it.
For fun, here are the rarest words on The Big Word Project… http://therares...wordproject.com
The site is still in beta…the author is working on this project in his spare time.
…he is bootstrapping and for now he has only one home machine to do the work (and sometimes uses Amazon for crawling).
If the site is not available then it could be the Techcrunch effect
@113.com
You can add your URL in TheRarestWord database…or you can wait until the spider will crawl your site.
It could be better prepared for the rest of the world if it could parse international characters …
Well, the author works on this project and rolling new features as we speak (I just found out about SEO fight from Techcrunch because I was asleep when they rolled up that feature (my timezone is different than russian timezone).
So I guess they will improve their spider soon…
Nice.. will prepare some exotic food for the spider…
This feature shows which rare words your site has that your competitor doesn’t and vice versa.
Also been noticing spider traffic from this site – they seem to have redesigned it since I last went (I’d just assumed it was something spammy). One problem it has with our site is in picking up members’ usernames which always tend to be weird/unusual/not real words!
Perhaps they can compare ‘rare’ to ‘real’..
Interesting way to focus in on specialist sites though, although currently it seems pretty easy to game them (not that you’d need to with it being a hobby site at the mo).
Too slow, but it work very good.
i have just tried accessing it, and it was a 404, rubbish
The author is working on this project in his spare time. If the site is not available then it could be the Techcrunch effect.