
An anonymous tipster wrote to us this morning to tell us that Cuil, the ill-fated “Google Killer,” has unleashed its Twiceler indexing bot on websites across the globe and in the process, has brought many sites down.
“I don’t know what spawned it, but when Cuil attempts to index a site, it does so by completely hammering it with traffic,” the tipster wrote. “So much, that it completely brings the site down. We’re 24 hours into this “index” of the site, and I’ve had to restrict traffic to the site down to 2 packets per second, while discarding the rest, or otherwise it makes the site unusable.”
The Admin Zone forums are abuzz over Cuil’s overzealous method for indexing. Countless posters on the site have said that their websites have been brought down because of the Twiceler robot and one user said it “leeched enormous amounts of bandwidth — nearly 2GB this month until it was blocked. It visited nearly 70,000 times!”
Website owners are also saying that the way Cuil indexes sites isn’t scientific in any way and is actually quite “amateurish.” According to those who experienced the Twiceler onslaught, the bot seems to “randomly hit a site and continue to guess and generate pseudo-random URLs in an attempt to find pages that aren’t accessible by links. And by doing this, they completely bring a site down to where it’s not functional.”
Upset site owners contacted Cuil to see why Twiceler was hitting sites so often. James Akers, Cuil’s Operational Engineer responded to the issue by saying that “Twiceler is an experimental crawler that we are developing for our new search engine. It is important to us that it obey robots.txt, and that it not crawl sites that do not wish to be crawled. If you wish I will glad to add your site to our list of sites to exclude, but I need you to tell the site name to block as email return addresses frequently from the domains that wish to be blocked.”
Akers also claims that Cuil has seen a “number of crawlers” that pretend to be Twiceler, and site owners should consult the company’s IP addresses page to determine if it’s really Cuil causing all the trouble.
Cuil has yet to respond to a request for comment, but it doesn’t look like the pelting of sites by the company’s Twiceler bot is an isolated incident. And if it’s true that Twiceler is trying to find pages on sites that don’t even exist to simply increase the index size, Cuil should work quickly to modify the bot before it receives even more negative publicity.








Call it the Cuill failbot!
Every time I see these guys in the news, it’s negative. I thought companies with >$30mil in funding were supposed to get hyped early, then crucified late. These guys have it backwards.
Cuil is a Cat 5 DISASTER!!!!!!!!!!!!
We need to terminate Cuil — a class action lawsuit will be filed!
Wow, that’s better than a ninja ctcp attack! 31337!
Sounds like early days of Google. At first, webmasters complained about the traffic surge caused by Google’s crawler however, they never brought anything down.
People who use, or invest, in cuil are fuils.
If you can’t handle more than 2 packets per second, you shouldn’t be on the web.
I think he meant 2 hits per second, not 2 packets per second. Or possibly he meant he limited Cuil to 2 packets/second.
In any event, a webmaster whose server can’t handle more than 2 hits per second per site is a lot less deserving of censure than an indexing bot that hits a website with a *sustained* (all-day) hitrate of 2 hits/second!
This sucks. Cuil is well, uncool at best – this makes it very uncool.
Yay!!! Cuil didn’t want to kill Google anymore, now Cuil wants to kill us all & rule the world.
I bet in the next 2 weeks we may see Darth Vader showing up in Cuil’s frontpage.
Hope so. Anyway, there is ALWAYS a way to stop malicious code, including bots. Like many other sites, DOS could happen to Cuil at any time…
Also, this is a great opportunity for a sharp IT lawyer to make some serious money [a class action lawsuit could be an option.] Cuil’s bot is certainly malicious, causing real damage.
Is there any actual proof that Cuil is causing these problems?
There’s a massive amount of bad publicity following Cuil. Why don’t they just leave it to self destruct – it’s horrible and massively inaccurate.
TheTechNewsBlog
http://www.thetechnewsblog.com
Yes Actually, there is proof. The proof is in the pudding, Totalchoicehosting.com has blocked Twiceler from indexing all of its servers for the past year because of this very issue. It occasionally gets through the blocks and brings down servers, but they stay pretty up to date about it.
Hope they never try to kill my website…
I’m sure that they already have.
Hey, I made a RSS feed mashup of almost all the TechCrunch Network blogs:
http://tinyurl.com/tcmashup
Follow me on Twitter: http://www.twit...er.com/Eleclion
Amusing how the weather changes considering half the people on techcrunch were all about Cuil when it first came out. It isn’t even an Ask killer.
Gebadia [Is it Ms. or Mr.?] Anyway, this is similar to this: the “weather” -as you call it, would also change if we realize that a very aggressive rottweiler is loose in the neighborhood…
Cuil has killed themselves. Wasn’t impressed with this search engine the few times I tried it out. Now forgetaboutit! These guys are toast.
So wait, we have to e-mail Cuil and tell them not to DoS us?
Collin, wake up!!! the meaning of the message is DOS Cuil!!! –Many on this blog understand it.
PS — A neat idea would be to modify the Twiceler and turn it on Cuil…
“…Twiceler is trying to find pages on sites that don’t even exist to simply increase the index size…”
So that’s how they are able to make outlandish claims about their index! Well, this makes sense now!
I’ve been crawled by Cuil thousands of times, they’ve never taken my site down…
Maybe I am LEETER than all of them.
Already more than one year ago Twiceler/Cuil brought our site down several times. After finding out the reason for our ’server problems’ we contacted Cuil and they stopped visiting our site…. After one year: They seem to have a rather flat learning curve….
I’m not sure if this is old news, but I did a cuil search of myself (I “cuiled” myself… yeah, that’ll catch on) and I noticed that around 40% of the results on any given page were duplicates. Literally, it would give me the same result, same exact URL (so that the color changed after I visited it) three or more times. At one point I thought I hadn’t even hit “next page” because it looked the same.
So, it seems in this case they are indexing URLs that don’t exist, and as an added filler, displaying duplicate results.
They really should have waited to launch. We’re going to tear them apart before they even have a chance to leave the ground.
Why not disable the cuil bot via .htaccess ?
I am going to write up an .htaccess file for disabling Cuil on T3chlusive so that others don’t have to put up with this rogue bot.
Wait, there is a robots.txt generator here
http://www.mcan.../robots-txt.asp
You can set the crawl delay to 10 seconds.
Cuil or KILL ?? Sounds similar .. How to avoid twiceler bot to access your website
I can’t believe this thing is still live. I laugh every time I look at their Alexa rankings:
http://www.alex...etails/cuil.com
I wonder what their investors are saying now…
that site is a joke
Cuil reminds me of the :CueCat, and I think it has just as much future.
ROR!!! That was random!
Sadly, we can’t hack their hardware to do cool things for us.
Naw, in order to be like CueCat, they’d have to respond with threats of a lawsuit because their crawler was blocked. :)
ah…wsj reports google will launch its own web browser called “chrome”
One of the major benefits of Spinn3r is that we crawl for dozens of startups so we spread the load….
Instead of all these crawlers hitting all of these sites we just hit once and then they pound us….
http://feedblog...tting-too-hard/
Not sure if im missing something, but i dont see how ‘forums are abuzz’ with only 2 pages of a post, and the last one being over a month old! Im not a big fan of cuil, but havent noticed an issue with it on any of our websites.
They are still in the start of the project…
I am confident Cuil will somehow manage to rebound from that while driving up their stock price
http://www.mysp...atedndetermined
Cuil absolutely thrashed our site (gaia.com, social network of 200,000) for 10+ hours on Saturday, causing 70+ major issues (norm for a day is 0.2) and serious downtime. We’re gathering our notes this weekend, and we’ll be in touch with them. And our architecture is being adjusted to better guard against all search engines now.
While Cuil didn’t bring down any of our sites or servers, it was bothersome enough to send twicler a cease crawling our address space. They acknowledged and stopped indexing it. Cuil is a royal PITA in my opinion and I am glad that it hasn’t come back sense we sent the C&D to them.
Just send email to: crawler@cuill.com and tell them what to block (IP range or netblock) I got a reply on our C&D within a couple of hours back in June.
—
Wed 5/21/2008 2:14 PM
Dear Allen,
Twiceler is the crawler that we are developing for our new search engine. It is important to us that it obey robots.txt, and that it not crawl sites that do not wish to be crawled.
Recently we have seen a number of crawlers masquerading as Twiceler, so please check that the IP address of the crawler in question is one of ours.
You can see our IP addresses at http://cuill.co...eler/robot.html
I am happy to add your IPs (xx.xx.132.1 – xx.xx.135.254, right ?) to our list of IPs to exclude and I apologize for any inconvenience this has caused you.
Sincerely,
James Akers
Operations Engineer
Cuill, Inc.
I can attest to the fact that I have seen this cuil bot bring down multiple sites that we host at engineyard.com. Not very cool.
Yep – I’ve had this happen to our site, http://www.pipelineworks.com, as well, but this was many months ago. I mailed them to complain but didnt get much of a response; they said it might have been someone using their browser agent, but not actually them which I found doubtful. In the end I just terminated their browser agent…
Allen – that was exactly the same boiler plate reply I received…
I find the search results really rigged in the favour of bigger sites. Smaller, less established sites are on the back pages (if at all).
That’s cause the smaller sites explode when cuil tries to index them ;-)
Sounds like Cuil needs a lot of work… I haven’t been impressed by their search results at all. I’d like to see the googlebot work faster than a single hit every 1-3 minutes.
Interesting info – when they started they were supposed to have a very good crawler that crawled in ways much smarter and better than Google. It sounds like they have not solved load balancing aspect to crawl while avoiding overloading sites or more likely they are desperate to crawl more data.
My guess is that they are desperate for more data since their search engine clearly does not index anywhere near 120 bln they claim either that and/or they also realised they can’t rank well without good backlinks analysis. Since they are short of time they must have decided to crawl a lot of urls at the same time rather than spread it out – not a very good idea.
It should be said though that it is possible they were faked – some bad people use fake user-agents for their own purposes and often it is that kind of people who overload sites :(
It has some good stuff, but also has a *LONG LONG* way to go, apparently. The developers might be humble, but their sales probably talked big.
maybe
From earlier TechCrunch posts, I figured you guys had probably just jumped the gun and placed Cuil in the deadpool.
Many search engines have made this kind of mistake. Even Google and Baidu did so in the past. Baidu had brought down many Chinese websites, and even made some non-Chinese websites block all Chinese IPs from then on. So what? Then they started to improve their crawler.
Can’t wait till some rich site owner sues the hell out of these guys for lost profit if their site goes down.
Myself and 6 other webmasters are compiling a class action lawsuit against Cuil this week. We have server logs from HostGator that identifies Cuil’s massive impact on server load to back this up. They’re going down!
I’ll personally be seeking $25,000 in damages (lost revenue when my website crashed). I can’t speak for the other webmasters, but it will be substantial.
Aside from the fact that Cuil sucks and displays irrelevant (and frequently odd) search results — the name is awful. How does one pronounce “Cuil”?
This is $33 million up in smoke. Good work boys. You fail.
I actually had this happen on a number of websites and because of this massive attack (well it basically is). As a result I actually blocked all the Cuil IP addresses at the router level before it even makes it’s impact on the server.
The odd thing however is the urls it was trying were construed as register_global attacks and PHP was creating huge dump files because of it’s activity.
Not anymore! =D
Best thing about Cuil:
In the rare instance that I get a visitor from Cuil, Google Analytics counts it as a referrer rather than a search engine.
Game.
Set.
Match.
is it just another company for public spanking ? I bet in month more techy detail (as usual Ruby’s fault) revealed. in other words another twitter saga unleashing …
Okey, i am still lookfor this too many years, for me ,i can’t say nothing, i hope it can help me more:)