Once Nearly Invisible To Search Engines, Flash Files Can Now Be Found And Indexed
by Erick Schonfeld on June 30, 2008

For most people on the Web, if Google or Yahoo cannot find something, it doesn’t exist. That has been one of the biggest drawbacks to creating a Website or application that displays itself as a Flash (SWF) file. Search engines could see the file, but they could not see what was in it. Until now.

Adobe has come up with a way for the search engines to read SWF files and index all of the information they contain. That means any text or links in a Flash application can now be indexed. This is a huge step forward for Adobe and anyone who develops in Flash/Flex. Michele Turner, Adobe’s VP of marketing for its platform business, explains:

We are releasing technology to Google and Yahoo that enables them to crawl and index SWF files. They are now searchable. This will open up millions of Flash files to search.

Adobe has created a special Flash player for the search engines that acts like a virtual user going through each application. It actually goes through the runtime of each Flash application and translates it into something the search engines can understand. So all of those fancy interactive Flash Websites and other rich Internet applications that have been invisible to search engines, can now be seen by them.

Turner acknowledges that this invisibility so far “has been a big problem for those developing rich applications.” After all, it doesn’t matter how pretty your Website is if nobody can find it. Flash applications and Websites (many ironically created by ad agencies) have not been able to take advantage of any of the search-engine juice that so many online ad campaigns depend upon. This should be seen as part of Adobe’s larger efforts to remove any remaining restrictions associated with Flash (in April, for instance, it opened up the Flash runtime as part of its the Open Screen Project).

Google is already rolling out the SWF-indexing technology, while Yahoo still “has some work to do,” says Turner. Even so, this won’t solve all the problems with Flash content showing up on search engines.

Becoming visible is one thing, actually ranking highly is another. Google currently can find about 73 million Flash files on the Web. But until Adobe makes it easy for the average Webmaster or blogger to link deeply into those Flash files, they are not likely to appear at the top of many search results.

Update: More info from Adobe here and Google here.

Responses (Trackback URL)

Comments

Comments Pages: [1] 2 » Show All

This is really huge for SEO purposes, but it still doesn’t help the fact that most people don’t like using a video game to find information.

 
Technology or Hype - June 30th, 2008 at 9:10 pm PDT

Macromedia (even pre-Adobe) has always provided examples on how all text and “interesting” strings could be pulled from a SWF file. This hasn’t stopped search engines from doing a lousy job at indexing them.

This is partly because “exercising” all execution paths in a Flash app, as in any other “application” (as opposed to “document”) on the web is an intractable problem.

Since that’s the case, what heuristics does this special Flash Player follow to pull out content in more meaningful ways? It’d be nice for Adobe to release this new “auto-exercising” Flash Player to the public, so we can see if any improvements have been made.

 

Seems to be a smart move to compete with Silverlight. Since Silverlight is xml based it is natively indexable by search.

 

They gave Yahoo the technology over Live Search. wow, they really know whos good. those who dont believe MSFT has a chance, read “the smoking gun” over on TechcrunchIT.

 

Yeah, well… no deeplinking means i don’t want to see flash pages coming up in my google searches. I don’t want to browse through a flash app in order to find a piece of information google has found but isn’t able to point precisely at. Yeah it’s better than nothing, maybe it’ll be of use in indexing video metadata or such.

 

@2 That’s part of what’s important about the new special player. It lets search engines move through the Flash movie like a human would and it exposes the data so it can be added to the index.

Just like with HTML, no one knows exactly what or how Google is indexing. But now Google potentially has the same level of knowledge about Flash movies as it does about the HTML DOM. It’s up to the community to figure out the SEO problem, just like with HTML.

=Ryan
rstewart@adobe.com

 

Ryan #6.
Great comments. Very helpful. Much appreciated.

 

Sound great, and the SEO people will love it, so will designers.

However where do web standards and proper mark-up come into play, particularly when search engines also rely heavily on the heirarchy of the information for keyword and key phrase ranking.

How do we know that Google and Yahoo will still rank Flash web site that little bit lower because good semantic mark-up is followed as nicely as a standard web page.

 

Becoming visible is one thing, actually ranking highly is another. Google currently can find about 73 million Flash files on the Web. But until Adobe makes it easy for the average Webmaster or blogger to link deeply into those Flash files, they are not likely to appear at the top of many search results.

Well said!

For those who didn’t catch that, SEO is mostly determined by inbound links. And since you can’t deep link into a Flash file, it makes it difficult to add relevant PageRank to the text in a Flash file. In my opinion, Flash is nothing more than a gratuitous way to store information in a proprietary format. Better left to advanced JavaScript these days.

 

@william: Lots of modern ad agencies are taking a hybrid approach, using techniques like Faust to display semantic mark-up progressively enhanced by Flash.

http://blog.space150.com/2007/.....-standards

 

can anybody make a flash blog? like make everything work from inside the flash file. can you do that? Make it generate its own pages just like a real blog, it would connect to your server threw api. Am i just making no sense or is this possible?

 

shit. seo used to be a good reason to convince people flash was awful. now we’ll be bombarded with even more flash content.

 
Technology or Hype - June 30th, 2008 at 9:53 pm PDT

Ryan, don’t get me wrong. I love Flash, from the VM all the way up to the language level.

However, what you’ve noted up above is nicely worded, but *heavily* misleading. While the general public may buy it, I would expect better from a technology evangelist that understands the underpinnings of all the technologies involved.

I made the comparison between a static document (which is primarily what Google does a great job of indexing today), which is a declarative object. There’s no guessing about the DOM, because there’s one page to pull strings out of. It then has links to other pages, and all search engines know how to deal with that.

If you then look at an “application” (whether it be tied to a DOM via AJAX, or DisplayObject(s) via Flex), it an intractable problem to make sense of what the navigation paths are, and what the useful “content” is.

For example, if I make an app, how can you tell the difference between text that appears in tooltips on mouse over, as compared to another view, that represents a “page” of information? Especially if this is an older SWF, not built on the Flex framework, it can be nearly impossible to tell.

I think the approach taken here is the wrong one. What Adobe *should* do, is allow developers to mark up the parts of the UI that are relevant, and use Adobe’s strength in the marketplace to convey that as a standard for search engines. In this fashion, either [Metadata] compiler directives, or some other form of markup can declare what’s meant to be indexed, and what’s just visual fluff.

Perhaps a sitemap.xml-style standard, but specifically taking into account the usage cases for RIAs. Sure, it’ll take some thinking to get even the first iteration close to being “right”, but it’s definitely something that only Adobe can do, if we’re to get any “traction”* on this. Combine that with the existing URL/hash-based “history support” in Flex, and we could get to a point one day that search engines could truly find “content” within RIAs.

However, in my (perhaps not so) humble opinion, exercising all possible paths in a SWF via a special player is simply *not* the right way to go. Turing would be rolling over in his grave ;-)

* Bonus points to other commenters who can write a technically correct post *and* use “traction” in the same sentence.

 
Technology or Hype - June 30th, 2008 at 9:54 pm PDT

To clarify, I think Ryan understands the technology perfectly. I take issue with the way he’s tried to craft the wording in his comment above, to make it sound like what Adobe is announcing has more weight than it really does. Sorry Ryan/Adobe — I’m calling you out on this one ;-)

 

Google and Yahoo. Hmm. So what happens to MSN and other search engines? Google and Yahoo may be the most popular ones, however they are not the only ones!!

Good for SEO purposes, but not good enough, YET!

 

Gotta say - I agree with @12.

 

@13/14 I pretty much agree with most of what you said. Although I do think Google has generally gotten pretty good at doing SEO on Ajaxy RIAs, so while static HTML pages are still the most ideal for SEO, people feel comfortable enough adding some Ajax interactivity because Google generally gets it right. This provides the same level of introspection that Google has for a heavily Ajax site.

Is this the long term search answer for RIAs? I hope not. I think there is a better way, the metadata/sitemaps.xml example you gave might be it. But Google basically holds all the cards so this is an effort to make Flash play better in Google’s world by giving Google/Yahoo more information about the Flash movie.

@9 and @11, in the Flex Framework we’ve made it semi-easy to add deep linking to Flash applications (http://labs.adobe.com/wiki/index.php/Flex_3:Feature_Introductions:_Deep_Linking). Not perfect, but it does give developers control over the URL which I think is going to become really, really important after today. Hopefully we’ll see it more widely used.

 

Now let’s get this to show up on correctly:
- my mobile phone
- my text browser
- my flash disabled browser

And everything will be hunky dory. Still not going to recommend Flash in most cases, due to accessibility issues.

Thanks though,
Jeff

 

For what it’s worth, I’m also waiting to see what actual effect it makes in search results. It’s already pretty hard to appear on the first few pages of a search. Traversing application state is still certainly a nice thing in itself, though.

(heh, it does make Ryan’s job easier when someone says “SWF can’t SEO!” now… quicker to say “Oh yes it can!” than “Hey, just try searching on ’seo myths filetype:swf” sometime…. ;-)

jd/adobe

 

Took a really long time to come, but it is here at last. Never thought that Flash would survive and thrive. Surprises never cease.

 
Technology or Hype - June 30th, 2008 at 10:19 pm PDT

As a small side note — JD’s blog, when it was over at markme.com/jd, was the first blog I had ever read. This was before TC, before all the other hoopla of today. Ah, nostalgia… Good to see JD commenting over here in drama-land(tm) :-)

 

I just wonder what the Google and Yahoo rules will be in how they will treat the way an application is crawl, I mean at the end it is ur data the one you want to be crawl right and to make it accessible and when indexed to be pointing to the right experience.

Other thing that I don’t have seen points on, where is Live Search on this deal?

 

Great that information… Im photographer and invite you to see my Peru photo galleries…there are more 5,000 pictures about Peru, Machupicchu.
Thanks for your visit
http://www.peruinside.com/

Jorge Alvarado
peruinside@gmail.com

 

Adobe is playing catch up to Silverlight. Still a long way to go.

 

Many of the reactions to this news seem to imply that the commenters think Adobe is the only winner in opening up SWFs. In reality it benefits Google even more. Why is that? What’s behind a SWF, at least in the case of a rich internet application like blist, is interesting and valuable content. Google has proven that anything they can index they can monetize.

When two sides of a problem both have a financial stake in seeing a solution, that problem usually will be solved.

Google has a very valuable business but can only index half of the web. The other “dark” half is hidden by authentication and/or technologies its crawlers can’t navigate. This news is just one step in Google’s plans to index as much of the Internet as they can.

 

Great, only Yahoo and Google get access to something that should be open sourced.

 

Too bad Flash isn’t truely opensource, as Adobe keeps playing games with it.

 

EXCLUSIVE RAPIDSHARE PREMIUM ACCOUNTS JULY 2008 !

http://rapidshare.com/files/12.....ly2008.rar

 

Nobody really puts any content within flash files anymore. Would this virtual flash user read data service responses? Most flash files are empty shells without any content whatsoever.

The content is in XML/JSON/AMFPHP whatever. But no content inside the SWFs. So in that aspect flash is not ‘catching up to SilverLight’. Flash has been there since Flash MX (6).

Great news, but obvoiusly a long way to go.

 

I hope Google will give me a way to express my preference of not seeing links to Flash pages in SERPs.

 

Finally…And who said search couldn’t get better. As Will Arnett would say best “Come On!”.

 

That’s great that the big two can index flash files, but what about for the rest of independent developers that may want to crawl web for whatever reason, they aren’t good enough to analyze flash?

 

If a user need to download new special flash player for that I’m not sure it will be so popular

 

@24, you are retarded sir. Microsoft is always lagging about 2 years behind any real innovation. Read a book or two.

 

A gr8 achievement for Adobe…
They have finally overcome one of the biggest problems of flash applications…
Getting read by the search engines..
And the problem of getting to the top will be definitely solved in the near future by adobe guys.
If they have done this…They can do the next step as well…
Gr8 Going….. :)

 

… and a huge step backward for the open web.

 

@24 Wow you are seriously saying Silverlight is more advanced than Flash? I hope you are kidding

 

#12 has it right.

now that google can index these flash files, can we find a way to make a flash website load quickly? say in under 30 seconds? without prompting me to download flash v20.201?

 

Horray for “mystery meat” navigation!

The rest of this comment is loading. Please wait….

 

Supose it’s a consecuence of making flash an open specification.

 

Deep linking is possible in Flash using swfaddress. Takes a bit of work mind you, but basically involves adding a meaningful string after the # in a url. This tells the flash movie which subsection the user means to navigate to.

 

No, this is so we can now have AdWords in our Flash…

 

From the Google Q&A:

“Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.”

As web developers, we use the SWFObject Javascript Flash Player detection and embed script. This is in response to the engagement issue IE has with Flash. I know many other developers us this as well. This statement leads me to believe our Flash objects “may not” be indexed…

 

@34 Read a book about innovation? That’s funny!

Why would they release this to just Yahoo and Google and not everybody? What are they afraid of?

@13 Hardly anybody uses any kind of markup whatsoever for search engines, why would flash be any different? The semantic web has very little TRACTION. I am sure the Adobe parser gives some sort of contextual info, and you can always analyze the text to see what it looks like, something that search engines already do. Anyways, tooltips can be relevant info too.

 

Great, even more poorly designed websites with pages I can’t link to and laserbeam sound effects whenever I mouse over a tab.

It’s great to know I can get google results for the 15-minute splash screen introduction every sane person clicks “skip intro” on.

 

Will search bots really load 900k Flash files to index them?

 

Technically speaking, indexing SWF files is impossible.

Read this:
http://blog.iconara.net/2008/0.....swf-files/

 
 

So Google have finally implemented Adobe’s search SDK which they subsequently removed and didn’t bother using 301 redrects to the new location.
http://www.timnash.co.uk/07/2008/swf-indexing/

The only thing new is they have issued a press release and persuaded 100s of bloggers to blog about old news, on something that doesn’t actually work properly.

Embargoed press releases are funny when they are used to fob off all the tech bloggers on something that isn’t really ground breaking, and misleading.

 

One of the challenges with both Flash and PDF files as they appear in Google and other search engines is the control over the experience when a user actually clicks the link. In the case of Flash, you may see an entry page which is actually not what the search engine sees. In PDF, you are “entered” into the beginning of a document that may take awhile to download (or not work at all, unfortunately).

In the magazine publishing market, allowing readers to find content in archives, and providing a reasonable “bite of the apple” (i.e., a satisfying reading experience), while controlling the intellectual property is a challenge. Digital magazine providers such as Texterity (www.texterity.com) enable a “Look Inside the Magazine” that allows the search bots to see clean and structured XML, while the reader gets the “visual experience” of the page… With all the words, but the pictures too, and as originally shown in the publication.

For example, search google for the term “worlds loudest ipod” (http://www.google.com/search?hl=en&q=worlds+loudest+ipod), and you’ll be shown a blurb from Make Magazine. Click on the link, and you’ll see a visual representation. Now, page forward 3-4 spreads, and eventually you’ll be asked to subscribe or login.

It’s an interesting approach for publishers who want both to “show the world” their content, provide a reasonable sample, but ultimately would like to get new readers and subscribers as well.

 

Comments Pages: [1] 2 » Show All

Leave a Reply

Create a Gravatar for your comments.
« Back to text comment