Have you nominated someone for a Crunchie today? »
The Financial Times Launches Its Own Business News Search Engine (Newssift).
by Erick Schonfeld on March 18, 2009

The Financial Times Group, which is owned by the British publisher Pearson, is not exactly the place you’d expect to find the latest search engine. But a startup deep within the bowels of the organization called FT Search is launching one at 8PM ET tonight called Newssift. It is a semantic search engine that sifts through business news, and it is not half bad, especially for bigger companies and broader topics.

Robin Johnson, the CEO of FT Search, used to run the Financial Times in the U.S. as its president. He’s been working on Newssift for the past two years, and currently employs a team of 25 people. “The object was to create a tool to allow a busy business person to assess what is the skinny on a problem they do not know the answer to,” he tells me.

Newssift indexes about 4,000 business news sources, from online newspapers and blogs to news portals and research sites. It is ingesting about 120,000 articles a day right now and applying semantic tags to each one. In the end it can categorize each article by business topic, organization, place, person, and theme. When you type in a search term, each of those columns gets filled in with associated keywords, allowing you to drill down to exactly what you want even if you are not sure at the outset what you are looking for. The back-end clustering search is powered by Endeca. The tagging and data extraction is done by technology from NStein, the sentiment analysis is provided by Lexalytics, and ReelTwo does the categorization.

A search for “Sun Microsystems” brings up further suggestions for refinement, including “IBM,” “Jonathan Schwartz,” and “market share.” You sort of graze around, adding new keywords as they are presented to you. Each keyword you select is added to your string, and corresponding article results appear below. A sentiment pie chart indicates what percentage of the stories are positive, negative, or neutral. Another one breaks the results down by source (Online News, Magazines, Newspapers, Blogs, Research). Clicking on any shaded area filters the results further.

Searches can be saved, creating an interesting prospective news search tool. You can create your own memetracker for any industry or topic. I am not sure I would use Newssift every day to stay on top of the latest news, but I can see it as a useful research tool when I have to really dig deep into a topic. It does better with business news than technology. Still, it is worth checking out in that it employs several subtle navigational techniques that make it more of a discovery engine than a search engine.

Advertisement

Comments rss icon

  • The sentiment index is quite intriguing. I wonder how accurate it is, however. Is is just based on keywords or is some semantic analysis being done on top of them? If it is accurate, I can see this being a great tool for marketing/pr departments.

  • It’s is really great that they even show the public opinion on the news.

  • A Segmented Search Engine. At least it is a tool wich allow a busy business person to assess what is the skinny on a problem they do not know answer to. I can see it as a useful research tool when I have to really dig deep into a business topic. Wishing them every success..http://www.yourclickmarket.com/

  • Pearson is preparing to launch more tools like this one soon…

  • Hmmm? Maybe there’s something I’m missing but it looks pretty clunky to me.

  • Interesting. Sounds very much like http://www.your...ancialworld.com, which also does semantic processing on financial news.

  • so far for searching such a financial searching i use business.com since for me I think they have high relevancy and about this new search I m sure I am gonna test it to find more connection to my online business

  • Excellent work. Lots of developer hours behind this one.

    A very similar site here.. wotnews.com

  • This is interesting. I know that “sub” search engines exist, and I have a feeling they will become more prominent. Google is great and all, but it’s so…daunting. As the internet gets even broader in scope, do you think we will see more of these smaller search engines?

  • tried it, way too clunky and not responsive. Also the layout and search options are confusing too many options. Just make it simple, which is an odd mistake considering they are catering to “busy business person” who probably isn’t very web savvy.

    Make it simple stupid!

    I tried searching for an article about the bank of Canada from today, and there is nothing. They need to add more sources to this to make it relevant. From the looks of it they’ve got a few newswire and the financial times (what a shocker there) but nothing of relevancy. Either they expand their reach or i’ll stick with Google and Business.com as they index more than a couple sources.

  • I think they’ve done a great job on this – the categories are populated really quickly and it’s fast.

    The semantic features of this are really cool – definitely a lot of possibility here, and it’s executed well.

    Now, they just need a logo that doesn’t look like it’s from the mid-90’s :)

  • One thing that author ommitted in this article is the fact that the power of Endeca at Newssift/FT Search is based on entity extraction (Organizations, People and Places) by Nstein, NOT Endeca.

    Endeca is able to leverage abundance of metadata, created by Nstein’s text mining to deliver fast and accurate search results.

  • I disagree with so many positive comments about this initiative. I think that for such a significant investment (25 people, 2 years and 4 different technology providers) the results fall short to what could be done.

    Here is my first analysis:
    - The sentiment analysis is useless. It’s another example on how applying a sentiment to a full article instead of analyzing individually each comment doesn’t really provide any value to the reader or the decision maker.
    - Many of the tools listed in the article use only two (1. morphology and 2. grammatical) out of four (3. logic and 4. disambiguation) known semantic approaches. Statistical or other rule based approaches are used to approximate the 3rd and 4th methods. This can result in poor resolution generally but on sentiment in particular as is the case with Lexalytics. Consider the sentence “The seats in my lousy Chevy are actually really great”. Lexalytics sees “lousy” and “great” and correctly identifies them as adjectives (e.g. the grammatical approach) but fails to attribute them correctly to the nouns in the sentence (e.g.
    a semantic logic approach). The “lousy” adjective is easy since it is right in front of Chevy. It’s that pesky “great” which is as far away and to the right of “seats” as possible. Without a true semantic logic step Lexalytics would only tally up one positive adjective and one negative adjective and score it neutral. But that is not what this consumer is really saying. Overall they think their car is lousy and only one aspect of the car is great. That is not neutral in my book, it is negative.
    - The categorization seems to work only for economic and financial articles (and not even that well on borderline content).
    - The entity recognition has a so-so performance considering that FT for sure owns, and probably manages well, up-to-date lists. It is missing, probably to keep the precision high, many organizations and people. At the same time, just try a search for “GE rating” and look how many not normalized duplicates you see in the list of people. Or it lists the same organization in many different formats, sometimes incorrectly. A search for Microsoft gets you Microsoft Corporation but also Microsoft Windows, Microsoft Visual Studio, Microsoft Photosynth, etc. These are products not organizations.
    - They are using a basic html cleaner: people and companies are taken from the peripheral content too and not only (as it should) from the main and more relevant section of the content.

    I want to challenge here any content provider who wants to replicate what FT did. Please contact us at http://www.expertsystem.net and in 4 months, with 4 people and 1 core technology we can deliver a system that is better performing than FT at a fraction (really) of the cost.

    • We can feel your frustration Luca for not having a technology up to the task. Good luck.

      • Bill,

        Maybe you are right but aren’t you curious to take a look? I just need a 30 minutes demo. :)

        Thanks for your message anyway and I really hope you will be successful. We need more and more examples of smarter search and analysis websites.

    • Dear Luca,

      I work at Newssift as the Director of Language Technology. We appreciate any and all feedback on the Newssift beta as we work to refine the tool for our users. I’d like to respond and clarify some of the issues you raise:

      - “The sentiment analysis is useless. It’s another example on how applying a sentiment to a full article instead of analyzing individually each comment doesn’t really provide any value to the reader or the decision maker.”

      LB: This comment betrays a fundamental misunderstanding of the difference between the two semantic elements in our Sentiment engine – polarity and opinion. It may be the case that we have made this distinction insufficiently clear to users, but it is not the case that the results are “useless”. I think the disconnect is that “polarity”, which is the objective global positive or negative effect of an event, is virtually always applied on the document level whereas “opinion” can have both a document and an individual-level application.

      “Many of the tools listed in the article use only two (1. morphology and 2. grammatical) out of four (3. logic and 4. disambiguation) known semantic approaches. Statistical or other rule based approaches are used to approximate the 3rd and 4th methods. This can result in poor resolution generally but on sentiment in particular as is the case with Lexalytics. Consider the sentence “The seats in my lousy Chevy are actually really great”. Lexalytics sees “lousy” and “great” and correctly identifies them as adjectives (e.g. the grammatical approach) but fails to attribute them correctly to the nouns in the sentence (e.g.
      a semantic logic approach). The “lousy” adjective is easy since it is right in front of Chevy. It’s that pesky “great” which is as far away and to the right of “seats” as possible. Without a true semantic logic step Lexalytics would only tally up one positive adjective and one negative adjective and score it neutral. But that is not what this consumer is really saying. Overall they think their car is lousy and only one aspect of the car is great. That is not neutral in my book, it is negative.”

      LB: This example is antithetical to this writer’s previous point – it was claimed that document-level sentiment does not work yet an example of a sentence was taken out of context to demonstrate that point. As a matter of fact the lexical weight of those elements in the example would very likely be overshadowed by further positive discussions of the “Chevy” in the greater context of an article.

      - The categorization seems to work only for economic and financial articles (and not even that well on borderline content).

      LB: If the writer is referring to Business Topics “categorization”, this is exactly the point.. Newssift focuses on sources and articles that are relevant to a business audience. Articles that are not relevant to Business Topics, do not get classified into Business Topics. We do not refer to this as either a recall or a source issue.

      - The entity recognition has a so-so performance considering that FT for sure owns, and probably manages well, up-to-date lists. It is missing, probably to keep the precision high, many organizations and people. At the same time, just try a search for “GE rating” and look how many not normalized duplicates you see in the list of people. Or it lists the same organization in many different formats, sometimes incorrectly. A search for Microsoft gets you Microsoft Corporation but also Microsoft Windows, Microsoft Visual Studio, Microsoft Photosynth, etc. These are products not organizations.
      - They are using a basic html cleaner: people and companies are taken from the peripheral content too and not only (as it should) from the main and more relevant section of the content.

      LB: I am unable to duplicate this search experience most likely because of our improvements in annotation and source filtering. Newssift is in beta and our technology team continues to test entity extraction accuracy and our numbers continue to improve. We also consider consumer feedback , from people such as yourself, as we refine and improve Newssift.

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbugbug
Techcrunch on Facebook