Sunday, July 20, 2008

We need semantics in scientific search

Scientific online publishers must develop expertise in semantic search in their technology teams and hire gurus in semantic web from academia or corporate world. That's one of the ways we will enhance the value of scientific content. Peter Mika's article describes why we need semantics in search. Peter is a researcher at Yahoo! Research

Here is some of the limitations of the current search platforms :

"Even though search is considered a functional technology, there are limits to a syntax-based approach. The following list shows some examples of these limitations.

  • It is almost impossible to return search results that relate to the secondary sense of a term—especially if a dominant sense exists—for example, try searching for George Bush the beer brewer as compared to the President.
  • The capabilities of computational advertising, which is largely also an IR problem (for example, retrieving matching ads from a fixed inventory), are clearly impacted because of the sparsity of advertisements.
  • When no clear key exists, search engines are unable to perform queries on descriptions of objects. For example, try searching for the author of this article with the keywords ‘semantic web researcher working for yahoo.’
  • Current search technology is unable to satisfy any complex queries requiring information integration such as analysis, prediction, scheduling, etc. An example of such integration-based tasks is opinion mining regarding products or services. While there have been some successes in opinion mining with pure sentiment analysis, it is often the case that users like to know what specific aspects of a product or service are being described in positive or negative terms and to have the search results appear aggregated and organized. Information integration is not possible without structured representations of content.
  • Multimedia queries are also difficult to answer, as multimedia objects are typically described with only a few keywords (tagging) or sentences. This is typically too little text for the statistical methods of IR to be effective."

