Aspect Extraction of Web Pages Based on Analyzing
Hyperlink Structures
Nowadays, large amount of web pages are scattered on the web, and, the web
become a huge database. As a way to extract information from web,
we often use search engines. On these search engines, the user
inputs keywords about the contents of the target web page, and the search
engine return web pages matching the keywords. Then the user has to
discovers the target web page from the title name and summary
sentence of the result from the search. This search technique is more
effective if searchers have some kind of prior knowledge about the contents of the target
web pages and can chose suitable search keywords for it.
However, when the user has only a fragmentary knowledge relevant to the
target web page, it becomes very difficult to search web page, which the user
really wants, with this method.
To solving this problem, there is a method of adding the information
about the side of the web pages which are returned as a result of the
search. Consequently, a search candidate is scolded and it becomes easy to
search.
In this paper, we focus on "how a web page is recognized by
others". We labeled the other's recognition as "the aspect of the web
page". By using "aspect" for information retrieval, a search candidate is
scolded and it becomes easy to search.
However, the "aspect" cannot be taken out from the web page. It takes out from
the contents of the web pages linked to the page. By this research,
several side of a web page are described as aspects, and we proposed a
method of how to extract the aspects from the web pages linked to the
page. We also discuss the implementation issues of our prototype system.