Aspect Extraction of Web Pages Based on Analyzing
Hyperlink Structures



Nowadays, large amount of web pages are scattered on the web, and, the web become a huge database. As a way to extract information from web, we often use search engines. On these search engines, the user inputs keywords about the contents of the target web page, and the search engine return web pages matching the keywords. Then the user has to discovers the target web page from the title name and summary sentence of the result from the search. This search technique is more effective if searchers have some kind of prior knowledge about the contents of the target web pages and can chose suitable search keywords for it. However, when the user has only a fragmentary knowledge relevant to the target web page, it becomes very difficult to search web page, which the user really wants, with this method. To solving this problem, there is a method of adding the information about the side of the web pages which are returned as a result of the search. Consequently, a search candidate is scolded and it becomes easy to search. In this paper, we focus on "how a web page is recognized by others". We labeled the other's recognition as "the aspect of the web page". By using "aspect" for information retrieval, a search candidate is scolded and it becomes easy to search. However, the "aspect" cannot be taken out from the web page. It takes out from the contents of the web pages linked to the page. By this research, several side of a web page are described as aspects, and we proposed a method of how to extract the aspects from the web pages linked to the page. We also discuss the implementation issues of our prototype system.