الفهرس | Only 14 pages are availabe for public view |
Abstract This thesis proposes a framework for a Hidden Web crawler that demonstrates how to efficiently crawling, classifying and indexing hidden web pages in eight proposed phases. Two unique features of the framework are 1) the classification phase for grouping Hidden Web and Publicly Indexable Web (PIW) pages into distinct classes, so that making the crawler performs well in both the domain-specific and general mode of crawling and 2) the capability of dealing with single-attribute and multi-attribute databases. Three novel algorithms proposed in the framework. The effectiveness of proposed algorithms is evaluated through experiments using real web sites. |