An Agent Based Context Driven Focused Crawler: Architecture of CDFC
Even though a focused crawler offers a potential solution to the problem of exhaustive crawling , but owing to its keyword driven approach, it still downloads a large number of web pages irrespective of the fact whether they are logically related. Thus, the keyword based strategy alone is not sufficient for the design of a focused crawler whereas context relevance is more important as far as the user’s requirement is concerned. This paper proposes an architecture of a context driven focused crawler (CDFC) that searches and downloads only highly related web pages based on the context. The architecture is based on the augmented hypertext document system , which makes available the context of the documents at the server side. Since only relevant and credible documents are downloaded, a very small number in comparison, the proposed architecture significantly reduces the storage space, search time and network traffic as well.
Keywords: Search Engine, Focused Crawler, Context, Software Agents
Lecturer, Department of Computer Engineering, YMCA Institute of Engineering
Professor & Head, Department of Computer Engineering, YMCA Institute of Engineering