A Framework to Derive Web-Page Context from Hyperlink Structure: Deriving Web-Page Context

To add a paper, Login.

Since an anchor is used in an HTML document to point to a related document/picture/media application, anchor-text becomes a potential resource to extract the information about an associated web page. However, sometimes anchor-texts are either not present at all or a single word text / an image anchor is contained in the anchor tag. In these situations, the text surrounding a link or the link-context assumes importance in the sense that it can be used to derive the context of the target web page. In this paper, a dataset of about hundred web pages of different categories from Open Directory Project (ODP) has been surveyed and analyzed. The results show that cohesive text surrounding the anchor in the form of full sentences and non-cohesive text present elsewhere in the in-link web pages provides rich semantic information about a target web page, which in turn can be considered as the context of the target web page. Since, generally, there are several in-links for a target web page, a filtering mechanism, based on the linguistic analysis of all context-sentences, which filters the best described context sentence, has been developed and is being described and evaluated in this paper.

Keywords: Hyperlinks, Anchor-text, In-links, Link-context, Cohesive Text, Linguistic Analysis
Stream: Knowledge and Technology
Presentation Type: Virtual Presentation in English
Paper: A paper has not yet been submitted.

Naresh Chauhan

Lecturer, Department of Computer Engineering, YMCA Institute of Engineering
Faridabad, Haryana, India

Naresh Chauhan received his M.Tech. (Information Technology) from GGS IndraPrastha University, Delhi in the year 2004. He has served in Bharat Electronics Ltd. and Motorola India Ltd. for 5 years. Presently, he is working as Lecturer in Deptt. of Computer Engg. at YMCA Institute of Engg., Faridabad for last 3 years. He is pursuing Ph.D. on Internet Technologies. His research interest includes Internet technologies, Software Engg., and Real time systems.

A.K. Sharma

Professor & Head, Department of Computer Engineering, YMCA Institute of Engineering
Faridabad, Haryana, India

Prof. A K Sharma received his M.Tech. (Computer Sci. & Tech.) with Hons. from University of Roorkee in the year 1989 and Ph.D. (Fuzzy Expert Systems) from JMI, New Delhi in the year 2000. From July 1992 to April 2002, he served as Assistant Professor and became Professor in Computer Engg. at YMCA Institute of Engg., Faridabad in April 2002. He received his second Ph.D. in Information Technology from ABV I.I.I.T. & M Gwalior in 2004. His research interest include Fuzzy Systems, Object oriented Programming, Knowledge Representation and Internet Technologies.

Ref: T08P0069