When are links useful? Experiments in text classification
 
 
          
 RME Home 
 
 DCS Home 
 
 Research 
 
 Teaching 
 
 Publications 
 
 Contact 
 
 Outgoing 
 
 
 Email me
  

When are links useful? Experiments in text classification

M.J. Fisher and R.M.Everson
In: Proceedings of 25th BCS-IRSG European Colloquium on IR Research, 2003.

Abstract

Link analysis methods have become popular for information access tasks, especially information retrieval, where the link information in a document collection is used to complement the traditionally used content information. However, there has been little firm evidence to confirm the utility of link information. We show that link information can be useful when the document collection has a sufficiently high link density and links are of sufficiently high quality. We report experiments on text classification of the Cora and WebKB data sets using Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic Selection. Comparison with manually assigned classes shows that link information enhances classification in data with sufficiently high link density, but is detrimental to performance at low link densities or if the quality of the links is degraded. We introduce a new frequency-based method for selecting the most useful citations from a document collection for use in the model.


PDF  (177 kb)