Content area
Full Text
British Library and information schools: the research of the Department of Information Science, City University London
Edited by David Bawden
1. Introduction
Web search is an important part of the working life of an information professional, but a little understood issue is one of evaluation. How do such professionals evaluate the retrieval effectiveness of a given search engine with regard to a particular information need, or compare the retrieval effectiveness of several search engines? There has been some research in web search evaluation, but few attempts to practically apply evaluation methods in a real environment. There is a need for structured and formal techniques for evaluation that yield quantitative data, in which searchers can clearly see differences in search engines. Such techniques have been around for over 40 years ([1] Aitchison and Cleverdon, 1963) using precision and recall measures, but these techniques do not tackle all the issues that may occur when evaluating web search. In this paper we show why such traditional IR measures on their own do not provide enough information for the researcher when evaluating web search, and to show how diagnostic measures (such as recording the number of broken links) can be used to augment such traditional measures. The paper puts forward a methodology which was initially derived while working in the commercial sector, and has been subsequently refined over six years in teaching search and evaluation to Library and Information Science postgraduate students. We argue that this methodology gives a much better idea of the retrieval effectiveness of web search engines, as well as the ability to examine other processes in web search (such as crawling) which are not part of online search and are not addressed by such measures as recall and precision.
2. Previous research in evaluation of web search
There is a significant amount of interest in the information retrieval research community in evaluating web search, including various tracks in the TREC conference series including the VLC2 ([11] Hawking et al. , 1999) and Web Tracks ([6] Craswell and Hawking, 2005). Strong arguments are made for the use of scientific methods to evaluate web search either in a live environment ([12] Hawking et al. , 2001) or on a static frozen collection ([6] Craswell and Hawking,...