Content area
Full Text
J Intell Inf Syst (2006) 26: 2540DOI 10.1007/s10844-006-5449-8Fuzzy semantic tagging and flexible querying of XML
documents extracted from the WebPatrice Buche Juliette Dibie-Barthelemy
Ollivier Haemmerle Gaelle HignetteC
Springer Science + Business Media, LLC 2006Abstract The relational database model is widely used in real applications. We propose
a way of complementing such a database with an XML data warehouse. The approach
we propose is generic, and driven by a domain ontology. The XML data warehouse is
built from data extracted from the Web, which are semantically tagged using terms belonging to the domain ontology. The semantic tagging is fuzzy, since, instead of tagging
the values of the Web document with one value of the domain ontology, we propose
to use tags expressed in terms of a possibility distribution representing a set of possible terms, each term being weighted by a possibility degree. The querying of the XML
data warehouse is also fuzzy: the end-users can express their preferences by means of
fuzzy selection criteria. We present our approach on a first application domain: predictive
microbiology.Keywords Flexible querying . Semantic tagging . Fuzzy data1. IntroductionThe relational database model has been widely studied since the 80s and it is now the
most popular database model used in real applications because of its efficiency. In a large
area of application domains, thematic relational databases have been developed and they
often contain a great deal of reference data. A lot of those databases are built on the OpenP. Buche J. Dibie-Barthelemy G. Hignette
INRA, Departement Mathematiques et Informatique Appliquees, Unite Met@risk,
16 rue Claude Bernard, F-75231 Paris, Cedex 05e-mail: {patrice.buche, juliette.dibie, gaelle.hignette}@inapg.frO. HaemmerleGRIMM-ISYCOM, Universite de Toulouse le Mirail, Departement de Mathematiques-Informatique,
5 allees Antonio Machado, F-31058 Toulouse Cedexe-mail: [email protected]
Springer26 J Intell Inf Syst (2006) 26: 2540World Assumption, which means that the lack of answer does not imply that the answer
is negative but rather unknown. The corollary of the Open World Assumption is the incompleteness issue, which has been widely studied. Palliating the incompleteness issue can
be achieved in two main ways. The first one consists in enlarging the answers of a query,
for example by generalizing the query in order to give relevant answers when there is no
exact answer to the query. The second one consists in...