Full Text

J Intell Inf Syst (2006) 26: 2540DOI 10.1007/s10844-006-5449-8Fuzzy semantic tagging and flexible querying of XML

documents extracted from the WebPatrice Buche Juliette Dibie-Barthelemy

Ollivier Haemmerle Gaelle HignetteC

Springer Science + Business Media, LLC 2006Abstract The relational database model is widely used in real applications. We propose

a way of complementing such a database with an XML data warehouse. The approach

we propose is generic, and driven by a domain ontology. The XML data warehouse is

built from data extracted from the Web, which are semantically tagged using terms belonging to the domain ontology. The semantic tagging is fuzzy, since, instead of tagging

the values of the Web document with one value of the domain ontology, we propose

to use tags expressed in terms of a possibility distribution representing a set of possible terms, each term being weighted by a possibility degree. The querying of the XML

data warehouse is also fuzzy: the end-users can express their preferences by means of

fuzzy selection criteria. We present our approach on a first application domain: predictive

microbiology.Keywords Flexible querying . Semantic tagging . Fuzzy data1. IntroductionThe relational database model has been widely studied since the 80s and it is now the

most popular database model used in real applications because of its efficiency. In a large

area of application domains, thematic relational databases have been developed and they

often contain a great deal of reference data. A lot of those databases are built on the OpenP. Buche J. Dibie-Barthelemy G. Hignette

INRA, Departement Mathematiques et Informatique Appliquees, Unite Met@risk,

16 rue Claude Bernard, F-75231 Paris, Cedex 05e-mail: {patrice.buche, juliette.dibie, gaelle.hignette}@inapg.frO. HaemmerleGRIMM-ISYCOM, Universite de Toulouse le Mirail, Departement de Mathematiques-Informatique,

5 allees Antonio Machado, F-31058 Toulouse Cedexe-mail: [email protected]

Springer26 J Intell Inf Syst (2006) 26: 2540World Assumption, which means that the lack of answer does not imply that the answer

is negative but rather unknown. The corollary of the Open World Assumption is the incompleteness issue, which has been widely studied. Palliating the incompleteness issue can

be achieved in two main ways. The first one consists in enlarging the answers of a query,

for example by generalizing the query in order to give relevant answers when there is no

exact answer to the query. The second one consists in...

Show less

Fuzzy semantic tagging and flexible querying of XML documents extracted from the Web

Content area

Full Text

Suggested sources