Improving term extraction with linguistic analysis in the biomedical domain

Authors: Wiktoria Golik, Robert Bossy, Zorana Ratkovic, Claire N├ędellec

Research in Computing Science, Vol. 70, pp. 157-172, 2013.

Abstract: This paper presents a linguistic-based approach to term extraction from corpora in the biomedical domain. The method is based on an analysis of terms and their context that verify linguistic constraints. It focuses on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrichment of terminologies and ontologies. We report on the evaluations we conducted by applying two complementary strategies, using a reference terminology and a manual validation. They were applied to two corpora of differing genres and Life Science domains, namely pharmacology patents and animal physiology scientific articles. Our work shows that the linguistic analysis-based developments significantly improve the extraction results. The method is especially efficient when dealing with gerunds and to prepositional modifiers.

Keywords: term extraction, biomedical corpora, linguistic approach

PDF: Improving term extraction with linguistic analysis in the biomedical domain
PDF: Improving term extraction with linguistic analysis in the biomedical domain