Detecting the association of health problems in consumer-level medical text
Journal of Information Science
Published online on October 19, 2016
Abstract
Consumers usually do not know the complicated links between related health problems. This fact may cause troubles when they wish to seek complete information regarding such problems. This study detects the associations among health problems by extending the meaning of health terms with methods based on the latent Dirichlet allocation (LDA) probability topic model, the Medical Subject Headings (MeSH) thesaurus structure and the Wikipedia concept mapping. The terms represented health problems are selected from and extended by the consumer-level medical text. The vocabulary is different between the consumer-level and the professional-level medical text. Thus, the findings can be easily understood by the general public and be suitable to consumer-oriented applications. The methods were evaluated in two ways: (1) correlation analysis with expert rating to show the overall performance and (2) P@N to reflect the ability of detecting strong associations. The LDA topic-model-based method outperforms the other two types. The judgment incongruence between the best method and the expert ratings has been examined, and the evidence shows that the automatic method sometimes detects real associations beyond those identified by human experts.