MetaTOC stay on top of your field, easily

Large‐scale extraction of drug–disease pairs from the medical literature

, , ,

Journal of the American Society for Information Science and Technology

Published online on

Abstract

Automatic extraction of large‐scale and accurate drug–disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time‐consuming to manually label drug–disease pairs datasets. There are many drug–disease pairs buried in free text. In this work, we first leverage a pattern‐based method to automatically extract drug–disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug–disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug–disease pair. In the experiments, we use the method to extract treatment and inducement drug–disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug–disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug–disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug–disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine‐grained evaluation of extracting frequent pairs.