An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Health CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as "Studies show that stress induces insomnia" in which the cause ('stress') and effect ('insomnia') are labeled.
- Download the dataset from Zenodo.