Construction of Word Network from Traditional Chinese Medicine Corpus
Abstract
In this paper, we created an automatic quanticized traditional Chinese medicine (TCM) term network with the measurement of cosine distance. After scanning over the corpus, we got a set of word vectors whose relationships could be measured. After clustering, we obtained a three-level network as a category tree. Leaves stand for different types of words and we got clusters like herbs, diseases, theories of medicine etc. Of all categories, we selected words nearest to the center of each cluster and invited our experts to evaluate whether a word is a correct uncollected TCM term and got a new word extraction rate of around 70%. Our network was almost completely machine-generated so that it is much more efficient and might lead us to several new approaches of TCM with the knowledge from our network.
DOI
10.12783/dtcse/itms2016/9460
10.12783/dtcse/itms2016/9460
Refbacks
- There are currently no refbacks.