Study on Tibetan-Chinese Comparable Corpus Extraction

Yuan SUN, Li-li GUO

Abstract


Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus. It will promote the knowledge sharing between different languages. In this paper, we propose a method to extract Tibetan-Chinese comparable corpus. The main work is in the following: (1) Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites. (2) Extraction method based on entity link from naturally annotated resources. Finally, the experimental results show our approach is effective.

Keywords


Tibetan-Chinese, Comparable corpus, Multi-feature fusion algorithm


DOI
10.12783/dtcse/aics2016/8212

Refbacks

  • There are currently no refbacks.