Study on Tibetan-Chinese Comparable Corpus Extraction

Yuan SUN; Li-li GUO

doi:10.12783/dtcse/aics2016/8212

Study on Tibetan-Chinese Comparable Corpus Extraction

Yuan SUN, Li-li GUO

Abstract

Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus. It will promote the knowledge sharing between different languages. In this paper, we propose a method to extract Tibetan-Chinese comparable corpus. The main work is in the following: (1) Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites. (2) Extraction method based on entity link from naturally annotated resources. Finally, the experimental results show our approach is effective.

Keywords

Tibetan-Chinese, Comparable corpus, Multi-feature fusion algorithm

DOI
10.12783/dtcse/aics2016/8212

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Study on Tibetan-Chinese Comparable Corpus Extraction

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING