Data-driven Approaches to Authorâ€™s Profiling Identification for Russian Texts on Base of Complex Machine Learning Models in Combinations with Siamese Networks

ALEKSANDR SBOEV; IVAN MOLOSHNIKOV; DMITRY GUDOVSKIKH; ROMAN RYBKA

doi:10.12783/dtcse/ceic2018/24526

Data-driven Approaches to Authorâ€™s Profiling Identification for Russian Texts on Base of Complex Machine Learning Models in Combinations with Siamese Networks

ALEKSANDR SBOEV, IVAN MOLOSHNIKOV, DMITRY GUDOVSKIKH, ROMAN RYBKA

Abstract

In this work data-driven approaches to authorâ€™s profiling identification for Russian texts are investigated on base of a united data corpus. This corpus has been specially collected by crowdsourcing, and currently contains texts from 1161 men and 2043 women. The adaptation of complicated models, based on convolutional neural networks, gradient boosting methods, LSTM, Siamese networks along with different input data and features (morphological data, vector of character n-grams frequencies, Linguistic Inquiry and Word Count and others) to form the vector of derived features in order to identify gender and age of the author of text is described. The method to improve the accuracy using coding by the Siamese network is presented and analyzed.

Keywords

Data-driving modeling, Authorâ€™s profiling, Age detection, Gender identification, Deep neural networks, Siamese networks.Text

DOI
10.12783/dtcse/ceic2018/24526

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Data-driven Approaches to Authorâ€™s Profiling Identification for Russian Texts on Base of Complex Machine Learning Models in Combinations with Siamese Networks

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING