Data-driven Approaches to Author’s Profiling Identification for Russian Texts on Base of Complex Machine Learning Models in Combinations with Siamese Networks
Abstract
In this work data-driven approaches to author’s profiling identification for Russian texts are investigated on base of a united data corpus. This corpus has been specially collected by crowdsourcing, and currently contains texts from 1161 men and 2043 women. The adaptation of complicated models, based on convolutional neural networks, gradient boosting methods, LSTM, Siamese networks along with different input data and features (morphological data, vector of character n-grams frequencies, Linguistic Inquiry and Word Count and others) to form the vector of derived features in order to identify gender and age of the author of text is described. The method to improve the accuracy using coding by the Siamese network is presented and analyzed.
Keywords
Data-driving modeling, Author’s profiling, Age detection, Gender identification, Deep neural networks, Siamese networks.Text
DOI
10.12783/dtcse/ceic2018/24526
10.12783/dtcse/ceic2018/24526
Refbacks
- There are currently no refbacks.