IJCCE 2012 Vol.1(1): 47-50 ISSN: 2010-3743
DOI: 10.7763/IJCCE.2012.V1.14
DOI: 10.7763/IJCCE.2012.V1.14
Author’s Native Language Identification from Web-Based Texts
Parham Tofıghı, Cemal Köse, and Leila Rouka
Abstract—With the rapid growth of Internet technologies and applications, Text is still the most common Internet medium. Examples of this include social networking applications and web applications are also mostly text based. We developed a framework to determine an anonymous author’s native language for short length, multi-genre such as the ones found in many Internet applications. In this framework, four types of feature sets (lexical, syntactic, structural, and content-specific features) are extracted and three machine learning algorithms (C4.5 decision tree, support vector machine and Naïve Bayes) are designed for author’s native language identification based on the proposed features. To experiment this framework, we used English, Persian, Turkish and German online news texts. The experimental results showed that the proposed approach was able to identify author’s native language in web-based texts with satisfactory accuracy of 70% to 80%. And Support vector machines outperformed the other two classification techniques in our experiments.
Index Terms—Native language identification, web-based texts, stylometry, classification techniques
The authors are with the Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University,61080 Trabzon, TURKEY (e-mail: parham.tofighi@gmail.com)
Cite: Parham Tofıghı, Cemal Köse, and Leila Rouka, "Author’s Native Language Identification from Web-Based Texts," International Journal of Computer and Communication Engineering vol. 1, no. 1, pp. 47-50, 2012.
PREVIOUS PAPER
Automatic Translation of Heterogeneous Data Models
NEXT PAPER
Last page
General Information
ISSN: 2010-3743 (Online)
Abbreviated Title: Int. J. Comput. Commun. Eng.
Frequency: Quarterly
DOI: 10.17706/IJCCE
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: INSPEC, CNKI, Google Scholar, Crossref, EBSCO, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
-
Dec 29, 2021 News!
IJCCE Vol. 10, No. 1 - Vol. 10, No. 2 have been indexed by Inspec, created by the Institution of Engineering and Tech.! [Click]
-
Mar 17, 2022 News!
IJCCE Vol.11, No.2 is published with online version! [Click]
-
Dec 29, 2021 News!
The dois of published papers in Vol. 9, No. 3 - Vol. 10, No. 4 have been validated by Crossref.
-
Dec 29, 2021 News!
IJCCE Vol.11, No.1 is published with online version! [Click]
-
Sep 16, 2021 News!
IJCCE Vol.10, No.4 is published with online version! [Click]
- Read more>>