DOES GOOGLE TRANSLATE AFFECT LEXICON-BASED SENTIMENT ANALYSIS OF MALAY SOCIAL MEDIA TEXT?
DOI:
https://doi.org/10.24191/mjoc.v7i2.19486Keywords:
Facebook, Google Translate, Lexicon-Based, Machine Translation, Sentiment AnalysisAbstract
There are a lot of sentiment resources for English, however, there are limited resources in a resource-poor language like the Malay language. One approach to improving sentiment analysis is to translate the focus-language text to a resource-rich language such as English by using Machine Translation (MT). However, when text is translated from one language into another, sentiment is preserved to varying degrees. The objective of this paper is to assess the performance of MT in Google Translate towards sentiment analysis of Malay social media text on Facebook pages of a caregiver of a person with autism. A total of 3,525 Facebook comments in the Malay language were gathered from May to October 2020. The comments were manually translated to English to create dataset_manual. Google Translate was used to automatically translate the Malay comments into English creating dataset_auto. The sentiment polarity of each comment was labeled as a ground truth dataset. A lexicon-based approach was used to extract sentiment from both dataset_manual and dataset_auto to determine the sentiment polarity. Results show that 65.9% of sentiment analysis using dataset_auto significantly reduces sentiment analysis. The sentiment expressions are often mistranslated into neutral expressions when translated. Meanwhile, sentiment analysis using dataset_manual was still able to capture the sentiment of Facebook comment without taking the comment out of context where 92.5% shows positive sentiment towards comments related to autism spectrum disorder.
References
Almatarneh, S., & Gamallo P. (2018). A lexicon-based method to search for extreme opinions. PLOS ONE 13(5): e0197816. https://doi.org/10.1371/journal.pone.0197816
Atteveldt, W.V., Velden, M., & Boukes, M. (2021), The validity of sentiment snalysis: Comparing manualannotation, crowd-coding, dictionary approaches, and machine learning algorithms Communication Methods and Measures, 15:2, 121-140.
Balahur, A., & Turchi, M. (2014). Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. In Journal of Computer Speech Language, vol. 28, Issue 1, (pp. 56-75).
Barhoumi, A., Aloulou, C., Camelin, N., Est`eve, Y., & Belguith, L. (2018). Arabic Sentiment analysis: An empirical study of machine translation’s impact. In Language Processing and Knowledge Management International Conference (LPKM-2018), Sfax, Tunisia.
Chaovalit P., & Zhou L. (eds). (2005). Movie review mining: a comparison between supervised and unsupervised classification approaches. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences, IEEE. 112c-112c
Chekima, K., & Alfred, R. (2018). Sentiment analysis of Malay social media text. In International Conference on Computational Science and Technology, (pp. 205-219),Singapore.
Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., & Zupan, B. (2013). Orange: Data mining toolbox in Python. Journal of Machine Learning Research, 14(1), 2349-2353.
De Vries, E., Schoonvelde, M., & Schumacher, G. (2018). No longer lost in translation: Evidence that Google Translate works for comparative bag-of-words text applications’, Political Analysis, 26(4), (pp. 417-430).
Lohar, P., Afli, H., & Way, A. (2017). Maintaining sentiment polarity in translation of usergenerated content, The Prague Bulletin of Mathematical Linguistics, 108(1), (pp. 73–84).
Puteh, M., Isa, N., Puteh, S., & Redzuan, N.A. (2013). Sentiment mining of Malay newspaper (SAMNews) using artificial immune system. In Proceedings of the World Congress on Engineering, vol. III, London.
Poncelas. A., Lohar, P., & Way, A. (2020). The impact of indirect machine translation on sentiment classification. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas, vol. 1, (pp. 78-88).
Rafiza A.B., & Jazredal A.B. (2019). WhatsApp as a source of support for parents of autistic children. In International Journal of Recent Technologyand Engineering (IJRTE), ISSN:2277-3878, vol. 8, Issue-2S9.
Razak, Z. I., Rahman, S.A., Mutalib, S., & Hamid N. H. A. (2018). Web Mining in Classifying Youth Emotions. Malaysian Journal of Computing, 3 (1): 1–11.
Saadany, H., Orasan, C., Quintana, R. C., Carmo, F.D., & Zilio, L. (2021). Challenges in translation of emotions in multilingual user-generatedcontent: Twitter as a case study, arXiv preprint arXiv:2106.10719.
Saif, M.M., Salameh, M., & Kiritchenko, S. (2016). How translation alters sentiment. In Journal of Artificial Intelligent Research, 55, (pp 95-130).
Shaiful Bakhtiar, R., Muhammad, H.R., Normaly, K.I., Nurazzah, A.R., Syed, A.A. & Hayati, A.R. (2019). Experiment with lexicon-based techniques on domain-specific Malay document sentiment analysis. In 2019 IEEE 9th Symposium on Computer Applications & Industrial Electronics (ISCAIE), Malaysia.
Shamsudin, N.F., Basiron, H., & Sa’aya, Z. (2015). Sentiment classification of unstructured data using lexical based techniques. Intelligence and Interactivity for Future Computing. vol. 77, no. 18.
Shamsudin, N.F., Basiron, H., & Sa’aya, Z. (2016). Lexical based sentiment analysis-verb, adverb & negation. J. Telecommunication Electron, Computer Eng, vol. 8, no. 2, (pp. 161-166).
Tebbifakhr, A., Bentivogli, L., Negri, M., & Turchi, M. (2019). Machine translation for machines: the sentiment classification use case. In the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. (pp. 1368– 1374), Hong Kong, China.
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. In Association for Computational Linguistics, 37(2), (pp. 267-307).
Wan, X. (2008). Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, (pp. 553–561), Honolulu.
Zabha, N.I., Ayop, Z., Anawar, S., Hamid, E., & Abidin, Z.Z., (2019). Developing crosslingual sentiment analysis of Malay Twitter data using lexicon-based approach. International Journal of Advanced Computer Science and Applications, 10(1).
Zamani, N. A. M., Zainal Abidin, S. Z., Omar, N., & Abiden, M. Z. Z. (2014). Sentiment analysis: Determining people's emotions in Facebook. In Proceedings of the 13th International Conference on Applied Computer and Applied Computational Science 2014, Kuala Lumpur.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol, Siti Arpah Ahmad

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




