Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)

 (*)Syifa Khairunnisa Mail (Universitas Telkom, Bandung, Indonesia)
 Adiwijaya Adiwijaya (Universitas Telkom, Bandung, Indonesia)
 Said Al Faraby (Universitas Telkom, Bandung, Indonesia)

(*) Corresponding Author

Submitted: February 15, 2021; Published: April 25, 2021

Abstract

COVID-19 is a pandemic that is troubling many people. This has led to a lot of public comments on Twitter social media. The comments are used for sentiment analysis so that we know the polarity of the sentiment that appears, whether it is positive, negative, or neutral. The problem when using twitter data is that the tweet data still contains many non-standard words such as abbreviated writing due to the maximum limitation of characters that can be used in one tweet. Preprocessing is the most important initial stage in sentiment analysis when using Twitter data, because it affects the classification performance results. This study specifically discusses the preproceesing technique by performing several test scenarios for the combination of preprocessing techniques to determine which preprocessing technique produces the most optimal accuracy and its effect on sentiment analysis. Feature extraction using N-Gram and word weighting using TF-IDF. Mutual Information as a feature selection method. The classification method used is SVM because it is able to classify high-dimensional data according to the data used in this study, namely text data. The results of this study indicate that the best performance is obtained by using a combination of cleaning and stemming; and normalization of words, cleaning, and stemming with the same accuracy of 77.77%. the use of unigram results in higher accuracy compared to bigram. Mutual Information is able to reduce overfitting problems by reducing irrelevant features so that train and test accuracy is quite stable

Keywords


COVID-19; Twitter;Sentiment Analysis; Preprocessing; Support Vector Machine

Full Text:

PDF


Article Metrics

Abstract view : 3517 times
PDF - 4056 times

References

WHO. What is COVID-19?. who.int. https://www.who.int/news-room/q-a-detail/q-a-coronaviruses (accessed March, 28, 2020)

Situasi Terkini Perkembangan Coronavirus Disease (COVID-19) 18 Juni 2020. infeksiemerging.kemkes.go.id. kemkes.go.id. https://infeksiemerging.kemkes.go.id/situasi-infeksi-emerging/situasi-terkini-perkembangan-coronavirus-disease-covid-19-18-juni-2020 (accessed , June 18, 2020).

Rana, S, & Singh. A, Comparative Analysis of Sentiment Orientation Using SVM and Naïve Bayes techniques, 2016 2nd International Conference on Next Generation Computing Technologies, pp. 106-111, Oct. 2016.

Agastya, I. M. A. Pengaruh Stemmer Bahasa Indonesia terhadap Performa Analisis Sentimen Terjemahan Ulasan Film. Jurnal TEKNOKOMPAK, vol. 12, no. 1, pp. 18-23, Feb. 2018.

Nhlabano, V. V. & Lutu, P. E. N. (2018). Impact of Text Pre-processing on the Performance of Sentiment Analysis Models for Social Media Data. 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD), 2018, doi: 10.1109/ICABCD.2018.8465135.

L. G. Irham, A., Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,” J. Media Inform. Budidarma, vol. 3, no. 4, pp. 284–292, 2019.

Krouska. A, Troussas. C, and Virvou. M, “The effect of preprocessing techniques on Twitter Sentiment Analysis,” in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), 2016.

Junita, V. & Bachtiar, F. A. Klasifikasi Aktivitas Manusia menggunakan Algoritme Decision Tree C4.5 dan Information Gain untuk Seleksi Fitur. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 10, pp. 9426-9433, Oct. 2019.

Nugroho, A. Analisis Sentimen Pada Media Sosial Twitter Menggunakan Naive Bayes Classifier Dengan Ekstrasi Fitur N-Gram. Jurnal Sains Komputer & Informatika (J-SAKTI), vol. 2, no. 2, pp. 200-209, Sep. 2018.

Putra. M. F, Anisa. H, & Diyas. P, Analisis Pengaruh Normalisasi, TF-IDF, Pemilihan Feature-set Terhadap Klasifikasi Sentimen Menggunakan Maximum Entropy (Studi Kasus : Grab dan Gojek), e-Proceeding of Engineering, vol. 6, no.2, pp. 8520-8529, Aug. 2019.

Hamzah. A. Deteksi Bahasa untuk Dokumen Teks Berbahasa Indonesia. Seminar Nasional Informatika (semnasIF 2010), pp. A5-A13, Mei. 2010.

Ahuja, R. et al. (2019). The Impact of Features Extraction on the Sentiment Analysis. International Conference on Pervasive Computing Advances and Applications, Procedia Computer Science, 2019, pp. 341-348.

Nurfikri, F. S., MS Mubarok. & adiwijaya. News Topic Classification Using Mutual Information and Bayesian Network. In 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 162-166. IEEE, 2018.

I. Mathilda Yulietha and S. Al Faraby. Klasifikasi Sentimen Review Film Menggunakan Algoritma Support Vector Machine,” e-Proceeding Eng., vol. 4, no. 3, pp. 4740–4750, 2017.

Adiwijaya, U. N. Wisesty, E. Lisnawati, A. Aditsania, D. S. Kusumo, "Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification", Journal of Computer Science vol.14, no.11, pp.1521-1530, Nov. 2018.

Cahyanti, F. E., Adiwijaya, & S. Al Faraby. On The Feature Extraction For Sentiment Analysis of Movie Reviews Based on SVM. 8th International Conference on Information and Communication Technology (ICoICT) ), Yogyakarta, Indonesia, Jun. 2020.

Said Al Farabym Eliza Riviera R. J, Andina Kusumaningrum dan Adiwijaya, “Classification of hadith into positive suggestion, negative suggestion, and information, IOP, 2018.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.