stmaiteam

Interpretable Cybersecurity Event Detection in Turkish, A Novel Dataset

Emre Tolga Ayan, Muhammed Said Zengin, Hacı Ali Duru, Gamze Deniz and Batuhan Bardak

Twitter is a social media platform with more than 200 million daily active users. Several studies monitor and analyze this rich real-time Twitter data to predict different events. Prediction of cybersecurity events from social media is of utmost importance for cybersecurity experts, companies, and Security Operation Centers (SOCs). In this paper, we study predicting cybersecurity events from Twitter in the Turkish language. However, due to the lack of a Turkish dataset in this domain, we collect a new Turkish dataset of cybersecurity-related tweets from Twitter and manually annotate a subset of 1.9K of them by cybersecurity analysts. The experimental results show that the BERT-based method outperforms all other baseline models. In addition, we attempt to explain the prediction results of the BERT model to increase transparency by using SHAP values. Related Material:

Dataset: Click Here