作者Houston, Jackson
ProQuest Information and Learning Co
University of Minnesota. Computer Science
書名Hate Speech Detection in Twitter : A Selectively Trained Ensemble Method
出版項2020
說明1 online resource (46 pages)
文字text
無媒介computer
成冊online resource
附註Source: Masters Abstracts International, Volume: 82-02
Advisor: Maclin, Richard
Thesis (M.S.)--University of Minnesota, 2020
Includes bibliographical references
This thesis tests classification models from Natural Language Processing and Machine learning in the task of identifying hate speech. We tested on multiple annotated data sets (Davidson et al. 2017) of tweet data labeled as hate speech, offensive speech, both, or neither. Hate speech has become an unavoidable topic in the current social media environment due to poorly monitored comment sections and news feeds. With that, studies showing the negative affects that it brings to people's well-being have also begun to surface (Gelber and McNamara 2015). Therefore, being able to identify hate speech accurately and precisely has grown in importance. Hate speech is often contextual, subjective, and a matter of opinion which makes creating an accurate model of such speech all the more difficult. We have found that using an ensemble method of a classic Naive Bayes classifier (Pedregosa et al. 2019c), Random Forest (Pedregosa et al. 2019b), K-Means (Pedregosa et al. 2019d), and Bernoulli (Pedregosa et al. 2019a) performed better than similar studies in precision, accuracy, recall, and f-score (Malmasi and Zampieri 2018). The ensemble performed better than using the strongest of the individual models, Random Forest, by a small but useful margin. We believe this to be due to the nuanced nature and context behind hate speech being more than one model can fully encompass. In addition to the ensemble strategy, training on data which was labeled as 'clean' (not hate speech or offensive) or labeled 'dirty' (hate speech) with higher confidence ratings increased the precision of our model by around 10% in some cases when compared to training on the complete data set including the tweets which have a blurred sentiment such as offensive but not hate speech tweets. Having an accurate and precise model such as this will allow organizations to protect their users from such language to prevent the negative effects of hate speech. Additionally, it will allow us to identify more hate speech tweets or statements to have more data to research in the future and find deeper trends than simply the tweet text, such as replies, retweets, and user biographies
Electronic reproduction. Ann Arbor, Mich. : ProQuest, 2021
Mode of access: World Wide Web
主題Computer science
Ensemble
Hate speech
Machine learning
Natural language processing
Selective training
Twitter
Electronic books.
0984
ISBN/ISSN9798662501133
QRCode
相關連結: click for full text (PQDT) (網址狀態查詢中....)
館藏地 索書號 條碼 處理狀態  

Go to Top