館藏資源查詢 | 國立臺灣師範大學圖書館

檢索點：關鍵字： 限制可取得館藏

排序選項：

上一筆下一筆

作者	Houston, Jackson
	ProQuest Information and Learning Co
	University of Minnesota. Computer Science
書名	Hate Speech Detection in Twitter : A Selectively Trained Ensemble Method
出版項	2020


說明	1 online resource (46 pages)
文字	text
無媒介	computer
成冊	online resource
附註	Source: Masters Abstracts International, Volume: 82-02
	Advisor: Maclin, Richard
	Thesis (M.S.)--University of Minnesota, 2020
	Includes bibliographical references
	This thesis tests classiﬁcation models from Natural Language Processing and Machine learning in the task of identifying hate speech. We tested on multiple annotated data sets (Davidson et al. 2017) of tweet data labeled as hate speech, oﬀensive speech, both, or neither. Hate speech has become an unavoidable topic in the current social media environment due to poorly monitored comment sections and news feeds. With that, studies showing the negative aﬀects that it brings to people's well-being have also begun to surface (Gelber and McNamara 2015). Therefore, being able to identify hate speech accurately and precisely has grown in importance. Hate speech is often contextual, subjective, and a matter of opinion which makes creating an accurate model of such speech all the more diﬃcult. We have found that using an ensemble method of a classic Naive Bayes classiﬁer (Pedregosa et al. 2019c), Random Forest (Pedregosa et al. 2019b), K-Means (Pedregosa et al. 2019d), and Bernoulli (Pedregosa et al. 2019a) performed better than similar studies in precision, accuracy, recall, and f-score (Malmasi and Zampieri 2018). The ensemble performed better than using the strongest of the individual models, Random Forest, by a small but useful margin. We believe this to be due to the nuanced nature and context behind hate speech being more than one model can fully encompass. In addition to the ensemble strategy, training on data which was labeled as 'clean' (not hate speech or oﬀensive) or labeled 'dirty' (hate speech) with higher conﬁdence ratings increased the precision of our model by around 10% in some cases when compared to training on the complete data set including the tweets which have a blurred sentiment such as oﬀensive but not hate speech tweets. Having an accurate and precise model such as this will allow organizations to protect their users from such language to prevent the negative eﬀects of hate speech. Additionally, it will allow us to identify more hate speech tweets or statements to have more data to research in the future and ﬁnd deeper trends than simply the tweet text, such as replies, retweets, and user biographies
	Electronic reproduction. Ann Arbor, Mich. : ProQuest, 2021
	Mode of access: World Wide Web
主題	Computer science
	Ensemble
	Hate speech
	Machine learning
	Natural language processing
	Selective training
	Twitter
	Electronic books.
	0984
ISBN/ISSN	9798662501133

上一筆下一筆

加入個人書庫回報書目問題轉入EndNote

館藏地	索書號	條碼	處理狀態