Profanity use in online communities
http://dl.acm.org/citation.cfm?id=2208610
Authors:
Sara Sood- Pomona College, Claremont, CA, USA, Professor at Pomona College. Research includes understanding the expression and impact of emotion in online communication.

Judd Antin- Yahoo! Research, Santa Clara, California, United States, research includes influence of information about competence on contributions in online social dilemmas.

Elizabeth Churchill- Yahoo! Research, Santa Clara, California, United States, research includes social media, computer-mediated communication, mobile and personal technologies, and ubiquitous computing.

Summary:
Profanity is a major problem in most online communities. Any site developer who wants to maintain and grow their community tries to create a profanity filter to erase these words from being used. If they do not have filters, it sends a message to the community that profanity is acceptable and might drive away users. Antin, Churchill, and Sood are researching three questions: current profanity filter systems on websites, does profanity occur more often in some communities than others, and the social context of profanity in some communities. These three gathered data from Yahoo Buzz! which included comments and other meta data. They then employed Amazon Mechanical Turk to gather workers to check each comment for profanity. Once the workers reached a consensus, the authors tried different profanity filter machines to see which ones worked the best. They determined list-based approaches perform poorly because of misspellings, quickly shifting systems, and context-specific nature of the profanity.
They decided to further explore the disguised profanity part by specifically looking at the @ character. The @ character is used for emails, twitter-like conversations, and disguising profanity. They developed an algorithm to categorize the @ character into these differing categories. They came to the conclusion that 40% of all @ signs were being used inappropriately. The authors then checked which topic that the profanity was most likely to be used in and they determined it was politics. Out of the profanity used to politics, 27.14% was used as an insult, 4.83 was used as a non-insult, 31.12 as a directed insult, and 5.79 as a non-directed insult.
The next step to this study once they stemmed where and how often the profanity occurs is it check in what context are these words occur. Nearly all of the profanity used related to negative rants. They concluded that current profanity filter systems do not perform nearly a satisfactory enough job. This goes back to the fact that there exist so many different ways a user can bypass them. This means @ can be used to easily fool systems in unique ways. The second point they concluded was that profanity systems were not custom-tailored to communities. Profanity is used within different areas of interest such as politics. These words are almost always used in negative rants or insults which concludes they barely add anything to the context. There needs to be new innovations on this front to fix the negative effects profanity has on communities.
Related Papers:
My paper is not very novel as many of the related works that I gathered delve into the same topic. The first paper linked, Designing for improved social responsibility, user participation and content in on-line communities, discusses how websites design specific systems to facilitate communal growth. The second paper discusses satire detection on websites and if it is detectable or not. They are researching the same area in way since they are checking whatever is posted on a site and determining if it is one thing or another. Filtering objectionable internet content talks about just what the title describes.
Evaluation:
Sood, Antin, and Churchill evaluated this project averagely. They gathered mostly subjective quantitative data throughout their test. It was mainly quantitative because they gathered a bunch of comment data and produced graphs and numbers on what Yahoo Buzz! indicated. I believe that the data was subjective because they only had data from one site. The profanity comments might have been made differently on different sites. The study did not really have much room to be objective because of the nature of the research paper. There was some qualitative because of how the Amazon Mechanical Turk workers came to a consensus on if comments possessed profanity.
Discussion:
I enjoyed this paper because I am interested in the microcosms of varying websites. I do highly believe though that there needed to be more testing to conclude what they have concluded. Politics can have some heated discussions, but certain websites are build around that fact. This means the data will be skewed. In the future, I think there may be some truly amazing algorithms to filter out any time of combination of profanity slang in comments but it will be different. The more specific we make it, the more we might infringe on words that were not meant to have profanity.
No comments:
Post a Comment