The folks at Yahoo Labs have managed to create an algorithm that is apparently one of the better automated abuse filters that have been created in recent times. Most automated filters work by detecting certain keywords, phrases, and commonly-used expressions, which has led to users finding creative ways to swear, troll, and bully online.
However with Yahoo’s creation, they applied machine learning to a dataset of abusive/offensive comments that were flagged by Yahoo’s editors. This allowed the algorithm to process words as “vectors” instead of treating words as either being good or bad. As a result, the algorithm could pick out strings of words that are considered offensive, even if the individual words itself were considered to be inoffensive.
So far the team has had pretty good success with their algorithm and has had a 90% accuracy rating. Whether or not Yahoo will implement this system remains to be seen, but as Alex Krasodomski-Jones, a researcher with the U.K.-based Centre for Analysis of Social Media, says, “Given 10 tweets, a group of humans will rarely all agree on which ones should be classed as abusive, so you can imagine how difficult it would be for a computer.”