Research: Classifying illegal content on dark web forums
The activities instigated by cybercriminals on the dark web are increasingly becoming one of the most critical issues influencing societies all over the world. The anonymity and privacy offered via the dark web have qualified it as the preferred online environment for a myriad of illegal activities. Cybercriminals are currently using darknets for various forms of illegal activities including illicit drug trading, forgery, pedophilia, piracy, terrorism, and even hiring hitmen.
A recently published paper explores illegal activities taking place on the dark web, especially dark web forums. Authors of the paper used the Tor browser to analyze the content posted by users on various dark web forums. The paper also delves into the most popular illicit subjects that forums users are chatting about. An analysis was performed to identify the most popular illegal activities being discussed on these forums such as drug trading, forgery, terrorism, piracy, and trading of stolen documents. Throughout this article, we will take a look at the interesting information presented via this paper.
Methodology of analysis of content of dark web forums:
This research conducted a detailed analysis of the structure as well as the content of dark web forums. Interestingly enough, this study was the first ever to systemically analyze the typology of dark web forums. Authors of the paper developed a special spider to crawl dark web pages and obtain a graph of hyperlinks connecting various forum pages.
A supervised learning approach was implemented to categorize the content of dark web forums on a per page basis as well as a per forum (hidden service) basis. More specifically, an SVM classifier combined with active learning was used to label forum’s content in terms of topic and legality. Surprisingly enough, around 60% of dark web forums’ content was legal. Dark web forums were classified into labels via labeling of their individual pages.
Types of content on dark web forums:
The following represent the labels, or types, of illegal content on dark web forums, which were obtained via the methodology outlined above. Each label includes a percentage of its associated content with relation to the total content of dark web forums studied:
– Pornography (3.9%): these include pornographic content of any form including pedophilic content. This category includes forum threads that include links to download videos and images. It also includes discussions that involve users’ sexual experiences and fantasies.
– Counterfeit (7.3%): these include forum threads that offer counterfeited documents or describe how counterfeits can be created. Discussions mainly involve credit cards, IDs, and passports.
– Drugs (5.2%): these include forum pages and threads that mainly discuss the best drug vendors on darknet marketplaces. Also, sometimes users offered drugs for sale via forum threads. This category also includes discussions about how to quit drugs.
– Fraud (8.1%): these include forum threads that discuss various frauds, e.g. bitcoin fraud attempts, successful scams, and similar experiences. This category also involves discussions about stolen credit cards, hacked PayPal accounts, etc.
– Hacking (13.6%): this label includes forum threads that discuss any topic related to compromising computer or smartphone systems. These include white hat as well as black hat techniques. Occasionally, users marketed their hacking services via forum threads.
– Money laundering (3.5%): this label includes discussions and offers for money laundering services. These often involve cryptocurrencies earned via illegal means.
– Terrorism (2.1%): this category mainly includes forum pages and threads managed by Islamic extremists. Discussions sometimes aim at recruiting new members for Jihadist terrorist groups.
– Weapons (3.2%): this label involves forum threads that discuss weapons or offer them for sale. Discussed and sold weapons can involve rifles all the way up to bombs.
– Murder (0.5%) – Hitmen for hire: this label includes forum pages and threads promising to execute an assassination in exchange for cryptocurrency. Most of these threads are scams as indicated by most darknet users.
The dark web is a shelter for individuals discussing embarrassing, illegal, or controversial topics. An example that illustrates this is a group created to support pedophiles via creating content and encouraging its members to engage in discussions that can help them live with their rather awkward predisposition without inflicting harm to others and without breaking the law. This example proves that it can be insufficient to just use one of the aforementioned labels to assess the legality of content due to the fact that most labels can include legal as well as illegal content. Accordingly, a binary classifier was used to assess the legality of forum contents. This classifier relies on an SVM to classify content via a keyword based label classifier.
It is essential to note that determining the legality of forum content represents a rather daunting task for several reasons. First, laws and legal systems differ greatly from one country to another. Moreover, occasionally the content lies in a relatively grey area – between what is legal and what is illegal. As the authors’ goal was to present a rough estimate on the legality of dark web forums’ content, they followed a conservative approach when legality was assessed. When the content’s purpose is clearly illegal, it was labeled as illegal, even though it might not seem technically so. As such, some might consider this classification as biased.
This paper includes comprehensive data that classifies illegal content on dark web forums, which could be extremely helpful if made publically available. As such, future work can use this data to develop a dark web search engine that can search indexed forum data, similar to search engines such as TORCH. Moreover, this data can help law enforcement agencies in identifying pedophiles and cybercriminals.