Research: ToRank – A novel system for identification of suspicious and malicious Tor hidden services
The Tor network currently hosts thousands of hidden services. A significant proportion of these hidden services can be associated with harmful and suspicious activities. Many onion domains have been linked to distribution of malware. In 2013, Freedom Hosting, a company that hosted Tor hidden services, was found to spread malware on all hidden services hosted on its servers. This malware was found to exploit a critical vulnerability that existed on certain Firefox versions that were included in the Tor browser bundle. This malware deanonymized users’ Windows hostnames and MAC addresses, then routed this data to a Virginia based server, unmasking the real IP addresses of users. Since then, several similar incidents have been reported.
Dozens of other Tor hidden services were associated with child pornography content. During the past few years, the FBI, in addition to international law enforcement agencies, have managed to convict dozens of operators of Tor hidden services linked to child pornography. Moreover, hacking services are offered for sale on various darknet marketplaces. As such, law enforcement agencies all over the world are seeking innovative means for surveillance and investigation of crimes facilitated by publically accessible Tor hidden services.
A recently published research paper introduces a novel algorithm, named ToRank, that is capable of ranking Tor hidden services much more effectively than known algorithms commonly used for ranking websites on the surface web (Clearnet). ToRank is capable of identifying Tor hidden services associated with malicious and/or suspicious activities. Authors of the paper also presented a thorough analysis of the content hosted on the Tor network, creating a detailed dataset, DUTA-10, which includes much more data than what is included in the recent Darknet Usage Test Address (DUTA) dataset.
Evaluation of ToRank:
Authors of the paper presented a quantitative comparison to some of the existing ranking systems used on the surface web, including PageRank, Katz, and HITS. Results prove that ToRank induces a more prominent harm to the robustness of the Tor network than any of the aforementioned ranking systems, which reflects the superiority of ToRank. The approach of ToRank depends on passive analysis of recursive network traffic traces collected from multiple relay nodes. Practically speaking, a sensor is deployed in front of a Tor node to monitor network queries, as well as responses originating from users and Tor hidden services. Obtained data about fast-flux hidden services is selectively stored into a central data collector.
Since the magnitude of suspicious traffic on Tor can be overwhelming, ToRank relies on several prefiltering rules that are formulated to specifically identify network queries with potential fast-flux hidden services and prohibited content, while excluding requests to legitimate hidden services. ToRank’s prefiltering approach is considerably conservative. However, it is capable of reducing the volume of monitored Tor traffic to a reasonable amount without eliminating data about hidden services that are actually associated with suspicious or malicious network flux.
Once data is obtained about potential suspicious hidden services for a certain epoch (e.g. 24 hours), a more fine-grain form of analysis is performed. First, suspicious hidden services related to each other are grouped together. For instance, the system groups together hidden services that are associated with the same hosting service, are related to the same administrator aliases, or are associated with the same malicious or suspicious network flux. Once the analyzed hidden services have been grouped together, clusters of suspicious hidden services are created. This monitoring approach enables ToRank to obtain queries associated with flux hidden services that are being advertised via a myriad of means including, for instance, darknet forums, crypto markets, message boards, and surface websites centered on topics related to the Tor network. Moreover, apart from other ranking systems based on active probing, ToRank passively monitors Tor hidden services, as well as Tor users, without the system itself interacting with various hidden services.
Active probing of Tor hidden services associated with fast-flux can be identified by an adversary, who can control relay nodes that relay data associated with Tor hidden services that contribute to fast-flux network transmission. If the adversary can successfully identify that active probing is taking place and attempting to trace their malicious flux Tor hidden service, they may react by withholding from responding to network queries arising from the probing system to avoid revealing further information and eventually taking down their Tor hidden service. On the other hand, ToRank is capable of detection of malicious and suspicious hidden services in a stealthy manner.
Analysis of the DUTA-10K dataset has shown that only 20% of Tor’s hidden services that are publically accessible are associated with suspicious or malicious activities. Interestingly enough, ToRank has proven that 48% of all hidden services on the Tor network are associated with legal, non-harmful content, which is definitely opposite to the public belief that postulates that most of the content on the Tor network is either illegal, malicious, or otherwise suspicious in nature. ToRank has also revealed that most onion domains associated with suspicious or malicious activities usually represent multiple clones hosted under different onion addresses, which could be utilized as an additional flag to identify them.
TorRank is an open source software that has been made publically available by its developers so that anyone can benefit from it and even contribute to improving it further. ToRank represents open data that is published under the CC BY license. You can download ToRank via visiting the following link.
ToRank is a revolutionary ranking system for suspicious or malicious Tor hidden services that is much more effective than previously developed systems. ToRank has led to the addition of a considerable amount of data to the most recent Darknet Usage Test Address (DUTA) dataset. The most important thing is that ToRank has proven that 48% of all Tor hidden services are legal and non-harmful. ToRank, the improved Darknet Usage Test Address (DUTA) dataset, and the findings obtained via the analysis performed via ToRank are extremely helpful for law enforcement agencies trying to prevent various criminal activities taking place on the Tor network.