Detection Methods

Doctor Web anti-virus solutions use several malicious software detection methods simultaneously, which helps them thoroughly check suspicious files and control software behavior.

Signature analysis

The scans start with a signature analysis, which consists of comparing file code segments with known virus signatures. A signature is a finite continuous sequence of bytes which is necessary and sufficient to identify a specific virus. To reduce the size of the signature dictionary, Dr.Web anti-virus solutions use signature checksums instead of complete signature sequences. Checksums uniquely identify the signatures, ensuring correct virus detection and neutralization. Dr.Web virus databases are compiled in such a way that some entries can be used to detect not just specific viruses, but entire classes of threats.

Origins Tracing

After the signature analysis, Dr.Web anti-virus solutions use the unique Origins Tracing method to detect new and modified viruses with known infection mechanisms. Thus, Dr.Web users are protected against such threats as the notorious blackmailer Trojan.Encoder.18 (also known as gpcode). In addition to detection of new and modified viruses, Origins Tracing can considerably reduce false triggering of the heuristic analyzer. Names of objects detected by the Origins Tracing algorithm have .Origin added to them.

Execution emulation

Program code emulation is used for detection of polymorphic and encrypted viruses when a search by checksums cannot be performed directly, or is very difficult (due to inability to build a reliable signature). This method involves simulating the execution of an analyzed code by an emulator—a programming model of the processor and runtime environment. An emulator operates within a protected memory region (an emulator buffer), in which execution of the analyzed application is modelled instruction by instruction. However, none of these instructions is actually executed by the CPU. When the emulator receives a file infected with a polymorphic virus, the result of the emulation is a decrypted virus code, which is then easily determined by searching against signature checksums.

Heuristic analysis

The detection by using a heuristic analyzer is based on certain knowledge (heuristics) about certain features (attributes) that might be typical for a virus code, or, on the contrary, extremely rare in viruses. Each attribute has a weight, which determines the level of its severity and reliability. The weight can be positive if the corresponding attribute is indicative of a malicious code or negative if the attribute is uncharacteristic of a computer threat. Depending on the total weight of a file, the heuristic analyzer calculates the probability of unknown virus infection. If the threshold is exceeded, the heuristic analyzer generates the conclusion that the analyzed object is probably infected with an unknown virus.

The heuristic analyzer also uses the FLY-CODE technology, which is a versatile algorithm for extracting files. The technology allows making heuristic assumptions about the presence of malicious objects in files compressed not only by packagers Dr.Web is aware of, but also by new, previously unexplored applications. While checking packed objects, Dr.Web anti-virus solutions also use structural entropy analysis. The technology detects threats by arranging pieces of code; thus, one database entry allows identification of a substantial portion of threats packed with the same polymorphous packager.

Like any other system of hypothesis testing under uncertainty, the heuristics analyzer may commit type I or type II errors (omit viruses or raise false positives). Thus, objects detected by the heuristics analyzer are treated as “suspicious”.

Machine learning

Machine learning is used for detecting and neutralizing malicious objects missing from the virus databases. The advantage of the method is detection of a malicious code without executing it, judging only by its features.

Threat detection is based on the malicious object classification according to specific features. Support vector machines (SVM) underlie machine learning technologies that are used for classification and adding code fragments written in scripting languages to the databases. Detected objects are then analyzed on the basis of whether they have features of a malicious code. Machine learning technology makes the process of updating these features and virus databases automatic. Large amounts of data are processed faster thanks to the connection to the cloud service, and continuous training of the system provides preventive protection from the latest threats. At that, the technology can function even without a constant connection to the cloud.

The machine learning method significantly saves the resources of the operating system, since it does not require code execution to detect threats, and dynamic machine learning of the classifier can be carried out without constant updates of the virus databases that are used for signature analysis.

Cloud-based threat detection technologies

Cloud-based detection methods allow to scan any object (file, application, browser extension, etc.) by its hash value. Hash is a unique sequence of numbers and letters of a given length. When analyzed by a hash value, objects are scanned using the existing database and then classified into categories: clean, suspicious, malicious, etc.

This technology optimizes the time of file scanning and saves device resources. The decision on whether the object is malicious is made almost instantly, because it is not the object that is analyzed, but its unique hash value. If there is no connection to the Dr.Web servers, the files are scanned locally, and the cloud scan resumes when the connection is restored.

Thus, the Doctor Web cloud service collects information from numerous users and quickly updates data on previously unknown threats increasing the effectiveness of device protection.