J1. Options Used in PCRE Regular Expressions

Top  Previous  Next

Regular expressions are used in the configuration file and in Dr.Web Security Control Center when objects to be excluded from scanning in the Scanner settings are specified.

Regular expressions are written as follows:

qr{EXP}options

where EXP is the expression itself; options stands for the sequence of options (a string of letters), and qr{} is literal metacharacters. The whole construction looks as follows:

qr{pagefile\.sys}i—Windows NT OS swap file

Below goes the description of options and regular expressions. For more details visit http://www.pcre.org/pcre.txt.

Option 'a' is equivalent to PCRE_ANCHORED

If this option is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string"). The same result can also be achieved by appropriate constructs in the pattern itself.

Option 'i' is equivalent to PCRE_CASELESS

If this option is set, letters in the pattern match both upper and lower case letters. This option can be changed within a pattern by a (?i) option setting.

Option 'x' is equivalent to PCRE_EXTENDED

If this option is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespaces do not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and a newline character inclusively are ignored. This option can be changed in the pattern by setting a (?x) option. This option enables including comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespaces may not appear in special character sequences in a pattern, for example within the (?( sequence which introduces a conditional subpattern.

Option 'm' is equivalent to PCRE_MULTILINE

By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter "^" matches only in the beginning of a string, while the "end of line" metacharacter "$" matches only in the end of a string or before a terminating newline (unless PCRE_DOLLAR_ENDONLY is set).

When PCRE_MULTILINE is set, the "start of line" and "end of line" metacharacters match any newline characters which immediately follow or precede them in the subject string as well as in the very beginning and end of a subject string. This option can be changed within a pattern by a (?m) option setting. If there are no "\n" characters in the subject string, or ^ or $ are not present in the pattern, the PCRE_MULTILINE option has no effect.

Option 'u' is equivalent to PCRE_UNGREEDY

This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". The same result can also be achieved by the (?U) option in the pattern.

Option 'd' is equivalent to PCRE_DOTALL

If this option is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This option can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, regardless of the settings of this option.

Option 'e' is equivalent to PCRE_DOLLAR_ENDONLY

If this option is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newline characters). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.