136 lines
6.8 KiB
Plaintext
136 lines
6.8 KiB
Plaintext
---
|
|
-- Perl Compatible Regular Expressions.
|
|
--
|
|
-- One of Lua's quirks is its string patterns. While they have great performance
|
|
-- and are tightly integrated into the Lua interpreter, they are very different
|
|
-- in syntax and not as powerful as standard regular expressions. So we have
|
|
-- integrated Perl compatible regular expressions into Lua using PCRE and a
|
|
-- modified version of the Lua PCRE library written by Reuben Thomas and Shmuel
|
|
-- Zeigerman. These are the same sort of regular expressions used by Nmap
|
|
-- version detection. The main modification to their library is that the NSE
|
|
-- version only supports PCRE expressions instead of both PCRE and POSIX
|
|
-- patterns. In order to maintain a high script execution speed, the library
|
|
-- interfacing with PCRE is kept very thin. It is not integrated as seamlessly
|
|
-- as the Lua string pattern API. This allows script authors to decide when to
|
|
-- use PCRE expressions versus Lua patterns. The use of PCRE involves a
|
|
-- separate pattern compilation step, which saves execution time when patterns
|
|
-- are reused. Compiled patterns can be cached in the NSE registry and reused
|
|
-- by other scripts.
|
|
--
|
|
-- The documentation for this module is derived from that supplied by the PCRE
|
|
-- Lua lib.
|
|
--
|
|
-- Warning: PCRE has a history of security vulnerabilities allowing attackers
|
|
-- who are able to compile arbitrary regular expressions to execute arbitrary
|
|
-- code. More such vulnerabilities may be discovered in the future. These have
|
|
-- never affected Nmap because it doesn't give attackers any control over the
|
|
-- regular expressions it uses. Similarly, NSE scripts should never build
|
|
-- regular expressions with untrusted network input. Matching hardcoded regular
|
|
-- expressions against the untrusted input is fine.
|
|
-- @author Reuben Thomas
|
|
-- @author Shmuel Zeigerman
|
|
|
|
module "pcre"
|
|
|
|
--- Returns a compiled regular expression.
|
|
--
|
|
-- The resulting compiled regular expression is ready to be matched against
|
|
-- strings. Compiled regular expressions are subject to Lua's garbage
|
|
-- collection.
|
|
--
|
|
-- The compilation flags are set bitwise. If you want to set the 3rd
|
|
-- (corresponding to the number 4) and the 1st (corresponding to 1) bit for
|
|
-- example you would pass the number 5 as a second argument. The compilation
|
|
-- flags accepted are those of the PCRE C library. These include flags for case
|
|
-- insensitive matching (<code>1</code>), matching line beginnings
|
|
-- (<code>^</code>) and endings (<code>$</code>) even in multiline strings
|
|
-- (i.e. strings containing newlines) (<code>2</code>) and a flag for matching
|
|
-- across line boundaries (<code>4</code>). No compilation flags yield a
|
|
-- default value of <code>0</code>.
|
|
-- @param pattern a string describing the pattern, such as <code>"^foo$"</code>.
|
|
-- @param flags a number describing which compilation flags are set.
|
|
-- @param locale a string describing the locale which should be used to compile
|
|
-- the regular expression (optional). The value is a string which is passed to
|
|
-- the C standard library function <code>setlocale</code>. For more
|
|
-- information on this argument refer to the documentation of
|
|
-- <code>setlocale</code>.
|
|
-- @usage local regex = pcre.new("pcre-pattern",0,"C")
|
|
function new(pattern, flags, locale)
|
|
|
|
--- Returns a table of the available PCRE option flags (numbers) keyed by their
|
|
-- names (strings).
|
|
--
|
|
-- Possible names of the available strings can be retrieved from the
|
|
-- documentation of the PCRE library used to link against Nmap. The key is the
|
|
-- option name in the manual minus the <code>PCRE_</code> prefix.
|
|
-- <code>PCRE_CASELESS</code> becomes <code>CASELESS</code> for example.
|
|
function flags()
|
|
|
|
--- Returns the version of the PCRE library in use as a string.
|
|
--
|
|
-- For example <code>"6.4 05-Sep-2005"</code>.
|
|
function version()
|
|
|
|
--- Matches a string against a compiled regular expression.
|
|
--
|
|
-- Returns the start point and the end point of the first match of the compiled
|
|
-- regular expression in the string.
|
|
-- @param string the string to match against.
|
|
-- @param start where to start the match in the string (optional).
|
|
-- @param flags execution flags (optional).
|
|
-- @return <code>nil</code> if no match, otherwise the start point of the first
|
|
-- match.
|
|
-- @return the end point of the first match.
|
|
-- @return a table which contains false in the positions where the pattern did
|
|
-- not match. If named sub-patterns were used, the table also contains substring
|
|
-- matches keyed by their sub-pattern name.
|
|
-- @usage
|
|
-- i, j = regex:match("string to be searched", 0, 0)
|
|
-- if (i) then ... end
|
|
function match(string, start, flags)
|
|
|
|
--- Matches a string against a compiled regular expression, returning positions
|
|
-- of substring matches.
|
|
--
|
|
-- This function is like <code>match</code> except that a table returned as a
|
|
-- third result contains offsets of substring matches rather than substring
|
|
-- matches themselves. That table will not contain string keys, even if named
|
|
-- sub-patterns are used. For example, if the whole match is at offsets 10, 20
|
|
-- and substring matches are at offsets 12, 14 and 16, 19 then the function
|
|
-- returns <code>10, 20, {12,14,16,19}</code>.
|
|
-- @param string the string to match against.
|
|
-- @param start where to start the match in the string (optional).
|
|
-- @param flags execution flags (optional).
|
|
-- @return <code>nil</code> if no match, otherwise the start point of the match
|
|
-- of the whole string.
|
|
-- @return the end point of the match of the whole string.
|
|
-- @return a table containing a list of substring match start and end positions.
|
|
-- @usage
|
|
-- i, j, substrings = regex:exec("string to be searched", 0, 0)
|
|
-- if (i) then ... end
|
|
function exec(string, start, flags)
|
|
|
|
--- Matches a string against a regular expression multiple times.
|
|
--
|
|
-- Tries to match the regular expression pcre_obj against string up to
|
|
-- <code>n</code> times (or as many as possible if <code>n</code> is not given
|
|
-- or is not a positive number), subject to the execution flags
|
|
-- <code>ef</code>. Each time there is a match, <code>func</code> is called as
|
|
-- <code>func(m, t)</code>, where <code>m</code> is the matched string and
|
|
-- <code>t</code> is a table of substring matches. This table contains false in
|
|
-- the positions where the corresponding sub-pattern did not match. If named
|
|
-- sub-patterns are used then the table also contains substring matches keyed
|
|
-- by their correspondent sub-pattern names (strings). If <code>func</code>
|
|
-- returns a true value, then <code>gmatch</code> immediately returns;
|
|
-- <code>gmatch</code> returns the number of matches made.
|
|
-- @param string the string to match against.
|
|
-- @param func the function to call for each match.
|
|
-- @param n the maximum number of matches to do (optional).
|
|
-- @param ef execution flags (optional).
|
|
-- @return the number of matches made.
|
|
-- @usage
|
|
-- local t = {}
|
|
-- local function match(m) t[#t + 1] = m end
|
|
-- local n = regex:gmatch("string to be searched", match)
|
|
function pcre_obj:gmatch(string, func, n, ef)
|