Commit 722dc78d authored by Sergei Golubchik's avatar Sergei Golubchik

pcre-8.36

parents 32ec8625 553b437d
ChangeLog for PCRE
------------------
Version 8.36 26-September-2014
------------------------------
1. Got rid of some compiler warnings in the C++ modules that were shown up by
-Wmissing-field-initializers and -Wunused-parameter.
2. The tests for quantifiers being too big (greater than 65535) were being
applied after reading the number, and stupidly assuming that integer
overflow would give a negative number. The tests are now applied as the
numbers are read.
3. Tidy code in pcre_exec.c where two branches that used to be different are
now the same.
4. The JIT compiler did not generate match limit checks for certain
bracketed expressions with quantifiers. This may lead to exponential
backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This
issue should be resolved now.
5. Fixed an issue, which occures when nested alternatives are optimized
with table jumps.
6. Inserted two casts and changed some ints to size_t in the light of some
reported 64-bit compiler warnings (Bugzilla 1477).
7. Fixed a bug concerned with zero-minimum possessive groups that could match
an empty string, which sometimes were behaving incorrectly in the
interpreter (though correctly in the JIT matcher). This pcretest input is
an example:
'\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
NON QUOTED "QUOT""ED" AFTER "NOT MATCHED
the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test
for an empty string was breaking the inner loop and carrying on at a lower
level, when possessive repeated groups should always return to a higher
level as they have no backtrack points in them. The empty string test now
occurs at the outer level.
8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
^\w+(?>\s*)(?<=\w) which caused it not to match "test test".
9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
doesn't).
10. Change 8.34/15 introduced a bug that caused the amount of memory needed
to hold a pattern to be incorrectly computed (too small) when there were
named back references to duplicated names. This could cause "internal
error: code overflow" or "double free or corruption" or other memory
handling errors.
11. When named subpatterns had the same prefixes, back references could be
confused. For example, in this pattern:
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
the reference to 'Name' was incorrectly treated as a reference to a
duplicate name.
12. A pattern such as /^s?c/mi8 where the optional character has more than
one "other case" was incorrectly compiled such that it would only try to
match starting at "c".
13. When a pattern starting with \s was studied, VT was not included in the
list of possible starting characters; this should have been part of the
8.34/18 patch.
14. If a character class started [\Qx]... where x is any character, the class
was incorrectly terminated at the ].
15. If a pattern that started with a caseless match for a character with more
than one "other case" was studied, PCRE did not set up the starting code
unit bit map for the list of possible characters. Now it does. This is an
optimization improvement, not a bug fix.
16. The Unicode data tables have been updated to Unicode 7.0.0.
17. Fixed a number of memory leaks in pcregrep.
18. Avoid a compiler warning (from some compilers) for a function call with
a cast that removes "const" from an lvalue by using an intermediate
variable (to which the compiler does not object).
19. Incorrect code was compiled if a group that contained an internal recursive
back reference was optional (had quantifier with a minimum of zero). This
example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
caused segmentation faults because of stack overflows at compile time.
20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
group that is quantified with an indefinite repeat, caused a compile-time
loop which used up all the system stack and provoked a segmentation fault.
This was not the same bug as 19 above.
21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h.
Patch by Mike Frysinger.
Version 8.35 04-April-2014
--------------------------
......@@ -27,9 +125,9 @@ Version 8.35 04-April-2014
6. Improve character range checks in JIT. Characters are read by an inprecise
function now, which returns with an unknown value if the character code is
above a certain treshold (e.g: 256). The only limitation is that the value
must be bigger than the treshold as well. This function is useful, when
the characters above the treshold are handled in the same way.
above a certain threshold (e.g: 256). The only limitation is that the value
must be bigger than the threshold as well. This function is useful when
the characters above the threshold are handled in the same way.
7. The macros whose names start with RAWUCHAR are placeholders for a future
mode in which only the bottom 21 bits of 32-bit data items are used. To
......
News about PCRE releases
------------------------
Release 8.36 26-September-2014
------------------------------
This is primarily a bug-fix release. However, in addition, the Unicode data
tables have been updated to Unicode 7.0.0.
Release 8.35 04-April-2014
--------------------------
......
......@@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.
In addition, there is a set of C wrapper functions (again, just for the 8-bit
library) that are based on the POSIX regular expression API (see the pcreposix
man page). These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE; the regular expressions themselves
still follow Perl syntax and semantics. The POSIX API is restricted, and does
not give full access to all of PCRE's facilities.
The distribution also contains a set of C wrapper functions (again, just for
the 8-bit library) that are based on the POSIX regular expression API (see the
pcreposix man page). These end up in the library called libpcreposix. Note that
this just provides a POSIX calling interface to PCRE; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
......@@ -988,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 January 2014
Last updated: 24 October 2014
......@@ -9,19 +9,19 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [35])
m4_define(pcre_minor, [36])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2014-04-04])
m4_define(pcre_date, [2014-09-26])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:3:2])
m4_define(libpcre16_version, [2:3:2])
m4_define(libpcre32_version, [0:3:0])
m4_define(libpcreposix_version, [0:2:0])
m4_define(libpcrecpp_version, [0:0:0])
m4_define(libpcre_version, [3:4:2])
m4_define(libpcre16_version, [2:4:2])
m4_define(libpcre32_version, [0:4:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])
AC_PREREQ(2.57)
AC_INIT(PCRE, pcre_major.pcre_minor[]pcre_prerelease, , pcre)
......
......@@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.
In addition, there is a set of C wrapper functions (again, just for the 8-bit
library) that are based on the POSIX regular expression API (see the pcreposix
man page). These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE; the regular expressions themselves
still follow Perl syntax and semantics. The POSIX API is restricted, and does
not give full access to all of PCRE's facilities.
The distribution also contains a set of C wrapper functions (again, just for
the 8-bit library) that are based on the POSIX regular expression API (see the
pcreposix man page). These end up in the library called libpcreposix. Note that
this just provides a POSIX calling interface to PCRE; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
......@@ -988,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 January 2014
Last updated: 24 October 2014
......@@ -39,8 +39,10 @@ arguments are as follows:
<i>where</i> Points to where to put the data
</pre>
The <i>where</i> argument must point to an integer variable, except for
PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
point to an unsigned long integer. The available codes are:
PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
The available codes are:
<pre>
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
......
......@@ -57,6 +57,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
0 otherwise
PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
......@@ -72,6 +76,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
......@@ -79,14 +84,18 @@ The following information is available:
The <i>where</i> argument must point to an integer variable, except for the
following <i>what</i> values:
<pre>
PCRE_INFO_DEFAULT_TABLES const unsigned char *
PCRE_INFO_FIRSTTABLE const unsigned char *
PCRE_INFO_DEFAULT_TABLES const uint8_t *
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_FIRSTTABLE const uint8_t *
PCRE_INFO_JITSIZE size_t
PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_STUDYSIZE size_t
PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
</pre>
The yield of the function is zero on success or:
......@@ -95,6 +104,7 @@ The yield of the function is zero on success or:
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
PCRE_ERROR_UNSET the option was not set
</PRE>
</P>
<P>
......
......@@ -703,6 +703,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -712,6 +713,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -722,11 +724,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -746,40 +751,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -797,8 +818,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
</P>
<P>
......
......@@ -171,6 +171,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -180,6 +181,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -190,11 +192,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -214,40 +219,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -265,8 +286,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
</P>
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
......
......@@ -5326,21 +5326,25 @@ BACKSLASH
Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.
Each character has exactly one Unicode general category property, spec-
......@@ -7777,21 +7781,25 @@ PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P
SCRIPT NAMES FOR \p AND \P
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.
......
.TH PCRE_CONFIG 3 "05 November 2013" "PCRE 8.34"
.TH PCRE_CONFIG 3 "20 April 2014" "PCRE 8.36"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
......@@ -24,8 +24,10 @@ arguments are as follows:
\fIwhere\fP Points to where to put the data
.sp
The \fIwhere\fP argument must point to an integer variable, except for
PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
point to an unsigned long integer. The available codes are:
PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
The available codes are:
.sp
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
......
.TH PCRE_FULLINFO 3 "24 June 2012" "PCRE 8.30"
.TH PCRE_FULLINFO 3 "21 April 2014" "PCRE 8.36"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
......@@ -43,6 +43,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
0 otherwise
PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
......@@ -58,6 +62,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
......@@ -65,14 +70,18 @@ The following information is available:
The \fIwhere\fP argument must point to an integer variable, except for the
following \fIwhat\fP values:
.sp
PCRE_INFO_DEFAULT_TABLES const unsigned char *
PCRE_INFO_FIRSTTABLE const unsigned char *
PCRE_INFO_DEFAULT_TABLES const uint8_t *
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_FIRSTTABLE const uint8_t *
PCRE_INFO_JITSIZE size_t
PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_STUDYSIZE size_t
PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
.sp
The yield of the function is zero on success or:
......@@ -81,6 +90,7 @@ The yield of the function is zero on success or:
the argument \fIwhere\fP was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid
PCRE_ERROR_UNSET the option was not set
.P
There is a complete description of the PCRE native API in the
.\" HREF
......
......@@ -708,6 +708,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -717,6 +718,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -727,11 +729,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -751,40 +756,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -802,8 +823,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
.P
Each character has exactly one Unicode general category property, specified by
......
......@@ -139,6 +139,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -148,6 +149,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -158,11 +160,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -182,40 +187,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -233,8 +254,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
.
.
......
......@@ -47,8 +47,8 @@ supporting internal functions that are not used by other modules. */
#endif
#define NLBLOCK cd /* Block containing newline information */
#define PSSTART start_pattern /* Field containing processed string start */
#define PSEND end_pattern /* Field containing processed string end */
#define PSSTART start_pattern /* Field containing pattern start */
#define PSEND end_pattern /* Field containing pattern end */
#include "pcre_internal.h"
......@@ -549,6 +549,7 @@ static const char error_texts[] =
"group name must start with a non-digit\0"
/* 85 */
"parentheses are too deeply nested (stack check)\0"
"digits missing in \\x{} or \\o{}\0"
;
/* Table to identify digits and hex digits. This is used when compiling
......@@ -1259,6 +1260,7 @@ else
case CHAR_o:
if (ptr[1] != CHAR_LEFT_CURLY_BRACKET) *errorcodeptr = ERR81; else
if (ptr[2] == CHAR_RIGHT_CURLY_BRACKET) *errorcodeptr = ERR86; else
{
ptr += 2;
c = 0;
......@@ -1328,6 +1330,11 @@ else
if (ptr[1] == CHAR_LEFT_CURLY_BRACKET)
{
ptr += 2;
if (*ptr == CHAR_RIGHT_CURLY_BRACKET)
{
*errorcodeptr = ERR86;
break;
}
c = 0;
overflow = FALSE;
while (MAX_255(*ptr) && (digitab[*ptr] & ctype_xdigit) != 0)
......@@ -1583,30 +1590,30 @@ read_repeat_counts(const pcre_uchar *p, int *minp, int *maxp, int *errorcodeptr)
int min = 0;
int max = -1;
/* Read the minimum value and do a paranoid check: a negative value indicates
an integer overflow. */
while (IS_DIGIT(*p)) min = min * 10 + (int)(*p++ - CHAR_0);
if (min < 0 || min > 65535)
while (IS_DIGIT(*p))
{
min = min * 10 + (int)(*p++ - CHAR_0);
if (min > 65535)
{
*errorcodeptr = ERR5;
return p;
}
/* Read the maximum value if there is one, and again do a paranoid on its size.
Also, max must not be less than min. */
}
if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else
{
if (*(++p) != CHAR_RIGHT_CURLY_BRACKET)
{
max = 0;
while(IS_DIGIT(*p)) max = max * 10 + (int)(*p++ - CHAR_0);
if (max < 0 || max > 65535)
while(IS_DIGIT(*p))
{
max = max * 10 + (int)(*p++ - CHAR_0);
if (max > 65535)
{
*errorcodeptr = ERR5;
return p;
}
}
if (max < min)
{
*errorcodeptr = ERR4;
......@@ -1615,9 +1622,6 @@ if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else
}
}
/* Fill in the required variables, and pass back the pointer to the terminating
'}'. */
*minp = min;
*maxp = max;
return p;
......@@ -2370,6 +2374,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
if (c == OP_RECURSE)
{
const pcre_uchar *scode = cd->start_code + GET(code, 1);
const pcre_uchar *endgroup = scode;
BOOL empty_branch;
/* Test for forward reference or uncompleted reference. This is disabled
......@@ -2384,20 +2389,16 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
if (GET(scode, 1) == 0) return TRUE; /* Unclosed */
}
/* If we are scanning a completed pattern, there are no forward references
and all groups are complete. We need to detect whether this is a recursive
call, as otherwise there will be an infinite loop. If it is a recursion,
just skip over it. Simple recursions are easily detected. For mutual
recursions we keep a chain on the stack. */
/* If the reference is to a completed group, we need to detect whether this
is a recursive call, as otherwise there will be an infinite loop. If it is
a recursion, just skip over it. Simple recursions are easily detected. For
mutual recursions we keep a chain on the stack. */
do endgroup += GET(endgroup, 1); while (*endgroup == OP_ALT);
if (code >= scode && code <= endgroup) continue; /* Simple recursion */
else
{
recurse_check *r = recurses;
const pcre_uchar *endgroup = scode;
do endgroup += GET(endgroup, 1); while (*endgroup == OP_ALT);
if (code >= scode && code <= endgroup) continue; /* Simple recursion */
for (r = recurses; r != NULL; r = r->prev)
if (r->group == scode) break;
if (r != NULL) continue; /* Mutual recursion */
......@@ -3038,7 +3039,7 @@ switch(c)
end += 1 + 2 * IMM2_SIZE;
break;
}
list[2] = end - code;
list[2] = (pcre_uint32)(end - code);
return end;
}
return NULL; /* Opcode not accepted */
......@@ -3079,6 +3080,7 @@ const pcre_uint8 *class_bitset;
const pcre_uint8 *set1, *set2, *set_end;
pcre_uint32 chr;
BOOL accepted, invert_bits;
BOOL entered_a_group = FALSE;
/* Note: the base_list[1] contains whether the current opcode has greedy
(represented by a non-zero value) quantifier. This is a different from
......@@ -3132,8 +3134,10 @@ for(;;)
case OP_ONCE:
case OP_ONCE_NC:
/* Atomic sub-patterns and assertions can always auto-possessify their
last iterator. */
return TRUE;
last iterator. However, if the group was entered as a result of checking
a previous iterator, this is not possible. */
return !entered_a_group;
}
code += PRIV(OP_lengths)[c];
......@@ -3152,6 +3156,8 @@ for(;;)
code = next_code + 1 + LINK_SIZE;
next_code += GET(next_code, 1);
}
entered_a_group = TRUE;
continue;
case OP_BRAZERO:
......@@ -3171,6 +3177,9 @@ for(;;)
code += PRIV(OP_lengths)[c];
continue;
default:
break;
}
/* Check for a supported opcode, and load its properties. */
......@@ -3409,8 +3418,7 @@ for(;;)
rightop >= FIRST_AUTOTAB_OP && rightop <= LAST_AUTOTAB_RIGHT_OP &&
autoposstab[leftop - FIRST_AUTOTAB_OP][rightop - FIRST_AUTOTAB_OP];
if (!accepted)
return FALSE;
if (!accepted) return FALSE;
if (list[1] == 0) return TRUE;
/* Might be an empty repeat. */
......@@ -4683,7 +4691,8 @@ for (;; ptr++)
previous = NULL;
if ((options & PCRE_MULTILINE) != 0)
{
if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
if (firstcharflags == REQ_UNSET)
zerofirstcharflags = firstcharflags = REQ_NONE;
*code++ = OP_CIRCM;
}
else *code++ = OP_CIRC;
......@@ -4863,7 +4872,7 @@ for (;; ptr++)
if (lengthptr != NULL && class_uchardata > class_uchardata_base)
{
xclass = TRUE;
*lengthptr += class_uchardata - class_uchardata_base;
*lengthptr += (int)(class_uchardata - class_uchardata_base);
class_uchardata = class_uchardata_base;
}
#endif
......@@ -5313,7 +5322,7 @@ for (;; ptr++)
whatever repeat count may follow. In the case of reqchar, save the
previous value for reinstating. */
if (class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
{
ptr++;
zeroreqchar = reqchar;
......@@ -6008,8 +6017,8 @@ for (;; ptr++)
while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
{
int save_offset = save_hwm - cd->start_workspace;
int this_offset = this_hwm - cd->start_workspace;
size_t save_offset = save_hwm - cd->start_workspace;
size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
......@@ -6090,8 +6099,8 @@ for (;; ptr++)
while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
{
int save_offset = save_hwm - cd->start_workspace;
int this_offset = this_hwm - cd->start_workspace;
size_t save_offset = save_hwm - cd->start_workspace;
size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
......@@ -6689,7 +6698,8 @@ for (;; ptr++)
ptr++;
}
namelen = (int)(ptr - name);
if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
if (lengthptr != NULL && (options & PCRE_DUPNAMES) != 0)
*lengthptr += IMM2_SIZE;
}
/* Check the terminator */
......@@ -6750,9 +6760,11 @@ for (;; ptr++)
for (; i < cd->names_found; i++)
{
slot += cd->name_entry_size;
if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) != 0) break;
if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) != 0 ||
(slot+IMM2_SIZE)[namelen] != 0) break;
count++;
}
if (count > 1)
{
PUT2(code, 2+LINK_SIZE, offset);
......@@ -7101,6 +7113,12 @@ for (;; ptr++)
/* Count named back references. */
if (!is_recurse) cd->namedrefcount++;
/* If duplicate names are permitted, we have to allow for a named
reference to a duplicated name (this cannot be determined until the
second pass). This needs an extra 16-bit data item. */
if ((options & PCRE_DUPNAMES) != 0) *lengthptr += IMM2_SIZE;
}
/* In the real compile, search the name table. We check the name
......@@ -7147,6 +7165,8 @@ for (;; ptr++)
for (i++; i < cd->names_found; i++)
{
if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
count++;
cslot += cd->name_entry_size;
}
......@@ -8244,12 +8264,16 @@ for (;;)
/* If it was a capturing subpattern, check to see if it contained any
recursive back references. If so, we must wrap it in atomic brackets.
In any event, remove the block from the chain. */
Because we are moving code along, we must ensure that any pending recursive
references are updated. In any event, remove the block from the chain. */
if (capnumber > 0)
{
if (cd->open_caps->flag)
{
*code = OP_END;
adjust_recurse(start_bracket, 1 + LINK_SIZE,
(options & PCRE_UTF8) != 0, cd, cd->hwm);
memmove(start_bracket + 1 + LINK_SIZE, start_bracket,
IN_UCHARS(code - start_bracket));
*start_bracket = OP_ONCE;
......@@ -9254,11 +9278,18 @@ subpattern. */
if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;
/* Unless disabled, check whether single character iterators can be
auto-possessified. The function overwrites the appropriate opcode values. */
/* Unless disabled, check whether any single character iterators can be
auto-possessified. The function overwrites the appropriate opcode values, so
the type of the pointer must be cast. NOTE: the intermediate variable "temp" is
used in this code because at least one compiler gives a warning about loss of
"const" attribute if the cast (pcre_uchar *)codestart is used directly in the
function call. */
if ((options & PCRE_NO_AUTO_POSSESS) == 0)
auto_possessify((pcre_uchar *)codestart, utf, cd);
{
pcre_uchar *temp = (pcre_uchar *)codestart;
auto_possessify(temp, utf, cd);
}
/* If there were any lookbehind assertions that contained OP_RECURSE
(recursions or subroutine calls), a flag is set for them to be checked here,
......
......@@ -3242,7 +3242,7 @@ md->callout_data = NULL;
if (extra_data != NULL)
{
unsigned int flags = extra_data->flags;
unsigned long int flags = extra_data->flags;
if ((flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data;
if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) return PCRE_ERROR_DFA_UMLIMIT;
......
......@@ -1167,11 +1167,16 @@ for (;;)
if (rrc == MATCH_KETRPOS)
{
offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset;
save_capture_last = md->capture_last;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K changed it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue;
}
......@@ -1241,10 +1246,15 @@ for (;;)
if (rrc == MATCH_KETRPOS)
{
offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K reset it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue;
}
......@@ -1979,6 +1989,19 @@ for (;;)
}
}
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. This must precede the
empty string test - in this case that test is done at the outer level. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* For an ordinary non-repeating ket, just continue at this level. This
also happens for a repeating ket if no characters were matched in the
group. This is the forcible breaking of infinite loops as implemented in
......@@ -2001,18 +2024,6 @@ for (;;)
break;
}
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* The normal repeating kets try the rest of the pattern or restart from
the preceding bracket, in the appropriate order. In the second case, we can
use tail recursion to avoid using another stack frame, unless we have an
......@@ -5681,8 +5692,6 @@ for (;;)
switch(ctype)
{
case OP_ANY:
if (max < INT_MAX)
{
for (i = min; i < max; i++)
{
if (eptr >= md->end_subject)
......@@ -5703,33 +5712,6 @@ for (;;)
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
}
}
/* Handle unlimited UTF-8 repeat */
else
{
for (i = min; i < max; i++)
{
if (eptr >= md->end_subject)
{
SCHECK_PARTIAL();
break;
}
if (IS_NEWLINE(eptr)) break;
if (md->partial != 0 && /* Take care with CRLF partial */
eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 &&
UCHAR21(eptr) == NLBLOCK->nl[0])
{
md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
}
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
}
}
break;
case OP_ALLANY:
......@@ -6519,7 +6501,7 @@ tables = re->tables;
if (extra_data != NULL)
{
register unsigned int flags = extra_data->flags;
unsigned long int flags = extra_data->flags;
if ((flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data;
if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0)
......
......@@ -2281,7 +2281,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERRCOUNT };
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -149,6 +149,8 @@ static void TestBigComment() {
// small stack size
int main(int argc, char** argv) {
(void)argc;
(void)argv;
TestScanner();
TestBigComment();
......
......@@ -174,6 +174,7 @@ template<> struct __type_traits<pcrecpp::StringPiece> {
#endif
// allow StringPiece to be logged
std::ostream& operator<<(std::ostream& o, const pcrecpp::StringPiece& piece);
PCRECPP_EXP_DECL std::ostream& operator<<(std::ostream& o,
const pcrecpp::StringPiece& piece);
#endif /* _PCRE_STRINGPIECE_H */
......@@ -142,6 +142,8 @@ static void CheckComparisonOperators() {
}
int main(int argc, char** argv) {
(void)argc;
(void)argv;
CheckComparisonOperators();
CheckSTLComparator();
......
......@@ -863,7 +863,6 @@ do
case OP_NOTUPTOI:
case OP_NOT_HSPACE:
case OP_NOT_VSPACE:
case OP_PROP:
case OP_PRUNE:
case OP_PRUNE_ARG:
case OP_RECURSE:
......@@ -881,6 +880,31 @@ do
case OP_THEN_ARG:
return SSB_FAIL;
/* A "real" property test implies no starting bits, but the fake property
PT_CLIST identifies a list of characters. These lists are short, as they
are used for characters with more than one "other case", so there is no
point in recognizing them for OP_NOTPROP. */
case OP_PROP:
if (tcode[1] != PT_CLIST) return SSB_FAIL;
{
const pcre_uint32 *p = PRIV(ucd_caseless_sets) + tcode[2];
while ((c = *p++) < NOTACHAR)
{
#if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf)
{
pcre_uchar buff[6];
(void)PRIV(ord2utf)(c, buff);
c = buff[0];
}
#endif
if (c > 0xff) SET_BIT(0xff); else SET_BIT(c);
}
}
try_next = FALSE;
break;
/* We can ignore word boundary tests. */
case OP_WORD_BOUNDARY:
......@@ -1106,24 +1130,17 @@ do
try_next = FALSE;
break;
/* The cbit_space table has vertical tab as whitespace; we have to
ensure it is set as not whitespace. Luckily, the code value is the same
(0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate bit. */
/* The cbit_space table has vertical tab as whitespace; we no longer
have to play fancy tricks because Perl added VT to its whitespace at
release 5.18. PCRE added it at release 8.34. */
case OP_NOT_WHITESPACE:
set_nottype_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] |= 0x08;
try_next = FALSE;
break;
/* The cbit_space table has vertical tab as whitespace; we have to not
set it from the table. Luckily, the code value is the same (0x0b) in
ASCII and EBCDIC, so we can just adjust the appropriate bit. */
case OP_WHITESPACE:
c = start_bits[1]; /* Save in case it was already set */
set_type_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] = (start_bits[1] & ~0x08) | c;
try_next = FALSE;
break;
......
......@@ -213,6 +213,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Avestan0 STR_A STR_v STR_e STR_s STR_t STR_a STR_n "\0"
#define STRING_Balinese0 STR_B STR_a STR_l STR_i STR_n STR_e STR_s STR_e "\0"
#define STRING_Bamum0 STR_B STR_a STR_m STR_u STR_m "\0"
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
......@@ -223,6 +224,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_C0 STR_C "\0"
#define STRING_Canadian_Aboriginal0 STR_C STR_a STR_n STR_a STR_d STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_b STR_o STR_r STR_i STR_g STR_i STR_n STR_a STR_l "\0"
#define STRING_Carian0 STR_C STR_a STR_r STR_i STR_a STR_n "\0"
#define STRING_Caucasian_Albanian0 STR_C STR_a STR_u STR_c STR_a STR_s STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_l STR_b STR_a STR_n STR_i STR_a STR_n "\0"
#define STRING_Cc0 STR_C STR_c "\0"
#define STRING_Cf0 STR_C STR_f "\0"
#define STRING_Chakma0 STR_C STR_h STR_a STR_k STR_m STR_a "\0"
......@@ -238,11 +240,14 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Cyrillic0 STR_C STR_y STR_r STR_i STR_l STR_l STR_i STR_c "\0"
#define STRING_Deseret0 STR_D STR_e STR_s STR_e STR_r STR_e STR_t "\0"
#define STRING_Devanagari0 STR_D STR_e STR_v STR_a STR_n STR_a STR_g STR_a STR_r STR_i "\0"
#define STRING_Duployan0 STR_D STR_u STR_p STR_l STR_o STR_y STR_a STR_n "\0"
#define STRING_Egyptian_Hieroglyphs0 STR_E STR_g STR_y STR_p STR_t STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Elbasan0 STR_E STR_l STR_b STR_a STR_s STR_a STR_n "\0"
#define STRING_Ethiopic0 STR_E STR_t STR_h STR_i STR_o STR_p STR_i STR_c "\0"
#define STRING_Georgian0 STR_G STR_e STR_o STR_r STR_g STR_i STR_a STR_n "\0"
#define STRING_Glagolitic0 STR_G STR_l STR_a STR_g STR_o STR_l STR_i STR_t STR_i STR_c "\0"
#define STRING_Gothic0 STR_G STR_o STR_t STR_h STR_i STR_c "\0"
#define STRING_Grantha0 STR_G STR_r STR_a STR_n STR_t STR_h STR_a "\0"
#define STRING_Greek0 STR_G STR_r STR_e STR_e STR_k "\0"
#define STRING_Gujarati0 STR_G STR_u STR_j STR_a STR_r STR_a STR_t STR_i "\0"
#define STRING_Gurmukhi0 STR_G STR_u STR_r STR_m STR_u STR_k STR_h STR_i "\0"
......@@ -262,12 +267,15 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Kayah_Li0 STR_K STR_a STR_y STR_a STR_h STR_UNDERSCORE STR_L STR_i "\0"
#define STRING_Kharoshthi0 STR_K STR_h STR_a STR_r STR_o STR_s STR_h STR_t STR_h STR_i "\0"
#define STRING_Khmer0 STR_K STR_h STR_m STR_e STR_r "\0"
#define STRING_Khojki0 STR_K STR_h STR_o STR_j STR_k STR_i "\0"
#define STRING_Khudawadi0 STR_K STR_h STR_u STR_d STR_a STR_w STR_a STR_d STR_i "\0"
#define STRING_L0 STR_L "\0"
#define STRING_L_AMPERSAND0 STR_L STR_AMPERSAND "\0"
#define STRING_Lao0 STR_L STR_a STR_o "\0"
#define STRING_Latin0 STR_L STR_a STR_t STR_i STR_n "\0"
#define STRING_Lepcha0 STR_L STR_e STR_p STR_c STR_h STR_a "\0"
#define STRING_Limbu0 STR_L STR_i STR_m STR_b STR_u "\0"
#define STRING_Linear_A0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_A "\0"
#define STRING_Linear_B0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_B "\0"
#define STRING_Lisu0 STR_L STR_i STR_s STR_u "\0"
#define STRING_Ll0 STR_L STR_l "\0"
......@@ -278,18 +286,24 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Lycian0 STR_L STR_y STR_c STR_i STR_a STR_n "\0"
#define STRING_Lydian0 STR_L STR_y STR_d STR_i STR_a STR_n "\0"
#define STRING_M0 STR_M "\0"
#define STRING_Mahajani0 STR_M STR_a STR_h STR_a STR_j STR_a STR_n STR_i "\0"
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
#define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
#define STRING_Mende_Kikakui0 STR_M STR_e STR_n STR_d STR_e STR_UNDERSCORE STR_K STR_i STR_k STR_a STR_k STR_u STR_i "\0"
#define STRING_Meroitic_Cursive0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_C STR_u STR_r STR_s STR_i STR_v STR_e "\0"
#define STRING_Meroitic_Hieroglyphs0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Miao0 STR_M STR_i STR_a STR_o "\0"
#define STRING_Mn0 STR_M STR_n "\0"
#define STRING_Modi0 STR_M STR_o STR_d STR_i "\0"
#define STRING_Mongolian0 STR_M STR_o STR_n STR_g STR_o STR_l STR_i STR_a STR_n "\0"
#define STRING_Mro0 STR_M STR_r STR_o "\0"
#define STRING_Myanmar0 STR_M STR_y STR_a STR_n STR_m STR_a STR_r "\0"
#define STRING_N0 STR_N "\0"
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
#define STRING_Nd0 STR_N STR_d "\0"
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
#define STRING_Nko0 STR_N STR_k STR_o "\0"
......@@ -298,12 +312,17 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
#define STRING_Old_Italic0 STR_O STR_l STR_d STR_UNDERSCORE STR_I STR_t STR_a STR_l STR_i STR_c "\0"
#define STRING_Old_North_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_N STR_o STR_r STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Permic0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_m STR_i STR_c "\0"
#define STRING_Old_Persian0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_s STR_i STR_a STR_n "\0"
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
#define STRING_P0 STR_P "\0"
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
#define STRING_Palmyrene0 STR_P STR_a STR_l STR_m STR_y STR_r STR_e STR_n STR_e "\0"
#define STRING_Pau_Cin_Hau0 STR_P STR_a STR_u STR_UNDERSCORE STR_C STR_i STR_n STR_UNDERSCORE STR_H STR_a STR_u "\0"
#define STRING_Pc0 STR_P STR_c "\0"
#define STRING_Pd0 STR_P STR_d "\0"
#define STRING_Pe0 STR_P STR_e "\0"
......@@ -313,6 +332,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Pi0 STR_P STR_i "\0"
#define STRING_Po0 STR_P STR_o "\0"
#define STRING_Ps0 STR_P STR_s "\0"
#define STRING_Psalter_Pahlavi0 STR_P STR_s STR_a STR_l STR_t STR_e STR_r STR_UNDERSCORE STR_P STR_a STR_h STR_l STR_a STR_v STR_i "\0"
#define STRING_Rejang0 STR_R STR_e STR_j STR_a STR_n STR_g "\0"
#define STRING_Runic0 STR_R STR_u STR_n STR_i STR_c "\0"
#define STRING_S0 STR_S "\0"
......@@ -321,6 +341,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Sc0 STR_S STR_c "\0"
#define STRING_Sharada0 STR_S STR_h STR_a STR_r STR_a STR_d STR_a "\0"
#define STRING_Shavian0 STR_S STR_h STR_a STR_v STR_i STR_a STR_n "\0"
#define STRING_Siddham0 STR_S STR_i STR_d STR_d STR_h STR_a STR_m "\0"
#define STRING_Sinhala0 STR_S STR_i STR_n STR_h STR_a STR_l STR_a "\0"
#define STRING_Sk0 STR_S STR_k "\0"
#define STRING_Sm0 STR_S STR_m "\0"
......@@ -341,8 +362,10 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
#define STRING_Tibetan0 STR_T STR_i STR_b STR_e STR_t STR_a STR_n "\0"
#define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
#define STRING_Tirhuta0 STR_T STR_i STR_r STR_h STR_u STR_t STR_a "\0"
#define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
#define STRING_Vai0 STR_V STR_a STR_i "\0"
#define STRING_Warang_Citi0 STR_W STR_a STR_r STR_a STR_n STR_g STR_UNDERSCORE STR_C STR_i STR_t STR_i "\0"
#define STRING_Xan0 STR_X STR_a STR_n "\0"
#define STRING_Xps0 STR_X STR_p STR_s "\0"
#define STRING_Xsp0 STR_X STR_s STR_p "\0"
......@@ -361,6 +384,7 @@ const char PRIV(utt_names)[] =
STRING_Avestan0
STRING_Balinese0
STRING_Bamum0
STRING_Bassa_Vah0
STRING_Batak0
STRING_Bengali0
STRING_Bopomofo0
......@@ -371,6 +395,7 @@ const char PRIV(utt_names)[] =
STRING_C0
STRING_Canadian_Aboriginal0
STRING_Carian0
STRING_Caucasian_Albanian0
STRING_Cc0
STRING_Cf0
STRING_Chakma0
......@@ -386,11 +411,14 @@ const char PRIV(utt_names)[] =
STRING_Cyrillic0
STRING_Deseret0
STRING_Devanagari0
STRING_Duployan0
STRING_Egyptian_Hieroglyphs0
STRING_Elbasan0
STRING_Ethiopic0
STRING_Georgian0
STRING_Glagolitic0
STRING_Gothic0
STRING_Grantha0
STRING_Greek0
STRING_Gujarati0
STRING_Gurmukhi0
......@@ -410,12 +438,15 @@ const char PRIV(utt_names)[] =
STRING_Kayah_Li0
STRING_Kharoshthi0
STRING_Khmer0
STRING_Khojki0
STRING_Khudawadi0
STRING_L0
STRING_L_AMPERSAND0
STRING_Lao0
STRING_Latin0
STRING_Lepcha0
STRING_Limbu0
STRING_Linear_A0
STRING_Linear_B0
STRING_Lisu0
STRING_Ll0
......@@ -426,18 +457,24 @@ const char PRIV(utt_names)[] =
STRING_Lycian0
STRING_Lydian0
STRING_M0
STRING_Mahajani0
STRING_Malayalam0
STRING_Mandaic0
STRING_Manichaean0
STRING_Mc0
STRING_Me0
STRING_Meetei_Mayek0
STRING_Mende_Kikakui0
STRING_Meroitic_Cursive0
STRING_Meroitic_Hieroglyphs0
STRING_Miao0
STRING_Mn0
STRING_Modi0
STRING_Mongolian0
STRING_Mro0
STRING_Myanmar0
STRING_N0
STRING_Nabataean0
STRING_Nd0
STRING_New_Tai_Lue0
STRING_Nko0
......@@ -446,12 +483,17 @@ const char PRIV(utt_names)[] =
STRING_Ogham0
STRING_Ol_Chiki0
STRING_Old_Italic0
STRING_Old_North_Arabian0
STRING_Old_Permic0
STRING_Old_Persian0
STRING_Old_South_Arabian0
STRING_Old_Turkic0
STRING_Oriya0
STRING_Osmanya0
STRING_P0
STRING_Pahawh_Hmong0
STRING_Palmyrene0
STRING_Pau_Cin_Hau0
STRING_Pc0
STRING_Pd0
STRING_Pe0
......@@ -461,6 +503,7 @@ const char PRIV(utt_names)[] =
STRING_Pi0
STRING_Po0
STRING_Ps0
STRING_Psalter_Pahlavi0
STRING_Rejang0
STRING_Runic0
STRING_S0
......@@ -469,6 +512,7 @@ const char PRIV(utt_names)[] =
STRING_Sc0
STRING_Sharada0
STRING_Shavian0
STRING_Siddham0
STRING_Sinhala0
STRING_Sk0
STRING_Sm0
......@@ -489,8 +533,10 @@ const char PRIV(utt_names)[] =
STRING_Thai0
STRING_Tibetan0
STRING_Tifinagh0
STRING_Tirhuta0
STRING_Ugaritic0
STRING_Vai0
STRING_Warang_Citi0
STRING_Xan0
STRING_Xps0
STRING_Xsp0
......@@ -509,146 +555,169 @@ const ucp_type_table PRIV(utt)[] = {
{ 20, PT_SC, ucp_Avestan },
{ 28, PT_SC, ucp_Balinese },
{ 37, PT_SC, ucp_Bamum },
{ 43, PT_SC, ucp_Batak },
{ 49, PT_SC, ucp_Bengali },
{ 57, PT_SC, ucp_Bopomofo },
{ 66, PT_SC, ucp_Brahmi },
{ 73, PT_SC, ucp_Braille },
{ 81, PT_SC, ucp_Buginese },
{ 90, PT_SC, ucp_Buhid },
{ 96, PT_GC, ucp_C },
{ 98, PT_SC, ucp_Canadian_Aboriginal },
{ 118, PT_SC, ucp_Carian },
{ 125, PT_PC, ucp_Cc },
{ 128, PT_PC, ucp_Cf },
{ 131, PT_SC, ucp_Chakma },
{ 138, PT_SC, ucp_Cham },
{ 143, PT_SC, ucp_Cherokee },
{ 152, PT_PC, ucp_Cn },
{ 155, PT_PC, ucp_Co },
{ 158, PT_SC, ucp_Common },
{ 165, PT_SC, ucp_Coptic },
{ 172, PT_PC, ucp_Cs },
{ 175, PT_SC, ucp_Cuneiform },
{ 185, PT_SC, ucp_Cypriot },
{ 193, PT_SC, ucp_Cyrillic },
{ 202, PT_SC, ucp_Deseret },
{ 210, PT_SC, ucp_Devanagari },
{ 221, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 242, PT_SC, ucp_Ethiopic },
{ 251, PT_SC, ucp_Georgian },
{ 260, PT_SC, ucp_Glagolitic },
{ 271, PT_SC, ucp_Gothic },
{ 278, PT_SC, ucp_Greek },
{ 284, PT_SC, ucp_Gujarati },
{ 293, PT_SC, ucp_Gurmukhi },
{ 302, PT_SC, ucp_Han },
{ 306, PT_SC, ucp_Hangul },
{ 313, PT_SC, ucp_Hanunoo },
{ 321, PT_SC, ucp_Hebrew },
{ 328, PT_SC, ucp_Hiragana },
{ 337, PT_SC, ucp_Imperial_Aramaic },
{ 354, PT_SC, ucp_Inherited },
{ 364, PT_SC, ucp_Inscriptional_Pahlavi },
{ 386, PT_SC, ucp_Inscriptional_Parthian },
{ 409, PT_SC, ucp_Javanese },
{ 418, PT_SC, ucp_Kaithi },
{ 425, PT_SC, ucp_Kannada },
{ 433, PT_SC, ucp_Katakana },
{ 442, PT_SC, ucp_Kayah_Li },
{ 451, PT_SC, ucp_Kharoshthi },
{ 462, PT_SC, ucp_Khmer },
{ 468, PT_GC, ucp_L },
{ 470, PT_LAMP, 0 },
{ 473, PT_SC, ucp_Lao },
{ 477, PT_SC, ucp_Latin },
{ 483, PT_SC, ucp_Lepcha },
{ 490, PT_SC, ucp_Limbu },
{ 496, PT_SC, ucp_Linear_B },
{ 505, PT_SC, ucp_Lisu },
{ 510, PT_PC, ucp_Ll },
{ 513, PT_PC, ucp_Lm },
{ 516, PT_PC, ucp_Lo },
{ 519, PT_PC, ucp_Lt },
{ 522, PT_PC, ucp_Lu },
{ 525, PT_SC, ucp_Lycian },
{ 532, PT_SC, ucp_Lydian },
{ 539, PT_GC, ucp_M },
{ 541, PT_SC, ucp_Malayalam },
{ 551, PT_SC, ucp_Mandaic },
{ 559, PT_PC, ucp_Mc },
{ 562, PT_PC, ucp_Me },
{ 565, PT_SC, ucp_Meetei_Mayek },
{ 578, PT_SC, ucp_Meroitic_Cursive },
{ 595, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 616, PT_SC, ucp_Miao },
{ 621, PT_PC, ucp_Mn },
{ 624, PT_SC, ucp_Mongolian },
{ 634, PT_SC, ucp_Myanmar },
{ 642, PT_GC, ucp_N },
{ 644, PT_PC, ucp_Nd },
{ 647, PT_SC, ucp_New_Tai_Lue },
{ 659, PT_SC, ucp_Nko },
{ 663, PT_PC, ucp_Nl },
{ 666, PT_PC, ucp_No },
{ 669, PT_SC, ucp_Ogham },
{ 675, PT_SC, ucp_Ol_Chiki },
{ 684, PT_SC, ucp_Old_Italic },
{ 695, PT_SC, ucp_Old_Persian },
{ 707, PT_SC, ucp_Old_South_Arabian },
{ 725, PT_SC, ucp_Old_Turkic },
{ 736, PT_SC, ucp_Oriya },
{ 742, PT_SC, ucp_Osmanya },
{ 750, PT_GC, ucp_P },
{ 752, PT_PC, ucp_Pc },
{ 755, PT_PC, ucp_Pd },
{ 758, PT_PC, ucp_Pe },
{ 761, PT_PC, ucp_Pf },
{ 764, PT_SC, ucp_Phags_Pa },
{ 773, PT_SC, ucp_Phoenician },
{ 784, PT_PC, ucp_Pi },
{ 787, PT_PC, ucp_Po },
{ 790, PT_PC, ucp_Ps },
{ 793, PT_SC, ucp_Rejang },
{ 800, PT_SC, ucp_Runic },
{ 806, PT_GC, ucp_S },
{ 808, PT_SC, ucp_Samaritan },
{ 818, PT_SC, ucp_Saurashtra },
{ 829, PT_PC, ucp_Sc },
{ 832, PT_SC, ucp_Sharada },
{ 840, PT_SC, ucp_Shavian },
{ 848, PT_SC, ucp_Sinhala },
{ 856, PT_PC, ucp_Sk },
{ 859, PT_PC, ucp_Sm },
{ 862, PT_PC, ucp_So },
{ 865, PT_SC, ucp_Sora_Sompeng },
{ 878, PT_SC, ucp_Sundanese },
{ 888, PT_SC, ucp_Syloti_Nagri },
{ 901, PT_SC, ucp_Syriac },
{ 908, PT_SC, ucp_Tagalog },
{ 916, PT_SC, ucp_Tagbanwa },
{ 925, PT_SC, ucp_Tai_Le },
{ 932, PT_SC, ucp_Tai_Tham },
{ 941, PT_SC, ucp_Tai_Viet },
{ 950, PT_SC, ucp_Takri },
{ 956, PT_SC, ucp_Tamil },
{ 962, PT_SC, ucp_Telugu },
{ 969, PT_SC, ucp_Thaana },
{ 976, PT_SC, ucp_Thai },
{ 981, PT_SC, ucp_Tibetan },
{ 989, PT_SC, ucp_Tifinagh },
{ 998, PT_SC, ucp_Ugaritic },
{ 1007, PT_SC, ucp_Vai },
{ 1011, PT_ALNUM, 0 },
{ 1015, PT_PXSPACE, 0 },
{ 1019, PT_SPACE, 0 },
{ 1023, PT_UCNC, 0 },
{ 1027, PT_WORD, 0 },
{ 1031, PT_SC, ucp_Yi },
{ 1034, PT_GC, ucp_Z },
{ 1036, PT_PC, ucp_Zl },
{ 1039, PT_PC, ucp_Zp },
{ 1042, PT_PC, ucp_Zs }
{ 43, PT_SC, ucp_Bassa_Vah },
{ 53, PT_SC, ucp_Batak },
{ 59, PT_SC, ucp_Bengali },
{ 67, PT_SC, ucp_Bopomofo },
{ 76, PT_SC, ucp_Brahmi },
{ 83, PT_SC, ucp_Braille },
{ 91, PT_SC, ucp_Buginese },
{ 100, PT_SC, ucp_Buhid },
{ 106, PT_GC, ucp_C },
{ 108, PT_SC, ucp_Canadian_Aboriginal },
{ 128, PT_SC, ucp_Carian },
{ 135, PT_SC, ucp_Caucasian_Albanian },
{ 154, PT_PC, ucp_Cc },
{ 157, PT_PC, ucp_Cf },
{ 160, PT_SC, ucp_Chakma },
{ 167, PT_SC, ucp_Cham },
{ 172, PT_SC, ucp_Cherokee },
{ 181, PT_PC, ucp_Cn },
{ 184, PT_PC, ucp_Co },
{ 187, PT_SC, ucp_Common },
{ 194, PT_SC, ucp_Coptic },
{ 201, PT_PC, ucp_Cs },
{ 204, PT_SC, ucp_Cuneiform },
{ 214, PT_SC, ucp_Cypriot },
{ 222, PT_SC, ucp_Cyrillic },
{ 231, PT_SC, ucp_Deseret },
{ 239, PT_SC, ucp_Devanagari },
{ 250, PT_SC, ucp_Duployan },
{ 259, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 280, PT_SC, ucp_Elbasan },
{ 288, PT_SC, ucp_Ethiopic },
{ 297, PT_SC, ucp_Georgian },
{ 306, PT_SC, ucp_Glagolitic },
{ 317, PT_SC, ucp_Gothic },
{ 324, PT_SC, ucp_Grantha },
{ 332, PT_SC, ucp_Greek },
{ 338, PT_SC, ucp_Gujarati },
{ 347, PT_SC, ucp_Gurmukhi },
{ 356, PT_SC, ucp_Han },
{ 360, PT_SC, ucp_Hangul },
{ 367, PT_SC, ucp_Hanunoo },
{ 375, PT_SC, ucp_Hebrew },
{ 382, PT_SC, ucp_Hiragana },
{ 391, PT_SC, ucp_Imperial_Aramaic },
{ 408, PT_SC, ucp_Inherited },
{ 418, PT_SC, ucp_Inscriptional_Pahlavi },
{ 440, PT_SC, ucp_Inscriptional_Parthian },
{ 463, PT_SC, ucp_Javanese },
{ 472, PT_SC, ucp_Kaithi },
{ 479, PT_SC, ucp_Kannada },
{ 487, PT_SC, ucp_Katakana },
{ 496, PT_SC, ucp_Kayah_Li },
{ 505, PT_SC, ucp_Kharoshthi },
{ 516, PT_SC, ucp_Khmer },
{ 522, PT_SC, ucp_Khojki },
{ 529, PT_SC, ucp_Khudawadi },
{ 539, PT_GC, ucp_L },
{ 541, PT_LAMP, 0 },
{ 544, PT_SC, ucp_Lao },
{ 548, PT_SC, ucp_Latin },
{ 554, PT_SC, ucp_Lepcha },
{ 561, PT_SC, ucp_Limbu },
{ 567, PT_SC, ucp_Linear_A },
{ 576, PT_SC, ucp_Linear_B },
{ 585, PT_SC, ucp_Lisu },
{ 590, PT_PC, ucp_Ll },
{ 593, PT_PC, ucp_Lm },
{ 596, PT_PC, ucp_Lo },
{ 599, PT_PC, ucp_Lt },
{ 602, PT_PC, ucp_Lu },
{ 605, PT_SC, ucp_Lycian },
{ 612, PT_SC, ucp_Lydian },
{ 619, PT_GC, ucp_M },
{ 621, PT_SC, ucp_Mahajani },
{ 630, PT_SC, ucp_Malayalam },
{ 640, PT_SC, ucp_Mandaic },
{ 648, PT_SC, ucp_Manichaean },
{ 659, PT_PC, ucp_Mc },
{ 662, PT_PC, ucp_Me },
{ 665, PT_SC, ucp_Meetei_Mayek },
{ 678, PT_SC, ucp_Mende_Kikakui },
{ 692, PT_SC, ucp_Meroitic_Cursive },
{ 709, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 730, PT_SC, ucp_Miao },
{ 735, PT_PC, ucp_Mn },
{ 738, PT_SC, ucp_Modi },
{ 743, PT_SC, ucp_Mongolian },
{ 753, PT_SC, ucp_Mro },
{ 757, PT_SC, ucp_Myanmar },
{ 765, PT_GC, ucp_N },
{ 767, PT_SC, ucp_Nabataean },
{ 777, PT_PC, ucp_Nd },
{ 780, PT_SC, ucp_New_Tai_Lue },
{ 792, PT_SC, ucp_Nko },
{ 796, PT_PC, ucp_Nl },
{ 799, PT_PC, ucp_No },
{ 802, PT_SC, ucp_Ogham },
{ 808, PT_SC, ucp_Ol_Chiki },
{ 817, PT_SC, ucp_Old_Italic },
{ 828, PT_SC, ucp_Old_North_Arabian },
{ 846, PT_SC, ucp_Old_Permic },
{ 857, PT_SC, ucp_Old_Persian },
{ 869, PT_SC, ucp_Old_South_Arabian },
{ 887, PT_SC, ucp_Old_Turkic },
{ 898, PT_SC, ucp_Oriya },
{ 904, PT_SC, ucp_Osmanya },
{ 912, PT_GC, ucp_P },
{ 914, PT_SC, ucp_Pahawh_Hmong },
{ 927, PT_SC, ucp_Palmyrene },
{ 937, PT_SC, ucp_Pau_Cin_Hau },
{ 949, PT_PC, ucp_Pc },
{ 952, PT_PC, ucp_Pd },
{ 955, PT_PC, ucp_Pe },
{ 958, PT_PC, ucp_Pf },
{ 961, PT_SC, ucp_Phags_Pa },
{ 970, PT_SC, ucp_Phoenician },
{ 981, PT_PC, ucp_Pi },
{ 984, PT_PC, ucp_Po },
{ 987, PT_PC, ucp_Ps },
{ 990, PT_SC, ucp_Psalter_Pahlavi },
{ 1006, PT_SC, ucp_Rejang },
{ 1013, PT_SC, ucp_Runic },
{ 1019, PT_GC, ucp_S },
{ 1021, PT_SC, ucp_Samaritan },
{ 1031, PT_SC, ucp_Saurashtra },
{ 1042, PT_PC, ucp_Sc },
{ 1045, PT_SC, ucp_Sharada },
{ 1053, PT_SC, ucp_Shavian },
{ 1061, PT_SC, ucp_Siddham },
{ 1069, PT_SC, ucp_Sinhala },
{ 1077, PT_PC, ucp_Sk },
{ 1080, PT_PC, ucp_Sm },
{ 1083, PT_PC, ucp_So },
{ 1086, PT_SC, ucp_Sora_Sompeng },
{ 1099, PT_SC, ucp_Sundanese },
{ 1109, PT_SC, ucp_Syloti_Nagri },
{ 1122, PT_SC, ucp_Syriac },
{ 1129, PT_SC, ucp_Tagalog },
{ 1137, PT_SC, ucp_Tagbanwa },
{ 1146, PT_SC, ucp_Tai_Le },
{ 1153, PT_SC, ucp_Tai_Tham },
{ 1162, PT_SC, ucp_Tai_Viet },
{ 1171, PT_SC, ucp_Takri },
{ 1177, PT_SC, ucp_Tamil },
{ 1183, PT_SC, ucp_Telugu },
{ 1190, PT_SC, ucp_Thaana },
{ 1197, PT_SC, ucp_Thai },
{ 1202, PT_SC, ucp_Tibetan },
{ 1210, PT_SC, ucp_Tifinagh },
{ 1219, PT_SC, ucp_Tirhuta },
{ 1227, PT_SC, ucp_Ugaritic },
{ 1236, PT_SC, ucp_Vai },
{ 1240, PT_SC, ucp_Warang_Citi },
{ 1252, PT_ALNUM, 0 },
{ 1256, PT_PXSPACE, 0 },
{ 1260, PT_SPACE, 0 },
{ 1264, PT_UCNC, 0 },
{ 1268, PT_WORD, 0 },
{ 1272, PT_SC, ucp_Yi },
{ 1275, PT_GC, ucp_Z },
{ 1277, PT_PC, ucp_Zl },
{ 1280, PT_PC, ucp_Zp },
{ 1283, PT_PC, ucp_Zs }
};
const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -511,7 +511,7 @@ int RE::TryMatch(const StringPiece& text,
return 0;
}
pcre_extra extra = { 0, 0, 0, 0, 0, 0 };
pcre_extra extra = { 0, 0, 0, 0, 0, 0, 0, 0 };
if (options_.match_limit() > 0) {
extra.flags |= PCRE_EXTRA_MATCH_LIMIT;
extra.match_limit = options_.match_limit();
......@@ -660,6 +660,8 @@ int RE::NumberOfCapturingGroups() const {
/***** Parsers for various types *****/
bool Arg::parse_null(const char* str, int n, void* dest) {
(void)str;
(void)n;
// We fail if somebody asked us to store into a non-NULL void* pointer
return (dest == NULL);
}
......
......@@ -455,7 +455,7 @@ exit(rc);
s pattern string to add
after if not NULL points to item to insert after
Returns: new pattern block
Returns: new pattern block or NULL on error
*/
static patstr *
......@@ -471,6 +471,7 @@ if (strlen(s) > MAXPATLEN)
{
fprintf(stderr, "pcregrep: pattern is too long (limit is %d bytes)\n",
MAXPATLEN);
free(p);
return NULL;
}
p->next = NULL;
......@@ -2549,7 +2550,11 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
afterwards, as a precaution against any later code trying to use it. */
*patlastptr = add_pattern(buffer, *patlastptr);
if (*patlastptr == NULL) return FALSE;
if (*patlastptr == NULL)
{
if (f != stdin) fclose(f);
return FALSE;
}
if (*patptr == NULL) *patptr = *patlastptr;
/* This loop is needed because compiling a "pattern" when -F is set may add
......@@ -2561,7 +2566,10 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
{
if (!compile_pattern(*patlastptr, pcre_options, popts, TRUE, filename,
linenumber))
{
if (f != stdin) fclose(f);
return FALSE;
}
(*patlastptr)->string = NULL; /* Insurance */
if ((*patlastptr)->next == NULL) break;
*patlastptr = (*patlastptr)->next;
......@@ -2962,8 +2970,8 @@ if (locale == NULL)
locale_from = "LC_CTYPE";
}
/* If a locale has been provided, set it, and generate the tables the PCRE
needs. Otherwise, pcretables==NULL, which causes the use of default tables. */
/* If a locale is set, use it to generate the tables the PCRE needs. Otherwise,
pcretables==NULL, which causes the use of default tables. */
if (locale != NULL)
{
......@@ -2971,7 +2979,7 @@ if (locale != NULL)
{
fprintf(stderr, "pcregrep: Failed to set locale %s (obtained from %s)\n",
locale, locale_from);
return 2;
goto EXIT2;
}
pcretables = pcre_maketables();
}
......@@ -2986,7 +2994,7 @@ if (colour_option != NULL && strcmp(colour_option, "never") != 0)
{
fprintf(stderr, "pcregrep: Unknown colour setting \"%s\"\n",
colour_option);
return 2;
goto EXIT2;
}
if (do_colour)
{
......@@ -3026,7 +3034,7 @@ else if (strcmp(newline, "anycrlf") == 0 || strcmp(newline, "ANYCRLF") == 0)
else
{
fprintf(stderr, "pcregrep: Invalid newline specifier \"%s\"\n", newline);
return 2;
goto EXIT2;
}
/* Interpret the text values for -d and -D */
......@@ -3039,7 +3047,7 @@ if (dee_option != NULL)
else
{
fprintf(stderr, "pcregrep: Invalid value \"%s\" for -d\n", dee_option);
return 2;
goto EXIT2;
}
}
......@@ -3050,7 +3058,7 @@ if (DEE_option != NULL)
else
{
fprintf(stderr, "pcregrep: Invalid value \"%s\" for -D\n", DEE_option);
return 2;
goto EXIT2;
}
}
......@@ -3251,7 +3259,8 @@ for (; i < argc; i++)
if (jit_stack != NULL) pcre_jit_stack_free(jit_stack);
#endif
if (main_buffer != NULL) free(main_buffer);
free(main_buffer);
free((void *)pcretables);
free_pattern_chain(patterns);
free_pattern_chain(include_patterns);
......
......@@ -172,7 +172,8 @@ static const int eint[] = {
REG_BADPAT, /* invalid range in character class */
REG_BADPAT, /* group name must start with a non-digit */
/* 85 */
REG_BADPAT /* parentheses too deeply nested (stack check) */
REG_BADPAT, /* parentheses too deeply nested (stack check) */
REG_BADPAT /* missing digits in \x{} or \o{} */
};
/* Table of texts corresponding to POSIX error codes */
......
......@@ -111,7 +111,7 @@
bababbc
babababc
/^\ca\cA\c[\c{\c:/
/^\ca\cA\c[;\c:/
\x01\x01\e;z
/^[ab\]cde]/
......@@ -4938,6 +4938,12 @@ however, we need the complication for Perl. ---/
/((?(R1)a+|(?1)b))/
aaaabcde
/((?(R)a|(?1)))*/
aaa
/((?(R)a|(?1)))+/
aaa
/a(*:any
name)/K
abc
......@@ -5666,4 +5672,52 @@ AbcdCBefgBhiBqz
/(a\Kb)*/+
ababc
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/^\w+(?>\s*)(?<=\w)/
test test
/(?P<same>a)(?P<same>b)/gJ
abbaba
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
bdl
adl
bcl
/\sabc/
\x{0b}abc
/[\Qa]\E]+/
aa]]
/[\Q]a\E]+/
aa]]
/-- End of testinput1 --/
......@@ -132,4 +132,6 @@ is required for these tests. --/
/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B
/(((a\2)|(a*)\g<-1>))*a?/B
/-- End of testinput11 --/
......@@ -32,4 +32,10 @@
/[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput16 --/
......@@ -19,4 +19,10 @@
/[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput19 --/
......@@ -4035,6 +4035,8 @@ backtracking verbs. --/
/(?(R&6yh)abc)/
/(((a\2)|(a*)\g<-1>))*a?/BZ
/-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ
......@@ -4062,4 +4064,18 @@ backtracking verbs. --/
/(((((a)))))/Q
/^\w+(?>\s*)(?<=\w)/BZ
/\othing/
/\o{}/
/\o{whatever}/
/\xthing/
/\x{}/
/\x{whatever}/
/-- End of testinput2 --/
......@@ -421,8 +421,8 @@
/^[\p{Arabic}]/8
\x{06e9}
\x{060b}
\x{061c}
** Failers
\x{061c}
X\x{06e9}
/^[\P{Yi}]/8
......@@ -1493,4 +1493,7 @@
/[q-u]+/8iW
Ss\x{17f}
/^s?c/mi8
scat
/-- End of testinput6 --/
......@@ -835,4 +835,7 @@ of case for anything other than the ASCII letters. --/
/[Q-U]+/8iWBZ
/^s?c/mi8I
scat
/-- End of testinput7 --/
......@@ -4831,4 +4831,10 @@
/[ab]{2,}?/
aaaa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/-- End of testinput8 --/
......@@ -223,7 +223,7 @@ No match
babababc
No match
/^\ca\cA\c[\c{\c:/
/^\ca\cA\c[;\c:/
\x01\x01\e;z
0: \x01\x01\x1b;z
......@@ -8235,6 +8235,16 @@ MK: M
0: aaaab
1: aaaab
/((?(R)a|(?1)))*/
aaa
0: aaa
1: a
/((?(R)a|(?1)))+/
aaa
0: aaa
1: a
/a(*:any
name)/K
abc
......@@ -9313,4 +9323,92 @@ No match
0+ c
1: ab
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
No match
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
1: AFTER
2:
/^\w+(?>\s*)(?<=\w)/
test test
0: tes
/(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
0: aba
1: a
2: b
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
0: bbbaaaba
1: a
2: b
0: bb
1: <unset>
2: b
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
No match
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
0: acl
1: a
bdl
0: bdl
1: <unset>
2: b
adl
0: dl
bcl
0: l
/\sabc/
\x{0b}abc
0: \x0babc
/[\Qa]\E]+/
aa]]
0: aa]]
/[\Q]a\E]+/
aa]]
0: aa]]
/-- End of testinput1 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 14
62 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 28
62 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 10
76 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 57 Bra
3 Brazero
4 48 SCBra 1
9 40 Once
12 18 CBra 2
17 10 CBra 3
22 a
24 \2
27 10 Ket
30 16 Alt
33 7 CBra 4
38 a*
40 7 Ket
43 33 Recurse
46 34 Ket
49 40 Ket
52 48 KetRmax
55 a?+
57 57 Ket
60 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -871,7 +871,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \xc2
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -883,15 +883,15 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3
\xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2
\xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1
\xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0
\xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
\xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
\xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
\xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
\xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -118,4 +118,24 @@ Starting chars: \x0a \x0b \x0c \x0d \x85
End
------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xe2
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xc5
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput16 --/
......@@ -752,7 +752,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \x85 \xa0
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -764,20 +764,20 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
\xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -749,7 +749,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \x85 \xa0
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -761,20 +761,20 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
\xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -85,4 +85,24 @@ No starting char list
End
------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xff
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xff
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput19 --/
......@@ -5821,13 +5821,13 @@ No match
No match
/a{11111111111111111111}/I
Failed: number too big in {} quantifier at offset 22
Failed: number too big in {} quantifier at offset 8
/(){64294967295}/I
Failed: number too big in {} quantifier at offset 14
Failed: number too big in {} quantifier at offset 9
/(){2,4294967295}/I
Failed: number too big in {} quantifier at offset 15
Failed: number too big in {} quantifier at offset 11
"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I
Capturing subpattern count = 1
......@@ -14093,6 +14093,30 @@ Failed: malformed number or name after (?( at offset 4
/(?(R&6yh)abc)/
Failed: group name must start with a non-digit at offset 5
/(((a\2)|(a*)\g<-1>))*a?/BZ
------------------------------------------------------------------
Bra
Brazero
SCBra 1
Once
CBra 2
CBra 3
a
\2
Ket
Alt
CBra 4
a*
Ket
Recurse
Ket
Ket
KetRmax
a?+
Ket
End
------------------------------------------------------------------
/-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ
......@@ -14149,4 +14173,37 @@ Failed: parentheses are too deeply nested (stack check) at offset 0
/(((((a)))))/Q
** Missing 0 or 1 after /Q
/^\w+(?>\s*)(?<=\w)/BZ
------------------------------------------------------------------
Bra
^
\w+
Once_NC
\s*+
Ket
AssertB
Reverse
\w
Ket
Ket
End
------------------------------------------------------------------
/\othing/
Failed: missing opening brace after \o at offset 1
/\o{}/
Failed: digits missing in \x{} or \o{} at offset 1
/\o{whatever}/
Failed: non-octal character in \o{} (closing brace missing?) at offset 3
/\xthing/
/\x{}/
Failed: digits missing in \x{} or \o{} at offset 3
/\x{whatever}/
Failed: non-hex character in \x{} (closing brace missing?) at offset 3
/-- End of testinput2 --/
......@@ -719,9 +719,9 @@ No match
0: \x{6e9}
\x{060b}
0: \x{60b}
\x{061c}
0: \x{61c}
** Failers
No match
\x{061c}
No match
X\x{06e9}
No match
......@@ -2457,4 +2457,8 @@ No match
Ss\x{17f}
0: Ss\x{17f}
/^s?c/mi8
scat
0: sc
/-- End of testinput6 --/
......@@ -2287,4 +2287,12 @@ No match
End
------------------------------------------------------------------
/^s?c/mi8I
Capturing subpattern count = 0
Options: caseless multiline utf
First char at start or follows newline
Need char = 'c' (caseless)
scat
0: sc
/-- End of testinput7 --/
......@@ -7777,4 +7777,12 @@ Matched, but offsets vector is too small to show all matches
1: aaa
2: aa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
/-- End of testinput8 --/
......@@ -192,7 +192,31 @@ enum {
ucp_Miao,
ucp_Sharada,
ucp_Sora_Sompeng,
ucp_Takri
ucp_Takri,
/* New for Unicode 7.0.0: */
ucp_Bassa_Vah,
ucp_Caucasian_Albanian,
ucp_Duployan,
ucp_Elbasan,
ucp_Grantha,
ucp_Khojki,
ucp_Khudawadi,
ucp_Linear_A,
ucp_Mahajani,
ucp_Manichaean,
ucp_Mende_Kikakui,
ucp_Modi,
ucp_Mro,
ucp_Nabataean,
ucp_Old_North_Arabian,
ucp_Old_Permic,
ucp_Pahawh_Hmong,
ucp_Palmyrene,
ucp_Psalter_Pahlavi,
ucp_Pau_Cin_Hau,
ucp_Siddham,
ucp_Tirhuta,
ucp_Warang_Citi
};
#endif
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment