Commit 095b7b92 authored by Sergei Golubchik's avatar Sergei Golubchik

Merge branch 'merge/merge-pcre' into 10.0

parents 359ae59a e7591a1b
ChangeLog for PCRE
------------------
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
development is happening in the PCRE2 10.xx series.
Version 8.38 23-November-2015
-----------------------------
1. If a group that contained a recursive back reference also contained a
forward reference subroutine call followed by a non-forward-reference
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
compile correct code, leading to undefined behaviour or an internally
detected error. This bug was discovered by the LLVM fuzzer.
2. Quantification of certain items (e.g. atomic back references) could cause
incorrect code to be compiled when recursive forward references were
involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
This bug was discovered by the LLVM fuzzer.
3. A repeated conditional group whose condition was a reference by name caused
a buffer overflow if there was more than one group with the given name.
This bug was discovered by the LLVM fuzzer.
4. A recursive back reference by name within a group that had the same name as
another group caused a buffer overflow. For example:
/(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
5. A forward reference by name to a group whose number is the same as the
current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
a buffer overflow at compile time. This bug was discovered by the LLVM
fuzzer.
6. A lookbehind assertion within a set of mutually recursive subpatterns could
provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
7. Another buffer overflow bug involved duplicate named groups with a
reference between their definition, with a group that reset capture
numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
fixed by always allowing for more memory, even if not needed. (A proper fix
is implemented in PCRE2, but it involves more refactoring.)
8. There was no check for integer overflow in subroutine calls such as (?123).
9. The table entry for \l in EBCDIC environments was incorrect, leading to its
being treated as a literal 'l' instead of causing an error.
10. There was a buffer overflow if pcre_exec() was called with an ovector of
size 1. This bug was found by american fuzzy lop.
11. If a non-capturing group containing a conditional group that could match
an empty string was repeated, it was not identified as matching an empty
string itself. For example: /^(?:(?(1)x|)+)+$()/.
12. In an EBCDIC environment, pcretest was mishandling the escape sequences
\a and \e in test subject lines.
13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
instead of the EBCDIC value.
14. The handling of \c in an EBCDIC environment has been revised so that it is
now compatible with the specification in Perl's perlebcdic page.
15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
ASCII/Unicode. This has now been added to the list of characters that are
recognized as white space in EBCDIC.
16. When PCRE was compiled without UCP support, the use of \p and \P gave an
error (correctly) when used outside a class, but did not give an error
within a class.
17. \h within a class was incorrectly compiled in EBCDIC environments.
18. A pattern with an unmatched closing parenthesis that contained a backward
assertion which itself contained a forward reference caused buffer
overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
19. JIT should return with error when the compiled pattern requires more stack
space than the maximum.
20. A possessively repeated conditional group that could match an empty string,
for example, /(?(R))*+/, was incorrectly compiled.
21. Fix infinite recursion in the JIT compiler when certain patterns such as
/(?:|a|){100}x/ are analysed.
22. Some patterns with character classes involving [: and \\ were incorrectly
compiled and could cause reading from uninitialized memory or an incorrect
error diagnosis.
23. Pathological patterns containing many nested occurrences of [: caused
pcre_compile() to run for a very long time.
24. A conditional group with only one branch has an implicit empty alternative
branch and must therefore be treated as potentially matching an empty
string.
25. If (?R was followed by - or + incorrect behaviour happened instead of a
diagnostic.
26. Arrange to give up on finding the minimum matching length for overly
complex patterns.
27. Similar to (4) above: in a pattern with duplicated named groups and an
occurrence of (?| it is possible for an apparently non-recursive back
reference to become recursive if a later named group with the relevant
number is encountered. This could lead to a buffer overflow. Wen Guanxing
from Venustech ADLAB discovered this bug.
28. If pcregrep was given the -q option with -c or -l, or when handling a
binary file, it incorrectly wrote output to stdout.
29. The JIT compiler did not restore the control verb head in case of *THEN
control verbs. This issue was found by Karl Skomski with a custom LLVM
fuzzer.
30. Error messages for syntax errors following \g and \k were giving inaccurate
offsets in the pattern.
31. Added a check for integer overflow in conditions (?(<digits>) and
(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
fuzzer.
32. Handling recursive references such as (?2) when the reference is to a group
later in the pattern uses code that is very hacked about and error-prone.
It has been re-written for PCRE2. Here in PCRE1, a check has been added to
give an internal error if it is obvious that compiling has gone wrong.
33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
This issue was found by Karl Skomski with a custom LLVM fuzzer.
34. The JIT compiler should restore the control chain for empty possessive
repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
35. Match limit check added to JIT recursion. This issue was found by Karl
Skomski with a custom LLVM fuzzer.
36. Yet another case similar to 27 above has been circumvented by an
unconditional allocation of extra memory. This issue is fixed "properly" in
PCRE2 by refactoring the way references are handled. Wen Guanxing
from Venustech ADLAB discovered this bug.
37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
with a custom LLVM fuzzer.
38. Fixed a corner case of range optimization in JIT.
39. An incorrect error "overran compiling workspace" was given if there were
exactly enough group forward references such that the last one extended
into the workspace safety margin. The next one would have expanded the
workspace. The test for overflow was not including the safety margin.
40. A match limit issue is fixed in JIT which was found by Karl Skomski
with a custom LLVM fuzzer.
41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
work under Windows. (Why has it taken so long for anyone to notice?)
42. In a character class such as [\W\p{Any}] where both a negative-type escape
("not a word character") and a property escape were present, the property
escape was being ignored.
43. Fix crash caused by very long (*MARK) or (*THEN) names.
44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
by a single ASCII character in a class item, was incorrectly compiled in
UCP mode. The POSIX class got lost, but only if the single character
followed it.
45. [:punct:] in UCP mode was matching some characters in the range 128-255
that should not have been matched.
46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
class, all characters with code points greater than 255 are in the class.
When a Unicode property was also in the class (if PCRE_UCP is set, escapes
such as \w are turned into Unicode properties), wide characters were not
correctly handled, and could fail to match.
Version 8.37 28-April-2015
--------------------------
......
News about PCRE releases
------------------------
Release 8.38 23-November-2015
-----------------------------
This is bug-fix release. Note that this library (now called PCRE1) is now being
maintained for bug fixes only. New projects are advised to use the new PCRE2
libraries.
Release 8.37 28-April-2015
--------------------------
......
......@@ -764,9 +764,9 @@ required. For details, please see this web site:
http://www.zaconsultants.net
There is also a mirror here:
http://www.vsoft-software.com/downloads.html
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
executable, is in EBCDIC and native z/OS file formats and this is the
recommended download site.
==========================
Last Updated: 10 February 2015
Last Updated: 25 June 2015
......@@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
(cd $srcdir; $valgrind $pcregrep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
(cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
echo "RC=$?" >>testtrygrep
# Now compare the results.
$cf $srcdir/testdata/grepoutput testtrygrep
......
......@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [37])
m4_define(pcre_minor, [38])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2015-04-28])
m4_define(pcre_date, [2015-11-23])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:5:2])
m4_define(libpcre16_version, [2:5:2])
m4_define(libpcre32_version, [0:5:0])
m4_define(libpcre_version, [3:6:2])
m4_define(libpcre16_version, [2:6:2])
m4_define(libpcre32_version, [0:6:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])
......
......@@ -764,9 +764,9 @@ required. For details, please see this web site:
http://www.zaconsultants.net
There is also a mirror here:
http://www.vsoft-software.com/downloads.html
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
executable, is in EBCDIC and native z/OS file formats and this is the
recommended download site.
==========================
Last Updated: 10 February 2015
Last Updated: 25 June 2015
......@@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
one of the following escape sequences than the binary character it represents:
one of the following escape sequences than the binary character it represents.
In an ASCII or Unicode environment, these escapes are as follows:
<pre>
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
......@@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
</P>
<P>
The \c facility was designed for use with ASCII characters, but with the
extension to Unicode it is even less useful than it once was. It is, however,
recognized when PCRE is compiled in EBCDIC mode, where data items are always
bytes. In this mode, all values are valid after \c. If the next character is a
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
characters also generate different values.
When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
generate the appropriate EBCDIC code values. The \c escape is processed
as specified for Perl in the <b>perlebcdic</b> document. The only characters
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \@ encodes
character code 0; the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\? becomes either 255 (hex FF) or 95 (hex 5F).
</P>
<P>
Thus, apart from \?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \G always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
</P>
<P>
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \? generate 95; otherwise it generates 255.
</P>
<P>
After \0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \0\x\07
specifies two binary zeros followed by a BEL character (code value 7). Make
digits, just those that are present are used. Thus the sequence \0\x\015
specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
</P>
......@@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 January 2014
Last updated: 14 June 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
......@@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
one of the following escape sequences than the binary character it represents:
one of the following escape sequences than the binary character it represents.
In an ASCII or Unicode environment, these escapes are as follows:
.sp
\ea alarm, that is, the BEL character (hex 07)
\ecx "control-x", where x is any ASCII character
......@@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
data item (byte or 16-bit value) following \ec has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
.P
The \ec facility was designed for use with ASCII characters, but with the
extension to Unicode it is even less useful than it once was. It is, however,
recognized when PCRE is compiled in EBCDIC mode, where data items are always
bytes. In this mode, all values are valid after \ec. If the next character is a
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
characters also generate different values.
When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
generate the appropriate EBCDIC code values. The \ec escape is processed
as specified for Perl in the \fBperlebcdic\fP document. The only characters
that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \e@ encodes
character code 0; the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\e? becomes either 255 (hex FF) or 95 (hex 5F).
.P
Thus, apart from \e?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \eG always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
.P
The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \e? generate 95; otherwise it generates 255.
.P
After \e0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \e0\ex\e07
specifies two binary zeros followed by a BEL character (code value 7). Make
digits, just those that are present are used. Thus the sequence \e0\ex\e015
specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
.P
......@@ -3283,6 +3296,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 14 June 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi
This diff is collapsed.
......@@ -6685,7 +6685,8 @@ if (md->offset_vector != NULL)
register int *iend = iptr - re->top_bracket;
if (iend < md->offset_vector + 2) iend = md->offset_vector + 2;
while (--iptr >= iend) *iptr = -1;
md->offset_vector[0] = md->offset_vector[1] = -1;
if (offsetcount > 0) md->offset_vector[0] = -1;
if (offsetcount > 1) md->offset_vector[1] = -1;
}
/* Set up the first character to match, if available. The first_char value is
......
......@@ -984,7 +984,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
#ifndef EBCDIC
#define HSPACE_LIST \
CHAR_HT, CHAR_SPACE, 0xa0, \
CHAR_HT, CHAR_SPACE, CHAR_NBSP, \
0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \
0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \
NOTACHAR
......@@ -1010,7 +1010,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
#define HSPACE_BYTE_CASES \
case CHAR_HT: \
case CHAR_SPACE: \
case 0xa0 /* NBSP */
case CHAR_NBSP
#define HSPACE_CASES \
HSPACE_BYTE_CASES: \
......@@ -1037,11 +1037,12 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
/* ------ EBCDIC environments ------ */
#else
#define HSPACE_LIST CHAR_HT, CHAR_SPACE
#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR
#define HSPACE_BYTE_CASES \
case CHAR_HT: \
case CHAR_SPACE
case CHAR_SPACE: \
case CHAR_NBSP
#define HSPACE_CASES HSPACE_BYTE_CASES
......@@ -1215,6 +1216,7 @@ same code point. */
#define CHAR_ESC '\047'
#define CHAR_DEL '\007'
#define CHAR_NBSP '\x41'
#define STR_ESC "\047"
#define STR_DEL "\007"
......@@ -1229,6 +1231,7 @@ a positive value. */
#define CHAR_NEL ((unsigned char)'\x85')
#define CHAR_ESC '\033'
#define CHAR_DEL '\177'
#define CHAR_NBSP ((unsigned char)'\xa0')
#define STR_LF "\n"
#define STR_NL STR_LF
......@@ -1606,6 +1609,7 @@ only. */
#define CHAR_VERTICAL_LINE '\174'
#define CHAR_RIGHT_CURLY_BRACKET '\175'
#define CHAR_TILDE '\176'
#define CHAR_NBSP ((unsigned char)'\xa0')
#define STR_HT "\011"
#define STR_VT "\013"
......@@ -1762,6 +1766,10 @@ only. */
/* Escape items that are just an encoding of a particular data value. */
#ifndef ESC_a
#define ESC_a CHAR_BEL
#endif
#ifndef ESC_e
#define ESC_e CHAR_ESC
#endif
......@@ -2446,6 +2454,7 @@ typedef struct compile_data {
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
BOOL dupnames; /* Duplicate names exist */
BOOL dupgroups; /* Duplicate groups exist: (?| found */
BOOL iscondassert; /* Next assert is a condition */
int nltype; /* Newline type */
int nllen; /* Newline string length */
......
......@@ -1064,6 +1064,7 @@ pcre_uchar *alternative;
pcre_uchar *end = NULL;
int private_data_ptr = *private_data_start;
int space, size, bracketlen;
BOOL repeat_check = TRUE;
while (cc < ccend)
{
......@@ -1071,9 +1072,10 @@ while (cc < ccend)
size = 0;
bracketlen = 0;
if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE)
return;
break;
if (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)
if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND))
{
if (detect_repeat(common, cc))
{
/* These brackets are converted to repeats, so no global
......@@ -1081,6 +1083,8 @@ while (cc < ccend)
if (cc >= end)
end = bracketend(cc);
}
}
repeat_check = TRUE;
switch(*cc)
{
......@@ -1136,6 +1140,13 @@ while (cc < ccend)
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;
case OP_BRAZERO:
case OP_BRAMINZERO:
case OP_BRAPOSZERO:
repeat_check = FALSE;
size = 1;
break;
CASE_ITERATOR_PRIVATE_DATA_1
space = 1;
size = -2;
......@@ -1162,12 +1173,17 @@ while (cc < ccend)
size = 1;
break;
CASE_ITERATOR_TYPE_PRIVATE_DATA_2B
case OP_TYPEUPTO:
if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI)
space = 2;
size = 1 + IMM2_SIZE;
break;
case OP_TYPEMINUPTO:
space = 2;
size = 1 + IMM2_SIZE;
break;
case OP_CLASS:
case OP_NCLASS:
size += 1 + 32 / sizeof(pcre_uchar);
......@@ -1316,6 +1332,13 @@ while (cc < ccend)
cc += 1 + LINK_SIZE + IMM2_SIZE;
break;
case OP_THEN:
stack_restore = TRUE;
if (common->control_head_ptr != 0)
*needs_control_head = TRUE;
cc ++;
break;
default:
stack_restore = TRUE;
/* Fall through. */
......@@ -2220,6 +2243,7 @@ while (current != NULL)
SLJIT_ASSERT_STOP();
break;
}
SLJIT_ASSERT(current > (sljit_sw*)current[-1]);
current = (sljit_sw*)current[-1];
}
return -1;
......@@ -3209,7 +3233,7 @@ bytes[len] = byte;
bytes[0] = len;
}
static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars)
static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars, pcre_uint32 *rec_count)
{
/* Recursive function, which scans prefix literals. */
BOOL last, any, caseless;
......@@ -3227,9 +3251,14 @@ pcre_uchar othercase[1];
repeat = 1;
while (TRUE)
{
if (*rec_count == 0)
return 0;
(*rec_count)--;
last = TRUE;
any = FALSE;
caseless = FALSE;
switch (*cc)
{
case OP_CHARI:
......@@ -3291,7 +3320,7 @@ while (TRUE)
#ifdef SUPPORT_UTF
if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc);
#endif
max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars);
max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars, rec_count);
if (max_chars == 0)
return consumed;
last = FALSE;
......@@ -3314,7 +3343,7 @@ while (TRUE)
alternative = cc + GET(cc, 1);
while (*alternative == OP_ALT)
{
max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars);
max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars, rec_count);
if (max_chars == 0)
return consumed;
alternative += GET(alternative, 1);
......@@ -3556,6 +3585,7 @@ int i, max, from;
int range_right = -1, range_len = 3 - 1;
sljit_ub *update_table = NULL;
BOOL in_range;
pcre_uint32 rec_count;
for (i = 0; i < MAX_N_CHARS; i++)
{
......@@ -3564,7 +3594,8 @@ for (i = 0; i < MAX_N_CHARS; i++)
bytes[i * MAX_N_BYTES] = 0;
}
max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS);
rec_count = 10000;
max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS, &rec_count);
if (max <= 1)
return FALSE;
......@@ -4311,8 +4342,10 @@ switch(length)
case 4:
if ((ranges[1] - ranges[0]) == (ranges[3] - ranges[2])
&& (ranges[0] | (ranges[2] - ranges[0])) == ranges[2]
&& (ranges[1] & (ranges[2] - ranges[0])) == 0
&& is_powerof2(ranges[2] - ranges[0]))
{
SLJIT_ASSERT((ranges[0] & (ranges[2] - ranges[0])) == 0 && (ranges[2] & ranges[3] & (ranges[2] - ranges[0])) != 0);
OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, ranges[2] - ranges[0]);
if (ranges[2] + 1 != ranges[3])
{
......@@ -4900,9 +4933,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
if (!check_class_ranges(common, (const pcre_uint8 *)cc, FALSE, TRUE, list))
{
#ifdef COMPILE_PCRE8
SLJIT_ASSERT(common->utf);
jump = NULL;
if (common->utf)
#endif
jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7);
OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3);
......@@ -4911,7 +4945,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
OP2(SLJIT_AND | SLJIT_SET_E, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0);
add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO));
JUMPHERE(jump);
#ifdef COMPILE_PCRE8
if (common->utf)
#endif
JUMPHERE(jump);
}
OP1(SLJIT_MOV, TMP1, 0, TMP3, 0);
......@@ -5219,7 +5256,7 @@ while (*cc != XCL_END)
OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_UNUSED, 0, SLJIT_LESS_EQUAL);
SET_CHAR_OFFSET(0);
OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xff);
OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x7f);
OP_FLAGS(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_LESS_EQUAL);
SET_TYPE_OFFSET(ucp_Pc);
......@@ -7665,6 +7702,10 @@ while (*cc != OP_KETRPOS)
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0);
}
/* Even if the match is empty, we need to reset the control head. */
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
......@@ -7692,6 +7733,10 @@ while (*cc != OP_KETRPOS)
OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), (framesize + 1) * sizeof(sljit_sw), STR_PTR, 0);
}
/* Even if the match is empty, we need to reset the control head. */
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
......@@ -7704,9 +7749,6 @@ while (*cc != OP_KETRPOS)
}
}
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
JUMPTO(SLJIT_JUMP, loop);
flush_stubs(common);
......@@ -8441,8 +8483,7 @@ while (cc < ccend)
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), STR_PTR, 0);
}
BACKTRACK_AS(braminzero_backtrack)->matchingpath = LABEL();
if (cc[1] > OP_ASSERTBACK_NOT)
count_match(common);
count_match(common);
break;
case OP_ONCE:
......@@ -9624,7 +9665,7 @@ static SLJIT_INLINE void compile_recurse(compiler_common *common)
DEFINE_COMPILER;
pcre_uchar *cc = common->start + common->currententry->start;
pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE);
pcre_uchar *ccend = bracketend(cc);
pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE);
BOOL needs_control_head;
int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head);
int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head);
......@@ -9648,6 +9689,7 @@ set_jumps(common->currententry->calls, common->currententry->entry);
sljit_emit_fast_enter(compiler, TMP2, 0);
allocate_stack(common, private_data_size + framesize + alternativesize);
count_match(common);
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0);
copy_private_data(common, ccbegin, ccend, TRUE, private_data_size + framesize + alternativesize, framesize + alternativesize, needs_control_head);
if (needs_control_head)
......@@ -9992,6 +10034,7 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, stack));
OP1(SLJIT_MOV_UI, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, limit_match));
OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, base));
OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, limit));
OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1);
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0);
if (mode == JIT_PARTIAL_SOFT_COMPILE)
......
......@@ -182,6 +182,7 @@ static struct regression_test_case regression_test_cases[] = {
{ CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
{ CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
{ CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" },
{ MA, 0, "[3-57-9]", "5" },
/* Assertions. */
{ MUA, 0, "\\b[^A]", "A_B#" },
......
......@@ -71,6 +71,7 @@ Arguments:
startcode pointer to start of the whole pattern's code
options the compiling options
recurses chain of recurse_check to catch mutual recursion
countptr pointer to call count (to catch over complexity)
Returns: the minimum length
-1 if \C in UTF-8 mode or (*ACCEPT) was encountered
......@@ -80,7 +81,8 @@ Returns: the minimum length
static int
find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
const pcre_uchar *startcode, int options, recurse_check *recurses)
const pcre_uchar *startcode, int options, recurse_check *recurses,
int *countptr)
{
int length = -1;
/* PCRE_UTF16 has the same value as PCRE_UTF8. */
......@@ -90,6 +92,8 @@ recurse_check this_recurse;
register int branchlength = 0;
register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;
if ((*countptr)++ > 1000) return -1; /* too complex */
if (*code == OP_CBRA || *code == OP_SCBRA ||
*code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE;
......@@ -131,7 +135,7 @@ for (;;)
case OP_SBRAPOS:
case OP_ONCE:
case OP_ONCE_NC:
d = find_minlength(re, cc, startcode, options, recurses);
d = find_minlength(re, cc, startcode, options, recurses, countptr);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
......@@ -415,7 +419,8 @@ for (;;)
int dd;
this_recurse.prev = recurses;
this_recurse.group = cs;
dd = find_minlength(re, cs, startcode, options, &this_recurse);
dd = find_minlength(re, cs, startcode, options, &this_recurse,
countptr);
if (dd < d) d = dd;
}
}
......@@ -451,7 +456,8 @@ for (;;)
{
this_recurse.prev = recurses;
this_recurse.group = cs;
d = find_minlength(re, cs, startcode, options, &this_recurse);
d = find_minlength(re, cs, startcode, options, &this_recurse,
countptr);
}
}
}
......@@ -514,7 +520,7 @@ for (;;)
this_recurse.prev = recurses;
this_recurse.group = cs;
branchlength += find_minlength(re, cs, startcode, options,
&this_recurse);
&this_recurse, countptr);
}
}
cc += 1 + LINK_SIZE;
......@@ -1453,6 +1459,7 @@ pcre32_study(const pcre32 *external_re, int options, const char **errorptr)
#endif
{
int min;
int count = 0;
BOOL bits_set = FALSE;
pcre_uint8 start_bits[32];
PUBL(extra) *extra = NULL;
......@@ -1539,7 +1546,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&
/* Find the minimum length of subject string. */
switch(min = find_minlength(re, code, code, re->options, NULL))
switch(min = find_minlength(re, code, code, re->options, NULL, &count))
{
case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
case -3: *errorptr = "internal error: opcode not recognized"; return NULL;
......
......@@ -246,7 +246,7 @@ while ((t = *data++) != XCL_END)
case PT_PXPUNCT:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
(c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
(c < 128 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
return !negated;
break;
......
......@@ -1692,9 +1692,13 @@ while (ptr < endptr)
if (filenames == FN_NOMATCH_ONLY) return 1;
/* If all we want is a yes/no answer, stop now. */
if (quiet) return 0;
/* Just count if just counting is wanted. */
if (count_only) count++;
else if (count_only) count++;
/* When handling a binary file and binary-files==binary, the "binary"
variable will be set true (it's false in all other cases). In this
......@@ -1715,10 +1719,6 @@ while (ptr < endptr)
return 0;
}
/* Likewise, if all we want is a yes/no answer. */
else if (quiet) return 0;
/* The --only-matching option prints just the substring that matched,
and/or one or more captured portions of it, as long as these strings are
not empty. The --file-offsets and --line-offsets options output offsets for
......@@ -2089,7 +2089,7 @@ if (filenames == FN_NOMATCH_ONLY)
/* Print the match count if wanted */
if (count_only)
if (count_only && !quiet)
{
if (count > 0 || !omit_zero_count)
{
......
......@@ -4621,9 +4621,9 @@ while (!done)
else switch ((c = *p++))
{
case 'a': c = 7; break;
case 'a': c = CHAR_BEL; break;
case 'b': c = '\b'; break;
case 'e': c = 27; break;
case 'e': c = CHAR_ESC; break;
case 'f': c = '\f'; break;
case 'n': c = '\n'; break;
case 'r': c = '\r'; break;
......
......@@ -5730,4 +5730,7 @@ AbcdCBefgBhiBqz
"(?1)(?#?'){8}(a)"
baaaaaaaaac
"(?|(\k'Pm')|(?'Pm'))"
abcd
/-- End of testinput1 --/
......@@ -136,4 +136,6 @@ is required for these tests. --/
/((?+1)(\1))/B
/.((?2)(?R)\1)()/B
/-- End of testinput11 --/
This diff is collapsed.
......@@ -340,4 +340,6 @@ not matter. --/
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ
/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
/-- End of testinput14 --/
This diff is collapsed.
This diff is collapsed.
......@@ -1502,4 +1502,55 @@
/\C\X*QT/8
Ӆ\x0aT
/[\pS#moq]/
=
/[[:punct:]]/8W
\xc2\xb4
\x{b4}
/[[:^ascii:]]/8W
\x{100}
\x{200}
\x{300}
\x{37e}
a
9
g
/[[:^ascii:]\w]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[\w[:^ascii:]]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[^[:ascii:]\W]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[[:^ascii:]a]/8W
a
9
g
\x{100}
\x{200}
\x{37e}
/-- End of testinput6 --/
......@@ -838,4 +838,19 @@ of case for anything other than the ASCII letters. --/
/^s?c/mi8I
scat
/[\W\p{Any}]/BZ
abc
123
/[\W\pL]/BZ
abc
** Failers
123
/a[[:punct:]b]/WBZ
/a[[:punct:]b]/8WBZ
/a[b[:punct:]]/8WBZ
/-- End of testinput7 --/
......@@ -29,13 +29,16 @@ in EBCDIC, but can be specified as escapes. --/
/^A\ˆ/
A B
A\x41B
/-- Test \H --/
/^A\È/
AB
A\x42B
** Fail
A B
A\x41B
/-- Test \R --/
......
......@@ -9429,4 +9429,9 @@ No match
0: aaaaaaaaa
1: a
"(?|(\k'Pm')|(?'Pm'))"
abcd
0:
1:
/-- End of testinput1 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 73
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 61
Memory allocation (code space): 77
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
......@@ -650,18 +650,18 @@ Memory allocation (code space): 14
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 26 Bra
2 [ -~\x80-\xff\P{L}]++
26 26 Ket
28 End
0 30 Bra
2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
30 30 Ket
32 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 26 Bra
2 [ -~\x80-\xff\P{L}]++
26 26 Ket
28 End
0 30 Bra
2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
30 30 Ket
32 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
......@@ -748,4 +748,21 @@ Memory allocation (code space): 14
22 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 23 Bra
2 Any
3 13 Once
5 9 CBra 1
8 18 Recurse
10 0 Recurse
12 \1
14 9 Ket
16 13 Ket
18 3 CBra 2
21 3 Ket
23 23 Ket
25 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 155
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 125
Memory allocation (code space): 157
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
......@@ -650,18 +650,18 @@ Memory allocation (code space): 28
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 18 Bra
2 [ -~\x80-\xff\P{L}]++
18 18 Ket
20 End
0 21 Bra
2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
21 21 Ket
23 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 18 Bra
2 [ -~\x80-\xff\P{L}]++
18 18 Ket
20 End
0 21 Bra
2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
21 21 Ket
23 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
......@@ -748,4 +748,21 @@ Memory allocation (code space): 28
22 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 23 Bra
2 Any
3 13 Once
5 9 CBra 1
8 18 Recurse
10 0 Recurse
12 \1
14 9 Ket
16 13 Ket
18 3 CBra 2
21 3 Ket
23 23 Ket
25 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 45
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 38
Memory allocation (code space): 50
------------------------------------------------------------------
0 30 Bra
3 7 CBra 1
......@@ -650,18 +650,18 @@ Memory allocation (code space): 10
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 44 Bra
3 [ -~\x80-\xff\P{L}]++
44 44 Ket
47 End
0 51 Bra
3 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
51 51 Ket
54 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 44 Bra
3 [ -~\x80-\xff\P{L}]++
44 44 Ket
47 End
0 51 Bra
3 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
51 51 Ket
54 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
......@@ -748,4 +748,21 @@ Memory allocation (code space): 10
34 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 35 Bra
3 Any
4 20 Once
7 14 CBra 1
12 27 Recurse
15 0 Recurse
18 \1
21 14 Ket
24 20 Ket
27 5 CBra 2
32 5 Ket
35 35 Ket
38 End
------------------------------------------------------------------
/-- End of testinput11 --/
This diff is collapsed.
......@@ -527,4 +527,6 @@ Failed: character value in \u.... sequence is too large at offset 6
End
------------------------------------------------------------------
/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
/-- End of testinput14 --/
This diff is collapsed.
This diff is collapsed.
......@@ -2469,4 +2469,92 @@ No match
Ӆ\x0aT
No match
/[\pS#moq]/
=
0: =
/[[:punct:]]/8W
\xc2\xb4
No match
\x{b4}
No match
/[[:^ascii:]]/8W
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
a
No match
9
No match
g
No match
/[[:^ascii:]\w]/8W
a
0: a
9
0: 9
g
0: g
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
/[\w[:^ascii:]]/8W
a
0: a
9
0: 9
g
0: g
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
/[^[:ascii:]\W]/8W
a
No match
9
No match
g
No match
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
No match
\x{37e}
No match
/[[:^ascii:]a]/8W
a
0: a
9
No match
g
No match
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{37e}
0: \x{37e}
/-- End of testinput6 --/
......@@ -949,7 +949,7 @@ No match
/[[:^alpha:][:^cntrl:]]+/8WBZ
------------------------------------------------------------------
Bra
[ -~\x80-\xff\P{L}]++
[ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
Ket
End
------------------------------------------------------------------
......@@ -961,7 +961,7 @@ No match
/[[:^cntrl:][:^alpha:]]+/8WBZ
------------------------------------------------------------------
Bra
[ -~\x80-\xff\P{L}]++
[ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
Ket
End
------------------------------------------------------------------
......@@ -2295,4 +2295,57 @@ Need char = 'c' (caseless)
scat
0: sc
/[\W\p{Any}]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}]
Ket
End
------------------------------------------------------------------
abc
0: a
** Failers
0: *
123
No match
/a[[:punct:]b]/WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[[:punct:]b]/8WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[b[:punct:]]/8WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/-- End of testinput7 --/
......@@ -41,16 +41,22 @@ No match
/^A\ˆ/
A B
0: A\x20
A\x41B
0: AA
/-- Test \H --/
/^A\È/
AB
0: AB
A\x42B
0: AB
** Fail
No match
A B
No match
A\x41B
No match
/-- Test \R --/
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment