Merge branch 'merge-pcre' into 10.0

f8736063 · Vicențiu Ciorbaru · 051f8bc8 · dba454ef · f8736063 · f8736063
Commit f8736063 authored Jul 30, 2017 by Vicențiu Ciorbaru
31 changed files
--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@@ -4,6 +4,53 @@ ChangeLog for PCRE
 Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
 development is happening in the PCRE2 10.xx series.

+Version 8.41 05-July-2017
+-------------------------
+
+1.  Fixed typo in CMakeLists.txt (wrong number of arguments for
+PCRE_STATIC_RUNTIME (affects MSVC only).
+
+2.  Issue 1 for 8.40 below was not correctly fixed. If pcregrep in multiline
+mode with --only-matching matched several lines, it restarted scanning at the
+next line instead of moving on to the end of the matched string, which can be
+several lines after the start.
+
+3.  Fix a missing else in the JIT compiler reported by 'idaifish'.
+
+4.  A (?# style comment is now ignored between a basic quantifier and a
+following '+' or '?' (example: /X+(?#comment)?Y/.
+
+5.  Avoid use of a potentially overflowing buffer in pcregrep (patch by Petr
+Pisar).
+
+6.  Fuzzers have reported issues in pcretest. These are NOT serious (it is,
+after all, just a test program). However, to stop the reports, some easy ones
+are fixed:
+
+    (a) Check for values < 256 when calling isprint() in pcretest.
+    (b) Give an error for too big a number after \O.
+
+7.  In the 32-bit library in non-UTF mode, an attempt to find a Unicode
+property for a character with a code point greater than 0x10ffff (the Unicode
+maximum) caused a crash.
+
+8. The alternative matching function, pcre_dfa_exec() misbehaved if it
+encountered a character class with a possessive repeat, for example [a-f]{3}+.
+
+9. When pcretest called pcre_copy_substring() in 32-bit mode, it set the buffer
+length incorrectly, which could result in buffer overflow.
+
+10. Remove redundant line of code (accidentally left in ages ago).
+
+11. Applied C++ patch from Irfan Adilovic to guard 'using std::' directives
+with namespace pcrecpp (Bugzilla #2084).
+
+12. Remove a duplication typo in pcre_tables.c.
+
+13. Fix returned offsets from regexec() when REG_STARTEND is used with a
+starting offset greater than zero.
+
+
 Version 8.40 11-January-2017
 ----------------------------


--- a/pcre/NEWS
+++ b/pcre/NEWS
 News about PCRE releases
 ------------------------

+Release 8.41 13-June-2017
+-------------------------
+
+This is a bug-fix release.
+
+
 Release 8.40 11-January-2017
 ----------------------------


--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@@ -9,18 +9,18 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [40])
+m4_define(pcre_minor, [41])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2017-01-11])
+m4_define(pcre_date, [2017-07-05])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:8:2])
-m4_define(libpcre16_version, [2:8:2])
-m4_define(libpcre32_version, [0:8:0])
-m4_define(libpcreposix_version, [0:4:0])
+m4_define(libpcre_version, [3:9:2])
+m4_define(libpcre16_version, [2:9:2])
+m4_define(libpcre32_version, [0:9:0])
+m4_define(libpcreposix_version, [0:5:0])
 m4_define(libpcrecpp_version, [0:1:0])

 AC_PREREQ(2.57)

--- a/pcre/doc/html/pcrejit.html
+++ b/pcre/doc/html/pcrejit.html
@@ -79,9 +79,12 @@ API that is JIT-specific.
 </P>
 <P>
 If your program may sometimes be linked with versions of PCRE that are older
-than 8.20, but you want to use JIT when it is available, you can test
-the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such
-as PCRE_CONFIG_JIT, for compile-time control of your code.
+than 8.20, but you want to use JIT when it is available, you can test the
+values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such as
+PCRE_CONFIG_JIT, for compile-time control of your code. Also beware that the
+<b>pcre_jit_exec()</b> function was not available at all before 8.32,
+and may not be available at all if PCRE isn't compiled with
+--enable-jit. See the "JIT FAST PATH API" section below for details.
 </P>
 <br><a name="SEC4" href="#TOC1">SIMPLE USE OF JIT</a><br>
 <P>
@@ -119,6 +122,20 @@ when you call <b>pcre_study()</b>:
  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
 </pre>
+If using <b>pcre_jit_exec()</b> and supporting a pre-8.32 version of
+PCRE, you can insert:
+<pre>
+   #if PCRE_MAJOR &#62;= 8 && PCRE_MINOR &#62;= 32
+   pcre_jit_exec(...);
+   #else
+   pcre_exec(...)
+   #endif
+</pre>
+but as described in the "JIT FAST PATH API" section below this assumes
+version 8.32 and later are compiled with --enable-jit, which may
+break.
+<br>
+<br>
 The JIT compiler generates different optimized code for each of the three
 modes (normal, soft partial, hard partial). When <b>pcre_exec()</b> is called,
 the appropriate code is run if it is available. Otherwise, the pattern is
@@ -428,6 +445,36 @@ fast path, and if invalid data is passed, the result is undefined.
 Bypassing the sanity checks and the <b>pcre_exec()</b> wrapping can give
 speedups of more than 10%.
 </P>
+<P>
+Note that the <b>pcre_jit_exec()</b> function is not available in versions of
+PCRE before 8.32 (released in November 2012). If you need to support versions
+that old you must either use the slower <b>pcre_exec()</b>, or switch between
+the two codepaths by checking the values of PCRE_MAJOR and PCRE_MINOR.
+</P>
+<P>
+Due to an unfortunate implementation oversight, even in versions 8.32
+and later there will be no <b>pcre_jit_exec()</b> stub function defined
+when PCRE is compiled with --disable-jit, which is the default, and
+there's no way to detect whether PCRE was compiled with --enable-jit
+via a macro.
+</P>
+<P>
+If you need to support versions older than 8.32, or versions that may
+not build with --enable-jit, you must either use the slower
+<b>pcre_exec()</b>, or switch between the two codepaths by checking the
+values of PCRE_MAJOR and PCRE_MINOR.
+</P>
+<P>
+Switching between the two by checking the version assumes that all the
+versions being targeted are built with --enable-jit. To also support
+builds that may use --disable-jit either <b>pcre_exec()</b> must be
+used, or a compile-time check for JIT via <b>pcre_config()</b> (which
+assumes the runtime environment will be the same), or as the Git
+project decided to do, simply assume that <b>pcre_jit_exec()</b> is
+present in 8.32 or later unless a compile-time flag is provided, see
+the "grep: un-break building with PCRE &#62;= 8.32 without --enable-jit"
+commit in git.git for an example of that.
+</P>
 <br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcreapi</b>(3)
@@ -443,9 +490,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC14" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 17 March 2013
+Last updated: 05 July 2017
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcretest.html
+++ b/pcre/doc/html/pcretest.html
@@ -74,6 +74,11 @@ newline as data characters. However, in some Windows environments character 26
 maximum portability, therefore, it is safest to use only ASCII characters in
 <b>pcretest</b> input files.
 </P>
+<P>
+The input is processed using using C's string functions, so must not
+contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
+treats any bytes other than newline as data characters.
+</P>
 <br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
 <P>
 From release 8.30, two separate PCRE libraries can be built. The original one
@@ -1149,9 +1154,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC17" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 09 February 2014
+Last updated: 23 February 2017
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
@@ -8365,7 +8365,11 @@ AVAILABILITY OF JIT SUPPORT
       If  your program may sometimes be linked with versions of PCRE that are
       older than 8.20, but you want to use JIT when it is available, you  can
       test the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT
-       macro such as PCRE_CONFIG_JIT, for compile-time control of your code.
+       macro such as PCRE_CONFIG_JIT, for compile-time control of  your  code.
+       Also  beware that the pcre_jit_exec() function was not available at all
+       before 8.32, and may not be available at all  if  PCRE  isn't  compiled
+       with  --enable-jit.  See  the  "JIT  FAST  PATH  API" section below for
+       details.


 SIMPLE USE OF JIT
@@ -8407,6 +8411,18 @@ SIMPLE USE OF JIT
         PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
         PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE

+       If using pcre_jit_exec() and supporting a pre-8.32 version of PCRE, you
+       can insert:
+
+          #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+          pcre_jit_exec(...);
+          #else
+          pcre_exec(...)
+          #endif
+
+       but  as described in the "JIT FAST PATH API" section below this assumes
+       version 8.32 and later are compiled with --enable-jit, which may break.
+
       The JIT compiler generates different optimized code  for  each  of  the
       three  modes  (normal, soft partial, hard partial). When pcre_exec() is
       called, the appropriate code is run if it is available. Otherwise,  the
@@ -8696,6 +8712,33 @@ JIT FAST PATH API
       Bypassing  the  sanity  checks  and  the  pcre_exec() wrapping can give
       speedups of more than 10%.

+       Note that the pcre_jit_exec() function is not available in versions  of
+       PCRE  before  8.32  (released in November 2012). If you need to support
+       versions that old you must either use the slower pcre_exec(), or switch
+       between  the  two  codepaths  by  checking the values of PCRE_MAJOR and
+       PCRE_MINOR.
+
+       Due to an unfortunate implementation oversight, even in  versions  8.32
+       and  later  there will be no pcre_jit_exec() stub function defined when
+       PCRE is compiled with --disable-jit, which is the default, and  there's
+       no  way  to  detect  whether  PCRE was compiled with --enable-jit via a
+       macro.
+
+       If you need to support versions older than 8.32, or versions  that  may
+       not   build   with   --enable-jit,  you  must  either  use  the  slower
+       pcre_exec(), or switch between the two codepaths by checking the values
+       of PCRE_MAJOR and PCRE_MINOR.
+
+       Switching  between the two by checking the version assumes that all the
+       versions being targeted are built with --enable-jit.  To  also  support
+       builds that may use --disable-jit either pcre_exec() must be used, or a
+       compile-time check for JIT via pcre_config() (which assumes the runtime
+       environment  will  be  the  same), or as the Git project decided to do,
+       simply assume that pcre_jit_exec() is present in 8.32 or later unless a
+       compile-time  flag  is  provided, see the "grep: un-break building with
+       PCRE >= 8.32 without --enable-jit" commit in git.git for an example  of
+       that.
+

 SEE ALSO

@@ -8711,8 +8754,8 @@ AUTHOR

 REVISION

-       Last updated: 17 March 2013
-       Copyright (c) 1997-2013 University of Cambridge.
+       Last updated: 05 July 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------



--- a/pcre/doc/pcrejit.3
+++ b/pcre/doc/pcrejit.3
-.TH PCREJIT 3 "17 March 2013" "PCRE 8.33"
+.TH PCREJIT 3 "05 July 2017" "PCRE 8.41"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE JUST-IN-TIME COMPILER SUPPORT"
@@ -54,9 +54,12 @@ programs that need the best possible performance, there is also a "fast path"
 API that is JIT-specific.
 .P
 If your program may sometimes be linked with versions of PCRE that are older
-than 8.20, but you want to use JIT when it is available, you can test
-the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such
-as PCRE_CONFIG_JIT, for compile-time control of your code.
+than 8.20, but you want to use JIT when it is available, you can test the
+values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such as
+PCRE_CONFIG_JIT, for compile-time control of your code. Also beware that the
+\fBpcre_jit_exec()\fP function was not available at all before 8.32,
+and may not be available at all if PCRE isn't compiled with
+--enable-jit. See the "JIT FAST PATH API" section below for details.
 .
 .
 .SH "SIMPLE USE OF JIT"
@@ -96,6 +99,19 @@ when you call \fBpcre_study()\fP:
  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
 .sp
+If using \fBpcre_jit_exec()\fP and supporting a pre-8.32 version of
+PCRE, you can insert:
+.sp
+   #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+   pcre_jit_exec(...);
+   #else
+   pcre_exec(...)
+   #endif
+.sp
+but as described in the "JIT FAST PATH API" section below this assumes
+version 8.32 and later are compiled with --enable-jit, which may
+break.
+.sp
 The JIT compiler generates different optimized code for each of the three
 modes (normal, soft partial, hard partial). When \fBpcre_exec()\fP is called,
 the appropriate code is run if it is available. Otherwise, the pattern is
@@ -404,6 +420,32 @@ fast path, and if invalid data is passed, the result is undefined.
 .P
 Bypassing the sanity checks and the \fBpcre_exec()\fP wrapping can give
 speedups of more than 10%.
+.P
+Note that the \fBpcre_jit_exec()\fP function is not available in versions of
+PCRE before 8.32 (released in November 2012). If you need to support versions
+that old you must either use the slower \fBpcre_exec()\fP, or switch between
+the two codepaths by checking the values of PCRE_MAJOR and PCRE_MINOR.
+.P
+Due to an unfortunate implementation oversight, even in versions 8.32
+and later there will be no \fBpcre_jit_exec()\fP stub function defined
+when PCRE is compiled with --disable-jit, which is the default, and
+there's no way to detect whether PCRE was compiled with --enable-jit
+via a macro.
+.P
+If you need to support versions older than 8.32, or versions that may
+not build with --enable-jit, you must either use the slower
+\fBpcre_exec()\fP, or switch between the two codepaths by checking the
+values of PCRE_MAJOR and PCRE_MINOR.
+.P
+Switching between the two by checking the version assumes that all the
+versions being targeted are built with --enable-jit. To also support
+builds that may use --disable-jit either \fBpcre_exec()\fP must be
+used, or a compile-time check for JIT via \fBpcre_config()\fP (which
+assumes the runtime environment will be the same), or as the Git
+project decided to do, simply assume that \fBpcre_jit_exec()\fP is
+present in 8.32 or later unless a compile-time flag is provided, see
+the "grep: un-break building with PCRE >= 8.32 without --enable-jit"
+commit in git.git for an example of that.
 .
 .
 .SH "SEE ALSO"
@@ -426,6 +468,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 17 March 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 05 July 2017
+Copyright (c) 1997-2017 University of Cambridge.
 .fi
--- a/pcre/doc/pcretest.1
+++ b/pcre/doc/pcretest.1
-.TH PCRETEST 1 "09 February 2014" "PCRE 8.35"
+.TH PCRETEST 1 "23 February 2017" "PCRE 8.41"
 .SH NAME
 pcretest - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -50,6 +50,10 @@ newline as data characters. However, in some Windows environments character 26
 (hex 1A) causes an immediate end of file, and no further data is read. For
 maximum portability, therefore, it is safest to use only ASCII characters in
 \fBpcretest\fP input files.
+.P
+The input is processed using using C's string functions, so must not
+contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP
+treats any bytes other than newline as data characters.
 .
 .
 .SH "PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES"
@@ -1151,6 +1155,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 09 February 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 23 February 2017
+Copyright (c) 1997-2017 University of Cambridge.
 .fi
--- a/pcre/doc/pcretest.txt
+++ b/pcre/doc/pcretest.txt
@@ -39,6 +39,10 @@ INPUT DATA FORMAT
       For  maximum  portability,  therefore,  it  is safest to use only ASCII
       characters in pcretest input files.

+       The input is processed using using C's string functions,  so  must  not
+       contain  binary  zeroes, even though in Unix-like environments, fgets()
+       treats any bytes other than newline as data characters.
+

 PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES

@@ -1083,5 +1087,5 @@ AUTHOR

 REVISION

-       Last updated: 09 February 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 23 February 2017
+       Copyright (c) 1997-2017 University of Cambridge.
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@@ -5739,6 +5739,21 @@ for (;; ptr++)
      ptr = p - 1;    /* Character before the next significant one. */
      }

+    /* We also need to skip over (?# comments, which are not dependent on
+    extended mode. */
+
+    if (ptr[1] == CHAR_LEFT_PARENTHESIS && ptr[2] == CHAR_QUESTION_MARK &&
+        ptr[3] == CHAR_NUMBER_SIGN)
+      {
+      ptr += 4;
+      while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++;
+      if (*ptr == CHAR_NULL)
+        {
+        *errorcodeptr = ERR18;
+        goto FAILED;
+        }
+      }
+
    /* If the next character is '+', we have a possessive quantifier. This
    implies greediness, whatever the setting of the PCRE_UNGREEDY option.
    If the next character is '?' this is a minimizing repeat, by default,
@@ -8210,7 +8225,6 @@ for (;; ptr++)

      if (mclength == 1 || req_caseopt == 0)
        {
-        firstchar = mcbuffer[0] | req_caseopt;
        firstchar = mcbuffer[0];
        firstcharflags = req_caseopt;


--- a/pcre/pcre_dfa_exec.c
+++ b/pcre/pcre_dfa_exec.c
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language (but see
 below for why this module is different).

                       Written by Philip Hazel
-           Copyright (c) 1997-2014 University of Cambridge
+           Copyright (c) 1997-2017 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -2625,7 +2625,7 @@ for (;;)
          if (isinclass)
            {
            int max = (int)GET2(ecode, 1 + IMM2_SIZE);
-            if (*ecode == OP_CRPOSRANGE)
+            if (*ecode == OP_CRPOSRANGE && count >= (int)GET2(ecode, 1))
              {
              active_count--;           /* Remove non-match possibility */
              next_active_state--;

--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
@@ -669,7 +669,7 @@ if (ecode == NULL)
    return match((PCRE_PUCHAR)&rdepth, NULL, NULL, 0, NULL, NULL, 1);
  else
    {
-    int len = (char *)&rdepth - (char *)eptr;
+    int len = (int)((char *)&rdepth - (char *)eptr);
    return (len > 0)? -len : len;
    }
  }

--- a/pcre/pcre_internal.h
+++ b/pcre/pcre_internal.h
@@ -2772,6 +2772,9 @@ extern const pcre_uint8  PRIV(ucd_stage1)[];
 extern const pcre_uint16 PRIV(ucd_stage2)[];
 extern const pcre_uint32 PRIV(ucp_gentype)[];
 extern const pcre_uint32 PRIV(ucp_gbtable)[];
+#ifdef COMPILE_PCRE32
+extern const ucd_record  PRIV(dummy_ucd_record)[];
+#endif
 #ifdef SUPPORT_JIT
 extern const int         PRIV(ucp_typerange)[];
 #endif
@@ -2780,10 +2783,16 @@ extern const int         PRIV(ucp_typerange)[];
 /* UCD access macros */

 #define UCD_BLOCK_SIZE 128
-#define GET_UCD(ch) (PRIV(ucd_records) + \
+#define REAL_GET_UCD(ch) (PRIV(ucd_records) + \
        PRIV(ucd_stage2)[PRIV(ucd_stage1)[(int)(ch) / UCD_BLOCK_SIZE] * \
        UCD_BLOCK_SIZE + (int)(ch) % UCD_BLOCK_SIZE])

+#ifdef COMPILE_PCRE32
+#define GET_UCD(ch) ((ch > 0x10ffff)? PRIV(dummy_ucd_record) : REAL_GET_UCD(ch))
+#else
+#define GET_UCD(ch) REAL_GET_UCD(ch)
+#endif
+
 #define UCD_CHARTYPE(ch)    GET_UCD(ch)->chartype
 #define UCD_SCRIPT(ch)      GET_UCD(ch)->script
 #define UCD_CATEGORY(ch)    PRIV(ucp_gentype)[UCD_CHARTYPE(ch)]

--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
--- a/pcre/pcre_scanner_unittest.cc
+++ b/pcre/pcre_scanner_unittest.cc
@@ -57,6 +57,7 @@
 } while (0)

 using std::vector;
+using std::string;
 using pcrecpp::StringPiece;
 using pcrecpp::Scanner;


--- a/pcre/pcre_stringpiece.h.in
+++ b/pcre/pcre_stringpiece.h.in
@@ -52,12 +52,12 @@

 #include <pcre.h>

+namespace pcrecpp {
+
 using std::memcmp;
 using std::strlen;
 using std::string;

-namespace pcrecpp {
-
 class PCRECPP_EXP_DEFN StringPiece {
 private:
  const char*   ptr_;

--- a/pcre/pcre_stringpiece_unittest.cc
+++ b/pcre/pcre_stringpiece_unittest.cc
@@ -24,6 +24,7 @@
  }                                                     \
 } while (0)

+using std::string;
 using pcrecpp::StringPiece;

 static void CheckSTLComparator() {

--- a/pcre/pcre_tables.c
+++ b/pcre/pcre_tables.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2012 University of Cambridge
+           Copyright (c) 1997-2017 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -161,7 +161,7 @@ const pcre_uint32 PRIV(ucp_gbtable[]) = {

   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /*  5 SpacingMark */
   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|   /*  6 L */
-     (1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT),
+     (1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT),

   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)|   /*  7 V */
     (1<<ucp_gbT),

--- a/pcre/pcre_ucd.c
+++ b/pcre/pcre_ucd.c
@@ -38,6 +38,20 @@ const pcre_uint16 PRIV(ucd_stage2)[] = {0};
 const pcre_uint32 PRIV(ucd_caseless_sets)[] = {0};
 #else

+/* If the 32-bit library is run in non-32-bit mode, character values
+greater than 0x10ffff may be encountered. For these we set up a
+special record. */
+
+#ifdef COMPILE_PCRE32
+const ucd_record PRIV(dummy_ucd_record)[] = {{
+  ucp_Common,    /* script */
+  ucp_Cn,        /* type unassigned */
+  ucp_gbOther,   /* grapheme break property */
+  0,             /* case set */
+  0,             /* other case */
+  }};
+#endif
+
 /* When recompiling tables with a new Unicode version, please check the
 types in this structure definition from pcre_internal.h (the actual
 field names will be different):

--- a/pcre/pcrecpp_unittest.cc
+++ b/pcre/pcrecpp_unittest.cc
@@ -43,6 +43,7 @@
 #include <vector>
 #include "pcrecpp.h"

+using std::string;
 using pcrecpp::StringPiece;
 using pcrecpp::RE;
 using pcrecpp::RE_Options;

--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@@ -1804,11 +1804,6 @@ while (ptr < endptr)
        if (line_buffered) fflush(stdout);
        rc = 0;                      /* Had some success */

-        /* If the current match ended past the end of the line (only possible
-        in multiline mode), we are done with this line. */
-
-        if ((unsigned int)offsets[1] > linelength) goto END_ONE_MATCH;
-
        startoffset = offsets[1];    /* Restart after the match */
        if (startoffset <= oldstartoffset)
          {
@@ -1818,6 +1813,22 @@ while (ptr < endptr)
          if (utf8)
            while ((matchptr[startoffset] & 0xc0) == 0x80) startoffset++;
          }
+
+        /* If the current match ended past the end of the line (only possible
+        in multiline mode), we must move on to the line in which it did end
+        before searching for more matches. */
+
+        while (startoffset > (int)linelength)
+          {
+          matchptr = ptr += linelength + endlinelength;
+          filepos += (int)(linelength + endlinelength);
+          linenumber++;
+          startoffset -= (int)(linelength + endlinelength);
+          t = end_of_line(ptr, endptr, &endlinelength);
+          linelength = t - ptr - endlinelength;
+          length = (size_t)(endptr - ptr);
+          }
+
        goto ONLY_MATCHING_RESTART;
        }
      }
@@ -3179,9 +3190,11 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
  cp->hint = pcre_study(cp->compiled, study_options, &error);
  if (error != NULL)
    {
-    char s[16];
-    if (patterns->next == NULL) s[0] = 0; else sprintf(s, " number %d", j);
-    fprintf(stderr, "pcregrep: Error while studying regex%s: %s\n", s, error);
+    if (patterns->next == NULL)
+      fprintf(stderr, "pcregrep: Error while studying regex: %s\n", error);
+    else
+      fprintf(stderr, "pcregrep: Error while studying regex number %d: %s\n",
+        j, error);
    goto EXIT2;
    }
 #ifdef SUPPORT_PCREGREP_JIT

--- a/pcre/pcreposix.c
+++ b/pcre/pcreposix.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2016 University of Cambridge
+           Copyright (c) 1997-2017 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -389,8 +389,8 @@ if (rc >= 0)
    {
    for (i = 0; i < (size_t)rc; i++)
      {
-      pmatch[i].rm_so = ovector[i*2];
-      pmatch[i].rm_eo = ovector[i*2+1];
+      pmatch[i].rm_so = ovector[i*2] + so;
+      pmatch[i].rm_eo = ovector[i*2+1] + so;
      }
    if (allocated_ovector) free(ovector);
    for (; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1;

--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
@@ -177,7 +177,7 @@ that differ in their output from isprint() even in the "C" locale. */
 #define PRINTABLE(c) ((c) >= 32 && (c) < 127)
 #endif

-#define PRINTOK(c) (locale_set? isprint(c) : PRINTABLE(c))
+#define PRINTOK(c) (locale_set? (((c) < 256) && isprint(c)) : PRINTABLE(c))

 /* Posix support is disabled in 16 or 32 bit only mode. */
 #if !defined SUPPORT_PCRE8 && !defined NOPOSIX
@@ -426,11 +426,11 @@ argument, the casting might be incorrectly applied. */
 #define PCRE_COPY_NAMED_SUBSTRING32(rc, re, bptr, offsets, count, \
    namesptr, cbuffer, size) \
  rc = pcre32_copy_named_substring((pcre32 *)re, (PCRE_SPTR32)bptr, offsets, \
-    count, (PCRE_SPTR32)namesptr, (PCRE_UCHAR32 *)cbuffer, size/2)
+    count, (PCRE_SPTR32)namesptr, (PCRE_UCHAR32 *)cbuffer, size/4)

 #define PCRE_COPY_SUBSTRING32(rc, bptr, offsets, count, i, cbuffer, size) \
  rc = pcre32_copy_substring((PCRE_SPTR32)bptr, offsets, count, i, \
-    (PCRE_UCHAR32 *)cbuffer, size/2)
+    (PCRE_UCHAR32 *)cbuffer, size/4)

 #define PCRE_DFA_EXEC32(count, re, extra, bptr, len, start_offset, options, \
    offsets, size_offsets, workspace, size_workspace) \
@@ -4834,7 +4834,16 @@ while (!done)
        continue;

        case 'O':
-        while(isdigit(*p)) n = n * 10 + *p++ - '0';
+        while(isdigit(*p))
+          {
+          if (n > (INT_MAX-10)/10)   /* Hack to stop fuzzers */
+            {
+            printf("** \\O argument is too big\n");
+            yield = 1;
+            goto EXIT;
+            }
+          n = n * 10 + *p++ - '0';
+          }
        if (n > size_offsets_max)
          {
          size_offsets_max = n;

--- a/pcre/testdata/testinput1
+++ b/pcre/testdata/testinput1
@@ -5739,4 +5739,7 @@ AbcdCBefgBhiBqz
 /(?=.*X)X$/ 
    \  X
     
+/X+(?#comment)?/
+    >XXX<
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testinput12
+++ b/pcre/testdata/testinput12
@@ -104,4 +104,6 @@ and a couple of things that are different with JIT. --/
 /(.|.)*?bx/
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax

+/((?(?!))x)(?'name')(?1)/S++
+
 /-- End of testinput12 --/
--- a/pcre/testdata/testinput15
+++ b/pcre/testdata/testinput15
@@ -363,4 +363,7 @@ correctly, but that messes up comparisons). --/

 /abc/89

+//8+L
+    \xf1\xad\xae\xae
+
 /-- End of testinput15 --/
--- a/pcre/testdata/testinput8
+++ b/pcre/testdata/testinput8
@@ -4845,4 +4845,7 @@
    aaa\D
    a\D

+/(02-)?[0-9]{3}-[0-9]{3}/
+    02-123-123
+
 /-- End of testinput8 --/
--- a/pcre/testdata/testoutput1
+++ b/pcre/testdata/testoutput1
@@ -9442,4 +9442,8 @@ No match
    \  X
 0: X
     
+/X+(?#comment)?/
+    >XXX<
+ 0: X
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testoutput12
+++ b/pcre/testdata/testoutput12
@@ -201,4 +201,6 @@ No match, mark = m (JIT)
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax
 Error -8 (match limit exceeded)

+/((?(?!))x)(?'name')(?1)/S++
+
 /-- End of testinput12 --/
--- a/pcre/testdata/testoutput15
+++ b/pcre/testdata/testoutput15
@@ -1136,4 +1136,9 @@ Failed: setting UTF is disabled by the application at offset 0
 /abc/89
 Failed: setting UTF is disabled by the application at offset 0

+//8+L
+    \xf1\xad\xae\xae
+ 0: 
+ 0+ \x{6dbae}
+
 /-- End of testinput15 --/
--- a/pcre/testdata/testoutput8
+++ b/pcre/testdata/testoutput8
@@ -7801,4 +7801,8 @@ No match
 ** Show all captures ignored after DFA matching
 0: a

+/(02-)?[0-9]{3}-[0-9]{3}/
+    02-123-123
+ 0: 02-123-123
+
 /-- End of testinput8 --/