pcre-8.35

2a590514 · Sergei Golubchik · c9c9f513 · 8cc5973f · 2a590514 · 2a590514
Commit 2a590514 authored Jun 05, 2014 by Sergei Golubchik
79 changed files
--- a/pcre/AUTHORS
+++ b/pcre/AUTHORS
@@ -8,7 +8,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2013 University of Cambridge
+Copyright (c) 1997-2014 University of Cambridge
 All rights reserved


@@ -19,7 +19,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2013 Zoltan Herczeg
+Copyright(c) 2010-2014 Zoltan Herczeg
 All rights reserved.


@@ -30,7 +30,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2013 Zoltan Herczeg
+Copyright(c) 2009-2014 Zoltan Herczeg
 All rights reserved.



--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
 ChangeLog for PCRE
 ------------------

+Version 8.35 04-April-2014
+--------------------------
+
+1.  A new flag is set, when property checks are present in an XCLASS.
+    When this flag is not set, PCRE can perform certain optimizations
+    such as studying these XCLASS-es.
+
+2.  The auto-possessification of character sets were improved: a normal
+    and an extended character set can be compared now. Furthermore
+    the JIT compiler optimizes more character set checks.
+
+3.  Got rid of some compiler warnings for potentially uninitialized variables
+    that show up only when compiled with -O2.
+
+4.  A pattern such as (?=ab\K) that uses \K in an assertion can set the start
+    of a match later then the end of the match. The pcretest program was not
+    handling the case sensibly - it was outputting from the start to the next
+    binary zero. It now reports this situation in a message, and outputs the
+    text from the end to the start.
+
+5.  Fast forward search is improved in JIT. Instead of the first three
+    characters, any three characters with fixed position can be searched.
+    Search order: first, last, middle.
+
+6.  Improve character range checks in JIT. Characters are read by an inprecise
+    function now, which returns with an unknown value if the character code is
+    above a certain treshold (e.g: 256). The only limitation is that the value
+    must be bigger than the treshold as well. This function is useful, when
+    the characters above the treshold are handled in the same way.
+
+7.  The macros whose names start with RAWUCHAR are placeholders for a future
+    mode in which only the bottom 21 bits of 32-bit data items are used. To
+    make this more memorable for those maintaining the code, the names have
+    been changed to start with UCHAR21, and an extensive comment has been added
+    to their definition.
+
+8.  Add missing (new) files sljitNativeTILEGX.c and sljitNativeTILEGX-encoder.c
+    to the export list in Makefile.am (they were accidentally omitted from the
+    8.34 tarball).
+
+9.  The informational output from pcretest used the phrase "starting byte set"
+    which is inappropriate for the 16-bit and 32-bit libraries. As the output
+    for "first char" and "need char" really means "non-UTF-char", I've changed
+    "byte" to "char", and slightly reworded the output. The documentation about
+    these values has also been (I hope) clarified.
+
+10. Another JIT related optimization: use table jumps for selecting the correct
+    backtracking path, when more than four alternatives are present inside a
+    bracket.
+
+11. Empty match is not possible, when the minimum length is greater than zero,
+    and there is no \K in the pattern. JIT should avoid empty match checks in
+    such cases.
+
+12. In a caseless character class with UCP support, when a character with more
+    than one alternative case was not the first character of a range, not all
+    the alternative cases were added to the class. For example, s and \x{17f}
+    are both alternative cases for S: the class [RST] was handled correctly,
+    but [R-T] was not.
+
+13. The configure.ac file always checked for pthread support when JIT was
+    enabled. This is not used in Windows, so I have put this test inside a
+    check for the presence of windows.h (which was already tested for).
+
+14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT.
+    The algorithm provides a way to skip certain starting offsets, and usually
+    faster than linear prefix searches.
+
+15. Change 13 for 8.20 updated RunTest to check for the 'fr' locale as well
+    as for 'fr_FR' and 'french'. For some reason, however, it then used the
+    Windows-specific input and output files, which have 'french' screwed in.
+    So this could never have worked. One of the problems with locales is that
+    they aren't always the same. I have now updated RunTest so that it checks
+    the output of the locale test (test 3) against three different output
+    files, and it allows the test to pass if any one of them matches. With luck
+    this should make the test pass on some versions of Solaris where it was
+    failing. Because of the uncertainty, the script did not used to stop if
+    test 3 failed; it now does. If further versions of a French locale ever
+    come to light, they can now easily be added.
+
+16. If --with-pcregrep-bufsize was given a non-integer value such as "50K",
+    there was a message during ./configure, but it did not stop. This now
+    provokes an error. The invalid example in README has been corrected.
+    If a value less than the minimum is given, the minimum value has always
+    been used, but now a warning is given.
+
+17. If --enable-bsr-anycrlf was set, the special 16/32-bit test failed. This
+    was a bug in the test system, which is now fixed. Also, the list of various
+    configurations that are tested for each release did not have one with both
+    16/32 bits and --enable-bar-anycrlf. It now does.
+
+18. pcretest was missing "-C bsr" for displaying the \R default setting.
+
+19. Little endian PowerPC systems are supported now by the JIT compiler.
+
+20. The fast forward newline mechanism could enter to an infinite loop on
+    certain invalid UTF-8 input. Although we don't support these cases
+    this issue can be fixed by a performance optimization.
+
+21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does
+    not take account if existing stack usage. There is now a new global
+    variable called pcre_stack_guard that can be set to point to an external
+    function to check stack availability. It is called at the start of
+    processing every parenthesized group.
+
+22. A typo in the code meant that in ungreedy mode the max/min qualifier
+    behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did
+    not match "ab".
+
+23. When UTF was disabled, the JIT program reported some incorrect compile
+    errors. These messages are silenced now.
+
+24. Experimental support for ARM-64 and MIPS-64 has been added to the JIT
+    compiler.
+
+25. Change all the temporary files used in RunGrepTest to be different to those
+    used by RunTest so that the tests can be run simultaneously, for example by
+    "make -j check".
+
+
 Version 8.34 15-December-2013
 -----------------------------


--- a/pcre/INSTALL
+++ b/pcre/INSTALL
@@ -12,8 +12,8 @@ without warranty of any kind.
 Basic Installation
 ==================

-   Briefly, the shell commands `./configure; make; make install' should
-configure, build, and install this package.  The following
+   Briefly, the shell command `./configure && make && make install'
+should configure, build, and install this package.  The following
 more-detailed instructions are generic; see the `README' file for
 instructions specific to this package.  Some packages provide this
 `INSTALL' file but do not implement all of the features documented

--- a/pcre/LICENCE
+++ b/pcre/LICENCE
@@ -24,7 +24,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2013 University of Cambridge
+Copyright (c) 1997-2014 University of Cambridge
 All rights reserved.


@@ -35,7 +35,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2013 Zoltan Herczeg
+Copyright(c) 2010-2014 Zoltan Herczeg
 All rights reserved.


@@ -46,7 +46,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2013 Zoltan Herczeg
+Copyright(c) 2009-2014 Zoltan Herczeg
 All rights reserved.



--- a/pcre/NEWS
+++ b/pcre/NEWS
 News about PCRE releases
 ------------------------

+Release 8.35 04-April-2014
+--------------------------
+
+There have been performance improvements for classes containing non-ASCII
+characters and the "auto-possessification" feature has been extended. Other
+minor improvements have been implemented and bugs fixed. There is a new callout
+feature to enable applications to do detailed stack checks at compile time, to
+avoid running out of stack for deeply nested parentheses. The JIT compiler has
+been extended with experimental support for ARM-64, MIPS-64, and PPC-LE.
+
+
 Release 8.34 15-December-2013
 -----------------------------


--- a/pcre/README
+++ b/pcre/README
@@ -85,11 +85,12 @@ documentation is supplied in two other forms:
  1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
     doc/pcretest.txt in the source distribution. The first of these is a
     concatenation of the text forms of all the section 3 man pages except
-     those that summarize individual functions. The other two are the text
-     forms of the section 1 man pages for the pcregrep and pcretest commands.
-     These text forms are provided for ease of scanning with text editors or
-     similar tools. They are installed in <prefix>/share/doc/pcre, where
-     <prefix> is the installation prefix (defaulting to /usr/local).
+     the listing of pcredemo.c and those that summarize individual functions.
+     The other two are the text forms of the section 1 man pages for the
+     pcregrep and pcretest commands. These text forms are provided for ease of
+     scanning with text editors or similar tools. They are installed in
+     <prefix>/share/doc/pcre, where <prefix> is the installation prefix
+     (defaulting to /usr/local).

  2. A set of files containing all the documentation in HTML form, hyperlinked
     in various ways, and rooted in a file called index.html, is distributed in
@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page.

  Of course, the relevant libraries must be installed on your system.

-. The default size of internal buffer used by pcregrep can be set by, for
-  example:
+. The default size (in bytes) of the internal buffer used by pcregrep can be
+  set by, for example:

-  --with-pcregrep-bufsize=50K
+  --with-pcregrep-bufsize=51200

-  The default value is 20K.
+  The value must be a plain integer. The default is 20480.

 . It is possible to compile pcretest so that it links with the libreadline
  or libedit libraries, by specifying, respectively,
@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 05 November 2013
+Last updated: 17 January 2014
--- a/pcre/RunGrepTest
+++ b/pcre/RunGrepTest
--- a/pcre/RunTest
+++ b/pcre/RunTest
@@ -31,6 +31,11 @@
 # except test 10. Whatever order the arguments are in, the tests are always run
 # in numerical order.
 #
+# The special argument "3S" runs test 3, stopping if it fails. Test 3 is the
+# locale test, and failure usually means there's an issue with the locale
+# rather than a bug in PCRE, so normally subsequent tests are run. "3S" is
+# useful when you want to debug or update the test.
+#
 # Inappropriate tests are automatically skipped (with a comment to say so): for
 # example, if JIT support is not compiled, test 12 is skipped, whereas if JIT
 # support is compiled, test 13 is skipped.
@@ -458,8 +463,9 @@ fi

 # Locale-specific tests, provided that either the "fr_FR" or the "french"
 # locale is available. The former is the Unix-like standard; the latter is
-# for Windows. Another possibility is "fr", which needs to be run against
-# the Windows-specific input and output files.
+# for Windows. Another possibility is "fr". Unfortunately, different versions
+# of the French locale give different outputs for some items. This test passes
+# if the output matches any one of the alternative output files.

 if [ $do3 = yes ] ; then
  locale -a | grep '^fr_FR$' >/dev/null
@@ -467,20 +473,28 @@ if [ $do3 = yes ] ; then
    locale=fr_FR
    infile=$testdata/testinput3
    outfile=$testdata/testoutput3
+    outfile2=$testdata/testoutput3A
+    outfile3=$testdata/testoutput3B
  else
    infile=test3input
    outfile=test3output
+    outfile2=test3outputA
+    outfile3=test3outputB
    locale -a | grep '^french$' >/dev/null
    if [ $? -eq 0 ] ; then
      locale=french
      sed 's/fr_FR/french/' $testdata/testinput3 >test3input
      sed 's/fr_FR/french/' $testdata/testoutput3 >test3output
+      sed 's/fr_FR/french/' $testdata/testoutput3A >test3outputA
+      sed 's/fr_FR/french/' $testdata/testoutput3B >test3outputB
    else
      locale -a | grep '^fr$' >/dev/null
      if [ $? -eq 0 ] ; then
        locale=fr
-        sed 's/fr_FR/fr/' $testdata/wintestinput3 >test3input
-        sed 's/fr_FR/fr/' $testdata/wintestoutput3 >test3output
+        sed 's/fr_FR/fr/' $testdata/intestinput3 >test3input
+        sed 's/fr_FR/fr/' $testdata/intestoutput3 >test3output
+        sed 's/fr_FR/fr/' $testdata/intestoutput3A >test3outputA
+        sed 's/fr_FR/fr/' $testdata/intestoutput3B >test3outputB
      else
        locale=
      fi
@@ -492,18 +506,20 @@ if [ $do3 = yes ] ; then
    for opt in "" "-s" $jitopt; do
      $sim $valgrind ./pcretest -q $bmode $opt $infile testtry
      if [ $? = 0 ] ; then
-        $cf $outfile testtry
-        if [ $? != 0 ] ; then
-          echo " "
-          echo "Locale test did not run entirely successfully."
-          echo "This usually means that there is a problem with the locale"
-          echo "settings rather than a bug in PCRE."
-          break;
-        else
+        if $cf $outfile testtry >teststdout || \
+           $cf $outfile2 testtry >teststdout || \
+           $cf $outfile3 testtry >teststdout
+        then
          if [ "$opt" = "-s" ] ; then echo "  OK with study"
          elif [ "$opt" = "-s+" ] ; then echo "  OK with JIT study"
          else echo "  OK"
          fi
+        else
+          echo "** Locale test did not run successfully. The output did not match"
+          echo "   $outfile, $outfile2 or $outfile3."
+          echo "   This may mean that there is a problem with the locale settings rather"
+          echo "   than a bug in PCRE."
+          exit 1
        fi
      else exit 1
      fi
@@ -989,6 +1005,6 @@ fi
 done

 # Clean up local working files
-rm -f test3input test3output testNinput testsaved* teststderr teststdout testtry
+rm -f test3input test3output test3outputA testNinput testsaved* teststderr teststdout testtry

 # End
--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [34])
+m4_define(pcre_minor, [35])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2013-12-15])
+m4_define(pcre_date, [2014-04-04])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:2:2])
-m4_define(libpcre16_version, [2:2:2])
-m4_define(libpcre32_version, [0:2:0])
+m4_define(libpcre_version, [3:3:2])
+m4_define(libpcre16_version, [2:3:2])
+m4_define(libpcre32_version, [0:3:0])
 m4_define(libpcreposix_version, [0:2:0])
 m4_define(libpcrecpp_version, [0:0:0])

@@ -248,7 +248,7 @@ AC_ARG_ENABLE(pcregrep-libbz2,
 # Handle --with-pcregrep-bufsize=N
 AC_ARG_WITH(pcregrep-bufsize,
              AS_HELP_STRING([--with-pcregrep-bufsize=N],
-                             [pcregrep buffer size (default=20480)]),
+                             [pcregrep buffer size (default=20480, minimum=8192)]),
              , with_pcregrep_bufsize=20480)

 # Handle --enable-pcretest-libedit
@@ -461,7 +461,8 @@ sure both macros are undefined; an emulation function will then be used. */])

 # Checks for header files.
 AC_HEADER_STDC
-AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h windows.h)
+AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h)
+AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1])

 # The files below are C++ header files.
 pcre_have_type_traits="0"
@@ -686,11 +687,15 @@ if test "$enable_pcre32" = "yes"; then
    Define to any value to enable the 32 bit PCRE library.])
 fi

+# Unless running under Windows, JIT support requires pthreads.
+
 if test "$enable_jit" = "yes"; then
-  AX_PTHREAD([], [AC_MSG_ERROR([JIT support requires pthreads])])
-  CC="$PTHREAD_CC"
-  CFLAGS="$PTHREAD_CFLAGS $CFLAGS"
-  LIBS="$PTHREAD_LIBS $LIBS"
+  if test "$HAVE_WINDOWS_H" != "1"; then
+    AX_PTHREAD([], [AC_MSG_ERROR([JIT support requires pthreads])])
+    CC="$PTHREAD_CC"
+    CFLAGS="$PTHREAD_CFLAGS $CFLAGS"
+    LIBS="$PTHREAD_LIBS $LIBS"
+  fi
  AC_DEFINE([SUPPORT_JIT], [], [
    Define to any value to enable support for Just-In-Time compiling.])
 else
@@ -739,7 +744,12 @@ if test "$enable_pcregrep_libbz2" = "yes"; then
 fi

 if test $with_pcregrep_bufsize -lt 8192 ; then
+  AC_MSG_WARN([$with_pcregrep_bufsize is too small for --with-pcregrep-bufsize; using 8192])
  with_pcregrep_bufsize="8192"
+else
+  if test $? -gt 1 ; then
+  AC_MSG_ERROR([Bad value for  --with-pcregrep-bufsize])
+  fi
 fi

 AC_DEFINE_UNQUOTED([PCREGREP_BUFSIZE], [$with_pcregrep_bufsize], [

--- a/pcre/doc/html/README.txt
+++ b/pcre/doc/html/README.txt
@@ -85,11 +85,12 @@ documentation is supplied in two other forms:
  1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
     doc/pcretest.txt in the source distribution. The first of these is a
     concatenation of the text forms of all the section 3 man pages except
-     those that summarize individual functions. The other two are the text
-     forms of the section 1 man pages for the pcregrep and pcretest commands.
-     These text forms are provided for ease of scanning with text editors or
-     similar tools. They are installed in <prefix>/share/doc/pcre, where
-     <prefix> is the installation prefix (defaulting to /usr/local).
+     the listing of pcredemo.c and those that summarize individual functions.
+     The other two are the text forms of the section 1 man pages for the
+     pcregrep and pcretest commands. These text forms are provided for ease of
+     scanning with text editors or similar tools. They are installed in
+     <prefix>/share/doc/pcre, where <prefix> is the installation prefix
+     (defaulting to /usr/local).

  2. A set of files containing all the documentation in HTML form, hyperlinked
     in various ways, and rooted in a file called index.html, is distributed in
@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page.

  Of course, the relevant libraries must be installed on your system.

-. The default size of internal buffer used by pcregrep can be set by, for
-  example:
+. The default size (in bytes) of the internal buffer used by pcregrep can be
+  set by, for example:

-  --with-pcregrep-bufsize=50K
+  --with-pcregrep-bufsize=51200

-  The default value is 20K.
+  The value must be a plain integer. The default is 20480.

 . It is possible to compile pcretest so that it links with the libreadline
  or libedit libraries, by specifying, respectively,
@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 05 November 2013
+Last updated: 17 January 2014
--- a/pcre/doc/html/pcre.html
+++ b/pcre/doc/html/pcre.html
@@ -154,8 +154,11 @@ page.
 The user documentation for PCRE comprises a number of different sections. In
 the "man" format, each of these is a separate "man page". In the HTML format,
 each is a separate page, linked from the index page. In the plain text format,
-all the sections, except the <b>pcredemo</b> section, are concatenated, for ease
-of searching. The sections are as follows:
+the descriptions of the <b>pcregrep</b> and <b>pcretest</b> programs are in files
+called <b>pcregrep.txt</b> and <b>pcretest.txt</b>, respectively. The remaining
+sections, except for the <b>pcredemo</b> section (which is a program listing),
+are concatenated in <b>pcre.txt</b>, for ease of searching. The sections are as
+follows:
 <pre>
  pcre              this document
  pcre-config       show PCRE installation configuration information
@@ -182,8 +185,8 @@ of searching. The sections are as follows:
  pcretest          description of the <b>pcretest</b> testing command
  pcreunicode       discussion of Unicode and UTF-8/16/32 support
 </pre>
-In addition, in the "man" and HTML formats, there is a short page for each
-C library function, listing its arguments and results.
+In the "man" and HTML formats, there is also a short page for each C library
+function, listing its arguments and results.
 </P>
 <br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
 <P>
@@ -201,9 +204,9 @@ two digits 10, at the domain cam.ac.uk.
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 13 May 2013
+Last updated: 08 January 2014
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcreapi.html
+++ b/pcre/doc/html/pcreapi.html
@@ -166,6 +166,9 @@ man page, in case the conversion went wrong.
 <br>
 <br>
 <b>int (*pcre_callout)(pcre_callout_block *);</b>
+<br>
+<br>
+<b>int (*pcre_stack_guard)(void);</b>
 </P>
 <br><a name="SEC5" href="#TOC1">PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
 <P>
@@ -324,6 +327,15 @@ by the caller to a "callout" function, which PCRE will then call at specified
 points during a matching operation. Details are given in the
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
+</P>
+<P>
+The global variable <b>pcre_stack_guard</b> initially contains NULL. It can be
+set by the caller to a function that is called by PCRE whenever it starts
+to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
+uses recursive function calls, which use up the system stack. This function is
+provided so that applications with restricted stacks can force a compilation
+error if the stack runs out. The function should return zero if all is well, or
+non-zero to force an error.
 <a name="newlines"></a></P>
 <br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
 <P>
@@ -369,7 +381,8 @@ controlled in a similar way, but by separate options.
 The PCRE functions can be used in multi-threading applications, with the
 proviso that the memory management functions pointed to by <b>pcre_malloc</b>,
 <b>pcre_free</b>, <b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the
-callout function pointed to by <b>pcre_callout</b>, are shared by all threads.
+callout and stack-checking functions pointed to by <b>pcre_callout</b> and
+<b>pcre_stack_guard</b>, are shared by all threads.
 </P>
 <P>
 The compiled form of a regular expression is not altered during matching, so
@@ -489,7 +502,10 @@ documentation.
 The output is a long integer that gives the maximum depth of nesting of
 parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
 of system stack used when a pattern is compiled. It is specified when PCRE is
-built; the default is 250.
+built; the default is 250. This limit does not take into account the stack that
+may already be used by the calling application. For finer control over
+compilation stack usage, you can set a pointer to an external checking function
+in <b>pcre_stack_guard</b>.
 <pre>
  PCRE_CONFIG_MATCH_LIMIT
 </pre>
@@ -1008,6 +1024,8 @@ have fallen out of use. To avoid confusion, they have not been re-used.
  81  missing opening brace after \o
  82  parentheses are too deeply nested
  83  invalid range in character class
+  84  group name must start with a non-digit
+  85  parentheses are too deeply nested (stack check)
 </pre>
 The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
 be used if the limits were changed when PCRE was built.
@@ -1265,12 +1283,15 @@ information call is provided for internal use by the <b>pcre_study()</b>
 function. External callers can cause PCRE to use its internal tables by passing
 a NULL table pointer.
 <pre>
-  PCRE_INFO_FIRSTBYTE
+  PCRE_INFO_FIRSTBYTE (deprecated)
 </pre>
 Return information about the first data unit of any matched string, for a
-non-anchored pattern. (The name of this option refers to the 8-bit library,
-where data units are bytes.) The fourth argument should point to an <b>int</b>
-variable.
+non-anchored pattern. The name of this option refers to the 8-bit library,
+where data units are bytes. The fourth argument should point to an <b>int</b>
+variable. Negative values are used for special cases. However, this means that
+when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of
+characters cannot be returned. For this reason, this value is deprecated; use
+PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead.
 </P>
 <P>
 If there is a fixed first value, for example, the letter "c" from a pattern
@@ -1293,12 +1314,43 @@ starts with "^", or
 -1 is returned, indicating that the pattern matches only at the start of a
 subject string or after any newline within the string. Otherwise -2 is
 returned. For anchored patterns, -2 is returned.
+<pre>
+  PCRE_INFO_FIRSTCHARACTER
+</pre>
+Return the value of the first data unit (non-UTF character) of any matched
+string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1;
+otherwise return 0. The fourth argument should point to an <b>uint_t</b>
+variable.
 </P>
 <P>
-Since for the 32-bit library using the non-UTF-32 mode, this function is unable
-to return the full 32-bit range of the character, this value is deprecated;
-instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
-should be used.
+In the 8-bit library, the value is always less than 256. In the 16-bit library
+the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
+can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
+<pre>
+  PCRE_INFO_FIRSTCHARACTERFLAGS
+</pre>
+Return information about the first data unit of any matched string, for a
+non-anchored pattern. The fourth argument should point to an <b>int</b>
+variable.
+</P>
+<P>
+If there is a fixed first value, for example, the letter "c" from a pattern
+such as (cat|cow|coyote), 1 is returned, and the character value can be
+retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and
+if either
+<br>
+<br>
+(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
+starts with "^", or
+<br>
+<br>
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
+(if it were set, the pattern would be anchored),
+<br>
+<br>
+2 is returned, indicating that the pattern matches only at the start of a
+subject string or after any newline within the string. Otherwise 0 is
+returned. For anchored patterns, 0 is returned.
 <pre>
  PCRE_INFO_FIRSTTABLE
 </pre>
@@ -1508,44 +1560,6 @@ above). The format of the <i>study_data</i> block is private, but its length
 is made available via this option so that it can be saved and restored (see the
 <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
 documentation for details).
-<pre>
-  PCRE_INFO_FIRSTCHARACTERFLAGS
-</pre>
-Return information about the first data unit of any matched string, for a
-non-anchored pattern. The fourth argument should point to an <b>int</b>
-variable.
-</P>
-<P>
-If there is a fixed first value, for example, the letter "c" from a pattern
-such as (cat|cow|coyote), 1 is returned, and the character value can be
-retrieved using PCRE_INFO_FIRSTCHARACTER.
-</P>
-<P>
-If there is no fixed first value, and if either
-<br>
-<br>
-(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
-starts with "^", or
-<br>
-<br>
-(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
-(if it were set, the pattern would be anchored),
-<br>
-<br>
-2 is returned, indicating that the pattern matches only at the start of a
-subject string or after any newline within the string. Otherwise 0 is
-returned. For anchored patterns, 0 is returned.
-<pre>
-  PCRE_INFO_FIRSTCHARACTER
-</pre>
-Return the fixed first character value in the situation where
-PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
-argument should point to an <b>uint_t</b> variable.
-</P>
-<P>
-In the 8-bit library, the value is always less than 256. In the 16-bit library
-the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
-can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
 <pre>
  PCRE_INFO_REQUIREDCHARFLAGS
 </pre>
@@ -2899,9 +2913,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 12 November 2013
+Last updated: 09 February 2014
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcregrep.html
+++ b/pcre/doc/html/pcregrep.html
@@ -37,8 +37,10 @@ man page, in case the conversion went wrong.
 <b>pcregrep</b> searches files for character patterns, in the same way as other
 grep commands do, but it uses the PCRE regular expression library to support
 patterns that are compatible with the regular expressions of Perl 5. See
+<a href="pcresyntax.html"><b>pcresyntax</b>(3)</a>
+for a quick-reference summary of pattern syntax, or
 <a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
-for a full description of syntax and semantics of the regular expressions
+for a full description of the syntax and semantics of the regular expressions
 that PCRE supports.
 </P>
 <P>
@@ -748,9 +750,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC14" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 13 September 2012
+Last updated: 03 April 2014
 <br>
-Copyright &copy; 1997-2012 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcrepattern.html
+++ b/pcre/doc/html/pcrepattern.html
@@ -1003,7 +1003,9 @@ matches "foobar", the first substring is still set to "foo".
 <P>
 Perl documents that the use of \K within assertions is "not well defined". In
 PCRE, \K is acted upon when it occurs inside positive assertions, but is
-ignored in negative assertions.
+ignored in negative assertions. Note that when a pattern such as (?=ab\K)
+matches, the reported start of the match can be greater than the end of the
+match.
 <a name="smallassertions"></a></P>
 <br><b>
 Simple assertions
@@ -2990,19 +2992,22 @@ match does not always guarantee that a match must be at this starting point.
 <P>
 Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
 unless PCRE's start-of-match optimizations are turned off, as shown in this
-<b>pcretest</b> example:
+output from <b>pcretest</b>:
 <pre>
    re&#62; /(*COMMIT)abc/
  data&#62; xyzabc
   0: abc
-  xyzabc\Y
+  data&#62; xyzabc\Y
  No match
 </pre>
-PCRE knows that any match must start with "a", so the optimization skips along
-the subject to "a" before running the first match attempt, which succeeds. When
-the optimization is disabled by the \Y escape in the second subject, the match
-starts at "x" and so the (*COMMIT) causes it to fail without trying any other
-starting points.
+For this pattern, PCRE knows that any match must start with "a", so the
+optimization skips along the subject to "a" before applying the pattern to the
+first set of data. The match attempt then succeeds. In the second set of data,
+the escape sequence \Y is interpreted by the <b>pcretest</b> program. It causes
+the PCRE_NO_START_OPTIMIZE option to be set when <b>pcre_exec()</b> is called.
+This disables the optimization that skips along to the first character. The
+pattern is now applied starting at "x", and so the (*COMMIT) causes the match
+to fail without trying any other starting points.
 <pre>
  (*PRUNE) or (*PRUNE:NAME)
 </pre>
@@ -3221,9 +3226,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 December 2013
+Last updated: 08 January 2014
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcresyntax.html
+++ b/pcre/doc/html/pcresyntax.html
@@ -29,13 +29,13 @@ man page, in case the conversion went wrong.
 <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
 <li><a name="TOC15" href="#SEC15">COMMENT</a>
 <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
-<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
-<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
-<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
-<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
-<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
+<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
+<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
+<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
+<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
+<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
 <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
 <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
 <li><a name="TOC26" href="#SEC26">AUTHOR</a>
@@ -339,7 +339,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
 <P>
 <pre>
  \K          reset start of match
-</PRE>
+</pre>
+\K is honoured in positive assertions, but ignored in negative ones.
 </P>
 <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
 <P>
@@ -382,11 +383,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
  (?x)            extended (ignore white space)
  (?-...)         unset option(s)
 </pre>
-The following are recognized only at the start of a pattern or after one of the
-newline-setting options with similar syntax:
+The following are recognized only at the very start of a pattern or after one
+of the newline or \R options with similar syntax. More than one of them may
+appear.
 <pre>
  (*LIMIT_MATCH=d) set the match limit to d (decimal number)
  (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
+  (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
  (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
  (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
  (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
@@ -397,7 +400,28 @@ newline-setting options with similar syntax:
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
 limits set by the caller of pcre_exec(), not increase them.
 </P>
-<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
+<P>
+These are recognized only at the very start of the pattern or after option
+settings with a similar syntax.
+<pre>
+  (*CR)           carriage return only
+  (*LF)           linefeed only
+  (*CRLF)         carriage return followed by linefeed
+  (*ANYCRLF)      all three of the above
+  (*ANY)          any Unicode newline sequence
+</PRE>
+</P>
+<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
+<P>
+These are recognized only at the very start of the pattern or after option
+setting with a similar syntax.
+<pre>
+  (*BSR_ANYCRLF)  CR, LF, or CRLF
+  (*BSR_UNICODE)  any Unicode newline sequence
+</PRE>
+</P>
+<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
  (?=...)         positive look ahead
@@ -407,7 +431,7 @@ limits set by the caller of pcre_exec(), not increase them.
 </pre>
 Each top-level branch of a look behind must be of a fixed length.
 </P>
-<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
  \n              reference by number (can be ambiguous)
@@ -421,7 +445,7 @@ Each top-level branch of a look behind must be of a fixed length.
  (?P=name)       reference by name (Python)
 </PRE>
 </P>
-<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
  (?R)            recurse whole pattern
@@ -440,7 +464,7 @@ Each top-level branch of a look behind must be of a fixed length.
  \g'-n'          call subpattern by relative number (PCRE extension)
 </PRE>
 </P>
-<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
  (?(condition)yes-pattern)
@@ -459,7 +483,7 @@ Each top-level branch of a look behind must be of a fixed length.
  (?(assert)...   assertion condition
 </PRE>
 </P>
-<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 The following act immediately they are reached:
 <pre>
@@ -482,27 +506,6 @@ pattern is not anchored.
  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
 </PRE>
 </P>
-<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
-<P>
-These are recognized only at the very start of the pattern or after a
-(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
-<pre>
-  (*CR)           carriage return only
-  (*LF)           linefeed only
-  (*CRLF)         carriage return followed by linefeed
-  (*ANYCRLF)      all three of the above
-  (*ANY)          any Unicode newline sequence
-</PRE>
-</P>
-<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
-<P>
-These are recognized only at the very start of the pattern or after a
-(*...) option that sets the newline convention or a UTF or UCP mode.
-<pre>
-  (*BSR_ANYCRLF)  CR, LF, or CRLF
-  (*BSR_UNICODE)  any Unicode newline sequence
-</PRE>
-</P>
 <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
@@ -526,9 +529,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 12 November 2013
+Last updated: 08 January 2014
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/html/pcretest.html
+++ b/pcre/doc/html/pcretest.html
@@ -138,6 +138,9 @@ following options output the value and set the exit code as indicated:
  newline    the default newline setting:
               CR, LF, CRLF, ANYCRLF, or ANY
               exit code is always 0
+  bsr        the default setting for what \R matches:
+               ANYCRLF or ANY
+               exit code is always 0
 </pre>
 The following options output 1 for true or 0 for false, and set the exit code
 to the same value:
@@ -373,6 +376,7 @@ sections.
  <b>/N</b>              set PCRE_NO_AUTO_CAPTURE
  <b>/O</b>              set PCRE_NO_AUTO_POSSESS
  <b>/P</b>              use the POSIX wrapper
+  <b>/Q</b>              test external stack check function
  <b>/S</b>              study the pattern after compilation
  <b>/s</b>              set PCRE_DOTALL
  <b>/T</b>              select character tables
@@ -534,7 +538,10 @@ below.
 The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
 compiled pattern (whether it is anchored, has a fixed first character, and
 so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a
-pattern. If the pattern is studied, the results of that are also output.
+pattern. If the pattern is studied, the results of that are also output. In
+this output, the word "char" means a non-UTF character, that is, the value of a
+single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
+being tested).
 </P>
 <P>
 The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking
@@ -568,6 +575,14 @@ successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
 JIT compiled code is also output.
 </P>
 <P>
+The <b>/Q</b> modifier is used to test the use of <b>pcre_stack_guard</b>. It
+must be followed by '0' or '1', specifying the return code to be given from an
+external function that is passed to PCRE and used for stack checking during
+compilation (see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation for details).
+</P>
+<P>
 The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the
 expression has been compiled, and the results used when the expression is
 matched. There are a number of qualifying characters that may follow <b>/S</b>.
@@ -1134,9 +1149,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC17" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 12 November 2013
+Last updated: 09 February 2014
 <br>
-Copyright &copy; 1997-2013 University of Cambridge.
+Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/pcre.3
+++ b/pcre/doc/pcre.3
-.TH PCRE 3 "01 Oct 2013" "PCRE 8.33"
+.TH PCRE 3 "08 January 2014" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH INTRODUCTION
@@ -158,8 +158,11 @@ page.
 The user documentation for PCRE comprises a number of different sections. In
 the "man" format, each of these is a separate "man page". In the HTML format,
 each is a separate page, linked from the index page. In the plain text format,
-all the sections, except the \fBpcredemo\fP section, are concatenated, for ease
-of searching. The sections are as follows:
+the descriptions of the \fBpcregrep\fP and \fBpcretest\fP programs are in files
+called \fBpcregrep.txt\fP and \fBpcretest.txt\fP, respectively. The remaining
+sections, except for the \fBpcredemo\fP section (which is a program listing),
+are concatenated in \fBpcre.txt\fP, for ease of searching. The sections are as
+follows:
 .sp
  pcre              this document
  pcre-config       show PCRE installation configuration information
@@ -188,8 +191,8 @@ of searching. The sections are as follows:
  pcretest          description of the \fBpcretest\fP testing command
  pcreunicode       discussion of Unicode and UTF-8/16/32 support
 .sp
-In addition, in the "man" and HTML formats, there is a short page for each
-C library function, listing its arguments and results.
+In the "man" and HTML formats, there is also a short page for each C library
+function, listing its arguments and results.
 .
 .
 .SH AUTHOR
@@ -210,6 +213,6 @@ two digits 10, at the domain cam.ac.uk.
 .rs
 .sp
 .nf
-Last updated: 13 May 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 08 January 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
--- a/pcre/doc/pcreapi.3
+++ b/pcre/doc/pcreapi.3
-.TH PCREAPI 3 "12 November 2013" "PCRE 8.34"
+.TH PCREAPI 3 "09 February 2014" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -116,6 +116,8 @@ PCRE - Perl-compatible regular expressions
 .B void (*pcre_stack_free)(void *);
 .sp
 .B int (*pcre_callout)(pcre_callout_block *);
+.sp
+.B int (*pcre_stack_guard)(void);
 .fi
 .
 .
@@ -286,6 +288,14 @@ points during a matching operation. Details are given in the
 \fBpcrecallout\fP
 .\"
 documentation.
+.P
+The global variable \fBpcre_stack_guard\fP initially contains NULL. It can be
+set by the caller to a function that is called by PCRE whenever it starts
+to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
+uses recursive function calls, which use up the system stack. This function is
+provided so that applications with restricted stacks can force a compilation
+error if the stack runs out. The function should return zero if all is well, or
+non-zero to force an error.
 .
 .
 .\" HTML <a name="newlines"></a>
@@ -337,7 +347,8 @@ controlled in a similar way, but by separate options.
 The PCRE functions can be used in multi-threading applications, with the
 proviso that the memory management functions pointed to by \fBpcre_malloc\fP,
 \fBpcre_free\fP, \fBpcre_stack_malloc\fP, and \fBpcre_stack_free\fP, and the
-callout function pointed to by \fBpcre_callout\fP, are shared by all threads.
+callout and stack-checking functions pointed to by \fBpcre_callout\fP and
+\fBpcre_stack_guard\fP, are shared by all threads.
 .P
 The compiled form of a regular expression is not altered during matching, so
 the same compiled pattern can safely be used by several threads at once.
@@ -465,7 +476,10 @@ documentation.
 The output is a long integer that gives the maximum depth of nesting of
 parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
 of system stack used when a pattern is compiled. It is specified when PCRE is
-built; the default is 250.
+built; the default is 250. This limit does not take into account the stack that
+may already be used by the calling application. For finer control over
+compilation stack usage, you can set a pointer to an external checking function
+in \fBpcre_stack_guard\fP.
 .sp
  PCRE_CONFIG_MATCH_LIMIT
 .sp
@@ -991,6 +1005,8 @@ have fallen out of use. To avoid confusion, they have not been re-used.
  81  missing opening brace after \eo
  82  parentheses are too deeply nested
  83  invalid range in character class
+  84  group name must start with a non-digit
+  85  parentheses are too deeply nested (stack check)
 .sp
 The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
 be used if the limits were changed when PCRE was built.
@@ -1248,12 +1264,15 @@ information call is provided for internal use by the \fBpcre_study()\fP
 function. External callers can cause PCRE to use its internal tables by passing
 a NULL table pointer.
 .sp
-  PCRE_INFO_FIRSTBYTE
+  PCRE_INFO_FIRSTBYTE (deprecated)
 .sp
 Return information about the first data unit of any matched string, for a
-non-anchored pattern. (The name of this option refers to the 8-bit library,
-where data units are bytes.) The fourth argument should point to an \fBint\fP
-variable.
+non-anchored pattern. The name of this option refers to the 8-bit library,
+where data units are bytes. The fourth argument should point to an \fBint\fP
+variable. Negative values are used for special cases. However, this means that
+when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of
+characters cannot be returned. For this reason, this value is deprecated; use
+PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead.
 .P
 If there is a fixed first value, for example, the letter "c" from a pattern
 such as (cat|cow|coyote), its value is returned. In the 8-bit library, the
@@ -1271,11 +1290,38 @@ starts with "^", or
 -1 is returned, indicating that the pattern matches only at the start of a
 subject string or after any newline within the string. Otherwise -2 is
 returned. For anchored patterns, -2 is returned.
+.sp
+  PCRE_INFO_FIRSTCHARACTER
+.sp
+Return the value of the first data unit (non-UTF character) of any matched
+string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1;
+otherwise return 0. The fourth argument should point to an \fBuint_t\fP
+variable.
 .P
-Since for the 32-bit library using the non-UTF-32 mode, this function is unable
-to return the full 32-bit range of the character, this value is deprecated;
-instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
-should be used.
+In the 8-bit library, the value is always less than 256. In the 16-bit library
+the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
+can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
+.sp
+  PCRE_INFO_FIRSTCHARACTERFLAGS
+.sp
+Return information about the first data unit of any matched string, for a
+non-anchored pattern. The fourth argument should point to an \fBint\fP
+variable.
+.P
+If there is a fixed first value, for example, the letter "c" from a pattern
+such as (cat|cow|coyote), 1 is returned, and the character value can be
+retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and
+if either
+.sp
+(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
+starts with "^", or
+.sp
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
+(if it were set, the pattern would be anchored),
+.sp
+2 is returned, indicating that the pattern matches only at the start of a
+subject string or after any newline within the string. Otherwise 0 is
+returned. For anchored patterns, 0 is returned.
 .sp
  PCRE_INFO_FIRSTTABLE
 .sp
@@ -1498,38 +1544,6 @@ is made available via this option so that it can be saved and restored (see the
 \fBpcreprecompile\fP
 .\"
 documentation for details).
-.sp
-  PCRE_INFO_FIRSTCHARACTERFLAGS
-.sp
-Return information about the first data unit of any matched string, for a
-non-anchored pattern. The fourth argument should point to an \fBint\fP
-variable.
-.P
-If there is a fixed first value, for example, the letter "c" from a pattern
-such as (cat|cow|coyote), 1 is returned, and the character value can be
-retrieved using PCRE_INFO_FIRSTCHARACTER.
-.P
-If there is no fixed first value, and if either
-.sp
-(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
-starts with "^", or
-.sp
-(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
-(if it were set, the pattern would be anchored),
-.sp
-2 is returned, indicating that the pattern matches only at the start of a
-subject string or after any newline within the string. Otherwise 0 is
-returned. For anchored patterns, 0 is returned.
-.sp
-  PCRE_INFO_FIRSTCHARACTER
-.sp
-Return the fixed first character value in the situation where
-PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
-argument should point to an \fBuint_t\fP variable.
-.P
-In the 8-bit library, the value is always less than 256. In the 16-bit library
-the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
-can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
 .sp
  PCRE_INFO_REQUIREDCHARFLAGS
 .sp
@@ -2900,6 +2914,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 12 November 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 09 February 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcregrep.1
+++ b/pcre/doc/pcregrep.1
-.TH PCREGREP 1 "13 September 2012" "PCRE 8.32"
+.TH PCREGREP 1 "03 April 2014" "PCRE 8.35"
 .SH NAME
 pcregrep - a grep with Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -11,9 +11,13 @@ pcregrep - a grep with Perl-compatible regular expressions.
 grep commands do, but it uses the PCRE regular expression library to support
 patterns that are compatible with the regular expressions of Perl 5. See
 .\" HREF
+\fBpcresyntax\fP(3)
+.\"
+for a quick-reference summary of pattern syntax, or
+.\" HREF
 \fBpcrepattern\fP(3)
 .\"
-for a full description of syntax and semantics of the regular expressions
+for a full description of the syntax and semantics of the regular expressions
 that PCRE supports.
 .P
 Patterns, whether supplied on the command line or in a separate file, are given
@@ -674,6 +678,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 13 September 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 03 April 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcregrep.txt
+++ b/pcre/doc/pcregrep.txt
--- a/pcre/doc/pcrepattern.3
+++ b/pcre/doc/pcrepattern.3
-.TH PCREPATTERN 3 "03 December 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -1004,7 +1004,9 @@ matches "foobar", the first substring is still set to "foo".
 .P
 Perl documents that the use of \eK within assertions is "not well defined". In
 PCRE, \eK is acted upon when it occurs inside positive assertions, but is
-ignored in negative assertions.
+ignored in negative assertions. Note that when a pattern such as (?=ab\eK)
+matches, the reported start of the match can be greater than the end of the
+match.
 .
 .
 .\" HTML <a name="smallassertions"></a>
@@ -3028,19 +3030,22 @@ match does not always guarantee that a match must be at this starting point.
 .P
 Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
 unless PCRE's start-of-match optimizations are turned off, as shown in this
-\fBpcretest\fP example:
+output from \fBpcretest\fP:
 .sp
    re> /(*COMMIT)abc/
  data> xyzabc
   0: abc
-  xyzabc\eY
+  data> xyzabc\eY
  No match
 .sp
-PCRE knows that any match must start with "a", so the optimization skips along
-the subject to "a" before running the first match attempt, which succeeds. When
-the optimization is disabled by the \eY escape in the second subject, the match
-starts at "x" and so the (*COMMIT) causes it to fail without trying any other
-starting points.
+For this pattern, PCRE knows that any match must start with "a", so the
+optimization skips along the subject to "a" before applying the pattern to the
+first set of data. The match attempt then succeeds. In the second set of data,
+the escape sequence \eY is interpreted by the \fBpcretest\fP program. It causes
+the PCRE_NO_START_OPTIMIZE option to be set when \fBpcre_exec()\fP is called.
+This disables the optimization that skips along to the first character. The
+pattern is now applied starting at "x", and so the (*COMMIT) causes the match
+to fail without trying any other starting points.
 .sp
  (*PRUNE) or (*PRUNE:NAME)
 .sp
@@ -3255,6 +3260,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 03 December 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 08 January 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcresyntax.3
+++ b/pcre/doc/pcresyntax.3
-.TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34"
+.TH PCRESYNTAX 3 "08 January 2014" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -309,6 +309,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
 .rs
 .sp
  \eK          reset start of match
+.sp
+\eK is honoured in positive assertions, but ignored in negative ones.
 .
 .
 .SH "ALTERNATION"
@@ -354,11 +356,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
  (?x)            extended (ignore white space)
  (?-...)         unset option(s)
 .sp
-The following are recognized only at the start of a pattern or after one of the
-newline-setting options with similar syntax:
+The following are recognized only at the very start of a pattern or after one
+of the newline or \eR options with similar syntax. More than one of them may
+appear.
 .sp
  (*LIMIT_MATCH=d) set the match limit to d (decimal number)
  (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
+  (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
  (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
  (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
  (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
@@ -370,6 +374,29 @@ Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
 limits set by the caller of pcre_exec(), not increase them.
 .
 .
+.SH "NEWLINE CONVENTION"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after option
+settings with a similar syntax.
+.sp
+  (*CR)           carriage return only
+  (*LF)           linefeed only
+  (*CRLF)         carriage return followed by linefeed
+  (*ANYCRLF)      all three of the above
+  (*ANY)          any Unicode newline sequence
+.
+.
+.SH "WHAT \eR MATCHES"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after option
+setting with a similar syntax.
+.sp
+  (*BSR_ANYCRLF)  CR, LF, or CRLF
+  (*BSR_UNICODE)  any Unicode newline sequence
+.
+.
 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
 .rs
 .sp
@@ -457,29 +484,6 @@ pattern is not anchored.
  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
 .
 .
-.SH "NEWLINE CONVENTIONS"
-.rs
-.sp
-These are recognized only at the very start of the pattern or after a
-(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
-.sp
-  (*CR)           carriage return only
-  (*LF)           linefeed only
-  (*CRLF)         carriage return followed by linefeed
-  (*ANYCRLF)      all three of the above
-  (*ANY)          any Unicode newline sequence
-.
-.
-.SH "WHAT \eR MATCHES"
-.rs
-.sp
-These are recognized only at the very start of the pattern or after a
-(*...) option that sets the newline convention or a UTF or UCP mode.
-.sp
-  (*BSR_ANYCRLF)  CR, LF, or CRLF
-  (*BSR_UNICODE)  any Unicode newline sequence
-.
-.
 .SH "CALLOUTS"
 .rs
 .sp
@@ -508,6 +512,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 12 November 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 08 January 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcretest.1
+++ b/pcre/doc/pcretest.1
-.TH PCRETEST 1 "12 November 2013" "PCRE 8.34"
+.TH PCRETEST 1 "09 February 2014" "PCRE 8.35"
 .SH NAME
 pcretest - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -113,6 +113,9 @@ following options output the value and set the exit code as indicated:
  newline    the default newline setting:
               CR, LF, CRLF, ANYCRLF, or ANY
               exit code is always 0
+  bsr        the default setting for what \eR matches:
+               ANYCRLF or ANY
+               exit code is always 0
 .sp
 The following options output 1 for true or 0 for false, and set the exit code
 to the same value:
@@ -330,6 +333,7 @@ sections.
  \fB/N\fP              set PCRE_NO_AUTO_CAPTURE
  \fB/O\fP              set PCRE_NO_AUTO_POSSESS
  \fB/P\fP              use the POSIX wrapper
+  \fB/Q\fP              test external stack check function
  \fB/S\fP              study the pattern after compilation
  \fB/s\fP              set PCRE_DOTALL
  \fB/T\fP              select character tables
@@ -483,7 +487,10 @@ below.
 The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
 compiled pattern (whether it is anchored, has a fixed first character, and
 so on). It does this by calling \fBpcre[16|32]_fullinfo()\fP after compiling a
-pattern. If the pattern is studied, the results of that are also output.
+pattern. If the pattern is studied, the results of that are also output. In
+this output, the word "char" means a non-UTF character, that is, the value of a
+single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
+being tested).
 .P
 The \fB/K\fP modifier requests \fBpcretest\fP to show names from backtracking
 control verbs that are returned from calls to \fBpcre[16|32]_exec()\fP. It causes
@@ -513,6 +520,15 @@ the compiled pattern to be output. This does not include the size of the
 successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
 JIT compiled code is also output.
 .P
+The \fB/Q\fP modifier is used to test the use of \fBpcre_stack_guard\fP. It
+must be followed by '0' or '1', specifying the return code to be given from an
+external function that is passed to PCRE and used for stack checking during
+compilation (see the
+.\" HREF
+\fBpcreapi\fP
+.\"
+documentation for details).
+.P
 The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the
 expression has been compiled, and the results used when the expression is
 matched. There are a number of qualifying characters that may follow \fB/S\fP.
@@ -1135,6 +1151,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 12 November 2013
-Copyright (c) 1997-2013 University of Cambridge.
+Last updated: 09 February 2014
+Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/pcre/doc/pcretest.txt
+++ b/pcre/doc/pcretest.txt
--- a/pcre/maria-patches/pcre_stack_guard.diff
+++ b/pcre/maria-patches/pcre_stack_guard.diff
-=== modified file 'pcre/pcre.h.in'
--- pcre/pcre.h.in	2013-09-26 14:02:17 +0000
-+++ pcre/pcre.h.in	2013-10-02 07:58:29 +0000
-@@ -486,6 +486,7 @@ PCRE_EXP_DECL void  (*pcre_free)(void *)
- PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
- PCRE_EXP_DECL void  (*pcre_stack_free)(void *);
- PCRE_EXP_DECL int   (*pcre_callout)(pcre_callout_block *);
-+PCRE_EXP_DECL int   (*pcre_stack_guard)(void);
- 
- PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
- PCRE_EXP_DECL void  (*pcre16_free)(void *);
-@@ -504,6 +505,7 @@ PCRE_EXP_DECL void  pcre_free(void *);
- PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
- PCRE_EXP_DECL void  pcre_stack_free(void *);
- PCRE_EXP_DECL int   pcre_callout(pcre_callout_block *);
-+PCRE_EXP_DECL int   pcre_stack_guard(void);
- 
- PCRE_EXP_DECL void *pcre16_malloc(size_t);
- PCRE_EXP_DECL void  pcre16_free(void *);
-
-=== modified file 'pcre/pcre_compile.c'
--- pcre/pcre_compile.c	2013-09-26 14:02:17 +0000
-+++ pcre/pcre_compile.c	2013-10-02 07:58:29 +0000
-@@ -7107,6 +7107,12 @@ unsigned int orig_bracount;
- unsigned int max_bracount;
- branch_chain bc;
- 
-+if (pcre_stack_guard && pcre_stack_guard())
-+{
-+  *errorcodeptr= ERR23;
-+  return FALSE;
-+}
-+ 
- bc.outer = bcptr;
- bc.current_branch = code;
- 
-
-=== modified file 'pcre/pcre_globals.c'
--- pcre/pcre_globals.c	2013-09-26 14:02:17 +0000
-+++ pcre/pcre_globals.c	2013-10-02 07:58:29 +0000
-@@ -72,6 +72,7 @@ PCRE_EXP_DATA_DEFN void  (*PUBL(free))(v
- PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc;
- PCRE_EXP_DATA_DEFN void  (*PUBL(stack_free))(void *) = LocalPcreFree;
- PCRE_EXP_DATA_DEFN int   (*PUBL(callout))(PUBL(callout_block) *) = NULL;
-+PCRE_EXP_DATA_DEFN int   (*PUBL(stack_guard))(void) = NULL;
- 
- #elif !defined VPCOMPAT
- PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc;
-@@ -79,6 +80,7 @@ PCRE_EXP_DATA_DEFN void  (*PUBL(free))(v
- PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc;
- PCRE_EXP_DATA_DEFN void  (*PUBL(stack_free))(void *) = free;
- PCRE_EXP_DATA_DEFN int   (*PUBL(callout))(PUBL(callout_block) *) = NULL;
-+PCRE_EXP_DATA_DEFN int   (*PUBL(stack_guard))(void) = NULL;
- #endif
- 
- /* End of pcre_globals.c */
-
--- a/pcre/pcre.h.in
+++ b/pcre/pcre.h.in
@@ -5,7 +5,7 @@
 /* This is the public header file for the PCRE library, to be #included by
 applications that call the PCRE functions.

-           Copyright (c) 1997-2013 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -498,12 +498,14 @@ PCRE_EXP_DECL void  (*pcre16_free)(void *);
 PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
 PCRE_EXP_DECL void  (*pcre16_stack_free)(void *);
 PCRE_EXP_DECL int   (*pcre16_callout)(pcre16_callout_block *);
+PCRE_EXP_DECL int   (*pcre16_stack_guard)(void);

 PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
 PCRE_EXP_DECL void  (*pcre32_free)(void *);
 PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
 PCRE_EXP_DECL void  (*pcre32_stack_free)(void *);
 PCRE_EXP_DECL int   (*pcre32_callout)(pcre32_callout_block *);
+PCRE_EXP_DECL int   (*pcre32_stack_guard)(void);
 #else   /* VPCOMPAT */
 PCRE_EXP_DECL void *pcre_malloc(size_t);
 PCRE_EXP_DECL void  pcre_free(void *);
@@ -517,12 +519,14 @@ PCRE_EXP_DECL void  pcre16_free(void *);
 PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
 PCRE_EXP_DECL void  pcre16_stack_free(void *);
 PCRE_EXP_DECL int   pcre16_callout(pcre16_callout_block *);
+PCRE_EXP_DECL int   pcre16_stack_guard(void);

 PCRE_EXP_DECL void *pcre32_malloc(size_t);
 PCRE_EXP_DECL void  pcre32_free(void *);
 PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
 PCRE_EXP_DECL void  pcre32_stack_free(void *);
 PCRE_EXP_DECL int   pcre32_callout(pcre32_callout_block *);
+PCRE_EXP_DECL int   pcre32_stack_guard(void);
 #endif  /* VPCOMPAT */

 /* User defined callback which provides a stack just before the match starts. */

--- a/pcre/pcre_byte_order.c
+++ b/pcre/pcre_byte_order.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2013 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -311,9 +311,9 @@ while(TRUE)
  ptr++;
  }
 /* Control should never reach here in 16/32 bit mode. */
-#endif /* !COMPILE_PCRE8 */
-
+#else  /* In 8-bit mode, the pattern does not need to be processed. */
 return 0;
+#endif /* !COMPILE_PCRE8 */
 }

 /* End of pcre_byte_order.c */
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
--- a/pcre/pcre_dfa_exec.c
+++ b/pcre/pcre_dfa_exec.c
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language (but see
 below for why this module is different).

                       Written by Philip Hazel
-           Copyright (c) 1997-2013 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -1473,7 +1473,7 @@ for (;;)
          goto ANYNL01;

          case CHAR_CR:
-          if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1;
+          if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
          /* Fall through */

          ANYNL01:
@@ -1742,7 +1742,7 @@ for (;;)
          goto ANYNL02;

          case CHAR_CR:
-          if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1;
+          if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
          /* Fall through */

          ANYNL02:
@@ -2012,7 +2012,7 @@ for (;;)
          goto ANYNL03;

          case CHAR_CR:
-          if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1;
+          if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
          /* Fall through */

          ANYNL03:
@@ -2210,7 +2210,7 @@ for (;;)
          if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
            reset_could_continue = TRUE;
          }
-        else if (RAWUCHARTEST(ptr + 1) == CHAR_LF)
+        else if (UCHAR21TEST(ptr + 1) == CHAR_LF)
          {
          ADD_NEW_DATA(-(state_offset + 1), 0, 1);
          }
@@ -3466,7 +3466,7 @@ for (;;)

    if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
      {
-      /* Advance to a known first char. */
+      /* Advance to a known first pcre_uchar (i.e. data item) */

      if (has_first_char)
        {
@@ -3474,12 +3474,12 @@ for (;;)
          {
          pcre_uchar csc;
          while (current_subject < end_subject &&
-                 (csc = RAWUCHARTEST(current_subject)) != first_char && csc != first_char2)
+                 (csc = UCHAR21TEST(current_subject)) != first_char && csc != first_char2)
            current_subject++;
          }
        else
          while (current_subject < end_subject &&
-                 RAWUCHARTEST(current_subject) != first_char)
+                 UCHAR21TEST(current_subject) != first_char)
            current_subject++;
        }

@@ -3509,36 +3509,26 @@ for (;;)
          ANYCRLF, and we are now at a LF, advance the match position by one
          more character. */

-          if (RAWUCHARTEST(current_subject - 1) == CHAR_CR &&
+          if (UCHAR21TEST(current_subject - 1) == CHAR_CR &&
               (md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) &&
               current_subject < end_subject &&
-               RAWUCHARTEST(current_subject) == CHAR_NL)
+               UCHAR21TEST(current_subject) == CHAR_NL)
            current_subject++;
          }
        }

-      /* Or to a non-unique first char after study */
+      /* Advance to a non-unique first pcre_uchar after study */

      else if (start_bits != NULL)
        {
        while (current_subject < end_subject)
          {
-          register pcre_uint32 c = RAWUCHARTEST(current_subject);
+          register pcre_uint32 c = UCHAR21TEST(current_subject);
 #ifndef COMPILE_PCRE8
          if (c > 255) c = 255;
 #endif
-          if ((start_bits[c/8] & (1 << (c&7))) == 0)
-            {
-            current_subject++;
-#if defined SUPPORT_UTF && defined COMPILE_PCRE8
-            /* In non 8-bit mode, the iteration will stop for
-            characters > 255 at the beginning or not stop at all. */
-            if (utf)
-              ACROSSCHAR(current_subject < end_subject, *current_subject,
-                current_subject++);
-#endif
-            }
-          else break;
+          if ((start_bits[c/8] & (1 << (c&7))) != 0) break;
+          current_subject++;
          }
        }
      }
@@ -3557,19 +3547,20 @@ for (;;)
      /* If the pattern was studied, a minimum subject length may be set. This
      is a lower bound; no actual string of that length may actually match the
      pattern. Although the value is, strictly, in characters, we treat it as
-      bytes to avoid spending too much time in this optimization. */
+      in pcre_uchar units to avoid spending too much time in this optimization.
+      */

      if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 &&
          (pcre_uint32)(end_subject - current_subject) < study->minlength)
        return PCRE_ERROR_NOMATCH;

-      /* If req_char is set, we know that that character must appear in the
-      subject for the match to succeed. If the first character is set, req_char
-      must be later in the subject; otherwise the test starts at the match
-      point. This optimization can save a huge amount of work in patterns with
-      nested unlimited repeats that aren't going to match. Writing separate
-      code for cased/caseless versions makes it go faster, as does using an
-      autoincrement and backing off on a match.
+      /* If req_char is set, we know that that pcre_uchar must appear in the
+      subject for the match to succeed. If the first pcre_uchar is set,
+      req_char must be later in the subject; otherwise the test starts at the
+      match point. This optimization can save a huge amount of work in patterns
+      with nested unlimited repeats that aren't going to match. Writing
+      separate code for cased/caseless versions makes it go faster, as does
+      using an autoincrement and backing off on a match.

      HOWEVER: when the subject string is very, very long, searching to its end
      can take a long time, and give bad performance on quite ordinary
@@ -3589,7 +3580,7 @@ for (;;)
            {
            while (p < end_subject)
              {
-              register pcre_uint32 pp = RAWUCHARINCTEST(p);
+              register pcre_uint32 pp = UCHAR21INCTEST(p);
              if (pp == req_char || pp == req_char2) { p--; break; }
              }
            }
@@ -3597,18 +3588,18 @@ for (;;)
            {
            while (p < end_subject)
              {
-              if (RAWUCHARINCTEST(p) == req_char) { p--; break; }
+              if (UCHAR21INCTEST(p) == req_char) { p--; break; }
              }
            }

-          /* If we can't find the required character, break the matching loop,
+          /* If we can't find the required pcre_uchar, break the matching loop,
          which will cause a return or PCRE_ERROR_NOMATCH. */

          if (p >= end_subject) break;

-          /* If we have found the required character, save the point where we
+          /* If we have found the required pcre_uchar, save the point where we
          found it, so that we don't search again next time round the loop if
-          the start hasn't passed this character yet. */
+          the start hasn't passed this point yet. */

          req_char_ptr = p;
          }
@@ -3665,9 +3656,9 @@ for (;;)
  not contain any explicit matches for \r or \n, and the newline option is CRLF
  or ANY or ANYCRLF, advance the match position by one more character. */

-  if (RAWUCHARTEST(current_subject - 1) == CHAR_CR &&
+  if (UCHAR21TEST(current_subject - 1) == CHAR_CR &&
      current_subject < end_subject &&
-      RAWUCHARTEST(current_subject) == CHAR_NL &&
+      UCHAR21TEST(current_subject) == CHAR_NL &&
      (re->flags & PCRE_HASCRORLF) == 0 &&
        (md->nltype == NLTYPE_ANY ||
         md->nltype == NLTYPE_ANYCRLF ||

--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
--- a/pcre/pcre_globals.c
+++ b/pcre/pcre_globals.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2012 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without

--- a/pcre/pcre_internal.h
+++ b/pcre/pcre_internal.h
--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
--- a/pcre/pcre_jit_test.c
+++ b/pcre/pcre_jit_test.c
--- a/pcre/pcre_printint.c
+++ b/pcre/pcre_printint.c
@@ -644,7 +644,9 @@ for(;;)
      int i;
      unsigned int min, max;
      BOOL printmap;
+      BOOL invertmap = FALSE;
      pcre_uint8 *map;
+      pcre_uint8 inverted_map[32];

      fprintf(f, "    [");

@@ -653,7 +655,12 @@ for(;;)
        extra = GET(code, 1);
        ccode = code + LINK_SIZE + 1;
        printmap = (*ccode & XCL_MAP) != 0;
-        if ((*ccode++ & XCL_NOT) != 0) fprintf(f, "^");
+        if ((*ccode & XCL_NOT) != 0)
+          {
+          invertmap = (*ccode & XCL_HASPROP) == 0;
+          fprintf(f, "^");
+          }
+        ccode++;
        }
      else
        {
@@ -666,6 +673,12 @@ for(;;)
      if (printmap)
        {
        map = (pcre_uint8 *)ccode;
+        if (invertmap)
+          {
+          for (i = 0; i < 32; i++) inverted_map[i] = ~map[i];
+          map = inverted_map;
+          }
+
        for (i = 0; i < 256; i++)
          {
          if ((map[i/8] & (1 << (i&7))) != 0)

--- a/pcre/pcre_string_utils.c
+++ b/pcre/pcre_string_utils.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2013 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -91,8 +91,8 @@ pcre_uchar c2;

 while (*str1 != '\0' || *str2 != '\0')
  {
-  c1 = RAWUCHARINC(str1);
-  c2 = RAWUCHARINC(str2);
+  c1 = UCHAR21INC(str1);
+  c2 = UCHAR21INC(str2);
  if (c1 != c2)
    return ((c1 > c2) << 1) - 1;
  }
@@ -131,7 +131,7 @@ pcre_uchar c2;

 while (*str1 != '\0' || *ustr2 != '\0')
  {
-  c1 = RAWUCHARINC(str1);
+  c1 = UCHAR21INC(str1);
  c2 = (pcre_uchar)*ustr2++;
  if (c1 != c2)
    return ((c1 > c2) << 1) - 1;

--- a/pcre/pcre_study.c
+++ b/pcre/pcre_study.c
--- a/pcre/pcre_xclass.c
+++ b/pcre/pcre_xclass.c
@@ -81,6 +81,11 @@ additional data. */

 if (c < 256)
  {
+  if ((*data & XCL_HASPROP) == 0)
+    {
+    if ((*data & XCL_MAP) == 0) return negated;
+    return (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0;
+    }
  if ((*data & XCL_MAP) != 0 &&
    (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0)
    return !negated; /* char found */

--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@@ -12,7 +12,7 @@ distribution because other apparatus is needed to compile pcregrep for z/OS.
 The header can be found in the special z/OS distribution, which is available
 from www.zaconsultants.net or from www.cbttape.org.

-           Copyright (c) 1997-2013 University of Cambridge
+           Copyright (c) 1997-2014 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -1298,7 +1298,7 @@ switch(endlinetype)
    while (p > startptr && p[-1] != '\n') p--;
    if (p <= startptr + 1 || p[-2] == '\r') return p;
    }
-  return p;   /* But control should never get here */
+  /* Control can never get here */

  case EL_ANY:
  case EL_ANYCRLF:

--- a/pcre/pcreposix.c
+++ b/pcre/pcreposix.c
--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
--- a/pcre/testdata/saved16BE-1
+++ b/pcre/testdata/saved16BE-1
--- a/pcre/testdata/saved16LE-1
+++ b/pcre/testdata/saved16LE-1
--- a/pcre/testdata/saved32BE-1
+++ b/pcre/testdata/saved32BE-1
--- a/pcre/testdata/saved32LE-1
+++ b/pcre/testdata/saved32LE-1
--- a/pcre/testdata/testinput18
+++ b/pcre/testdata/testinput18
--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
--- a/pcre/testdata/testinput25
+++ b/pcre/testdata/testinput25
--- a/pcre/testdata/testinput3
+++ b/pcre/testdata/testinput3
--- a/pcre/testdata/testinput4
+++ b/pcre/testdata/testinput4
--- a/pcre/testdata/testinput5
+++ b/pcre/testdata/testinput5
--- a/pcre/testdata/testinput6
+++ b/pcre/testdata/testinput6
--- a/pcre/testdata/testinput7
+++ b/pcre/testdata/testinput7
--- a/pcre/testdata/testoutput12
+++ b/pcre/testdata/testoutput12
--- a/pcre/testdata/testoutput13
+++ b/pcre/testdata/testoutput13
--- a/pcre/testdata/testoutput14
+++ b/pcre/testdata/testoutput14
--- a/pcre/testdata/testoutput15
+++ b/pcre/testdata/testoutput15
--- a/pcre/testdata/testoutput16
+++ b/pcre/testdata/testoutput16
--- a/pcre/testdata/testoutput17
+++ b/pcre/testdata/testoutput17
--- a/pcre/testdata/testoutput18-16
+++ b/pcre/testdata/testoutput18-16
--- a/pcre/testdata/testoutput18-32
+++ b/pcre/testdata/testoutput18-32
--- a/pcre/testdata/testoutput19
+++ b/pcre/testdata/testoutput19
--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
--- a/pcre/testdata/testoutput21-16
+++ b/pcre/testdata/testoutput21-16
--- a/pcre/testdata/testoutput21-32
+++ b/pcre/testdata/testoutput21-32
--- a/pcre/testdata/testoutput22-16
+++ b/pcre/testdata/testoutput22-16
--- a/pcre/testdata/testoutput22-32
+++ b/pcre/testdata/testoutput22-32
--- a/pcre/testdata/testoutput23
+++ b/pcre/testdata/testoutput23
--- a/pcre/testdata/testoutput25
+++ b/pcre/testdata/testoutput25
--- a/pcre/testdata/testoutput3
+++ b/pcre/testdata/testoutput3
--- a/pcre/testdata/testoutput3A
+++ b/pcre/testdata/testoutput3A
--- a/pcre/testdata/testoutput3B
+++ b/pcre/testdata/testoutput3B
--- a/pcre/testdata/testoutput4
+++ b/pcre/testdata/testoutput4
--- a/pcre/testdata/testoutput5
+++ b/pcre/testdata/testoutput5
--- a/pcre/testdata/testoutput6
+++ b/pcre/testdata/testoutput6
--- a/pcre/testdata/testoutput7
+++ b/pcre/testdata/testoutput7
--- a/pcre/testdata/testoutput8
+++ b/pcre/testdata/testoutput8
--- a/pcre/testdata/wintestoutput3
+++ b/pcre/testdata/wintestoutput3