• Jérome Perrin's avatar
    component/ghostscript: Workaround for slaprunner paths with double slashs · 1b291415
    Jérome Perrin authored
    This tessdata path will be included in cpp code by pre-processor macros
    https://github.com/ArtifexSoftware/ghostpdl/blob/gs9.54.0/base/tessocr.cpp#L188-L193
    Since // is the marker for a comment in cpp and as documented in
    https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html "Comments are replaced by
    whitespace long before stringizing happens, so they never appear in stringized
    text", the STRINGIFY/STRINGIFY2 approach of including a path does not work
    when the path contain // , because anything after // is considered a comment
    and is not included, causing errors like this when using ghostscript with OCR
    in webrunner:
    
        $ strace -e open -o open.strace /srv/slapgrid/slappart42/srv/runner/shared/ghostscript/4387fe7a8d2034ac5691d43b58134248/bin/gs -sDEVICE=ocr
        GPL Ghostscript 9.54.0 (2021-03-30)
        Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
        This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
        see the file COPYING for details.
        Error opening data file ./eng.traineddata
        Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
        Failed loading language 'eng'
        Tesseract couldn't load any languages!
        **** Unable to open the initial device, quitting.
        $ grep eng open.strace
        open("./eng.traineddata", O_RDONLY)     = -1 ENOENT (No such file or directory)
        open("/srv/slapgrid/slappart42/srv/eng.traineddata", O_RDONLY) = -1 ENOENT (No such file or directory)
        open("eng.traineddata", O_RDONLY)       = -1 ENOENT (No such file or directory)
    
    eng.traineddata is looked up in /srv/slapgrid/slappart42/srv/ because
    ghostscript was configured with:
    
       --with-tessdata=/srv/slapgrid/slappart42/srv//runner//shared/ghostscript/4387fe7a8d2034ac5691d43b58134248/share/tessdata/
    
    and everything after // was stripped.
    
    This was reported upstream as https://bugs.ghostscript.com/show_bug.cgi?id=703905
    
    More about the case of // in slaprunner paths was on commit eb544196
    (slparunner: document the reasons why we keep srv//slaprunner, 2019-10-10)
    1b291415
buildout.cfg 1.8 KB
[buildout]
extends =
  ../fontconfig/buildout.cfg
  ../freetype/buildout.cfg
  ../libjpeg/buildout.cfg
  ../libtiff/buildout.cfg
  ../libxml2/buildout.cfg
  ../pkgconfig/buildout.cfg
  ../tesseract/buildout.cfg
  ../xz-utils/buildout.cfg

parts = ghostscript

[ghostscript]
recipe = slapos.recipe.cmmi
shared = true
url = https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs9540/ghostscript-9.54.0.tar.gz
md5sum = 5d571792a8eb826c9f618fb69918d9fc
pkg_config_depends = ${libtiff:location}/lib/pkgconfig:${libjpeg:location}/lib/pkgconfig:${fontconfig:location}/lib/pkgconfig:${fontconfig:pkg_config_depends}
# XXX --with-tessdata work arounds a slaprunner bug of having softwares installed in a path containing //
configure-options =
  --disable-cups
  --disable-threadsafe
  --with-system-libtiff
  --without-libidn
  --without-x
  --with-drivers=FILES
  --with-tessdata=$(python -c 'print("""${:tessdata-location}""".replace("//", "/"))')
environment =
  PATH=${pkgconfig:location}/bin:${xz-utils:location}/bin:%(PATH)s
  PKG_CONFIG_PATH=${:pkg_config_depends}
  CFLAGS=-I${libjpeg:location}/include
  LDFLAGS=-Wl,-rpath=${fontconfig:location}/lib -Wl,-rpath=${freetype:location}/lib -Wl,-rpath=${libtiff:location}/lib -L${libjpeg:location}/lib -Wl,-rpath=${libjpeg:location}/lib
  LD_LIBRARY_PATH=${fontconfig:location}/lib:${freetype:location}/lib:${libtiff:location}/lib:${libxml2:location}/lib

# configure gives priority to local jpeg library and refuse mixing local libjpeg with "system" libtiff.
# We remove this local jpeg library source folder so that configure picks up the slapos versions of these libraries.
pre-configure = rm -r jpeg

post-make-hook = ${tesseract-download-traineddata:post-make-hook}
tessdata-location = @@LOCATION@@/share/tessdata/
tessdata-urls = ${tesseract-download-traineddata:urls}