Lighter processing for OCR activities
When running OCR, we sometimes have issues because processing is "too heavy": - [x] use 2 or 3 Go of disk space for a one page PDF created by erp5_document_scanner, because we convert pdf -> png -> tiff before sending to tesseract. Modern Ghostscript supports running tesseract directly, so we use it if it's available. - [x] use 300% of CPU. Fixed by setting `OMP_THREAD_LIMIT` when running tesseract. This will only apply when OCR from Images. OCR embedded in Ghostscript does not seem to need this. - [x] ... and often crash, so is restarted. This is fixed by updated tesseract. Updates of ghostscript and tesseract are part of slapos!985 See merge request !1420
Status | Job ID | Name | Coverage | ||||||
---|---|---|---|---|---|---|---|---|---|
External | |||||||||
passed |
#233265
external
|
ERP5.CodingStyleTest-Master |
00:55:50
|
||||||
failed |
#233272
external
|
ERP5.UnitTest-Master |
02:22:07
|
||||||
failed |
#233270
external
|
ERP5.UnitTest-Master.Medusa |
02:22:46
|
||||||
passed |
#233259
external
|
SlapOS.Eggs.UnitTest-Master.Python2 |
00:08:41
|
||||||
passed |
#233261
external
|
SlapOS.Eggs.UnitTest-Master.Python3 |
00:33:20
|
||||||