Problem installing paperless-ngx

Friend still with the probelmas with tika doesn’t start the service I don’t know when it even goes with your script doesn’t start :sweat_smile: :upside_down_face: :smiling_face_with_tear: :grimacing:

Really strange :face_with_diagonal_mouth:
If u manually run that it starts right?

/usr/local/openjdk11/bin/java -jar /root/tika-server-standard-2.9.2.jar &

Edit: maybe try an approach where you use 1 only script for all 3 services

Good news

I was able to make them work by simplifying the entire process, it is a bit tricky but it works perfectly, you need to do the following:

ee /root/scripts/tika.sh

#!/bin/sh
java -jar /root/tika-server-standard-2.9.2.jar &


ee /root/scripts/gotenberg.sh

#!/bin/sh

export CHROMIUM_BIN_PATH="/usr/local/bin/chrome"
export EXIFTOOL_BIN_PATH="/usr/local/bin/exiftool"
export LIBREOFFICE_BIN_PATH="/usr/local/bin/soffice"
export PDFTK_BIN_PATH="/usr/local/bin/pdftk"
export QPDF_BIN_PATH="/usr/local/bin/qpdf"
export UNOCONV_BIN_PATH="/usr/local/bin/unoconv"
#export UNOCONVERTER_BIN_PATH="/usr/local/bin/unoconv"

cd /root/gotenberg-7.4.2
./gotenberg &



ee /etc/crontab

add this lines:


@reboot root /root/scripts/tika.sh
@reboot root /root/scripts/gotenberg.sh

I’m almost sure that making a single script and adding the tika and gotenberg instructions will work, however I think this way is more organized

EDIT: Source

https://forums.freebsd.org/threads/help-to-run-a-simple-script-on-startup.78634/

1 Like

hi buddy, finally i’m starting using Paperless.
During the import of various documents, i encounter some minor problem on consuming some kind of file.
I have resolved those problem adding (on paperless.conf)
PAPERLESS_OCR_USER_ARGS={"invalidate_digital_signatures": true,"continue_on_soft_render_error": true}

and adding this to my nginx conf file (if you still dont use reverse proxy, i think u will be fine)
client_max_body_size 100M;

Tell me if you have more improvement, this software is really amazing!

Hello my friend, thank you for your feedback, I am also using paperless and I have not had any major problems except the consumption of .eml files that are saved emails and I have not been able to solve it, I am still investigating hehe, greetings

1 Like

The problem I seem to have is because of Gotenberg 7.4, that version is already very old and very soon will be obsolete, with all the information we have I will try to make a clean installation with the latest stable versions of everything (paperless, tika, gotenberg) Wish me luck :sweat_smile: :rofl: :joy: greetings

try it in another jail, then if works made the update on the paperless one (where you install the service).

This was a good reason to keep Paperless-Tika-Gotenberg separated
:stuck_out_tongue_winking_eye: despite the more resource they prob use in my case

Anyway, dont expect help from gotenbergs developer:

Hello @oxyde1989 :wave:

We don’t support anything outside the “official” Docker images. Check the debug logs to have more insights on why LibreOffice failed to start.

The key of success there is to find anything in the log. Prob i will test too, because the idea of store some old email there is pretty good for me too (and i didnt think about it until you mentioned).

EDIT:
i try consuming an .eml file, after installed bleach on pip3 i got this error:

[2024-07-26 09:29:28,475] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: cannot import name 'Margin' from 'gotenberg_client.options' (/home/paperless/.local/lib/python3.11/site-packages/gotenberg_client/options.py)

Traceback (most recent call last):

  File "/usr/home/paperless/paperless-ngx/src/documents/tasks.py", line 151, in consume_file

    msg = plugin.run()

          ^^^^^^^^^^^^

  File "/usr/home/paperless/paperless-ngx/src/documents/consumer.py", line 555, in run

    document_parser: DocumentParser = parser_class(

                                      ^^^^^^^^^^^^^

  File "/usr/home/paperless/paperless-ngx/src/paperless_mail/signals.py", line 2, in get_parser

    from paperless_mail.parsers import MailDocumentParser

  File "/usr/home/paperless/paperless-ngx/src/paperless_mail/parsers.py", line 12, in <module>

    from gotenberg_client.options import Margin

ImportError: cannot import name 'Margin' from 'gotenberg_client.options' (/home/paperless/.local/lib/python3.11/site-packages/gotenberg_client/options.py)

Same issue as you? seems that last version of paperless use api that are not implemented on our gotenberg version

This error is due to the fact that you need a package, the package is installed within the paperless user with pip3

pip3 install bleach

after that the error you will get is a 400 Bad Request error

I am trying the most current version of Gotenber but I am still stuck with the same problem that we presented in our first installations when determining if it is a gotenebrg problem

Edit 1:
Everything seems to indicate that our version of gotenberg 7.4.2 is the one that works best for us, well the one that works best for BSD

Edit 2:
What I want to try is how to do a paperless update without having to install everything again, do you know how to do this? Because paperless is constantly updated

Edit3:

This a error with .eml:

comprobante.eml: Error occurred while consuming document comprobante.eml: Error while converting email to PDF: Client error '400 Bad Request' for url 'http://localhost:3000/forms/chromium/convert/html'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

Edit 4:

Good news bro, there is a gotenber API demo that works perfectly, the question is to try to reverse engineer that API and apply it to our server, the API demo is this:

https://demo.gotenberg.dev

you must put it in:

PAPERLESS_TIKA_GOTENBERG_ENDPOINT: https://demo.gotenberg.dev

I tried it and it works wonders

Hi! I have installed bleach but still same error. Prob i missing another dependency?

I have made some test, and the version 7.4.3 works to me (except .eml) but starting from 7.5.0 LibreOffice stop work as same of the 8.x versions. Next days i will try to read break change, maybe Will be usefull

Hi bro, i have see in a discussion

The API now waits for all modules to be ready before starting listening on the configured port.

and this what i see in 8.9

[SYSTEM] modules: api chromium exiftool libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook
[SYSTEM] chromium: Chromium ready to start
[SYSTEM] libreoffice-api: LibreOffice ready to start
[SYSTEM] pdfengines: exiftool libreoffice-pdfengine pdfcpu pdftk qpdf
[SYSTEM] api: server listening on port 3000
[SYSTEM] prometheus: collecting metrics

despite in 7.4.3

[SYSTEM] modules: api chromium gc libreoffice logging pdfcpu pdfengines pdftk prometheus qpdf unoconv unoconv-pdfengine webhook
[SYSTEM] gc: application started
[SYSTEM] api: server listening on port 3000
[SYSTEM] prometheus: collecting metrics
[SYSTEM] pdfengines: pdfcpu pdftk qpdf unoconv-pdfengine
[SYSTEM] unoconv: listener started on port 20543

As you can see, Chromium is not initialised, and i i try to force autostart i get error
./gotenberg --chromium-auto-start

[SYSTEM] modules: api chromium exiftool libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook
[SYSTEM] libreoffice-api: LibreOffice ready to start
[SYSTEM] prometheus: collecting metrics
[SYSTEM] pdfengines: exiftool libreoffice-pdfengine pdfcpu pdftk qpdf
[FATAL] starting chromium: launch supervisor: start process: run exec allocator: could not dial “ws://127.0.0.1:22413/devtools/browser/18398fe0-c434-4986-9f83-d09dedcbac7f”: EOF

so i think this is what prevent libreoffice to start properly on recent gotenberg versions.

I have tried:

  • service dbus onestart
  • cd /usr/local/binchrome --headless --disable-gpu --remote-debugging-port=9222

receving

root@Gotenberg:/usr/local/bin # chrome --headless --disable-gpu --remote-debugging-port=9222
[92989:26808749032192:0729/163012.949649:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are “tcp” and on UNIX “unix”)
[92989:26808749032192:0729/163012.949757:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are “tcp” and on UNIX “unix”)
[92989:26808749032192:0729/163012.949799:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are “tcp” and on UNIX “unix”)
[92989:26808749032192:0729/163012.949831:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are “tcp” and on UNIX “unix”)

Edit:
setenv DBUS_SESSION_BUS_ADDRESS unix:path=/var/run/dbus/system_bus_socket
made errors disappear, but still connection problem from gotenberg to chromium

Hey buddy .Thanks for the information, it seems that error is because Chrome does not run with root users. I don’t know how to get Gotenberg to run Chrome with another user and without Chrome. It sounds a bit logical since the Gotenberg installation dockerfile creates a user. for the installation but I can’t replicate it in BSD, the other question is because version 7.4.3 converts the documents if Chrome is not executed,

the only thing make sense for me is in 7.4.3 we probably (or for sure :rofl:) dont have a chromium working, but this dont prevent the api module from working… so we had manage to consume without that module; but with newer version is not possible anymore, all module must work together (used or no).
If you think about the fact the last libreoffice work good with an old version, but not new… this scenario make more sense.
Tomorrow i will try to install gotenberg with pip, using a user like we made with paperless… or im totally out of mind? xd

Hye buddy I don’t know where to look anymore, I hope you have better results, I’m already stuck

Unlucky i don’t have any good news.
I can’t get Chrome to work in any way i know… The better result i have is manually start the service headless, without error, but neither without anything done (for example screenshots or convert html to pdf).
I have tried to install puppeteer too, but is a pain and not working natively on freebsd.
Tried with a specific user, nothing change; tried disable chrome route and nothing change too.
I’m out of idea for the moment :face_exhaling:

I’m starting think about the only viable way to setup correctly Gotenberg is to run it in a VM for example, and due the fact my little i3 can’t handle VM i will continue use the only working version 7.4.3… at least now is working! Sooner or later we have to switch to Scale, so this problem will autosolve :sweat_smile:

Hello friend, look there is not so much problem with paperless, I have tried up to the latest version and it works without problems in BSD as well as tika, and with gotenberg we can use the demo version of the same developer, or create a local docker instance to use it although I would have preferred having everything on the same server seems like it won’t be possible xD anyway version 7.4.3 is fine except for emails, greetings

Yo, at this point i have tried the docker installation.
I have setup a VM on my workstation (i have better hardware), using Alpine Linux as OS (pretty minimal, but probably there are better choice, im a totally newbie on that).
In literally less than 30 min (considering create VM, install SO, setup ecc)

all seems working like a charm.

Actually i have allocated 2 CPU and 8GB of ram to the VM, but is probably an overkill… i hope he can work well with 1 CPU and half ram, so i can reconsidering the fact to run it directly on the NAS (maybe with Tika too inside the docker).
As you, i obviously want all service running in same server, i will do more test on it

EDIT: im still facing the problem

[2024-08-03 23:14:15,831] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: cannot import name ‘Margin’ from ‘gotenberg_client.options’ (/home/paperless/.local/lib/python3.11/site-packages/gotenberg_client/options.py)

after install pip3 install bleach, i tried to pkg update/upgrade and got all service broken

ImportError: Shared object “libtiff.so.5” not found, required by “_imaging.cpython-311.so”

resolved with a symlink

ln -s /usr/local/lib/libtiff.so.6 /usr/local/lib/libtiff.so.5

but still this module margin problem is here :face_exhaling:

EDIT2: resolved by installing
pip3 install gotenberg-client==0.5.0
after unistall the 0.6.0 version… this is kinda strange

The consume working like a charm with .eml too, and is very fast despite che 1cpu/2gb ram on the VM :rofl: :rofl: i will try for sure to install a same VM on truenas directly!

Hello bro, I’m glad to know that your gotenberg works well in VM and maybe you can run everything in the VM including paperless to save resources on your NAS, I may get it by using the API developed in gotenberg or maybe use my API in docker in My desktop computer, everything else works fine on TrueNAS, although I am going to continue testing Gotenberg on BSD hehehe, greetings:

Well my friend, at the end i have create the VM directly into TN… i was literally afraid because some months ago i have got huge stability problem using VM, but i think thanks to the more ram i have now (32 instead of 16), and the fact i really just allocate only 1 virtual CPU and 1gb ram all goes fine.
If your machine have resource, give it a try. I have consumed some documents and i didn’t see difference from the “jail” version to the “docker” one, i like that those parts are separeted from paperless itself.
Changing argument… are you able to consume .html file?
Screenshot 2024-08-04 231423
trad:

text/html file are not supported

There is something to set? (the fact that is not che task that fail but the upload itself, make me think is not a setup problem).
Or maybe they are not supported as well (gotenberg can do it, but maybe paperless no)

Edit: forget, this is for you if you decide to try docker (after installed your linux iso into vm):

apk add nano
nano /etc/apk/repositories
#add this line → Index of /alpine/v3.17/community/

apk update
apk upgrade

apk add docker
service docker start

rc-update add docker boot

docker pull thecodingmachine/gotenberg:latest
docker run -d -p 3000:3000 thecodingmachine/gotenberg:latest

docker run -d -p 9998:9998 apache/tika:latest

that’s all :melting_face:

Hey buddy, I have good news, maybe I don’t know if it’s good anymore hahaha, I found several very interesting articles to run docker on FreeBSD, I’m sharing them with you, I’ve only tried the one from Lima+QEMU+DOCKER, the one from DOCKER-MACHINE-VIRTUALBOX, I’ve tried it but in Windows, which is how I use docker, anyway, both cases seemed interesting to me, however in the case of Lima+QEMU I had problems with my processor, it is an ADM FX (bad for me) so I decided to create the virtual machine, and it is embarrassing. how easy it is to create paperless-tika-gotenberg in docker and all the suffering we had in BAREMETAL hehe, in short, I think we took away a lot of knowledge, and the VM-DOCKER version works very well, I’ll share the results with you:

Operating system: AlpineLinux is obviously the one that consumes the least resources, there are some crazier options but it is better to play it safe, try to make the Alpine-Mini root filesystem version work since it is only 3.45MB but the amount of package I was missing It was a tremendous decision to go for the 64MB Alpine-Virtual version, not bad for our purpose

Hardware: My TrueNas is not the most powerful there is so I decided to give everything the minimum:

Virtual CPUs:1
Cores:1
Threads:1
Memory Size:512.00 MiB

Docker: I found a docker-compose.yml in the official documentation that comes already prepared to work with tika and gotenberg and uses sqlite as a database, which is more than enough for personal use:

# Docker Compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
#   as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# In addition to that, this Docker Compose file adds the following optional
# configurations:
#
# - Apache Tika and Gotenberg servers are started with paperless and paperless
#   is configured to use these services. These provide support for consuming
#   Office documents (Word, Excel, Power Point and their LibreOffice counter-
#   parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
#   and '.env' into a folder.
# - Run 'docker compose pull'.
# - Run 'docker compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped

    # The gotenberg chromium route is used to convert .eml files. We do not
    # want to allow external content like tracking pixels or even javascript.
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

volumes:
  data:
  media:
  redisdata:

If everything goes well, we execute:

docker-compose up -d

and … BOOM!! :sunglasses: :sunglasses:

ready in 5 minutes we have everything ready, consume everything docx,doc,odt,txt,eml a beauty :star_struck:

If you don’t want to use a separate .env file, I leave you the docker-compose that youse integrating the .env

version: '3.8'

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_TIKA_ENABLED: "1"  # Asegúrate de que sea una cadena si el contenedor lo espera así
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

volumes:
  data:
  media:
  redisdata:

Docker sources on FreeBSD:
Docker-Machine-Virtualbox
LIMA-QEMU

Sources of docker-compose.yml:
https://github.com/paperless-ngx/paperless-ngx/blob/main/docker/compose/docker-compose.sqlite-tika.yml

AlpineLinux:
https://www.alpinelinux.org/downloads/

Edit1: You cant install docker compose, the following way:

apk add docker-compose
1 Like

If you think about i started my first install via port, downloading like 30gb of stuff… this seems more than embarassing :face_exhaling: at least a triple facepalm :upside_down_face:

I will read those resource ASAP, tonight im too tired for everything :rofl:
I have read yet both, i understand why you got CPU problem now. They are kinda outdated too but in glad still work

I have just a couple of consideration about virtualise paperless too.
Yesterday i have consumed a couple of large PDF (50~70mb each). They are the result of the merge from scanned paper into JPEG (btw, is impossible merge file that are not uploaded natively in pdf, kinda no sense due to the face they are converted by paperless…). I will do that a lot, considering the amount of documenti in paper i have (and i will need for sure to optimize those image).
Considering that paperless don’t need tika/gotenberg for that, i see the CPU use 100% on all thread, and the process needed 5 minute (bit less bit more)… Can you handle this kind of data with such a small VM? And if you can, process time Will be reasonable?

Another point, most important to me: in case of need, with all the data in the VM, how do you performs backups? Have you to restore completely the VM?
In my case, having mounted data outside the jail, and according to guideline, a complete reinstall “shouldn’t” end in a data loss.
Tika and gotenberg don’t are “data critical”, in case of need i can just point them to another host