The original post: /r/selfhosted by /u/hand_in_every_pot on 2024-07-01 14:46:20.


Recently got into paperless ngx and already built a script to extract the amount from the OCR content, it works pretty well (like 95% success) but the extract attempt for the company name does not work well as there is really no indicator for that on general receipts.

I have tried openchat-cpu with the data but its results are not consistent via the api and eat my CPU for tooooo long. Is there a better approach to this that I can self host?

I was thinking to get a list of all company names and check against that but there has to be SOMETHING out there that is straight forward and readily available (self-hosted!)?

To reiterate, I already have the text data (do NOT need to OCR it), just want to send a bunch of text and pull out a company name (or more details as possible) to return to paperless ngx.