What is the actual status for PDF search and where can I find this in the documentation. I’m German but don’t understand Portuguese.
Hi ralfh, how are you?
PDF search works and works well with PDF created with OCR too.
Here is the wiki page for that functionality:
https://tainacan.github.io/tainacan-wiki/#/indexing-pdf
Mostly what you need to do is define a constant and, if you have already imported the PDFs, run the WP cli command to reindex them.
In short:
wp-config.php:
define(‘TAINACAN_INDEX_PDF_CONTENT’, true);
WP CLI command:
wp tainacan index-content --collection=all
or
wp tainacan index-content --collection=<collection_id>
Hope it helps ![]()
Cheers,
Fred Marvila
To add up a bit:
The feature works like this:
When uploading a PDF file as the main document of the item, a php extension called `Smalot\PdfParser` will try to read it’s content and extract it into an internal metadata, that you won’t see on interface but will be reachable when you use textual search (the simple search input) or the advanced search when you select “Document content”.
The constant that @marvila mentioned, `TAINACAN_INDEX_PDF_CONTENT` was the way to go in versions before 1.0.0. While it still works, you can enable the indexing (in case it is not yet) via “Tainacan” → “Settings” → “PDF Content”. The WP CLI command mentioned is necessary in case you have already uploaded a series of files before enabling this.
Wow. Very quick and helpful response.
I will give it a try.
Do you plan to update the wiki information from 0.12. with this new process?
Yeah we need several updates in our docs… I did added the Settings screen info there but there is certainly more that can be done to clarify these alternatives. Hope you can use the feature from now on!