Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Cannot search inside PDFs with MediaWiki
I have a MediaWiki server set up on WSL 2 with CirrusSearch and its dependencies. Everything is working fine, except searching inside of PDFs. I can search for filenames, but not PDF content. The PDFs will display, so I'm pretty sure PDFHandler is working properly. MediaWiki version: 1.41.1
I have tried:
- Running all the setup scripts for ElasticSearch and CirrusSearch
- Deleting and re-uploading files to make sure they get indexed
1 answer
I ran which gs convert pdfinfo pdftotext
and noticed that the output was /usr/bin/gs
, etc. The Ubuntu defaults for PdfHandler have the path set as gs
, not /usr/bin/gs
. I changed the paths for gs
, pdfinfo
, and pdftotext
to their /usr/bin
equivalents and also disabled the shell memory limit by setting $wgMaxShellMemory = 0
in LocalSettings.php.
I re-uploaded the files, and now they have proper thumbnails and previews as well as being searchable. It took a few minutes for the server to process the changes, but after that everything worked smoothly.
0 comment threads