This project started a bit out of frustration with the existing tools.
On Linux one can search from the command line with grep or find. No GUI and to complex for my family. Further it's difficult to search in photos because only the few words in the exif comment are interesting, you have to carefully design the command to search the right type of documents.
There are several but some try to read all documents (html, pdf, openoffice, word, excel, mp3 and so on). To enable those documents the search tools are large. Some have settings to restrict the search but that makes them more complex again. Some are using KDE/QT. While that are a fantastic frameworks, they are also large.
My system use openbox and several other lightweight tools. Started many years ago with Gimp because its the most powerful photo editor available on Linux. Meanwhile I've learned how it works and are reluctant to migrate to something else. Using geeqie for photo file management because it's very fast. This means my system has Gtk3 and i try to avoid installing Qt.
A summary:
No tool ready in Archlinux repositories. Used docfetcher for a while: it comes very close to my needs but have to download and compile and it's based on java.
Why is there no tool based on C, Python ready available in the repositories ? Maybe such tool does not exist yet ?
So if there is nothing that meets my requirement i may have to build my own program. But a good search program is a lot of work hence i started looking for tools:
Since i know python it's obvious to first try python-whoosh:
Whoosh is a library of classes and functions for indexing text and then searching the index. It allows you to develop custom search engines for your content. It uses only pure Python.
By the way: speed is not an issue. Generating an index of my 40.000 photos takes a few minutes. But that's only done once. Updating is much faster.
Source here: https://github.com/mchaput/whoosh
Documentation: https://whoosh.readthedocs.io/en/latest/
Provides guidance on updating the index.
After just a few experiments i was already getting useful results.
Henk Speksnijder 20210521