Discussion:
Ligatures in xpdf search: Fail
(too old to reply)
Mike Spencer
2023-06-12 06:04:45 UTC
Permalink
Back in March, I complained that the version of xpdf distributed with
Slackware 15 was hard-coded to use CUPS and would only "print to file"
under lprng, a problem for which I still have no fix.

Now I've discovered further brain damage.

The search facility is, depending on how you look at it, either too
stupid or too smart. Reading an article on complexity, searching for
the name of Stuart Kauffman by last name failed. Nope, sorry, no
mention of "Kauffman" in this document. Paging down to the footnotes,
there was Kauffman's name. But even with the text of his name
displayed on the screen, search for it failed.

It's because the authors (or their software) used a code point for the
"ff" ligature and xpdf insists that you search for that datum,
unwilling to accommodate the fact that no one types "ff" ligature into
a search pane. If I use the mouse to copy and paste the "ff" from
Kauffman into the search pane, xpdf finds it fine.

How many of the other commonly used "fi", "fl", "ffi"
and "ffl" ligatures are going to impede searching? And there are
others less commonly seen such as "st".

Yes, I see that there's stuff in the man pages about text encoding.
Is it worth hours of my time to figure out a lot of stuff about
unicode mapping? I don't see anything about how that would affect
search.

I think I have to find some other way to deal with PDF files.
--
Mike Spencer Nova Scotia, Canada
Henrik Carlqvist
2023-06-13 05:40:29 UTC
Permalink
Post by Mike Spencer
It's because the authors (or their software) used a code point for the
"ff" ligature and xpdf insists that you search for that datum, unwilling
to accommodate the fact that no one types "ff" ligature into a search
pane.
Yes, I see that there's stuff in the man pages about text encoding. Is
it worth hours of my time to figure out a lot of stuff about unicode
mapping? I don't see anything about how that would affect search.
I think I have to find some other way to deal with PDF files.
Unicode is a mess in so many ways... But is there really any pdf reader
out there capable of successfully do a search like that? Would Acrobat
Reader do it better?

regards Henrik
Lew Pitcher
2023-06-13 15:59:40 UTC
Permalink
Post by Mike Spencer
Back in March, I complained that the version of xpdf distributed with
Slackware 15 was hard-coded to use CUPS
Apparently, that behaviour is due to the developer's use of QT toolkit;
the QT print dialog only supports CUPS. Note that the developer doesn't
actually support all the features of the QT print dialog either, so, even
with CUPS, some of the dialog options do not work.
(https://forum.xpdfreader.com/viewtopic.php?t=41828)
Post by Mike Spencer
and would only "print to file"
under lprng, a problem for which I still have no fix.
I haven't tried this myself, but you /might/ be able to circumvent
the toolkit print dialog and print directly to lpr by setting the
"psFile" configuration option, or the "-ps" commandline argument.
See xpdf(1) and xpdfrc(5) for details.
Post by Mike Spencer
Now I've discovered further brain damage.
The search facility is, depending on how you look at it, either too
stupid or too smart. Reading an article on complexity, searching for
the name of Stuart Kauffman by last name failed. Nope, sorry, no
mention of "Kauffman" in this document. Paging down to the footnotes,
there was Kauffman's name. But even with the text of his name
displayed on the screen, search for it failed.
It's because the authors (or their software) used a code point for the
"ff" ligature and xpdf insists that you search for that datum,
unwilling to accommodate the fact that no one types "ff" ligature into
a search pane. If I use the mouse to copy and paste the "ff" from
Kauffman into the search pane, xpdf finds it fine.
Apparently, the developer has "addressed" (but not fixed) this issue in
the xpdfreader version of his software. If you are interested in the
details, see https://forum.xpdfreader.com/viewtopic.php?t=42051
It doesn't look like the developer has yet implemented the fix that
would permit correct searching with ligatures.
Post by Mike Spencer
I think I have to find some other way to deal with PDF files.
Probably.


HTH
--
Lew Pitcher
"In Skills We Trust"
Jim Diamond
2023-06-16 20:23:06 UTC
Permalink
Post by Mike Spencer
Back in March, I complained that the version of xpdf distributed with
Slackware 15 was hard-coded to use CUPS and would only "print to file"
under lprng, a problem for which I still have no fix.
Now I've discovered further brain damage.
The search facility is, depending on how you look at it, either too
stupid or too smart. Reading an article on complexity, searching for
the name of Stuart Kauffman by last name failed. Nope, sorry, no
mention of "Kauffman" in this document. Paging down to the footnotes,
there was Kauffman's name. But even with the text of his name
displayed on the screen, search for it failed.
It's because the authors (or their software) used a code point for the
"ff" ligature and xpdf insists that you search for that datum,
unwilling to accommodate the fact that no one types "ff" ligature into
a search pane. If I use the mouse to copy and paste the "ff" from
Kauffman into the search pane, xpdf finds it fine.
How many of the other commonly used "fi", "fl", "ffi"
and "ffl" ligatures are going to impede searching? And there are
others less commonly seen such as "st".
Yes, I see that there's stuff in the man pages about text encoding.
Is it worth hours of my time to figure out a lot of stuff about
unicode mapping? I don't see anything about how that would affect
search.
I think I have to find some other way to deal with PDF files.
Mike,

xpdf is pretty rudimentary in a lot of ways. Have you considered
installing evince (there is a SlackBuild for it)?

Or... Notwithstanding the fact that the final version of acroread has some
security bugs, I use acroread when I have no reason to be suspicious of the
PDF. To do this on Slackware64 you need to install Alien Bob's
compatibility stuff, but with that it works fine for me.

To evince's credit, it found words with ffl, fl and ff ligatures in a
document I created with TeX. Acroread did not find those (even when I
copied and pasted the word into the search box). Other ligatures are left
as exercises to the diligent student.

But at least acroread will print to file for me (I use cups, not sure what
would happen if I was a lprng guy).

If you want to try installing acroread and need any help, feel free to
reply here or directly, as you prefer.

Jim
Henrik Carlqvist
2023-06-17 08:51:41 UTC
Permalink
Post by Jim Diamond
But at least acroread will print to file for me (I use cups, not sure
what would happen if I was a lprng guy).
I use an old Adobe Acrobat Reader 8.1.7 from year 2009 which
successfully prints to printer with lprng. When I want to print
something from Firefox browser I need to print to file from Firefox
(which nowadays only support cups) to get a pdf file which I then can
print with acroread.

regards Henrik
Jim Diamond
2023-06-17 14:55:48 UTC
Permalink
Post by Henrik Carlqvist
Post by Jim Diamond
But at least acroread will print to file for me (I use cups, not sure
what would happen if I was a lprng guy).
I use an old Adobe Acrobat Reader 8.1.7 from year 2009 which
successfully prints to printer with lprng. When I want to print
something from Firefox browser I need to print to file from Firefox
(which nowadays only support cups) to get a pdf file which I then can
print with acroread.
Acroread 9.5.5 for Linux is (was, but still can be found) available.

Just out of curiosity, are you using 8.1.7 because it has some feature
lacking in 9.5.5, or did you never feel the need to upgrade?

Cheers.
Jim
Henrik Carlqvist
2023-06-17 17:03:42 UTC
Permalink
Post by Jim Diamond
Just out of curiosity, are you using 8.1.7 because it has some feature
lacking in 9.5.5, or did you never feel the need to upgrade?
It was probably the later, but now it was a long time ago I made that
decision so I cant say for sure.

The most advanced thing I have used acrobat reader for which I couldn't
do with other programs like xpdf or okular was to view pdf files with
embedded 3D models which was possible to rotate, zoom and pan in
acroread. I used latex to create such documents, starting with something
like an .obj file which had to be converted to an .u3d file.

Unfortunately the meshlab functionality to convert to u3d format was
broken so I contributed a patch at
https://sourceforge.net/p/meshlab/patches/7/ which never made it upstream.

regards Henrik
Mike Spencer
2023-06-20 21:11:32 UTC
Permalink
Thanks all for the discussion. Don't stop. :-)

I've been using Slackware for over 20 years but never beat up
SlackBuild, just compiled a source tarball when needed.

So I'm looking into evince & slackbuild, will try to hunt up other
suggested alternatives.

For the foreseeable future, I'm clinging to the trailing edge of
technology with a 32 bit system and other components that I already
understand or have become accustomed to. I'll report back if/when I
get a PDF handler that suits me.

The "Life-long Learning" slogan is supposed to be about learning *new*
stuff, pursuing fresh enlightenment for the aging brain, not about
learning the same stuff over and over as old stuff gets wrapped in new
packaging or exfoliates a huge but unwanted superstructure. Don't
need to learn how to drive and maintain a Winnebago to go the the
corner store for milk.
--
Mike Spencer Nova Scotia, Canada
Loading...