[geeks] PDF: How to tell if a PDF file is text or image only

Mike Meredith very at zonky.org
Wed Dec 13 01:19:20 CST 2006


On Tue, 12 Dec 2006 17:50:02 -0600, Bill Bradford wrote:
> On Tue, Dec 12, 2006 at 04:12:45PM -0500, Charles Shannon Hendrix
> wrote:
> > I need to be able to tell in a shell script if a PDF file is text
> > based or image based.
> > What I mean is that some documents are actualy PDF text documents,
> > while others are image scans only with no text.
> 
> And some are BOTH.

pdftotext $file /var/tmp/pdftest.$$
if [ -z /var/tmp/pdftest.$$ ]
then
    echo No text in pdf
fi
rm -f /var/tmp/pdftotext

or something like that (the sigmonster has come with an appropriate
comment). pdftotext is part of xpdf I believe.


-- 
Mike Meredith (http://zonky.org/)
  One test is worth a thousand opinions.



More information about the geeks mailing list