[Sunhelp] file conversion

Doug McLaren dougmc at frenzy.com
Tue Jun 13 01:25:51 CDT 2000


On Mon, Jun 12, 2000 at 09:07:09PM -0700, Gregory Leblanc wrote:

| > is there a software/package for solaris that converts .pdf 
| > files to .doc?
| 
| ghostscript?  .doc could be lots of things, if you're talking about
| Microsoft Word format, probably not.

You're not likely to like the answer.

PDF is very similar to postscript.  It also has the ability to be
encrypted, which makes it an even bigger pain in the ass.

Ghostscript can read non encrypted PDF files.  If it's encrypted, you
can use acroread to decrypt it into a postscript file, which
ghostscript can then read.

Once it's postscript, if the file was just a picture, you can convert
it to a picture - that's all.  And quite often manuals are scanned as
a picture.  If you just have a picture of each page, you can't extract
the text from it.  That, and the PDF and ps files tend to be huge.

If the PDF file includes text that is saved as text (and not a
picture) then maybe you can get it out of the postscript.  The text
you'll get is ugly, and some may be missing, but it may be better than
retyping stuff.

One program that can be used to pull the text out of a postscript file
is called pstotext.  pstotext does need ghostscript to work.

Relevant URLs include :

   http://research.compaq.com/SRC/virtualpaper/cgi-bin/nph-download.tcl/pstotext.tar.Z?object=pstotext
   ftp://prep.ai.mit.edu/pub/gnu/ghostscript
   http://www.adobe.com

Programs I have not used that may be useful include :

  pdf2html:  ftp://atrey.karlin.mff.cuni.cz/pub/local/clock/pdf2html/
  pdftohtml: http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/
  xpdf:      http://www.foolabs.com/xpdf/

-- 
Doug McLaren, dougmc at frenzy.com
Which is worse:  Ignorance or Apathy?  Who knows?  Who cares?





More information about the SunHELP mailing list