PDF Parser Command Line
xpdf file.pdf
xpdf file.pdf :18
OCR Commands
PDF pages to Image
magick -density 150 file.pdf[0-6] -quality 90 -resize 75% pdfPage-%d.png
Note First page starts at 0. If you want nth page it should be like
magick -density 150 file.pdf[n minus one value] -quality 90 -resize 75% pdfPage-%d.png
Image to Text
rem tesseract pdfPage-2.png pdfText/pdfPage-2.txt
rem tesseract pdfPage-[3-6].png pdfText/pdfPage-%d.md
FOR /L %y IN (3, 1, 6) do tesseract "pdfPage-%y.png" pdfText/pdfPage-%y
rem FOR /L %%y IN (3, 1, 6) do tesseract "pdfPage-%%y.png" pdfText/pdfPage-%%y
tesseract pdfPage-6.png pdfText/pdfPage-6 -l mal
|
|