The best way to Convert a PDF File to Textual content Doc on Linux

Unlike a text file, you cannot edit a PDF directly. There are several ways to create PDF files with text. But what if you want to go the other way around and convert PDFs to text files?

Fortunately, with Linux, you can easily change these files from the terminal. This article shows how to convert a PDF file to a text document on Linux.

Convert PDF to text from the terminal

Poppler is a software library for rendering and modifying PDF files. It includes a utility known as pdftotextwhich allows users to generate text files from PDFs. Since poppler-utils is not part of the standard Linux package, you have to install it manually using a package manager.

On Ubuntu and Debian:

sudo apt install poppler-utils

To install Poppler on Arch Linux:

sudo pacman -S poppler

Installation of the poppler-utils Package on CentOS, Fedora, and other RHEL-based distributions is easy.

sudo dnf install poppler-utils
sudo yum install poppler-utils

Convert an entire PDF file to text

The basic syntax of the pdftotext command is:

pdftotext (options) pdffile text file

…Where PDF file is the absolute or relative path to the PDF file and Text file is the name of the output file.

For example to convert lorem-ipsum.pdf to a text file:

pdftotext lorem-ipsum.pdf text.txt

If the file being converted contains watermarks or misaligned text, you can discard them in the output -nodiag Flag.

pdftotext -nodiag lorem-ipsum.pdf random.text

Process pages within a certain range

Use the -f and -l Flag if you want to convert pages that fall within a certain range. For example, to convert pages one through five to convert lorem-ipsum.pdf send someone an SMS:

pdftotext -f 1 -l 5 lorem-ipsum.pdf output.txt

To convert only the first page of the PDF file:

pdftotext -f 1 -l 1 lorem-ipsum.pdf output.txt

Convert password-protected PDF files to text

Pdftotext can even convert password-protected PDFs into text files. The -upw and -opw Flags that stand for User password and Owner password Take care of the authentication process as you convert the PDF files.

pdftotext -upw password lorem-ipsum.pdf output.txt
pdftotext -opw password lorem-ipsum.pdf output.txt

Make sure you replace password with the password of the PDF file.

You can also combine multiple flags to get the output you want. For example, to convert pages one to three of a password-protected PDF file to text:

pdftotext -f 1 -l 3 -upw password lorem-ipsum.pdf output.txt

Related: How to Convert PDF File to Images on Linux

Convert PDF graphically to a text file

If working at the command line isn't your thing, you can convert PDFs to text files using graphics software like Caliber. It is an e-book management application that allows you to view, organize, and modify PDF files on your system.

Caliber is available in the official Linux distribution repositories and can be downloaded by anyone with a package manager.

To install caliber on Ubuntu and Debian:

sudo apt install caliber

Under Arch Linux:

sudo pacman – caliber S.

On RHEL-based distributions like CentOS and Fedora, you can download caliber using either DNF or Yum.

sudo dnf install caliber
sudo yum install caliber

How to use caliber to convert PDF files

After the installation on your system, start caliber with the Application menu. Alternatively, you can start caliber from the terminal by typing:

caliber

To generate text files with PDF using Caliber:

  1. Click on that Add books Option from the menu.

  2. Find and select the PDF file you want to convert.

  3. Select the PDF file in the middle area and choose Convert books from the menu.

  4. Of the Output format Dropdown, choose TXT.

  5. Then click on OK keep going.

Caliber will now start converting the specified PDF file into a text document. You can check the status of the operation by clicking the Jobs Option located in the lower right corner of the window.

Working with PDF files on Linux

If you want to share a document with someone, the most efficient way to do this is to convert it to PDF before sharing. In the past, users had to install a dedicated PDF viewer on their system to view PDF files, but now almost every browser has a built-in PDF viewer.

You can find several applications that allow a user to easily view and edit PDF files. Many Linux installations come with LibreOffice, an office software suite that can be used as a PDF editor.

The 5 Best Linux PDF Editors You Should Try

Do you need to edit a PDF file on Linux? These Linux PDF editors are free to install and easy to use.

Continue reading

About the author

Deepesh Sharma
(63 published articles)

Deepesh is Junior Editor for Linux at MUO. He has been writing informational content on the Internet for over 3 years. In his spare time he enjoys writing, listening to music and playing the guitar.

More
From Deepesh Sharma

Subscribe to our newsletter

Subscribe to our newsletter for tech tips, reviews, free e-books, and exclusive offers!

One more step …!

Please confirm your email address in the email we just sent you.

Leave a Reply

Your email address will not be published. Required fields are marked *