Pdf File Reader(pdf_file) number_of_pages = read_Num Pages() page = read_Page(0) page_content = page.extract Text() print page_content.encode('utf-8') def extract Text(self): """ Locate all text drawing commands, in the order they are provided in the content stream, and extract the text.

This works well for some PDF files, but poorly for others, depending on the generator used. Do not rely on the order of text coming out of this function, as it will change if this function is made more sophisticated. """ Was looking for a simple solution to use for python 3.x and windows.

There doesn't seem to be support from textract, which is unfortunate, but if you are looking for a simple solution for windows/python 3 checkout the tika package, really straight forward for reading pdfs import os, subprocess SCRIPT_DIR = dirname(abspath(__file__)) args = ["/usr/local/bin/pdftotext", '-enc', 'UTF-8', "/my-pdf.pdf".format(SCRIPT_DIR), '-'] res = subprocess.run(args, stdout=subprocess. PIPE) output = res.stdout.decode('utf-8') There is pdftotext which does basically the same but this assumes pdftotext in /usr/local/bin whereas I am using this in AWS lambda and wanted to use it from the current directory. You can use TIFF PDF Counter for Windows and Mac to count number of pages of multipage TIFF and PDF Documents.

