Have you ever wanted to extract text from an image automatically, Maybe you have a scanned document, a screenshot, or even a photograph containing important text that you need to process. I will show you how to use Python and Tesseract OCR to extract text from images efficiently.
We will start with a simple script and then enhance it with additional functionality, including GUI-based file selection, pre-processing for better accuracy, and multi-language support. By the end of this tutorial, you will have a fully functional OCR tool that saves the extracted text in multiple formats.
Explanation of the Basic Code
Import Required Libraries
We need two main libraries:
from PIL import Image
import pytesseract
PIL (Pillow)
: Used for image processing, such as opening and manipulating image files.pytesseract
: A Python wrapper for Google’s Tesseract OCR engine, which extracts text from images.
Define the Image Path
image_path = "image.png" # Replace with your image file
This is the file path to the image that contains the text you want to extract.
Open and Process the Image
image = Image.open(image_path)
This line opens the image using PIL so that we can process it.
Extract Text from the Image
extracted_text = pytesseract.image_to_string(image)
This function processes the image using Optical Character Recognition (OCR) and extracts any readable text.
Print the Extracted Text
print("Extracted Text from Image:\n")
print(extracted_text)
The extracted text will be displayed in the console.
Save the Extracted Text to a File
with open("extracted_text.txt", "w", encoding="utf-8") as text_file:
text_file.write(extracted_text)
print("Text extracted and saved to 'extracted_text.txt'")
This saves the extracted text into a text file (extracted_text.txt
).
Enhancing the Script with More Practical Functionality
While the basic script works well, we can improve it in several ways:
- GUI-based file selection (so the user can pick an image without modifying the script).
- Image pre-processing (grayscale conversion and thresholding for better OCR accuracy).
- Language selection (support for multiple languages in OCR).
- Saving the text in multiple formats (
.txt
and.csv
).
Enhanced Python Code
import pytesseract
from PIL import Image
import tkinter as tk
from tkinter import filedialog
import cv2
import numpy as np
import os
# Function to enhance and preprocess image
def preprocess_image(image_path):
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) # Convert to grayscale
_, processed_img = cv2.threshold(image, 150, 255, cv2.THRESH_BINARY) # Apply thresholding
return processed_img
# Function to extract text
def extract_text(image_path, lang="eng"):
processed_img = preprocess_image(image_path)
temp_filename = "temp_image.png"
cv2.imwrite(temp_filename, processed_img) # Save temporary preprocessed image
# OCR processing
extracted_text = pytesseract.image_to_string(Image.open(temp_filename), lang=lang)
os.remove(temp_filename) # Remove temporary file
return extracted_text
# Function to select an image and extract text
def main():
root = tk.Tk()
root.withdraw() # Hide the main window
file_path = filedialog.askopenfilename(title="Select an Image", filetypes=[("Image Files", "*.png;*.jpg;*.jpeg")])
if not file_path:
print("No file selected.")
return
lang_choice = input("Enter language code (default 'eng' for English, 'spa' for Spanish, 'fra' for French, etc.): ").strip() or "eng"
extracted_text = extract_text(file_path, lang_choice)
if extracted_text.strip():
print("\nExtracted Text:\n")
print(extracted_text)
# Save as .txt file
text_filename = os.path.splitext(file_path)[0] + "_extracted.txt"
with open(text_filename, "w", encoding="utf-8") as text_file:
text_file.write(extracted_text)
print(f"\nText extracted and saved to '{text_filename}'")
# Save as .csv file
csv_filename = os.path.splitext(file_path)[0] + "_extracted.csv"
with open(csv_filename, "w", encoding="utf-8") as csv_file:
csv_file.write("Extracted Text\n")
csv_file.write(extracted_text.replace("\n", " ")) # Store in a single row
print(f"CSV file saved as '{csv_filename}'")
else:
print("No text found in the image.")
if __name__ == "__main__":
main()
Final Thoughts
I hope this guide helps you build your own OCR tool using Python. With the ability to extract text from images, this project has a variety of real-world applications, such as digitizing documents, automating data entry, and processing text from scanned receipts.
You can further improve this project by integrating it into a web app using Flask or Django, adding AI models for handwritten text recognition, or automating bulk image processing.