Excel files, particularly those with the .xlsx extension, can be read and manipulated in Python using libraries such as pandas and openpyxl. Here’s a guide on how to read Excel files and the key concepts associated with Excel documents.
• Workbook: An Excel file containing one or more sheets, saved with a .xlsx extension.
• Sheet (Worksheet): A single page within a workbook, containing data organized in rows and columns.
• Cell: The intersection of a row and a column, where individual data values are stored.
• Active Sheet: The currently viewed sheet or the last sheet viewed before closing Excel.
openpyxl is a library for reading and writing .xlsx files. It provides detailed control over the Excel file's contents.
If you haven't installed it yet:
pip install openpyxl
Here’s a step-by-step guide to reading an Excel file with openpyxl:
from openpyxl import load_workbook
# Load the workbook
wb = load_workbook('example.xlsx')
# Get the active sheet
sheet = wb.active
# Accessing cell values
print("Cell A1:", sheet['A1'].value)
# Iterating over rows
for row in sheet.iter_rows(values_only=True):
print(row)
• Loading Workbook: load_workbook('example.xlsx') loads the workbook from the file.
• Accessing Active Sheet: wb.active retrieves the currently active sheet.
• Accessing Cell Value: sheet['A1'].value gets the value from cell A1.
• Iterating Over Rows: sheet.iter_rows(values_only=True) iterates through rows, returning cell values as tuples.
from openpyxl import load_workbook
# Load the workbook
wb = load_workbook('example.xlsx')
# Get a specific sheet by name
sheet = wb['Sheet1']
# Get cell value
print("Cell A1:", sheet['A1'].value)
# Iterate over rows
for row in sheet.iter_rows(values_only=True):
print(row)
For older .xls files, you can use the xlrd library. Note that xlrd no longer supports .xlsx files as of version 2.0.1, so it's only suitable for .xls files.
If you need it for .xls files:
pip install xlrd
import xlrd
# Open the workbook
workbook = xlrd.open_workbook('example.xls')
# Get the first sheet
sheet = workbook.sheet_by_index(0)
# Accessing cell value
print("Cell A1:", sheet.cell_value(0, 0))
# Iterating over rows
for row_idx in range(sheet.nrows):
row = sheet.row_values(row_idx)
print(row)
• Opening Workbook: xlrd.open_workbook('example.xls') opens the .xls file.
• Getting Sheet: workbook.sheet_by_index(0) gets the first sheet (0-based index).
• Accessing Cell Value: sheet.cell_value(0, 0) gets the value from the cell at row 0, column 0 (A1).
• Iterating Over Rows: sheet.row_values(row_idx) gets the values of a specific row.