Data Storage in Python with Google Sheets + Pandas
September 22, 2021
The Problem
You need somewhere centralized and human-readable to store data that you’re processing or collecting with a Python script.
The Solution
Use Google Sheets as free (or very cheap) storage for even large quantites of data, and Pandas DataFrames to make the data easy to handle on the scripting side.
Yeah, that sounds like it would be a huge lift. But, I’ve distilled the setup and utilization down enough to make it quite easy to use.
Requirements
To install gspread and pandas on my setup, I used pip.
pip3 install gspread
pip3 install pandas
Step by Step
- Access the
Service Accounts
page.
- Create a new service account
- Create and download a key for the service account.
- Copy the key to your Python project directory.
mv ~/Downloads/innate-mix-326900-6e64c607c722.json python-and-google-sheets/service_account.json
- Back on the Google Cloud dashboard, enable the Google Sheets API.
- Open
service_account.json
and copy the client email.
- Open any Google Sheet that you want to use for this project, and share it with that email.
- While you’re in the Sheet, copy the key. The key is the middle part of the share link between
/d/
and/edit
.
- Create variables in your Python script to store this key and the ID of the worksheet.
SHEET_KEY = '1xsb5NmB9OfqthMsoXGN_6bIE2CCRoN0eTQ_QvWu_-vk'
SHEET_ID = 0
The ID of a worksheet is it’s numerical order within the larger sheet, starting with 0.
- Download google_sheets.py and move it into the same directory as your Python script.
- From your script, import the methods defined in
google_sheets.py
.
from google_sheets import get_sheet, set_sheet, clear_sheet, append_row_to_sheet
- Now you can use these methods, passing
SHEET_KEY
,SHEET_ID
, and then a DataFrame or series to add to a DataFrame. Here’s an example implementation:
import pandas as pd
from google_sheets import get_sheet, set_sheet, clear_sheet, append_row_to_sheet
SHEET_KEY = '1xsb5NmB9OfqthMsoXGN_6bIE2CCRoN0eTQ_QvWu_-vk'
SHEET_ID = 0
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
set_sheet(SHEET_KEY, SHEET_ID, df)
Run this script, and you can see the data appear on the Sheet.
python3 example.py
Conclusion
With these basic methods, built on the CRUD model, you can work through almost any data storge need you might have. Plus, you have the added benefits of Google Sheets being cloud-based and human-readable by default. This easy setup and light Python module (29 lines of code total) together make up the most straightforward solution I could come up. I hope it can help you in your projects as well!