Isaiah's Blog

Data Storage in Python with Google Sheets + Pandas

September 22, 2021

The Problem

You need somewhere centralized and human-readable to store data that you’re processing or collecting with a Python script.

The Solution

Use Google Sheets as free (or very cheap) storage for even large quantites of data, and Pandas DataFrames to make the data easy to handle on the scripting side.

Yeah, that sounds like it would be a huge lift. But, I’ve distilled the setup and utilization down enough to make it quite easy to use.

Requirements

To install gspread and pandas on my setup, I used pip.

pip3 install gspread
pip3 install pandas

Step by Step

  1. Create a new project on Google Cloud.

00 create project

  1. Access the Service Accounts page.

01 service accounts

  1. Create a new service account

02 create account

03 create account

  1. Create and download a key for the service account.

07 service accounts keys

08 make a key

09 make a key

10 make a key

  1. Copy the key to your Python project directory.
mv ~/Downloads/innate-mix-326900-6e64c607c722.json python-and-google-sheets/service_account.json
  1. Back on the Google Cloud dashboard, enable the Google Sheets API.

04 marketplace

05 marketplace search

06 google sheets

  1. Open service_account.json and copy the client email.

11 copy client email

  1. Open any Google Sheet that you want to use for this project, and share it with that email.

12 share sheet with client

  1. While you’re in the Sheet, copy the key. The key is the middle part of the share link between /d/ and /edit.

13 copy sheet id

  1. Create variables in your Python script to store this key and the ID of the worksheet.
SHEET_KEY = '1xsb5NmB9OfqthMsoXGN_6bIE2CCRoN0eTQ_QvWu_-vk'
SHEET_ID = 0

The ID of a worksheet is it’s numerical order within the larger sheet, starting with 0.

  1. Download google_sheets.py and move it into the same directory as your Python script.
  2. From your script, import the methods defined in google_sheets.py.
from google_sheets import get_sheet, set_sheet, clear_sheet, append_row_to_sheet
  1. Now you can use these methods, passing SHEET_KEY, SHEET_ID, and then a DataFrame or series to add to a DataFrame. Here’s an example implementation:
import pandas as pd
from google_sheets import get_sheet, set_sheet, clear_sheet, append_row_to_sheet

SHEET_KEY = '1xsb5NmB9OfqthMsoXGN_6bIE2CCRoN0eTQ_QvWu_-vk'
SHEET_ID = 0

data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}
 
df = pd.DataFrame(data)

set_sheet(SHEET_KEY, SHEET_ID, df)

Run this script, and you can see the data appear on the Sheet.

python3 example.py

14 updated data

Conclusion

With these basic methods, built on the CRUD model, you can work through almost any data storge need you might have. Plus, you have the added benefits of Google Sheets being cloud-based and human-readable by default. This easy setup and light Python module (29 lines of code total) together make up the most straightforward solution I could come up. I hope it can help you in your projects as well!


Isaiah Nixon

Written by Isaiah Nixon.
TwitterGitHubDrupal AnswersDrupal.orgLinkedIn