Type Safety with Database Queries ¶
When interacting with databases in Rust, I use the SQLx library. Among many other features, SQLx type checks your queries at compile time against your database and allows you to build and return custom values as custom types:
// Rust
struct Person {age: i32, name: String}
let people = sqlx::query_as!(Person, "SELECT name, age FROM people").fetch_one(&mut conn)?;
With Python, we can approximate these ergonomics and the type safety (kind of) using Pydantic models and the built in SQLite3 module.
First, we can create an in memory SQLite3 database to work with:
import sqlite3
data = [
{"age": 19, "name": "jim"},
{"age": 28, "name": "sarah"},
{"age": 22, "name": "ash"},
]
con = sqlite3.connect(":memory:")
con.execute("CREATE TABLE IF NOT EXISTS people(age INTEGER, name TEXT)")
con.commit()
con.executemany("INSERT INTO people(name, age) VALUES(:name, :age)", data)
con.commit()
We are going to be working with the row_factory attribute on a connection (or cursor). This attribute determines how values are returned to python from the database. The default setting is None
, which returns a tuple (or a list of tuples) as the result of your query.
The SQlite3 module provides a Row
class which provides some nice ergonomics. For example, you can return the data as a list of dictionaries:
con.row_factory = sqlite3.Row
res = [dict(i) for i in con.execute("SELECT name, age FROM people").fetchall()]
print(res)
# [{'name': 'jim', 'age': 19}, {'name': 'sarah', 'age': 28}, {'name': 'ash', 'age': 22}]
The sqlite3 module docs have instructions to create your own row_factory
, which is any python callable that takes a cursor object and a row (tuple). So we can create a pydantic model and modify the example from the docs to return our model:
from pydantic import BaseModel
class Person(BaseModel):
age: int
name: str
def people_factory(cursor, row):
fields = [column[0] for column in cursor.description]
return Person(**{k: v for k,v in zip(fields, row)})
con.row_factory = people_factory
res = con.execute("SELECT name, age FROM people").fetchall()
print(res)
# [Person(age=19, name='jim'), Person(age=28, name='sarah'), Person(age=22, name='ash')]
This gives us Pydantic’s runtime safety with values from our queries. If we try to return something that isn’t named in the model or is the wrong type, we’ll get a runtime error from pydantic:
res = con.execute("SELECT 1 as name, 'jim' as age").fetchall()
Traceback (most recent call last):
File "<stdin>", line 12, in <module>
File "<stdin>", line 9, in people_factory
File "/Users/nick/dev/venv/lib/python3.11/site-packages/pydantic/main.py", line 165, in __init__
__pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Person
age
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='jim', input_type=str]
For further information visit https://errors.pydantic.dev/2.3/v/int_parsing
name
Input should be a valid string [type=string_type, input_value=1, input_type=int]
For further information visit https://errors.pydantic.dev/2.3/v/string_type
Note I’ve set this custom row_factory
at the connection level, so this is the only model we can use for this session unless we re-set it to None or another worker function. For me, this is fine because I will often just open a new connection within the function when dealing with SQLite, but if you have one connection and many cursors, you can also set cursor.row_factory
.
Using Python’s Dataclasses ¶
If you want to stick to the standard library, you can use a similar pattern with python’s dataclasses
.
from dataclasses import dataclass
@dataclass
class PersonDataclass:
age: int
name: str
def dataclass_factory(cursor, row):
fields = [column[0] for column in cursor.description]
return PersonDataclass(**{k: v for k, v in zip(fields, row)})
con.row_factory = dataclass_factory
res = con.execute("SELECT name,age FROM people").fetchall()
print(res)
# [PersonDataclass(age=19, name='jim'), PersonDataclass(age=28, name='sarah'), PersonDataclass(age=22, name='ash')]
However, you won’t get Pydantic’s runtime type safety:
res = con.execute("SELECT 1 as name, 'jim' as age").fetchall()
print(res)
# [PersonDataclass(age='jim', name=1)] # uh oh
Adapting Python ¶
While it isn’t quite the same as the fantastic compile time query type checks in Rust’s SQLx library, the ergonomics of using Pydantic and row_factory
make it much easier to return types without using an Object Relational Mapper and build more robust code with python.