Database generated columns⁽¹⁾: Django & SQLite

An introduction to database generated columns, using SQLite and the new GeneratedField added in Django 5.0.

© 2023 Paolo Melchiorre “View of clouds over the Labrador Peninsula (Canada) taken from a commercial flight.”
© 2023 Paolo Melchiorre “View of clouds over the Labrador Peninsula (Canada) taken from a commercial flight.”
Database generated columns (3 part series)
  1. Database generated columns⁽¹⁾: Django & SQLite
  2. Database generated columns⁽²⁾: Django & PostgreSQL
  3. Database generated columns⁽³⁾: GeoDjango & PostGIS

TL;DR

GeneratedField is a new Django 5.0 feature and has these characteristics:

Here is a model example with a GeneratedField to use with SQLite as the database backend:

from django.db import models
from django.db.models import F


class Rectangle(models.Model):
    base = models.FloatField()
    height = models.FloatField()
    area = models.GeneratedField(
        expression=F("base") * F("height"),
        output_field=models.FloatField(),
        db_persist=True,
    )

🚀 Introduction

I’ve been thinking for a long time that database generated columns would be a killer feature for Django and now they will finally be available in Django 5.0 with the new GeneratedField.

Given my great interest in this functionality, I proposed it to the Django developer mailing list, I tried to help in the revision phase of the Pull Request on the Django code, but above all I tried to test them as much as possible after the merge, trying to highlight any bugs before the release of Django 5.0 as a stable version.

In this article, I want to share with you the work I have done to allow you to experiment with GeneratedField using common model field types using SQLite as a database backend.

🤖 Generated Field

The GeneratedField are full-fledged fields that can be used in queries, displayed and even indexed but their values cannot be set or modified because they are automatically calculated by the database itself whenever the other fields of the same row are modified.

For this reason, they are very useful for having immediately available values calculated starting from the other fields of the same model as long as these are in the same database table.

The definition of the GeneratedField class requires specifying the expression, output_field, and db_persist attributes.

class GeneratedField(
    expression, output_field, db_persist=None, **kwargs
):
    ...

🕶️ Attributes

db_persist

In this article, we’re going to use SQLite as a database backend with persisted columns, specified with db_persist=True, in that the database column will occupy storage as if it were a real column.

output_field

We have to specify an explicit model field instance to define the field’s data type and its attributes.

expression

The expression attributes is the one we’re going to play more with, using various example of Expression that the database will use to automatically set the field value each time the model is changed.

↔️ Migrations

The SQL code generated from the migrations produced in our examples will all have the GENERATED ALWAYS SQL (...) STORED syntax, varying only in the internal logic of the expression.

💻 Set Up

The requirements to experiment with Django GeneratedField are:

🐍 Python

A stable and supported version of Python 3 (tested with Python 3.11-3.12):

$ python3 --version
Python 3.12.0

⚗️ Virtual environment

A Python virtual environment:

$ python3 -m venv ~/.generatedfields
$ source ~/.generatedfields/bin/activate

🦄 Django

The latest version of Django (tested with Django 5.0):

$ python3 -m pip install django==5.0

🗃️ The generatedfields project

To create the generatedfields project I’ll switch to my projects directory:

$ cd ~/projects

and then use the startproject Django command:

$ python3 -m django startproject generatedfields

The basic files of our project will be created in the generatedfields directory:

$ tree --noreport generatedfields/
generatedfields/
├── manage.py
└── generatedfields
    ├── asgi.py
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

🗂️ The samples app

After switching to the generatedfields directory:

$ cd generatedfields

We can create our samples app with the Django startapp command:

$ python3 -m django startapp samples

Again, all the necessary files will be created for us in the samples directory:

$ tree --noreport samples/
samples/
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│   └── __init__.py
├── models.py
├── tests.py
└── views.py

✅ Activating the samples app

Now, we have to activate our samples application by inserting its name in the list of the INSTALLED_APPS in the generatedfields settings file.

generatedfields/generatedfields/settings.py

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "samples",
]

📚 Sample models

The example models I have experimented with have expressions with complexity that progressively increase from example to example.

We start by combining two fields of the model to create the value for the generated one and then using database functions, combining them, using them with conditions and even generating multiple fields from the same starting field.

These experiments have allowed us to identify some bugs and fix them during this Django 5.0 testing period, but I leave off some edge cases to take into account which we’ll mention at the end of the article.

🔧 Useful commands

In the following examples, we will use some commands that we show here to avoid repetition.

makemigrations

Generates a new database migration to apply it to our database:

$ python3 -m manage makemigrations --no-header

sqlmigrate

See the SQL code that our migration will apply to the database:

$ python3 -m manage sqlmigrate samples <migration-id>

migrate

Apply migration to our database with:

$ python3 -m manage migrate

shell

Start an interactive Django shell:

$ python3 -m manage shell

✖️ A calculated product field

The first generated field experiment I thought of is the one in which to use the model fields by combining them, for example, the Rectangle model, in which we have two float fields, base and height, and a generated field, the area, which is automatically calculated by the database by multiplying the other two fields together.

generatedfields/samples/models.py

from django.db import models
from django.db.models import F


class Rectangle(models.Model):
    base = models.FloatField()
    height = models.FloatField()
    area = models.GeneratedField(
        expression=F("base") * F("height"),
        output_field=models.FloatField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.base}×{self.height}="
            f"{self.area}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Rectangle
--
CREATE TABLE "samples_rectangle" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "base" real NOT NULL,
  "height" real NOT NULL,
  "area" real GENERATED ALWAYS AS (
    ("base" * "height")
  ) STORED
);
COMMIT;

In the generated SQL code you see the notation for the generated columns GENERATED ALWAYS AS, the specific formula "base" * "height" and the notation specifying that the column is of type STORED.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Rectangle
>>> Rectangle.objects.create(base=6, height=7)
<Rectangle: 6×7=42.0>

✳️ A calculated exponentiation field

The second example model is the Square, from its single side field, the value of the area generated field is calculated, using the Power database function, made available by Django’s ORM.

generatedfields/samples/models.py

from django.db import models
from django.db.models.functions import (
    Power,
)


class Square(models.Model):
    side = models.FloatField()
    area = models.GeneratedField(
        expression=Power("side", 2),
        output_field=models.FloatField(),
        db_persist=True,
    )

    def __str__(self):
        return f"{self.side}²={self.area}"

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Square
--
CREATE TABLE "samples_square" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "side" real NOT NULL,
  "area" real GENERATED ALWAYS AS (
    POWER("side", 2)
  ) STORED
);
COMMIT;

The SQL code generated for the SQLite database uses the built-in POWER mathematical function to calculate the value of the calculated column.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Square
>>> Square.objects.create(side=3)
<Square: 3²=9.0>

⭕ A calculated geometric formula field

The Circle model is very similar to that of the Square but the expression for calculating the value of the area generated field uses the database function Pi, and the result of multiplication with the latter by the value of the radius field, is rounded, with the other database function Power.

generatedfields/samples/models.py

from django.db import models
from django.db.models.functions import (
    Pi, Power, Round
)


class Circle(models.Model):
    radius = models.FloatField()
    area = models.GeneratedField(
        expression=Round(
            Power("radius", 2) * Pi(),
            precision=2,
        ),
        output_field=models.FloatField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.radius}²×π="
            f"{self.area}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Circle
--
CREATE TABLE "samples_circle" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "radius" real NOT NULL,
  "area" real GENERATED ALWAYS AS (
    ROUND((POWER("radius", 2) * PI()), 2)
  ) STORED
);
COMMIT;

The SQL code of the calculated column is more complex than that of the previous example, in line with the increased complexity of the model expression but, the functions used have a one-to-one correspondence with those made available by the Django ORM.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Circle
>>> Circle.objects.create(radius=3.1415)
<Circle: 3.1415²×π=31.0>

📐 A calculated trigonometric functions field

The RightTriangle model is the last geometric example and in this one a fairly complex expression is used to calculate the generated field area, starting from the hypotenuse and angle fields, using the trigonometric functions Sin and Cos made available by the Django ORM in addition to those used in the previous examples.

generatedfields/samples/models.py

from django.db import models
from django.db.models.functions import (
    Cos, Power, Radians, Round, Sin
)


class RightTriangle(models.Model):
    hypotenuse = models.FloatField()
    angle = models.FloatField()
    area = models.GeneratedField(
        expression=Round(
            (
                Power("hypotenuse", 2)
                * Sin(Radians("angle"))
                * Cos(Radians("angle"))
            )
            / 2,
            precision=2,
        ),
        output_field=models.FloatField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.hypotenuse}²×"
            f"sin({self.angle}°)×"
            f"cos({self.angle}°)÷"
            f"2={self.area}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model RightTriangle
--
CREATE TABLE "samples_righttriangle" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "hypotenuse" real NOT NULL,
  "angle" real NOT NULL,
  "area" real GENERATED ALWAYS AS (
    ROUND(
      (
        (
          (
            POWER("hypotenuse", 2)
            * SIN(RADIANS("angle"))
          ) * COS(RADIANS("angle"))
        ) / 2
      ),
      2
    )
  ) STORED
);
COMMIT;

Despite the increased complexity of the expression, the SQL code of the database generated column expression, for SQLite, is clear, demonstrating the power of SQL for these types of queries.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import RightTriangle
>>> RightTriangle.objects.create(angle=45, hypotenuse=5)
<RightTriangle: 5²×sin(45°)×cos(45°)÷2=6.25>

🧾 A calculated total price field

In the Item model the expression of the generated field total_price is a multiplication of the two fields price and quantity to which the Round function is applied. However, the integer quantity field has a default value of 1, but this default value is declared via the new db_default parameter, which sets a database-computed default value, also introduced in Django 5.0.

generatedfields/samples/models.py

from django.db import models
from django.db.models import F, Value


class Item(models.Model):
    price = models.DecimalField(
        max_digits=6, decimal_places=2
    )
    quantity = models.PositiveSmallIntegerField(
        db_default=Value(1)
    )
    total_price = models.GeneratedField(
        expression=F("price") * F("quantity"),
        output_field=models.DecimalField(
            max_digits=11, decimal_places=2
        ),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.price}×"
            f"{self.quantity}="
            f"{self.total_price}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Item
--
CREATE TABLE "samples_item" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "price" decimal NOT NULL,
  "quantity" smallint unsigned DEFAULT 1 NOT NULL CHECK (
    "quantity" >= 0
  ),
  "total_price" decimal GENERATED ALWAYS AS (
    CAST(("price" * "quantity") AS NUMERIC)
  ) STORED
);
COMMIT;

The SQL code generated for SQLite contains an expression similar to a previous example, but the presence of the DEFAULT in the quantity column means that the calculation logic of the generated column total_price is entirely contained in the database.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Item
>>> Item.objects.create(price=9.99)
<Item: 9.99×1=9.99>
>>> Item.objects.create(price=4.99, quantity=2)
<Item: 4.99×2=9.98>

🚦 A calculated status field

In the Order model we see an example of using a conditional expression in the generated status field. The presence or absence of the creation and payment values permits the selection of the correct status value directly in the SQLite database.

generatedfields/samples/models.py

from django.db import models
from django.db.models import (
    Case, Value, When
)


class Order(models.Model):
    creation = models.DateTimeField()
    payment = models.DateTimeField(null=True)
    status = models.GeneratedField(
        expression=Case(
            When(
                payment__isnull=False,
                then=Value("paid"),
            ),
            default=Value("created"),
        ),
        output_field=models.TextField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"[{self.status}] "
            f"{self.payment or self.creation}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Order
--
CREATE TABLE "samples_order" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "creation" datetime NOT NULL,
  "payment" datetime NULL,
  "status" text GENERATED ALWAYS AS (
    CASE
      WHEN "payment" IS NOT NULL THEN 'paid'
      ELSE 'created'
    END
  ) STORED
);
COMMIT;

The SQL generated by SQLite uses the CASE expression to populate the value of the generated column status based on whether or not there is a datetime value in the payment column.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Order
>>> Order.objects.create(creation="2023-01-01 12:00Z")
<Order: [created] 2023-01-01 12:00Z>
>>> Order.objects.create(
...     creation="2023-01-02 00:00Z",
...     payment="2023-01-03 06:30Z"
... )
<Order: [paid] 2023-01-02 6:30Z>

🗓️ Calculated date and duration fields

In the Event model there are three fields generated compared to only two basic DateTimeField, the start field is required and the end field is optional. The generated fields start_date and end_date have an expression that truncates the date of the corresponding fields. In the third generated field duration the expression performs the subtraction between the value of the end and start field. If the basic fields are empty, the expression of the generated fields returns a null value.

generatedfields/samples/models.py

from django.db import models
from django.db.models import F
from django.db.models.functions import (
    TruncDate,
)


class Event(models.Model):
    start = models.DateTimeField()
    start_date = models.GeneratedField(
        expression=TruncDate("start"),
        output_field=models.DateField(),
        db_persist=True,
    )
    end = models.DateTimeField(null=True)
    end_date = models.GeneratedField(
        expression=TruncDate("end"),
        output_field=models.DateField(),
        db_persist=True,
    )
    duration = models.GeneratedField(
        expression=F("end") - F("start"),
        output_field=models.DurationField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"[{self.duration or '∞'}] "
            f"{self.start_date}…"
            f"{self.end_date or ''}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Event
--
CREATE TABLE "samples_event" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "start" datetime NOT NULL,
  "start_date" date GENERATED ALWAYS AS (
    django_datetime_cast_date(
      "start", 'UTC', 'UTC'
    )
  ) STORED,
  "end" datetime NULL,
  "end_date" date GENERATED ALWAYS AS (
    django_datetime_cast_date(
      "end", 'UTC', 'UTC'
    )
  ) STORED,
  "duration" bigint GENERATED ALWAYS AS (
    django_timestamp_diff(
      "end", "start"
    )
  ) STORED
);
COMMIT;

In the generated SQL code the expressions do not have complex logic, but we see the presence of the deterministic functions django_datetime_cast_date and django_timestamp_diff defined in the Django database backend for SQLite.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Event
>>> Event.objects.create(start="2023-1-1 12:00Z")
<Event: [∞] 2023-01-01…>
>>> Event.objects.create(
...     start="2023-1-1 11:45Z",
...     end="2023-1-9 00:00Z"
... )
<Event: [7 days, 12:15:00] 2023-01-01…2023-01-09>

🔑 A calculated JSON key field

In the Package model there is the slug field and the data JSONfield in which we want to store the JSON payload of the Python package, corresponding to the slug value, as it is returned to us by PyPi via its endpoint https://pypi.org/pypi/. The generated field version will contain the latest version of the package, extracted from the JSONfield data.

generatedfields/samples/models.py

from django.db import models
from django.db.models import F


class Package(models.Model):
    slug = models.SlugField()
    data = models.JSONField()
    version = models.GeneratedField(
        expression=F("data__info__version"),
        output_field=models.TextField(),
        db_persist=True,
    )

    def __str__(self):
        return f"{self.slug} {self.version}"

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Package
--
CREATE TABLE "samples_package" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "slug" varchar(50) NOT NULL,
  "data" text NOT NULL CHECK (
    (
      JSON_VALID("data")
      OR "data" IS NULL
    )
  ),
  "version" text GENERATED ALWAYS AS (
    (
      CASE
        WHEN JSON_TYPE(
          "data", '$."info"."version"'
        ) IN (
          'false', 'true', 'null'
        ) THEN JSON_TYPE(
          "data", '$."info"."version"'
        )
        ELSE JSON_EXTRACT(
          "data", '$."info"."version"'
        )
      END
    )
  ) STORED
);
CREATE INDEX "samples_package_slug_idx"
ON "samples_package" ("slug");
COMMIT;

The SQL code generated for SQLite is a little more complex and verbose than what we might have using another database with advanced JSON type support (for example PostgreSQL).

After running migrate, open a shell and check the model behaves as intended:

>>> import json
>>> from urllib.request import urlopen
>>> from samples.models import Package
>>> slug = "django"
>>> URL = f"https://pypi.org/pypi/{slug}/json"
>>> data = json.loads(urlopen(URL).read())
>>> Package.objects.create(slug=slug, data=data)
<Package: django 4.2.7>

🔔 Note

At first glance, the case presented in this example might seem not very useful because it is limited to a simple extraction of the value corresponding to a specific key from a JSON, which in many databases is rather quick to do at run time.

In reality, if we assume that we have a large number of instances of the Package model, the execution of a query in which the version value is already present in its field greatly lightens its execution by simplifying the logic, and also alleviating indexes.

Furthermore, if the version were the only interesting field to extract from the JSON in the data field, we could defer the data field from our queries, speeding up them and drastically reducing the data transferred from the database (e.g. in the change list pages, in the Django admin).

🖇️ A calculated concatenated field

The last User model partially follows the one already present in Django itself, but we only have the first_name and the last_name fields. The generated field full_name is calculated starting from the first two on which is performed a concatenation with the addition of a space, using the appropriate Concat function.

generatedfields/samples/models.py

from django.db import models
from django.db.models import Value
from django.db.models.functions import (
    Concat,
)


class User(models.Model):
    first_name = models.CharField(max_length=150)
    last_name = models.CharField(max_length=150)
    full_name = models.GeneratedField(
        expression=Concat(
            "first_name", Value(" "), "last_name",
        ),
        output_field=models.TextField(),
        db_persist=True,
    )

    def __str__(self):
        return self.full_name

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model User
--
CREATE TABLE "samples_user" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "first_name" varchar(150) NOT NULL,
  "last_name" varchar(150) NOT NULL,
  "full_name" text GENERATED ALWAYS AS (
    COALESCE("first_name", '') ||
    COALESCE(COALESCE(' ', '') ||
    COALESCE("last_name", ''), '')
  ) STORED
);
COMMIT;

The SQL code generated for SQLite uses the concatenation operator || to join column values together with space and the COALESCE function in case one of the values is empty.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import User
>>> User.objects.create(first_name="Jane", last_name="Doe")
<User: Jane Doe>

🔔 Note

This concatenation example is one of the most used to present the potential of the generated fields. I wanted to present it only at the end because, contrary to what it may seem at first glance, it contains pitfalls when used with other databases but I will delve into this in the next article of this series of articles on database generated columns.

🙋 Get involved

Thank you for getting this far with reading this article.

I hope you found it interesting to read the examples I proposed, as it was for me to formulate them and report them here. The motivation behind this work and this article is to collaborate with all Django developers to churn out features that are as solid and stable as possible.

As I have repeatedly invited you to do in my articles and talks, I invite you to get involved.

Try Django 5.0 and experiment with its new features, but above all report any strange behavior or bugs so that they end up being better for everyone.

🐙 GitHub

I have published the code presented in this article in this GitHub repository of mine to allow you to carry out experiments.

🥠 Conclusion

As we have seen, the generated fields of Django 5.0 are a very powerful tool, which lend themselves to solving a wide variety of problems.

I think they also have the merit of bringing a lot of logic back into the database, which in most projects is the lowest layer of the project stack.

Together with the database-computed default values they also make it possible to simplify the collaboration of different applications on the same database.

In future articles, I’ll try to show other examples of using Django’s generated fields with other Django-supported database backends (e.g. PostgreSQL) and with other Django modules (e.g. GeoDjango).

Stay tuned.

— Paolo


⚠️ Disclaimer

This code is for demonstration purposes only and should not be used in production as is. However, the code is released without any guarantee from the author and no liability can be attributed. Use at your own risk.

🔗 Resources

📰 Updates