Django 3.2: Compressed fixtures, fixtures compression

In the Django 3.2 version just released I contributed with new features related to compressed fixtures and fixtures compression. In this article I have explored the topic and produced some sample benchmarks.

© 2019 Paolo Melchiorre “Photo of a renaissance tower helical staircase in Palazzo Ducale - Urbino, Marche, Italy”
© 2019 Paolo Melchiorre “Photo of a renaissance tower helical staircase in Palazzo Ducale - Urbino, Marche, Italy”
Django under the hood (3 part series)
  1. Update annotated Django querysets using subqueries
  2. Django 3.2: Compressed fixtures, fixtures compression
  3. μDjango (micro Django) 🧬

Management Commands

As reported in the documentation, the changes are related to the scope of the management commands.

loaddata

The loaddata command searches for and loads the contents of the named fixture into the database.

Compressed fixtures

In the Django 3.2 version was added support for xz archives (.xz) and lzma archives (.lzma).

Fixtures may be compressed in zip, gz, bz2, lzma, or xz format.

For example $ django-admin loaddata mydata.json would look for any of mydata.json, mydata.json.zip, mydata.json.gz, mydata.json.bz2, mydata.json.lzma, or mydata.json.xz.

The first file contained within a compressed archive is used.

dumpdata

The dumpdata outputs all data in the database associated with some or installed applications. The output of dumpdata can be used as input for loaddata.

Fixtures compression

In the Django 3.2 version was added support to dump data directly to a compressed file.

The output file can be compressed with one of the bz2, gz, lzma, or xz formats by ending the filename with the corresponding extension.

For example, to output the data as a compressed JSON file $ django-admin dumpdata -o mydata.json.gz

Benchmarks

After the development of the new fixtures compression function I carried out benchmarks for all supported file formats starting from different databases, from small projects to larger ones.

The benchmarks were performed on my pc and are only examples of the relationship between time, file size, memory and cpu occupation that is needed to export data directly into different types of compressed files.

System info

import os
import platform

print(
    f"Architecture:\t{platform.architecture()[0]}\n"
    f"Machine type:\t{platform.machine()}\n"
    f"System glibc:\t{platform.libc_ver()[1]}\n"
    f"System memory:\t"
    f"{os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')}\n"
    f"System release:\t{platform.release()}\n"
    f"System type:\t{platform.system()}\n"
    f"Python impl.:\t{platform.python_implementation()}\n"
    f"Python version:\t{platform.python_version()}\n"
    f"OS name:\t{platform.freedesktop_os_release()['NAME']}\n"
    f"OS version:\t{platform.freedesktop_os_release()['VERSION']}"
)
Architecture:   64bit
Machine type:   x86_64
System glibc:   2.32
System memory:  33402449920
System release: 5.8.0-48-generic
System type:    Linux
Python impl.:   CPython
Python version: 3.8.6
OS name:        Ubuntu
OS version:     20.10 (Groovy Gorilla)

Benchmark 01

typetimememorycpusize
txt0.75s70kB99%826B
gz0.66s71kB99%312B
bz20.69s70kB99%351B
xz0.67s87kB99%336B
Benchmark 01
Benchmark 01 graphics

Benchmark 02

typetimememorycpusize
txt0.67s70kB99%1.2kB
gz0.66s71kB99%501B
bz20.66s71kB99%538B
xz0.68s87kB99%532B
Benchmark 02
Benchmark 02 graphics

Benchmark 03

typetimememorycpusize
txt1s72kB98%870kB
gz1.1s73kB99%30kB
bz21.2s79kB99%21kB
xz1.1s97kB99%23kB
Benchmark 03
Benchmark 03 graphics

Benchmark 04

typetimememorycpusize
txt1.5s71kB98%2.1MB
gz1.6s72kB98%258kB
bz21.7s78kB98%198kB
xz2.4s107kB99%164kB
Benchmark 04
Benchmark 04 graphics

Benchmark 05

typetimememorycpusize
txt2.1s74.3kB98%5.2MB
gz2.2s74.2kB98%406kB
bz22.7s81kB98%334kB
xz3.2s137kB99%238kB
Benchmark 05
Benchmark 05 graphics

Benchmark 06

typetimememorycpusize
txt55s87kB73%12MB
gz72s87.2kB71%845kB
bz254s93kB74%689kB
xz73s181kB73%769kB
Benchmark 06
Benchmark 06 graphics

Benchmark 07

typetimememorycpusize
txt119s86kB74%36MB
gz183s87kB71%3.9MB
bz2159s93kB73%2.7MB
xz221s182kB73%2.6MB
Benchmark 07
Benchmark 07 graphics

Benchmark 08

typetimememorycpusize
txt533s89kB79%395MB
gz712s90kB77%95MB
bz2673s96kB78%74MB
xz1217s185kB79%65MB
Benchmark 08
Benchmark 08 graphics

Conclusions

From the benchmarks carried out with various starting data in exporting data directly to compressed files, it is clear that:

The export of fixtures directly to compressed files therefore allows a strong reduction of the space occupied in the face of a small increase in the time and resources required for creation.

In addition there is the possibility for the user to choose the best file type for their use case, opting for maximum compression (xz) or for greater portability (gz).

PR #12871
Added tests for loaddata with gzip/bzip2 compressed fixtures.
Ticket #31552
Loading lzma compressed fixtures.
PR #12879
Fixed #31552 — Added support for LZMA and XZ fixtures to loaddata.
Ticket #32291
Add support for fixtures compression in dumpdata.
PR #13797
Fixed #32291 — Added fixtures compression support to dumpdata.