Felix Crux

Technology & Miscellanea

Transcribe: Generate static sites with Django and YAML

Transcribe is a static site generator that integrates the conveniences of Django to allow you to easily use templates; generate RSS feeds; and create chronological archives, tag-link-pages, and index pages.

By using Transcribe you can run a site without any database dependencies, without worrying about vulnerabilities in a complex stack, and with super fast-loading plain old HTML pages, while still keeping (some of) the conveniences you associate with a stack like Django.


There is document in the form of a sample project. You can build it by running make sample in the project's root directory. Then, examine the contents of sample/out. This page goes over some of those files in more detail.

The centrepiece of Transcribe is the combination of YAML-defined content files with Django HTML templates. Templates can be specified on a per-page basis, but if no such exception exists, a generic one is used instead. Directories full of content can also have paginated “index” pages and chronological historical archives associated with them, as well as RSS feeds for updates.

Certain attributes within the YAML content can also be defined as “linkable” indexes, which will result in pages being created for each observed value of the attribute, with each page listing all the items that had that attribute value. This allows for the reimplementation of the “tags” system for blog posts.

All of the properties of the system are customizable through a simple config file (that is in fact itself a Python file, so you can automate away a lot of the tedious aspects of file/directory listing).

As an example of how to use the system, let's consider the case of a simple blog. Let's say you want to produce a set of output files that looks something like this:

|-- about.html
`-- blog/
    |-- index.html
    |-- an-article.html
    |-- more-content.html     <-- Special unique design
    |-- quick-update.html
    |-- rss.xml               <-- RSS feed
    |-- archives
    |   |-- index.html
    |   `-- 2012
    |       |-- index.html
    |       `-- dec
    |           `-- index.html
    `-- tags
        |-- tag1.html
        `-- tag2.html

Let's assume that you have configured your webserver in such a way that requests for the top-level index.html page actually return the blog/index.html page.

Let's also assume that more-content.html requires a special, unique, HTML layout, because it is a special project you're showing off.

The first thing to do would be to create directory structures for your content and templates:

|-- content/
|   |-- about.yaml
|   `-- blog/
|       |-- an-article.yaml
|       |-- more-content.yaml
|       `-- quick-update.yaml
`-- templates/
    |-- about.html
    `-- blog/
        |-- archive.html
        |-- item.html
        |-- list.html
        `-- more-content.html

You'd also want to include a config file in the root directory.

At this point, your content files can be populated with YAML-defined tags, like this:

title: "An Article"
tags: [tag1, tag2]
pub_date: !!timestamp 2013-06-02
content: |
  This is an article.

You'll have to define a couple of templates. The item.html one is used when there is no filename-matching specific template, like there is in the case of more-content.{yaml,html}. list.html is used for ordered listings (by the field defined in your config file). All of these templates will have access to the YAML keywords defined above, e.g. {{ title }}. In the case of lists, the list will benamed after the directory; for example: {% for post in blog %}.... The root keyword contains the name of the directory the content lived in, like “blog” — this is useful for generating links.

The only slightly tricky template file is the one used for archives: the data structure is once again named after the directory it originates from, but it now contains years, which are not simply numbers, but structures with a year keyword (which contains the numeric value), and a month field that similarly contains a month field and a posts list. To make all that abstract verbiage more concrete, here's an example snippet of a reasonable archive.html:

{% for year in blog|dictsortreversed:"year" %}
  <a href="/{{ root }}/archives/{{ year.year }}/">
    {{ year.year }}
  {% for month in year.months|dictsortreversed:"month" %}
    <a href="/{{ root }}/archives/{{ year.year }}/{{ month.month|date:"b" }}/">
      {{ month.month|date:"F" }}
    {% for post in month.posts|dictsortreversed:"pub_date" %}
        <a href="/{{ post.root }}/{{ post.link }}">{{ post.title }}</a>
    {% endfor %}
  {% endfor %}
{% endfor %}

At this point, you've got all the complex initial setup out of the way. Adding new content is as simple as defining a new YAML file, and running transcribe.py.