Running a server

git-pages is an application that serves static websites from either filesystem or an S3 compatible object store and updates them when directed by the site author through an HTTP request. The server scales linearly from a personal instance running on a Raspberry Pi to a highly available, geographically distributed cluster powering Grebedoc. It is written in Go and does not depend on any other services, although most installations will use a reverse proxy like Caddy or Nginx to serve sites using the https:// protocol.

This document explains how to configure and operate a git-pages server.

Installation

You can install git-pages from a binary build or source code, or use a Docker image.

... from a binary build

git-pages provides binary builds for multiple architectures and operating systems on the release index. Download the right binary for your platform and place it in a directory on your PATH (or anywhere else convenient). Builds for the following platforms are currently available:

Operating system	Hardware	Binary name
Linux	64-bit Intel	`git-pages.linux-amd64`
Linux	64-bit ARM	`git-pages.linux-arm64`
Windows	64-bit Intel	`git-pages.windows-amd64.exe`
macOS	M1 and later	`git-pages.darwin-arm64`

If you'd like us to provide binary builds for a platform not listed above, file an issue.

Warning

In addition to building binaries for release versions, the binaries built from the latest commit to the main branch should be available in a pre-release called latest, but this usually fails to materialize due to a bug in Forgejo.

... from source code

git-pages requires Go 1.25 or newer. Once you install Go and Git, install the application as follows:

$ go install codeberg.org/git-pages/git-pages@latest

This command will install the latest released version; you can pick a different version by replacing latest with the desired version number, e.g. v0.3.0.

... via Docker

git-pages provides an OCI-compliant (Docker) container image built for each release as well as from the latest commit to the main branch. To quickly set up a server on port 3000 storing data in ./data, run:

$ docker run --rm -v ./data:/app/data -p 3000:3000 codeberg.org/git-pages/git-pages:latest

Configuration

git-pages has no required configuration options and can be launched as simply as:

$ git-pages -no-config
time=2025-12-07T05:21:42.045Z level=INFO msg="memlimit: was 8.0 EB now 31.3 GB"
time=2025-12-07T05:21:42.046Z level=INFO msg="fs: has atomic CAS"
time=2025-12-07T05:21:42.046Z level=INFO msg="serve: ready"

However, most installations will use configuration options. By default, the configuration is read from a TOML file named config.toml in the current directory; alternately the location of the configuration file may be specified using the -config /path/to/config.toml command line option.

If the environment variable CREDENTIALS_DIRECTORY is set, then a file $CREDENTIALS_DIRECTORY/secrets.toml is read afterwards, and any options specified in it take priority over the options set in config.toml; this mechanism is used by systemd to allow settings and secrets to be stored in separate files. Alternately, the location of the secrets file may be specified using the -secrets /path/to/secrets.toml command line option, in which case the environment variable is disregarded.

Environment variables

In addition to the config.toml and potentially secrets.toml files, git-pages may be configured via environment variables. Every TOML configuration option has a corresponding environment variable name. For example, the following two configurations are equivalent:

config.toml

[storage]
type = "s3"

environment

PAGES_STORAGE_TYPE=s3

The -print-config-env-vars command line option lists every accepted environment variable and its default value:

$ git-pages -print-config-env-vars
PAGES_INSECURE bool = "false"
PAGES_FEATURES []string = "[]"
PAGES_LOG_FORMAT string = "text"
PAGES_LOG_LEVEL string = "info"
PAGES_SERVER_PAGES string = "tcp/:3000"
...

Whenever both the TOML file(s) and an environment variable specify a value for some configuration option, the value of the environment variable is used. The -print-config command line option displays the final configuration (after taking into account the default values, the configuration file, and the environment variables):

$ go run . -print-config
features = []
log-format = 'text'
log-level = 'info'
wildcard = []

[server]
pages = 'tcp/localhost:3000'
caddy = 'tcp/localhost:3001'
metrics = 'tcp/localhost:3002'

...

Storage backends

git-pages needs a place to store the contents of the sites it manages, which is called a backend. Currently, two backends are provided: local filesystem and S3 object store.

Filesystem

By default, git-pages uses the filesystem backend and stores the site contents in the data subdirectory of the current directory. The only configuration option for this backend is the path where the data will be stored:

config.tomlenvironment

[storage]
type = "fs"

[storage.fs]
root = "./data"

PAGES_STORAGE_TYPE=fs
PAGES_STORAGE_FS_ROOT=./data

S3 object store

Storing site contents in an AWS S3 compatible object store is useful when git-pages is deployed in a cluster, or simply to reduce storage costs. At a minimum, it is necessary to configure an endpoint, a bucket name, a region, an access key, and a secret:

config.tomlenvironment

[storage]
type = "s3"

[storage.s3]
endpoint = "play.min.io"
access-key-id = "Q3AM3UQ867SPQQA43P2F"
secret-access-key = "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"
region = "us-east-1"
bucket = "git-pages-demo"

PAGES_STORAGE_TYPE=s3
PAGES_STORAGE_S3_ENDPOINT=play.min.io
PAGES_STORAGE_S3_ACCESS_KEY_ID=Q3AM3UQ867SPQQA43P2F
PAGES_STORAGE_S3_SECRET_ACCESS_KEY=zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
PAGES_STORAGE_S3_REGION=us-east-1
PAGES_STORAGE_S3_BUCKET=git-pages-demo

Refer to the documentation of your S3 object store service provider for how to obtain these credentials. The credentials above are valid for the public demo service provided by MinIO, to be used for evaluation only.

The S3 object store backend in git-pages is conservative with the requests it makes, and may be used with virtually any S3-compatible service or application. It has been verified to work with AWS S3, Wasabi, Tigris, MinIO, and Garage. We do not recommend any specific service or application, and you should pick the option that fits your needs the best.

Note

For partial update operations (PATCH requests), git-pages will attempt to use the conditional writes function of the S3 object store (specifically, PutObject requests are issued with both If-Match: and If-Unmodified-Since: headers). Object stores that provide this function include AWS S3, Tigris, and MinIO. If conditional writes are not implemented natively by the object store, git-pages will emulate them using HeadObject, which has a small window during which a race condition may cause an update operation to be lost.

In most cases, the presence or lack of conditional write support should not affect your choice of an S3 object store. You may ignore this section unless you expect a very high frequency of partial updates.

To improve page load times and reduce the amount of requests to the S3 object store, git-pages implements an in-memory cache in front of it. The maximum size of this cache may be adjusted according to the amount of available memory; the defaults are conservative and should be acceptable for most deployments. The maximum age and stale period of the site cache determine the speed at which updates to the site contents reach the visitors; the meaning of these values is the same as in the HTTP Cache-Control header.

config.tomlenvironment

[storage.s3.blob-cache]
max-size = "256MB"

[storage.s3.site-cache]
max-size = "16MB"
max-age = "60s"
max-stale = "1h"

PAGES_STORAGE_S3_BLOB_CACHE_MAX_SIZE = "256MB"
PAGES_STORAGE_S3_SITE_CACHE_MAX_SIZE = "16MB"
PAGES_STORAGE_S3_SITE_CACHE_MAX_AGE = "60s"
PAGES_STORAGE_S3_SITE_CACHE_MAX_STALE = "1h"

Systemd unit file

Typically, git-pages will run as a system service. The following systemd unit file can be used as a starting point for running it on a Linux system:

git-pages.service

[Unit]
Description=git-pages static site server
After=network-online.target
Requires=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
ExecStart=git-pages -config /etc/git-pages/config.toml
LoadCredential=secrets.toml:/etc/git-pages/secrets.toml
StateDirectory=git-pages
DynamicUser=true
PrivateTmp=true
Restart=on-failure
RestartSec=5s

Configuration may be provided via Environment keys in addition to, or instead of the TOML configuration files:

git-pages.service

[Service]
ExecStart=git-pages -no-config
...
Environment="PAGES_STORAGE_FS_ROOT=/data/git-pages"
...

TLS termination

Today, it is expected that every website will be served TLS encrypted from an https:// URL. git-pages itself does not implement TLS encryption and requires a reverse proxy server to terminate TLS (decrypt incoming traffic, pass it to git-pages, then encrypt the response and send it back).

The recommended reverse proxy server is Caddy; other options include Nginx and HAProxy. The advantage of using Caddy is that it is easy to configure and it fully automates acquisition of TLS certificates from Let's Encrypt for an open set of domains (where uploading a site to git-pages enables Caddy to acquire a TLS certificate on demand). If you plan to use git-pages with only a few domains that are known in advance, you may use any reverse proxy server you like.

... with Caddy

The minimal Caddyfile below is suitable for a deployment of git-pages on a limited set of domains:

https://mydomain.tld, https://otherdomain.tld {
    reverse_proxy http://localhost:3000
}

The following Caddyfile shows how to deploy git-pages on an open set of domains using On-Demand TLS:

{
    on_demand_tls {
        permission http http://localhost:3001
    }
}

https:// {
    tls {
        on_demand
    }

    reverse_proxy http://localhost:3000
}

These Caddy configurations use the default ports for the pages and caddy endpoints. The http://localhost:3000 (pages) endpoint is used to serve site contents and process updates, while the http://localhost:3001 (caddy) endpoint is used by Caddy to check whether it should acquire a TLS certificate for a given domain. Both of these endpoints are configurable:

config.tomlenvironment

[server]
pages = "tcp/localhost:3000"
caddy = "tcp/localhost:3001"

PAGES_SERVER_PAGES=tcp/localhost:3000
PAGES_SERVER_CADDY=tcp/localhost:3001

To use a Unix domain socket instead of TCP, configure an endpoint as unix//path/to/endpoint.sock in both git-pages and Caddy.

If you are deploying git-pages on a single machine, the configuration above is sufficient. In a cluster deployment, it is convenient to configure Caddy to store key material in the same S3 object store as the site contents. This can be done by building Caddy with the certmagic-s3 plugin and configuring it to use the same environment variables as git-pages itself uses for S3 credentials:

{
    storage s3 {
        host "{env.PAGES_STORAGE_S3_ENDPOINT}"
        access_id "{env.PAGES_STORAGE_S3_ACCESS_KEY_ID}"
        secret_key "{env.PAGES_STORAGE_S3_SECRET_ACCESS_KEY}"
        bucket "{env.PAGES_STORAGE_S3_BUCKET}"
        prefix "ssl"
    }
}

Success

Congratulations! You can now publish a site to your git-pages server. While you can stop here, the sections below explain how to adjust your configuration further.

Wildcard domains

git-pages can be used to publish personal sites for each user of a git forge like Forgejo. To do this, it is necessary to set up a wildcard DNS record for a domain used to publish the sites, and configure a mapping between its subdomains and git repositories hosted on the forge.

Note

While wildcard domains can be configured via environment variables, the resulting values are difficult to read and maintain; the examples will use TOML only.

Let's consider a configuration for a forge https://mygit.forge where the sites are published under https://mygit.page:

[[wildcard]]
domain = "mygit.page"
clone-url = "https://mygit.forge/<user>/<project>.git"
index-repo = "<user>.mygit.page"
index-repo-branch = "main" # (1)!

The default is index-repo-branch = "pages", and we recommend keeping it.

Warning

For security reasons, you must not use the second-level domain of your forge (mygit.forge in this example) to host user-authored sites; use a different second-level domain (mygit.page in this case) to isolate your forge and safeguard sensitive credentials.

This configuration creates two distinct mappings:

Every site URL of the form https://USER.mygit.page/ corresponds to a repository https://mygit.forge/USER/USER.mygit.page.git and branch main;
Every site URL of the form https://USER.mygit.page/PROJECT/ corresponds to a repository https://mygit.forge/USER/PROJECT.git and branch pages.

Sites using wildcard domains may be updated in one of the three following ways:

A webhook POST request delivered by a forge (Forgejo, Gitea, Gogs, or GitHub) causes git-pages to clone the repository and publish the contents of the corresponding branch;
A PUT request containing a git repository URL causes git-pages to clone the repository and publish the contents of the corresponding branch;
A PUT or PATCH request containing an archive causes git-pages to publish the contents of the archive.

In all three cases, only someone with write access to the corresponding repository is able to change site contents. Uploading an archive with the output of a static site generator (typically from a CI workflow) increases efficiency and reduces unnecessary git repository updates; it requires using an authorization token with write access to the repository obtained from the forge. The supported forges are Forgejo, Gitea, and Gogs, and the method must be enabled as follows:

config.toml (Forgejo)config.toml (Gitea)config.toml (Gogs)

[[wildcard]]
...
authorization = "forgejo"

[[wildcard]]
...
authorization = "gitea"

[[wildcard]]
...
authorization = "gogs"

Multiple wildcard domains may be configured by repeating the [[wildcard]] section in the configuration file.

Resource limits

git-pages offers configurable limits on its operation so as to operate robustly in an adversarial environment of the open internet. The default values are conservative and suitable for most deployments, but may need to be changed when publishing larger sites.

Go heap size limit

The Go language uses garbage collection, meaning that the memory allocated by the application is not released to the OS right away, but only after the need arises. Almost all operations performed by git-pages require only a small amount of RAM, with the exception of site updates where the entire site may have to be loaded into memory all at once. This may cause the machine to run out of memory and kill the git-pages process, causing an outage.

The maximum heap size ratio sets a soft limit for heap size past which the Go runtime will aggressively attempt to free memory even at the cost of reduced performance. It is configured as a ratio of total available memory (excluding swap); the default value of 0.5 means that Go will only use more than 50% of system memory if absolutely necessary.

config.tomlenvironment

[limits]
max-heap-size-ratio = 0.5

PAGES_LIMITS_MAX_HEAP_SIZE_RATIO=0.5

This configuration option is most useful for resource-constrained machines; git-pages will run with less than 64 MB of RAM, but it will not be able to publish large sites in such a configuration.

Site size limits

git-pages offers fine-grained limits for user-authored content through four closely related configuration options.

The update timeout limits the amount of time a site update may take. It applies equally to updates from a git repository and updates from an uploaded archive. This limit may need to be raised when dealing with slow connections, slow git repository hosts, or very large sites.

config.tomlenvironment

[limits]
update-timeout = "60s"

PAGES_LIMITS_UPDATE_TIMEOUT=60s

The maximum site size option limits how much storage in total a single site may consume, and is applied throughout the processing of the site; for example, when a ZIP archive is uploaded, the limit first applies to the size of the compressed request body, and then to the total size of every archive member. This limit is applied before compression and deduplication to ensure its transparency and fairness.

config.tomlenvironment

[limits]
max-site-size = "128M"

PAGES_LIMITS_MAX_SITE_SIZE=128M

The maximum manifest size option limits how large an individual site manifest can get. The site manifest needs to be loaded in whole every time the site is accessed, so its size should be quite small; typically no more than a few MB. This option indirectly limits how many files can there be in a single site; as a rough approximation, 1 MB of manifest size fits about 5000 files.

config.tomlenvironment

[limits]
max-manifest-size = "1M"

PAGES_LIMITS_MAX_MANIFEST_SIZE=1M

Inline file size limit

The maximum inline file size limit changes how big a file may become until it is stored in a separate blob rather than directly in the manifest. This limit should not be less than the size of a blob reference, which is 71 bytes long. Increasing this limit enlarges manifests and reduces the amount of very small blobs. This limit should only be changed if storing blobs under certain size is expensive, to reduce latency for sites with many small files, or to address concerns of data retention within detached audit records.

config.tomlenvironment

[limits]
max-inline-file-size = "256B"

PAGES_LIMITS_MAX_INLINE_FILE_SIZE=256B

Functional limits

git-pages also offers configurable restrictions on its functionality whose purpose is to prevent accidental or intentional misuse.

The custom header allowlist enumerates headers configurable via the _headers file. The following headers are considered so critical that they cannot be included in this allowlist: Accept-Ranges, Age, Allow, Alt-Svc, Connection, Content-Encoding, Content-Length, Content-Range, Date, Location, Server, Trailer, Transfer-Encoding, Upgrade. The allowlist applies both when a site is updated (where a header not in the allowlist causes a diagnostic to be produced and then gets discarded) and when a page is served (where using a header not in the allowlist causes the page load to fail with a 500 Internal Server Error).

config.tomlenvironment

[limits]
allowed-custom-headers = ["X-Clacks-Overhead", "X-Frame-Options"]

PAGES_LIMITS_ALLOWED_CUSTOM_HEADERS=X-Clacks-Overhead,X-Frame-Options

The basic authentication option allows sites to be partially or fully password-protected via the Basic-Auth pseudo-header in the _headers file. This is not a security mechanism: the value of this header specifies the password in cleartext, and anybody who can download a site manifest will see not only all filenames and some file contents, but also all passwords as well. This mechanism is intended for low-stakes situations, like preventing search engines from indexing documentation for unpublished versions of software, or for restricting access to the notes of your tabletop RPG group.

config.tomlenvironment

[limits]
allow-basic-auth = false

PAGES_LIMITS_ALLOW_BASIC_AUTH=false

The repository URL prefix allowlist restricts the set of repositories from which sites may be uploaded. Since it is not possible to determine what repository (if any) an archive was built from, archive uploads are prohibited, except for wildcard domains with forge authorization and domains with forge DNS allowlist authorization configured.

config.tomlenvironment

[limits]
allowed-repository-url-prefixes = ["https://mygit.forge/"]

PAGES_LIMITS_ALLOWED_REPOSITORY_URL_PREFIXES=https://mygit.forge/

The forbidden domain list defines domains to which a site may not be uploaded under any circumstances. This includes subdomains; the configuration below prohibits uploading sites to both internal.mygit.page and metrics.internal.mygit.page. Sites that have been uploaded before the option has been set will continue to be available.

config.tomlenvironment

[limits]
forbidden-domains = ["internal.mygit.page", "status.mygit.page"]

PAGES_LIMITS_FORBIDDEN_DOMAINS=internal.mygit.page,status.mygit.page

The maximum symlink depth option limits how much time will be spent resolving symbolic links. The primary purpose of this limit is to avoid infinite loops caused by resolving a symlink that refers to itself (directly or indirectly). The default value should be suitable for virtually all deployments.

config.tomlenvironment

[limits]
max-symlink-depth = 16

PAGES_LIMITS_MAX_SYMLINK_DEPTH=16

The maximum concurrent uploads option limits how many concurrent requests (counted in total per git-pages instance) will be used to upload the contents of very large sites. The default value should be suitable for virtually all deployments.

config.tomlenvironment

[limits]
concurrent-uploads = 1024

PAGES_LIMITS_CONCURRENT_UPLOADS=1024

Observability

git-pages provides four modes of insight into its runtime operation: logs, counters, errors, and traces. It offers multiple ways of exposing this information.

As a policy, IP addresses are never included in this data; they are only included verbatim in audit logs if explicitly configured for collection.

Console logs

git-pages will send application logs to the standard error stream by default. The possible log formats are none (no output), text (default; human-readable text), and json (one JSON object per log line).

config.tomlenvironment

log-format = "text"

PAGES_LOG_FORMAT=text

Syslog

git-pages will send application logs to a syslog daemon like syslog-ng or VictoriaLogs when the SYSLOG_ADDR environment variable is set to a valid destination. Note that this environment variable does not have an equivalent TOML configuration option.

Local destinations are specified using unixgram//path/to/endpoint.sock or tcp/localhost:port.
Network destinations are specified using tcp+tls/host:port or tcp/host:port.

Prometheus

git-pages contains many statistical counters that can be used to monitor application health, analyze resource usage, build pretty Grafana dashboards and more. The values of these counters are available via the http://localhost:3002 (metrics) endpoint in the Prometheus format (both text and binary). This endpoint is configurable:

config.tomlenvironment

[server]
metrics = "tcp/localhost:3002"

PAGES_SERVER_METRICS=tcp/localhost:3002

The purpose of each counter is documented as a part of the Prometheus format; it should be displayed in your observability software, or you can explore the output of curl http://localhost:3002.

Sentry

Note

Sentry reporting used to be available, but is no longer a part of git-pages due to extensive LLM use in the tracing library. We would like to integrate other traceability options, but none are available at the time.

Operation

git-pages provides a rich set of administrative commands, the most important of which are described below. All of these commands expect options to be provided as a TOML configuration and/or environment variables. Whenever a git-pages command appears in this section, you should ensure that it receives the appropriate configuration for your environment.

Audit log

git-pages includes an audit function that records every substantial event affecting the backend data and keeps accurate historical snapshots of every site; deduplication of storage makes these snapshots inexpensive. Each git-pages process that uses the same backend configuration must be assigned a unique node ID (a number from 0 to 63) to prevent identifier collisions, after which the collection of audit records may be enabled:

config.tomlenvironment

[audit]
node-id = 0 # (1)!
collect = true

Consider leaving this option out of your TOML configuration and using the PAGES_AUDIT_NODE_ID environment variable instead.

PAGES_AUDIT_NODE_ID=0
PAGES_AUDIT_COLLECT=true

By default, IP addresses are not collected. To enable collection of IP addresses, pick the source of this information: RemoteAddr to use the peer address of the socket, X-Forwarded-For to use the rightmost entry in the corresponding header. Use the X-Forwarded-For source when using git-pages with a reverse proxy server:

config.tomlenvironment

[audit]
include-ip = "X-Forwarded-For"

PAGES_AUDIT_INCLUDE_IP=X-Forwarded-For

Whenever forge authorization is used to alter a site, the username (mutable) and ID (immutable) of the forge user are recorded in audit logs. If the provided forge authorization token has insufficient privileges to record this information, the request is denied.

The substantial events that trigger the creation of an audit record are currently:

creating or updating a site (CommitManifest event);
deleting a site (DeleteManifest event);
freezing a domain (FreezeDomain) event;
unfreezing a domain (UnfreezeDomain event).

Note

Audit records are created before the corresponding substantial event, and if the creation of an audit record fails, the operation does not complete; that is, if the event has occurred, then a record of it will have been created. But the inverse is not true: if creation of an audit record succeeds but the operation that caused it fails afterwards, the audit record will remain in the log.

The audit log is best considered a record of intent: if an audit record exists, it means that someone attempted to perform the recorded action. To be sure whether the attempt had succeeded, examine also the application logs at the relevant timestamp.

Retrieval

To retrieve audit logs, use the -audit-log command line option:

$ git-pages -audit-log
0000019da0073000 2025-12-06T00:28:37Z 2001:db8::f00d grebedoc.dev/.index CommitManifest
0000019e44566000 2025-12-06T00:39:50Z 2001:db8::f00d grebedoc.dev/.index CommitManifest
000001ede967b000 2025-12-06T23:51:43Z 198.51.100.11 git-pages.org/.index CommitManifest
000001f5a2c58000 2025-12-07T02:06:43Z <cli-admin> problematic.site/.index DeleteManifest

To drill down into a specific event, use the -audit-read command line option:

$ git-pages -audit-read 000001ede967b000
$ ls
000001ede967b000-archive.tar
000001ede967b000-event.json
000001ede967b000-manifest.json
$ cat 000001ede967b000-event.json
{
  "id": "2121334763520",
  "timestamp": "2025-12-06T23:51:43.995096981Z",
  "event": "CommitManifest",
  "principal": {
    "ipAddress": "198.51.100.11"
  },
  "domain": "git-pages.org",
  "project": ".index"
}

The information extracted from this audit event includes the event description (*-event.json file), site manifest (*-manifest.json file), and archive of site contents (*-archive.tar file); the latter two files will only be present for CommitManifest events. This information may aid in recovering lost data, or help you make a decision on an abuse report concerning data that has been overwritten or deleted since.

Expiration

git-pages will not delete audit records without being explicitly instructed to.

If you are using an S3 object store, it may offer configurable object lifecycle rules. Consider setting up a 30 day retention policy for the audit/ prefix.

If you are using the filesystem for storage, or if your S3 object store does not offer suitable lifecycle rules, configure a daily scheduled job using the -audit-expire command. For example, if you use cron, the following line in crontab will retain 7 days of audit logs:

0 0 * * * git-pages -config /etc/git-pages/config.toml -audit-expire 7

Detachment

Audit records snapshot the state of a site before the corresponding operation occurs. As a result, even if content is removed from a site, it will be retained on the storage backend while the audit record exists. This can be undesirable if the content in question is illegal or abusive, and so must be quickly removed from storage.

While removing the audit records followed by a garbage collection cycle would solve that, this also erases the evidence of site updates, which can be undesirable as well. To resolve this conflict, git-pages provides a way to update a CommitManifest audit record such that the embedded manifest is disregarded for all operations. This operation is called detaching.

To sever the link between the every audit record for a specific site and the blobs it references, use:

$ git-pages -audit-detach problematic.site/.index

To do the same for every site on a specific domain, use * for the project name:

$ git-pages -audit-detach problematic.site/*

The audit log will then show these records with a (detached) marker:

$ git-pages -audit-log
...
00002f442694d000 2026-04-26T20:25:27Z 2001:db8::f00d problematic.site/.index CommitManifest (detached)
...

Warning

The audit log is treated as append-only: once created, the audit records are never modified. This is true even when detaching audit records. Instead of modifying the record, this operation creates a marker instructing git-pages to disregard the manifest within. However, the manifest is not physically erased.

The manifest only contains filenames and contents of very small files under the inline file size limit. If retaining even this data presents an issue, the audit record can be manually removed, and/or the inline file size limit can be set to zero.

Storage management

git-pages can store a large amount of data provided by third parties, some of which may have an outsized impact on the operation of a service. Statistically, it is likely that only a few users will be responsible for most of the data stored, and it important to be able to see who they are.

The -size-histogram stored command outputs a human-readable overview of the amount of data the manifest and every referenced blob occupies on the storage, aggregated per domain:

|                                        | bob-baker.name 348 B
|                                        | charmed.id.au 1 KB
|****                                    | star-trek-fans.co.uk 7.6 MB
|****************************************| big-buck.org 101.4 MB

The length of the bar (indicated by the * symbols) has a linear relationship with the total site size (manifest size and the combined size of every unique blob referenced in the manifest) for the corresponding domain, normalized such that the domain with the largest total size has the largest bar of the histogram. The histogram is sorted from the smallest to the biggest value.

Note

While this command provides a concise and readable report, the values it reports have significant caveats and should be interpreted with care. While it takes compression and deduplication within a single site into account, it treats every individual site as if it was the only site stored, meaning that if two sites have a 1 MB stored size, their total size will be reported as 2 MB even if they have the exact same data and the actual storage use is only 1 MB. This is the case even for sites on the same domain.

Treat the values reported by this command as the upper bound on how much storage the site is using rather than the precise value. (It is not possible to precisely attribute storage use to individual sites in presence of deduplication across sites and audit logs.)

The -size-histogram original command is also available, displaying the original size (the amount of data that would be transferred if someone downloaded every file in the site manifest once without compression). This value has little correlation with storage use or data transfer, and is mainly useful to demonstrate the efficiency of compression and deduplication.

The blobs referenced by audit logs will be stored even if no site manifest refers to them. Make sure to configure audit log expiation to prevent these blobs from lingering for too long.

Content scanning

git-pages can be configured to scan site contents in the background for known threats as the updates are published. This is done by enabling audit logs, configuring an audit notify endpoint, and running an audit server. The audit server receives notifications whenever an audit record gets created, in response to which it runs an arbitrary executable to make a decision.

First, configure the notify endpoint. Note that the pages server will only perform GET requests to it, and the audit server ignores everything but the query string.

config.tomlenvironment

[audit]
notify-url = "http://localhost:3004/"

PAGES_AUDIT_NOTIFY_URL=http://localhost:3004/

Then, start the audit server. In this example, it runs on the same machine as the pages server, so the endpoint is tcp/localhost:3004. It is also possible to use a Unix domain socket, e.g. unix//path/to/audit.sock.

$ git-pages -audit-server tcp/localhost:3004 ./autoscan.sh

Whenever a notification arrives, the audit server runs the deciding executable ./autoscan.sh with any additional command line arguments provided (none in this case), followed by the audit record ID in hexadecimal and the event name; for example, ./autoscan.sh 000001ede967b000 CommitManifest. The current directory will point to a newly created temporary directory, and the contents of the temporary directory will be the same as if -audit-read was executed in it beforehand. The deciding executable may run for as long as necessary and complete with any exit code, but if the exit code is not successful (i.e. non-zero), it will be restarted after a short delay.

The deciding executable will asynchronously examine the event and take any necessary action. For example, the following shell script (which depends on jq) will scan site contents with ClamAV whenever an update is published, and take enforcement action if the scan comes back positive:

#!/bin/bash -e
config=/etc/git-pages/config.toml # (1)!
export PAGES_AUDIT_NODE_ID=63 # (3)!
if [[ "$2" = "CommitManifest" ]]; then
    if ! clamdscan $1-archive.tar; then
        domain=$(jq -r .domain <$1-event.json)
        project=$(jq -r .project <$1-event.json)
        echo '<h1>Threat Automatically Removed</h1>' >index.html # (2)!
        echo '/* /index.html 410!' >_redirects
        tar cf site.tar index.html _redirects
        git-pages -config $config -update-site "$domain/$project" site.tar
        git-pages -config $config -audit-detach "$domain/$project" # (4)!
        git-pages -config $config -freeze-domain "$domain"
    fi
fi

Since the audit server changes the current directory, the location of the configuration file must be specified explicitly. Environment variables may also be used instead.
Remember that every automatic scanner has false positives! The placeholder page must include a clear explanation of the reason for removal, as well as contact information. This example is not appropriate for production use.
All deciding executables will use the same node ID (63), which will prevent audit record ID collisions with pages server processes.
Refer to the section on detaching audit records for details.

Warning

Keep in mind that every deployment is unique, and it is impossible to provide a one-size-fits-all automated scanning solution. This section explains how to build a custom solution tailored to your needs; the script above demonstrates the principles but should not be used as-is.

Note

The audit server does not implement queueing, retries, or exponential backoff; it runs the deciding executable once per GET request and responds with the status and captured standard output/error. The pages server, however, will keep resubmitting the notification (with exponential backoff and added random jitter) until either the request succeeds or the pages server is restarted.

This means that the audit server may be restarted at any time without loss of audit notifications, but restarting the pages server will cause all pending audit notifications to be lost. If you need stronger guarantees than these, you should implement them as a part of your deciding executable.

Responding to abuse

If you discover that a site managed by git-pages contains abusive material (spam, phishing, illegal downloads, etc) then the recommended course of action is to replace every page of the site with a placeholder returning the 410 Gone status and then freeze the domain, preventing any further updates of any site on that domain. This can be done from the command line as follows:

cd $(mktemp -d)
echo "<h1>Gone</h1>" >index.html # (1)!
echo "/* /index.html 410!" >_redirects # (2)!
tar cf placeholder.tar index.html _redirects
git-pages -update-site problematic.site/.index placeholder.tar
git-pages -audit-detach problematic.site/.index # (3)!
git-pages -freeze-domain problematic.site

Customize this HTML template to include the reason for removal, your contact information, and anything else relevant.
The 410 Gone status ensures that the site is delisted from search engine results and removed from the cache of malware scanners.
Refer to the section on detaching audit records for details.

If you later decide that the domain should no longer be restricted, you can unfreeze it to restore normal functionality:

git-pages -unfreeze-domain problematic.site

The freeze and unfreeze operations will append records to the audit log (if it is enabled).

Note

The freeze operation is done per-domain and not per-site since the publisher of the abusive material must have access to DNS records for the domain, and therefore complete control over it, to publish a site in the first place.

The freeze operation will prevent you from making administrative updates to sites on that domain as well. Remember to examine other sites on the same domain before freezing it.

Rolling back a site

The git-pages audit log may be used to revert site contents to an earlier point in time. If the site was compromised or in response to data loss, you can perform a rollback to restore it from a CommitManifest audit record:

$ git-pages -audit-log
...
0000019da0073000 2025-12-06T00:28:37Z 2001:db8::f00d grebedoc.dev/.index CommitManifest
0000019e44566000 2025-12-06T00:39:50Z 2001:db8::f00d grebedoc.dev/.index CommitManifest
...
$ git-pages -audit-rollback 0000019da0073000

The audit record ID determines both the site that will be affected and the site contents after the operation is complete. The -audit-rollback command above will restore the site https://grebedoc.dev/ to its contents in the exact moment after the time at which the audit record 0000019da0073000 was created.

Performing an audit rollback operation will, itself, create a CommitManifest audit record.