view mod_storage_xmlarchive/README.markdown @ 5193:2bb29ece216b

mod_http_oauth2: Implement stateless dynamic client registration Replaces previous explicit registration that required either the additional module mod_adhoc_oauth2_client or manually editing the database. That method was enough to have something to test with, but would not probably not scale easily. Dynamic client registration allows creating clients on the fly, which may be even easier in theory. In order to not allow basically unauthenticated writes to the database, we implement a stateless model here. per_host_key := HMAC(config -> oauth2_registration_key, hostname) client_id := JWT { client metadata } signed with per_host_key client_secret := HMAC(per_host_key, client_id) This should ensure everything we need to know is part of the client_id, allowing redirects etc to be validated, and the client_secret can be validated with only the client_id and the per_host_key. A nonce injected into the client_id JWT should ensure nobody can submit the same client metadata and retrieve the same client_secret
author Kim Alvefur <zash@zash.se>
date Fri, 03 Mar 2023 21:14:19 +0100
parents 591c643d55b2
children 100110d539d3
line wrap: on
line source

---
labels:
- 'Stage-Beta'
- 'Type-Storage'
- ArchiveStorage
summary: XML file based archive storage
---

Introduction
============

This module implements stanza archives using files, similar to the
default "internal" storage. Unlike "internal", it saves messages in two
files per day (and per user), one containing metadata and one containing
the actual messages in XML format (hence the name).

Splitting data per day improves performance for larger archives as it
does not have to look through data from other days.

Configuration
=============

To use this with [mod\_mam] add this to your config:

``` lua
storage = {
    archive = "xmlarchive"
}
```

To use it with [mod\_mam\_muc] or [mod\_http\_muc\_log]:

``` lua
storage = {
    muc_log = "xmlarchive"
}
```

Refer to [Prosodys data storage documentation][doc:storage] for more
information.

Note that this module does not implement the "keyval" storage method and
can't be used by anything other than archives.

Compatibility
=============

  ------ ---------------
  trunk  Should work
  0.11   Works
  0.10   Should work
  0.9    Does not work
  ------ ---------------

Conversion to or from internal storage
--------------------------------------

This module stores data in a way that overlaps with the more recent
archive support in `mod_storage_internal`, meaning e.g. [mod_migrate]
will not be able to cleanly convert to or from the `xmlarchive` format.

To mitigate this, an migration command has been added to
`mod_storage_xmlarchive`:

``` bash
prosodyctl mod_storage_xmlarchive convert $DIR internal $STORE $JID
```

Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2`
for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the
user or MUC room to be migrated, which can be repeated.

::: {.alert .alert-danger}
Since this is a destructive command, don't forget to backup your data
first.

Prosody should *not* be running while converting data.
:::


Data structure
==============

Data is split in three kinds of files and messages are grouped by day.
Prosodys `util.datamanager` is used, so all special characters in these
filenames are escaped and reside under `hostname/store` in Prosodys Data
directory, commonly `/var/lib/prosody`.

`username.list`
:   A list of dates in `YYYY-MM-DD` format.

`username@YYYY-MM-DD.list`
:   Index containing metadata for messages stored on that day.

`username@YYYY-MM-DD.xml`
:   Messages in textual XML format, separated by newlines.

This makes it fairly simple and fast to find messages by timestamp.
Queries that are not time based, but limited to a specific contact may
be expensive as potentially the entire archive will be read.

Each archive ID is of the form `YYYY-MM-DD-random`, making lookups by
archive id just as simple as time based queries.

## Limitations

-   Only XML stanzas can be stored.
-   The deletion method only supports removing entire days at a time.