view mod_storage_xmlarchive/README.markdown @ 2968:569b98d6fca1

mod_http_logging: Be robust against missing connection object
author Kim Alvefur <>
date Fri, 30 Mar 2018 13:37:39 +0200
parents 7713cd4fff8f
children a6722a35f35e
line wrap: on
line source

- 'Stage-Beta'
- 'Type-Storage'
- ArchiveStorage
summary: XML file based archive storage


This module implements stanza archives using files, similar to the
default "internal" storage.


To use this with [mod\_mam] add this to your config:

``` lua
storage = {
    archive2 = "xmlarchive"

To use it with [mod\_mam\_muc] or [mod\_http\_muc\_log]:

``` lua
storage = {
    muc_log = "xmlarchive"

Refer to [Prosodys data storage documentation][doc:storage] for more

Note that this module does not implement the "keyval" storage method and
can't be used by anything other than archives.


  ------ ---------------
  0.10   Works
  0.9    Should work
  0.8    Does not work
  ------ ---------------

Conversion to or from internal storage

This module stores data in a way that overlaps with the more recent
archive support in `mod_storage_internal`, meaning e.g. [mod_migrate]
will not be able to cleanly convert to or from the `xmlarchive` format.

To mitigate this, an migration command has been added to

``` bash
prosodyctl mod_storage_xmlarchive convert $DIR internal $STORE $JID

Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2`
for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the
user or MUC room to be migrated, which can be repeated.

Data structure

Data is split in three kinds of files and messages are grouped by day.
Prosodys `util.datamanager` is used, so all special characters in these
filenames are escaped and reside under `hostname/store` in Prosodys Data
directory, commonly `/var/lib/prosody`.

:   A list of dates in `YYYY-MM-DD` format.

:   Index containing metadata for messages stored on that day.

:   Messages in textual XML format, separated by newlines.

This makes it fairly simple and fast to find messages by timestamp.
Queries that are not time based, but limited to a specific contact may
be expensive as potentially the entire archive will be read.

Each archive ID is of the form `YYYY-MM-DD-random`, making lookups by
archive id just as simple as time based queries.