# HG changeset patch # User Kim Alvefur # Date 1514496656 -3600 # Node ID 687b19cad4f51b7d0c041a1c04339df950eb4880 # Parent a844d1535c4dde5aa2d4236da95c347e546df301 mod_storage_xmlarchive/README: Add description of how data is stored diff -r a844d1535c4d -r 687b19cad4f5 mod_storage_xmlarchive/README.markdown --- a/mod_storage_xmlarchive/README.markdown Fri Dec 08 21:14:10 2017 +0100 +++ b/mod_storage_xmlarchive/README.markdown Thu Dec 28 22:30:56 2017 +0100 @@ -63,3 +63,27 @@ Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2` for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the user or MUC room to me migrated, which can be repeated. + +Data structure +============== + +Data is split in three kinds of files and messages are grouped by day. +Prosodys `util.datamanager` is used, so all special characters in these +filenames are escaped and reside under `hostname/store` in Prosodys Data +directory, commonly `/var/lib/prosody`. + +`username.list` +: A list of dates in `YYYY-MM-DD` format. + +`username@YYYY-MM-DD.list` +: Index containing metadata for messages stored on that day. + +`username@YYYY-MM-DD.xml` +: Messages in textual XML format, separated by newlines. + +This makes it fairly simple and fast to find messages by timestamp. +Queries that are not time based, but limited to a specific contact may +be expensive as potentially the entire archive will be read. + +Each archive ID is of the form `YYYY-MM-DD-random`, making lookups by +archive id just as simple as time based queries.