comparison mod_storage_xmlarchive/README.markdown @ 2854:687b19cad4f5

mod_storage_xmlarchive/README: Add description of how data is stored
author Kim Alvefur <zash@zash.se>
date Thu, 28 Dec 2017 22:30:56 +0100
parents 88474dd1af48
children 7713cd4fff8f
comparison
equal deleted inserted replaced
2853:a844d1535c4d 2854:687b19cad4f5
61 ``` 61 ```
62 62
63 Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2` 63 Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2`
64 for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the 64 for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the
65 user or MUC room to me migrated, which can be repeated. 65 user or MUC room to me migrated, which can be repeated.
66
67 Data structure
68 ==============
69
70 Data is split in three kinds of files and messages are grouped by day.
71 Prosodys `util.datamanager` is used, so all special characters in these
72 filenames are escaped and reside under `hostname/store` in Prosodys Data
73 directory, commonly `/var/lib/prosody`.
74
75 `username.list`
76 : A list of dates in `YYYY-MM-DD` format.
77
78 `username@YYYY-MM-DD.list`
79 : Index containing metadata for messages stored on that day.
80
81 `username@YYYY-MM-DD.xml`
82 : Messages in textual XML format, separated by newlines.
83
84 This makes it fairly simple and fast to find messages by timestamp.
85 Queries that are not time based, but limited to a specific contact may
86 be expensive as potentially the entire archive will be read.
87
88 Each archive ID is of the form `YYYY-MM-DD-random`, making lookups by
89 archive id just as simple as time based queries.