comparison doc/libervia-cli/pubsub_cache.rst @ 3715:b9718216a1c0 0.9

merge bookmark 0.9
author Goffi <goffi@goffi.org>
date Wed, 01 Dec 2021 16:13:31 +0100
parents d0b66efc6c0e
children
comparison
equal deleted inserted replaced
3714:af09b5aaa5d7 3715:b9718216a1c0
1 .. _libervia-cli_pubsub_cache:
2
3 =====================================
4 pubsub/cache: PubSub Cache Management
5 =====================================
6
7 Libervia runs transparently a cache for pubsub. That means that according to internal
8 criteria, some pubsub items are stored locally.
9
10 The ``cache`` subcommands let user inspect and manipulate the internal cache.
11
12 get
13 ===
14
15 Retrieve items from internal cache only. Most end-users won't need to use this command, as
16 the usual ``pubsub get`` command will use cache transparently. However, it may be useful
17 to inspect local cache, notably for debugging.
18
19 The parameters are basically the same as for :ref:`li_pubsub_get`.
20
21 example
22 -------
23
24 Retrieve the last 2 cached items for personal blog::
25
26 $ li pubsub cache get -n urn:xmpp:microblog:0 -M 2
27
28 .. _li_pubsub_cache_sync:
29
30 sync
31 ====
32
33 Synchronise or resynchronise a pubsub node. If the node is already in cache, it will be
34 deleted then re-cached. Node will be put in cache even if internal policy doesn't request
35 a synchronisation for this kind of nodes. Node will be (re-)subscribed to keep cache
36 synchronised.
37
38 All items of the node (up to the internal limit which is high), will be retrieved and put
39 in cache, even if a previous version of those items have been deleted by the
40 :ref:`li_pubsub_cache_purge` command.
41
42
43 example
44 -------
45
46 Resynchronise personal blog::
47
48 $ li pubusb cache sync -n urn:xmpp:microblog:0
49
50 .. _li_pubsub_cache_purge:
51
52 purge
53 =====
54
55 Remove items from cache. This may be desirable to save resource, notably disk space.
56
57 Note that once a pubsub node is cached, the cache is the source of trust. That means that
58 if cache is not explicitly bypassed when retrieving items of a pubsub node (notably with
59 the ``-C, --no-cache`` option of :ref:`li_pubsub_get`), only items found in cache will be
60 returned, thus purged items won't be used or returned anymore even if they still exists on
61 the original pubsub service.
62
63 If you have purged items by mistake, it is possible to retrieve them either node by node
64 using :ref:`li_pubsub_cache_sync`, or by resetting the whole pubsub cache with
65 :ref:`li_pubsub_cache_reset`.
66
67 If you have a node or a profile (e.g. a component) caching a lot of items frequently, you
68 may use this command using a scheduler like cron_.
69
70 .. _cron: https://en.wikipedia.org/wiki/Cron
71
72 examples
73 --------
74
75 Remove all blog and event items from cache if they haven't been updated since 6 months::
76
77 $ li pubsub cache purge -t blog -t event -b "6 months ago"
78
79 Remove items from profile ``ap_gateway`` if they have been created more that 2 months
80 ago::
81
82 $ li pubsub cache purge -p ap_gateway --created-before "2 months ago"
83
84 .. _li_pubsub_cache_reset:
85
86 reset
87 =====
88
89 Reset the whole pubsub cache. This means that all nodes and all them items will be removed
90 from cache. After this command, cache will be re-filled progressively as if it where a new
91 one.
92
93 .. note::
94
95 Use this command with caution: even if cache will be re-constructed with time, that
96 means that items will have to be retrieved again, that may be resource intensive both
97 for your machine and for the pubsub services which will be used. That also means that
98 searching items will return less results until all desired items are cached again.
99
100 Also note that all items of cached nodes are retrieved, even if you have previously
101 purged items, they will be retrieved again.
102
103 example
104 -------
105
106 Reset the whole pubsub cache::
107
108 $ li pubsub cache reset
109
110 search
111 ======
112
113 Search items into pubsub cache. The search is done on the whole cache, it's not restricted
114 to a single node/profile (even if it may be if suitable filters are specified). Full-Text
115 Search can be done with ``-f FTS, --fts FTS`` argument, as well as filtering on parsed
116 data (with ``-F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE``, see below).
117
118 By default, parsed data are returned, with the 3 additional keys ``pubsub_service``,
119 ``pubsub_items`` (the search being done on the whole cache, those data are here to get the
120 full location of each item) and ``node_profile``.
121
122 "Parsed data" are the result of the parsing of the items XML payload by feature aware
123 plugins. Those data are usually more readable and easier to work with. Parsed data are
124 only stored when a parser is registered for a specific feature, that means that a Pubsub
125 item in cache may not have parsed data at all, in which case an empty dict will be used
126 instead (and ``-P, --payload`` argument should be used to get content of the item).
127
128 The dates are normally stored as `Unix time`_ in database, but the default output convert
129 the ``updated``, ``created`` and ``published`` fields to human readable local time. Use
130 ``--output simple`` if you want to keep the float (or int) value.
131
132 XML item payload is not returned by default, but it can be added to the ``item_payload``
133 field if ``-P, --payload`` argument is set. You can also use the ``--output xml`` (or
134 ``xml_raw`` if you don't want prettifying) to output directly the highlighted XML
135 — without the parsed data —, to have an output similar to the one of ``li pubsub get``.
136
137 If you are interested only in a specific data (e.g. item id and title), the ``-k KEY,
138 --key KEY`` can be used.
139
140 You'll probably want to limit result size by using ``-l LIMIT, --limit LIMIT``, and do
141 pagination using ``-i INDEX, --index INDEX``.
142
143 .. _Unix time: https://en.wikipedia.org/wiki/Unix_time
144
145 Filters
146 -------
147
148 By default search returns all items in cache, you have to use filter to specify what you
149 are looking after. We can split filters in 3 categories: nodes/items metadata,
150 Full-Text Search query and parsed metadata.
151
152 Nodes/items metadata are the generic information you have on a node: which profile it
153 belong too, which pubsub service it's coming from, what's the name or type of the node,
154 etc.
155
156 Arguments there should be self-explanatory. Type (set with ``-t TYPE, --type TYPE``) and
157 subtype (set with ``-S SUBTYPE, --subtype SUBTYPE``) are values dependent of the
158 plugin/feature associated with the node, so we can't list them in an exhaustive way here.
159 The most common type is probably ``blog``, from which a subtype can be ``comment``. An
160 empty string can be used to find items with (sub)type not set.
161
162 It's usually a good idea to specify a profile with ``-p PROFILE, --profile PROFILE``,
163 otherwise you may get duplicated results.
164
165 Full-Text Search
166 ----------------
167
168 You can specify a Full-Text Search query with the ``-f FTS_QUERY, --fts FTS_QUERY``
169 argument. The engine is currently SQLite FTS5, and you can check its `query syntax`_.
170 FTS is done on the whole raw XML payload, that means that all data there can be matched
171 (including XML tags and attributes).
172
173 FTS queries are indexed, that means that they are fast and efficient.
174
175 .. note::
176
177 Futures version of Libervia will probably include other FTS engines (support for
178 PostgreSQL and MySQL/MariaDB is planned). Thus the syntax may vary depending on the
179 engine, or a common syntax may be implemented for all engines in the future. Keep that
180 in mind if you plan to use FTS capabilities in long-term queries, e.g. in scripts.
181
182 .. _query syntax: https://sqlite.org/fts5.html#full_text_query_syntax
183
184 Parsed Metadata Filters
185 -----------------------
186
187 It is possible to filter on any field of parsed data. This is done with the ``-F PATH
188 OPERATOR VALUE, --field PATH OPERATOR VALUE`` (be careful that the short option is an
189 uppercase ``F``, the lower case one being used for Full-Text Search).
190
191 .. note::
192
193 Parsed Metadata Filters are not indexed, that means that using them is less efficient
194 than using e.g. Full-Text Search. If you want to filter on a text field, it's often a
195 good idea to pre-filter using Full-Text Search to have more efficient queries.
196
197 ``PATH`` and ``VALUE`` can be either specified as string, or using JSON syntax (if the
198 value can't be decoded as JSON, it is used as plain text).
199
200 ``PATH`` is the name of the field to use. If you must go beyond root level fields, you can
201 use a JSON array to specify each element of the path. If a string is used, it's an object
202 key, if a number is used it's an array index. Thus you can use ``title`` to access the
203 root title key, or ``'"title"'`` (JSON string escaped for shell) or ``'["title"]'`` (JSON
204 array with the "title" string, escaped for shell).
205
206 .. note::
207
208 The extra fields ``pubsub_service``, ``pubsub_node`` and  ``node_profile`` are added to
209 the result after the query, thus they can't be used as fields for filtering (use the
210 direct arguments for that).
211
212 ``OPERATOR`` indicate how to use the value to make a filter. The currently supported
213 operators are:
214
215 ``==`` or ``eq``
216 Equality operator, true if field value is the same as given value.
217
218 ``!=`` or ``ne``
219 Inequality operator, true if the field value is different from given value.
220
221 ``>`` or ``gt``
222 Greater than, true if the field value is higher than given value. For string, this is
223 according to alphabetical order.
224
225 Time Pattern can be used here, see below.
226
227 ``<`` or ``lt``
228 Lesser than, true if the field value is lower than given value. For string, this is
229 according to alphabetical order.
230
231 Time Pattern can be used here, see below.
232
233 ``between``
234 Given value must be an array with 2 elements. The condition is true if field value is
235 between the 2 elements (for string, this is according to alphabetical order).
236
237 Time Pattern can be used here, see below.
238
239 ``in``
240 Given value must be an array of elements. Field value must be one of them to make the
241 condition true.
242
243 ``not_in``
244 Given value must be an array of elements. Field value must not be any of them the make
245 the condition true.
246
247 ``overlap``
248 This can be used only on array fields.
249
250 If given value is not already an array, it is put in an array. Condition is true if any
251 element of field value match any element of given value. Notably useful to filter on
252 tags.
253
254 ``ioverlap``
255 Same as ``overlap`` but done in a case insensitive way.
256
257 ``disjoint``
258 This can be used only on array fields.
259
260 If given value is not already an array, it is put in an array. Condition is true if no
261 element of field value match any element of given value. Notably useful to filter out
262 tags.
263
264 ``idisjoint``
265 Same as ``disjoint`` but done in a case insensitive way.
266
267 ``like``
268 Does pattern matching on a string. ``%`` can be used to match zero or more characters
269 and ``_`` can be used to match any single character.
270
271 If you're not looking after a specific field, it's better to use Full-Text Search when
272 possible.
273
274 ``ilike``
275 Like ``like`` but done in a case insensitive way.
276
277
278 ``not_like``
279 Same as ``like`` except that condition is true when pattern is **not** matching.
280
281 ``not_ilike``
282 Same as ``not_like`` but done in a case insensitive way.
283
284
285 For ``gt``/``>``, ``lt``/``<`` and ``between``, you can use :ref:`time_pattern` by using
286 the syntax ``TP(<time pattern>)`` (see examples below).
287
288 Ordering
289 --------
290
291 Result ordering can be done by a well know order, or using a parsed data field. Ordering
292 default to ``created`` (see below), but this may be changed with ``-o ORDER [FIELD]
293 [DIRECTION], --order-by ORDER [FIELD] [DIRECTION]``.
294
295 ``ORDER`` can be one of the following:
296
297 ``creation``
298 Order by item creation date. Note that is this the date of creation of the item in cache
299 (which most of time should correspond to order of creation of the item in the source
300 pubsub service), and this may differ from the date of publication as specified with some
301 feature (like blog). This is important when old items are imported, e.g. when they're
302 coming from an other blog engine.
303
304 ``modification``
305 Order by the date when item has last been modified. Modification date is the same as
306 creation date if the item has never been modified since it is in cache. The same warning
307 as for ``creation`` applies: this is the date of last modification in cache, not the one
308 advertised in parsed data.
309
310 ``item_id``
311 Order by XMPP id of the item. Notably useful when user-friendly ID are used (like it is
312 often the case with blogs).
313
314 ``rank``
315 Order item by Full-Text Search rank. This one can only be used when Full-Text Search is
316 used (via ``-f FTS_QUERY, --fts FTS_QUERY``). Rank is a value indicating how well an
317 item match the query. This usually needs to be used with ``desc`` direction, so you get
318 most relevant items first.
319
320 ``field``
321 This special order indicates that the ordering must be done on an parsed data field. The
322 following argument is then the path of the field to used (which can be a plain text name
323 of a root field, or a JSON encoded array). An optional direction can be specified as a
324 third argument. See examples below.
325
326 examples
327 --------
328
329 Search for blog items cached for the profile ``louise`` which contain the word
330 ``Slovakia``::
331
332 $ li pubsub cache search -t blog -p louise -f Slovakia
333
334 Show title, publication date and id of blog articles (excluding comments) which have been
335 published on Louise's blog during the last 6 months, order them by item id. Here we use an
336 empty string as a subtype to exclude comments (for which subtype is ``comment``)::
337
338 $ li pubsub cache search -t blog -S "" -p louise -s louise@example.net -n urn:xmpp:microblog:0 -F published gt 'TP(6 months ago)' -k id -k published -k title -o item_id
339
340 Show all blog items from anywhere which are tagged as XMPP or ActivityPub (case
341 insensitive) and which have been published in the last month (according to advertised
342 publishing date, not cache creation date).
343
344 We want to order them by descending publication date (again the advertised publication
345 date, not cache creation), and we don't want more than 50 results.
346
347 We do a FTS query there even if it's not mandatory, because it will do an efficient
348 pre-filtering::
349
350 $ li pubsub cache search -f "xmpp OR activitypub" -F tags ioverlap '["xmpp", "activitypub"]' -F published gt 'TP(1 month ago)' -o field published desc -l 50