Mercurial > libervia-backend
comparison doc/libervia-cli/pubsub_cache.rst @ 3715:b9718216a1c0 0.9
merge bookmark 0.9
author | Goffi <goffi@goffi.org> |
---|---|
date | Wed, 01 Dec 2021 16:13:31 +0100 |
parents | d0b66efc6c0e |
children |
comparison
equal
deleted
inserted
replaced
3714:af09b5aaa5d7 | 3715:b9718216a1c0 |
---|---|
1 .. _libervia-cli_pubsub_cache: | |
2 | |
3 ===================================== | |
4 pubsub/cache: PubSub Cache Management | |
5 ===================================== | |
6 | |
7 Libervia runs transparently a cache for pubsub. That means that according to internal | |
8 criteria, some pubsub items are stored locally. | |
9 | |
10 The ``cache`` subcommands let user inspect and manipulate the internal cache. | |
11 | |
12 get | |
13 === | |
14 | |
15 Retrieve items from internal cache only. Most end-users won't need to use this command, as | |
16 the usual ``pubsub get`` command will use cache transparently. However, it may be useful | |
17 to inspect local cache, notably for debugging. | |
18 | |
19 The parameters are basically the same as for :ref:`li_pubsub_get`. | |
20 | |
21 example | |
22 ------- | |
23 | |
24 Retrieve the last 2 cached items for personal blog:: | |
25 | |
26 $ li pubsub cache get -n urn:xmpp:microblog:0 -M 2 | |
27 | |
28 .. _li_pubsub_cache_sync: | |
29 | |
30 sync | |
31 ==== | |
32 | |
33 Synchronise or resynchronise a pubsub node. If the node is already in cache, it will be | |
34 deleted then re-cached. Node will be put in cache even if internal policy doesn't request | |
35 a synchronisation for this kind of nodes. Node will be (re-)subscribed to keep cache | |
36 synchronised. | |
37 | |
38 All items of the node (up to the internal limit which is high), will be retrieved and put | |
39 in cache, even if a previous version of those items have been deleted by the | |
40 :ref:`li_pubsub_cache_purge` command. | |
41 | |
42 | |
43 example | |
44 ------- | |
45 | |
46 Resynchronise personal blog:: | |
47 | |
48 $ li pubusb cache sync -n urn:xmpp:microblog:0 | |
49 | |
50 .. _li_pubsub_cache_purge: | |
51 | |
52 purge | |
53 ===== | |
54 | |
55 Remove items from cache. This may be desirable to save resource, notably disk space. | |
56 | |
57 Note that once a pubsub node is cached, the cache is the source of trust. That means that | |
58 if cache is not explicitly bypassed when retrieving items of a pubsub node (notably with | |
59 the ``-C, --no-cache`` option of :ref:`li_pubsub_get`), only items found in cache will be | |
60 returned, thus purged items won't be used or returned anymore even if they still exists on | |
61 the original pubsub service. | |
62 | |
63 If you have purged items by mistake, it is possible to retrieve them either node by node | |
64 using :ref:`li_pubsub_cache_sync`, or by resetting the whole pubsub cache with | |
65 :ref:`li_pubsub_cache_reset`. | |
66 | |
67 If you have a node or a profile (e.g. a component) caching a lot of items frequently, you | |
68 may use this command using a scheduler like cron_. | |
69 | |
70 .. _cron: https://en.wikipedia.org/wiki/Cron | |
71 | |
72 examples | |
73 -------- | |
74 | |
75 Remove all blog and event items from cache if they haven't been updated since 6 months:: | |
76 | |
77 $ li pubsub cache purge -t blog -t event -b "6 months ago" | |
78 | |
79 Remove items from profile ``ap_gateway`` if they have been created more that 2 months | |
80 ago:: | |
81 | |
82 $ li pubsub cache purge -p ap_gateway --created-before "2 months ago" | |
83 | |
84 .. _li_pubsub_cache_reset: | |
85 | |
86 reset | |
87 ===== | |
88 | |
89 Reset the whole pubsub cache. This means that all nodes and all them items will be removed | |
90 from cache. After this command, cache will be re-filled progressively as if it where a new | |
91 one. | |
92 | |
93 .. note:: | |
94 | |
95 Use this command with caution: even if cache will be re-constructed with time, that | |
96 means that items will have to be retrieved again, that may be resource intensive both | |
97 for your machine and for the pubsub services which will be used. That also means that | |
98 searching items will return less results until all desired items are cached again. | |
99 | |
100 Also note that all items of cached nodes are retrieved, even if you have previously | |
101 purged items, they will be retrieved again. | |
102 | |
103 example | |
104 ------- | |
105 | |
106 Reset the whole pubsub cache:: | |
107 | |
108 $ li pubsub cache reset | |
109 | |
110 search | |
111 ====== | |
112 | |
113 Search items into pubsub cache. The search is done on the whole cache, it's not restricted | |
114 to a single node/profile (even if it may be if suitable filters are specified). Full-Text | |
115 Search can be done with ``-f FTS, --fts FTS`` argument, as well as filtering on parsed | |
116 data (with ``-F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE``, see below). | |
117 | |
118 By default, parsed data are returned, with the 3 additional keys ``pubsub_service``, | |
119 ``pubsub_items`` (the search being done on the whole cache, those data are here to get the | |
120 full location of each item) and ``node_profile``. | |
121 | |
122 "Parsed data" are the result of the parsing of the items XML payload by feature aware | |
123 plugins. Those data are usually more readable and easier to work with. Parsed data are | |
124 only stored when a parser is registered for a specific feature, that means that a Pubsub | |
125 item in cache may not have parsed data at all, in which case an empty dict will be used | |
126 instead (and ``-P, --payload`` argument should be used to get content of the item). | |
127 | |
128 The dates are normally stored as `Unix time`_ in database, but the default output convert | |
129 the ``updated``, ``created`` and ``published`` fields to human readable local time. Use | |
130 ``--output simple`` if you want to keep the float (or int) value. | |
131 | |
132 XML item payload is not returned by default, but it can be added to the ``item_payload`` | |
133 field if ``-P, --payload`` argument is set. You can also use the ``--output xml`` (or | |
134 ``xml_raw`` if you don't want prettifying) to output directly the highlighted XML | |
135 — without the parsed data —, to have an output similar to the one of ``li pubsub get``. | |
136 | |
137 If you are interested only in a specific data (e.g. item id and title), the ``-k KEY, | |
138 --key KEY`` can be used. | |
139 | |
140 You'll probably want to limit result size by using ``-l LIMIT, --limit LIMIT``, and do | |
141 pagination using ``-i INDEX, --index INDEX``. | |
142 | |
143 .. _Unix time: https://en.wikipedia.org/wiki/Unix_time | |
144 | |
145 Filters | |
146 ------- | |
147 | |
148 By default search returns all items in cache, you have to use filter to specify what you | |
149 are looking after. We can split filters in 3 categories: nodes/items metadata, | |
150 Full-Text Search query and parsed metadata. | |
151 | |
152 Nodes/items metadata are the generic information you have on a node: which profile it | |
153 belong too, which pubsub service it's coming from, what's the name or type of the node, | |
154 etc. | |
155 | |
156 Arguments there should be self-explanatory. Type (set with ``-t TYPE, --type TYPE``) and | |
157 subtype (set with ``-S SUBTYPE, --subtype SUBTYPE``) are values dependent of the | |
158 plugin/feature associated with the node, so we can't list them in an exhaustive way here. | |
159 The most common type is probably ``blog``, from which a subtype can be ``comment``. An | |
160 empty string can be used to find items with (sub)type not set. | |
161 | |
162 It's usually a good idea to specify a profile with ``-p PROFILE, --profile PROFILE``, | |
163 otherwise you may get duplicated results. | |
164 | |
165 Full-Text Search | |
166 ---------------- | |
167 | |
168 You can specify a Full-Text Search query with the ``-f FTS_QUERY, --fts FTS_QUERY`` | |
169 argument. The engine is currently SQLite FTS5, and you can check its `query syntax`_. | |
170 FTS is done on the whole raw XML payload, that means that all data there can be matched | |
171 (including XML tags and attributes). | |
172 | |
173 FTS queries are indexed, that means that they are fast and efficient. | |
174 | |
175 .. note:: | |
176 | |
177 Futures version of Libervia will probably include other FTS engines (support for | |
178 PostgreSQL and MySQL/MariaDB is planned). Thus the syntax may vary depending on the | |
179 engine, or a common syntax may be implemented for all engines in the future. Keep that | |
180 in mind if you plan to use FTS capabilities in long-term queries, e.g. in scripts. | |
181 | |
182 .. _query syntax: https://sqlite.org/fts5.html#full_text_query_syntax | |
183 | |
184 Parsed Metadata Filters | |
185 ----------------------- | |
186 | |
187 It is possible to filter on any field of parsed data. This is done with the ``-F PATH | |
188 OPERATOR VALUE, --field PATH OPERATOR VALUE`` (be careful that the short option is an | |
189 uppercase ``F``, the lower case one being used for Full-Text Search). | |
190 | |
191 .. note:: | |
192 | |
193 Parsed Metadata Filters are not indexed, that means that using them is less efficient | |
194 than using e.g. Full-Text Search. If you want to filter on a text field, it's often a | |
195 good idea to pre-filter using Full-Text Search to have more efficient queries. | |
196 | |
197 ``PATH`` and ``VALUE`` can be either specified as string, or using JSON syntax (if the | |
198 value can't be decoded as JSON, it is used as plain text). | |
199 | |
200 ``PATH`` is the name of the field to use. If you must go beyond root level fields, you can | |
201 use a JSON array to specify each element of the path. If a string is used, it's an object | |
202 key, if a number is used it's an array index. Thus you can use ``title`` to access the | |
203 root title key, or ``'"title"'`` (JSON string escaped for shell) or ``'["title"]'`` (JSON | |
204 array with the "title" string, escaped for shell). | |
205 | |
206 .. note:: | |
207 | |
208 The extra fields ``pubsub_service``, ``pubsub_node`` and ``node_profile`` are added to | |
209 the result after the query, thus they can't be used as fields for filtering (use the | |
210 direct arguments for that). | |
211 | |
212 ``OPERATOR`` indicate how to use the value to make a filter. The currently supported | |
213 operators are: | |
214 | |
215 ``==`` or ``eq`` | |
216 Equality operator, true if field value is the same as given value. | |
217 | |
218 ``!=`` or ``ne`` | |
219 Inequality operator, true if the field value is different from given value. | |
220 | |
221 ``>`` or ``gt`` | |
222 Greater than, true if the field value is higher than given value. For string, this is | |
223 according to alphabetical order. | |
224 | |
225 Time Pattern can be used here, see below. | |
226 | |
227 ``<`` or ``lt`` | |
228 Lesser than, true if the field value is lower than given value. For string, this is | |
229 according to alphabetical order. | |
230 | |
231 Time Pattern can be used here, see below. | |
232 | |
233 ``between`` | |
234 Given value must be an array with 2 elements. The condition is true if field value is | |
235 between the 2 elements (for string, this is according to alphabetical order). | |
236 | |
237 Time Pattern can be used here, see below. | |
238 | |
239 ``in`` | |
240 Given value must be an array of elements. Field value must be one of them to make the | |
241 condition true. | |
242 | |
243 ``not_in`` | |
244 Given value must be an array of elements. Field value must not be any of them the make | |
245 the condition true. | |
246 | |
247 ``overlap`` | |
248 This can be used only on array fields. | |
249 | |
250 If given value is not already an array, it is put in an array. Condition is true if any | |
251 element of field value match any element of given value. Notably useful to filter on | |
252 tags. | |
253 | |
254 ``ioverlap`` | |
255 Same as ``overlap`` but done in a case insensitive way. | |
256 | |
257 ``disjoint`` | |
258 This can be used only on array fields. | |
259 | |
260 If given value is not already an array, it is put in an array. Condition is true if no | |
261 element of field value match any element of given value. Notably useful to filter out | |
262 tags. | |
263 | |
264 ``idisjoint`` | |
265 Same as ``disjoint`` but done in a case insensitive way. | |
266 | |
267 ``like`` | |
268 Does pattern matching on a string. ``%`` can be used to match zero or more characters | |
269 and ``_`` can be used to match any single character. | |
270 | |
271 If you're not looking after a specific field, it's better to use Full-Text Search when | |
272 possible. | |
273 | |
274 ``ilike`` | |
275 Like ``like`` but done in a case insensitive way. | |
276 | |
277 | |
278 ``not_like`` | |
279 Same as ``like`` except that condition is true when pattern is **not** matching. | |
280 | |
281 ``not_ilike`` | |
282 Same as ``not_like`` but done in a case insensitive way. | |
283 | |
284 | |
285 For ``gt``/``>``, ``lt``/``<`` and ``between``, you can use :ref:`time_pattern` by using | |
286 the syntax ``TP(<time pattern>)`` (see examples below). | |
287 | |
288 Ordering | |
289 -------- | |
290 | |
291 Result ordering can be done by a well know order, or using a parsed data field. Ordering | |
292 default to ``created`` (see below), but this may be changed with ``-o ORDER [FIELD] | |
293 [DIRECTION], --order-by ORDER [FIELD] [DIRECTION]``. | |
294 | |
295 ``ORDER`` can be one of the following: | |
296 | |
297 ``creation`` | |
298 Order by item creation date. Note that is this the date of creation of the item in cache | |
299 (which most of time should correspond to order of creation of the item in the source | |
300 pubsub service), and this may differ from the date of publication as specified with some | |
301 feature (like blog). This is important when old items are imported, e.g. when they're | |
302 coming from an other blog engine. | |
303 | |
304 ``modification`` | |
305 Order by the date when item has last been modified. Modification date is the same as | |
306 creation date if the item has never been modified since it is in cache. The same warning | |
307 as for ``creation`` applies: this is the date of last modification in cache, not the one | |
308 advertised in parsed data. | |
309 | |
310 ``item_id`` | |
311 Order by XMPP id of the item. Notably useful when user-friendly ID are used (like it is | |
312 often the case with blogs). | |
313 | |
314 ``rank`` | |
315 Order item by Full-Text Search rank. This one can only be used when Full-Text Search is | |
316 used (via ``-f FTS_QUERY, --fts FTS_QUERY``). Rank is a value indicating how well an | |
317 item match the query. This usually needs to be used with ``desc`` direction, so you get | |
318 most relevant items first. | |
319 | |
320 ``field`` | |
321 This special order indicates that the ordering must be done on an parsed data field. The | |
322 following argument is then the path of the field to used (which can be a plain text name | |
323 of a root field, or a JSON encoded array). An optional direction can be specified as a | |
324 third argument. See examples below. | |
325 | |
326 examples | |
327 -------- | |
328 | |
329 Search for blog items cached for the profile ``louise`` which contain the word | |
330 ``Slovakia``:: | |
331 | |
332 $ li pubsub cache search -t blog -p louise -f Slovakia | |
333 | |
334 Show title, publication date and id of blog articles (excluding comments) which have been | |
335 published on Louise's blog during the last 6 months, order them by item id. Here we use an | |
336 empty string as a subtype to exclude comments (for which subtype is ``comment``):: | |
337 | |
338 $ li pubsub cache search -t blog -S "" -p louise -s louise@example.net -n urn:xmpp:microblog:0 -F published gt 'TP(6 months ago)' -k id -k published -k title -o item_id | |
339 | |
340 Show all blog items from anywhere which are tagged as XMPP or ActivityPub (case | |
341 insensitive) and which have been published in the last month (according to advertised | |
342 publishing date, not cache creation date). | |
343 | |
344 We want to order them by descending publication date (again the advertised publication | |
345 date, not cache creation), and we don't want more than 50 results. | |
346 | |
347 We do a FTS query there even if it's not mandatory, because it will do an efficient | |
348 pre-filtering:: | |
349 | |
350 $ li pubsub cache search -f "xmpp OR activitypub" -F tags ioverlap '["xmpp", "activitypub"]' -F published gt 'TP(1 month ago)' -o field published desc -l 50 |