Mercurial > libervia-backend
view doc/libervia-cli/pubsub_cache.rst @ 4326:5fd6a4dc2122
cli (output/std): use `rich` to output JSON.
author | Goffi <goffi@goffi.org> |
---|---|
date | Wed, 20 Nov 2024 11:38:44 +0100 |
parents | d0b66efc6c0e |
children |
line wrap: on
line source
.. _libervia-cli_pubsub_cache: ===================================== pubsub/cache: PubSub Cache Management ===================================== Libervia runs transparently a cache for pubsub. That means that according to internal criteria, some pubsub items are stored locally. The ``cache`` subcommands let user inspect and manipulate the internal cache. get === Retrieve items from internal cache only. Most end-users won't need to use this command, as the usual ``pubsub get`` command will use cache transparently. However, it may be useful to inspect local cache, notably for debugging. The parameters are basically the same as for :ref:`li_pubsub_get`. example ------- Retrieve the last 2 cached items for personal blog:: $ li pubsub cache get -n urn:xmpp:microblog:0 -M 2 .. _li_pubsub_cache_sync: sync ==== Synchronise or resynchronise a pubsub node. If the node is already in cache, it will be deleted then re-cached. Node will be put in cache even if internal policy doesn't request a synchronisation for this kind of nodes. Node will be (re-)subscribed to keep cache synchronised. All items of the node (up to the internal limit which is high), will be retrieved and put in cache, even if a previous version of those items have been deleted by the :ref:`li_pubsub_cache_purge` command. example ------- Resynchronise personal blog:: $ li pubusb cache sync -n urn:xmpp:microblog:0 .. _li_pubsub_cache_purge: purge ===== Remove items from cache. This may be desirable to save resource, notably disk space. Note that once a pubsub node is cached, the cache is the source of trust. That means that if cache is not explicitly bypassed when retrieving items of a pubsub node (notably with the ``-C, --no-cache`` option of :ref:`li_pubsub_get`), only items found in cache will be returned, thus purged items won't be used or returned anymore even if they still exists on the original pubsub service. If you have purged items by mistake, it is possible to retrieve them either node by node using :ref:`li_pubsub_cache_sync`, or by resetting the whole pubsub cache with :ref:`li_pubsub_cache_reset`. If you have a node or a profile (e.g. a component) caching a lot of items frequently, you may use this command using a scheduler like cron_. .. _cron: https://en.wikipedia.org/wiki/Cron examples -------- Remove all blog and event items from cache if they haven't been updated since 6 months:: $ li pubsub cache purge -t blog -t event -b "6 months ago" Remove items from profile ``ap_gateway`` if they have been created more that 2 months ago:: $ li pubsub cache purge -p ap_gateway --created-before "2 months ago" .. _li_pubsub_cache_reset: reset ===== Reset the whole pubsub cache. This means that all nodes and all them items will be removed from cache. After this command, cache will be re-filled progressively as if it where a new one. .. note:: Use this command with caution: even if cache will be re-constructed with time, that means that items will have to be retrieved again, that may be resource intensive both for your machine and for the pubsub services which will be used. That also means that searching items will return less results until all desired items are cached again. Also note that all items of cached nodes are retrieved, even if you have previously purged items, they will be retrieved again. example ------- Reset the whole pubsub cache:: $ li pubsub cache reset search ====== Search items into pubsub cache. The search is done on the whole cache, it's not restricted to a single node/profile (even if it may be if suitable filters are specified). Full-Text Search can be done with ``-f FTS, --fts FTS`` argument, as well as filtering on parsed data (with ``-F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE``, see below). By default, parsed data are returned, with the 3 additional keys ``pubsub_service``, ``pubsub_items`` (the search being done on the whole cache, those data are here to get the full location of each item) and ``node_profile``. "Parsed data" are the result of the parsing of the items XML payload by feature aware plugins. Those data are usually more readable and easier to work with. Parsed data are only stored when a parser is registered for a specific feature, that means that a Pubsub item in cache may not have parsed data at all, in which case an empty dict will be used instead (and ``-P, --payload`` argument should be used to get content of the item). The dates are normally stored as `Unix time`_ in database, but the default output convert the ``updated``, ``created`` and ``published`` fields to human readable local time. Use ``--output simple`` if you want to keep the float (or int) value. XML item payload is not returned by default, but it can be added to the ``item_payload`` field if ``-P, --payload`` argument is set. You can also use the ``--output xml`` (or ``xml_raw`` if you don't want prettifying) to output directly the highlighted XML — without the parsed data —, to have an output similar to the one of ``li pubsub get``. If you are interested only in a specific data (e.g. item id and title), the ``-k KEY, --key KEY`` can be used. You'll probably want to limit result size by using ``-l LIMIT, --limit LIMIT``, and do pagination using ``-i INDEX, --index INDEX``. .. _Unix time: https://en.wikipedia.org/wiki/Unix_time Filters ------- By default search returns all items in cache, you have to use filter to specify what you are looking after. We can split filters in 3 categories: nodes/items metadata, Full-Text Search query and parsed metadata. Nodes/items metadata are the generic information you have on a node: which profile it belong too, which pubsub service it's coming from, what's the name or type of the node, etc. Arguments there should be self-explanatory. Type (set with ``-t TYPE, --type TYPE``) and subtype (set with ``-S SUBTYPE, --subtype SUBTYPE``) are values dependent of the plugin/feature associated with the node, so we can't list them in an exhaustive way here. The most common type is probably ``blog``, from which a subtype can be ``comment``. An empty string can be used to find items with (sub)type not set. It's usually a good idea to specify a profile with ``-p PROFILE, --profile PROFILE``, otherwise you may get duplicated results. Full-Text Search ---------------- You can specify a Full-Text Search query with the ``-f FTS_QUERY, --fts FTS_QUERY`` argument. The engine is currently SQLite FTS5, and you can check its `query syntax`_. FTS is done on the whole raw XML payload, that means that all data there can be matched (including XML tags and attributes). FTS queries are indexed, that means that they are fast and efficient. .. note:: Futures version of Libervia will probably include other FTS engines (support for PostgreSQL and MySQL/MariaDB is planned). Thus the syntax may vary depending on the engine, or a common syntax may be implemented for all engines in the future. Keep that in mind if you plan to use FTS capabilities in long-term queries, e.g. in scripts. .. _query syntax: https://sqlite.org/fts5.html#full_text_query_syntax Parsed Metadata Filters ----------------------- It is possible to filter on any field of parsed data. This is done with the ``-F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE`` (be careful that the short option is an uppercase ``F``, the lower case one being used for Full-Text Search). .. note:: Parsed Metadata Filters are not indexed, that means that using them is less efficient than using e.g. Full-Text Search. If you want to filter on a text field, it's often a good idea to pre-filter using Full-Text Search to have more efficient queries. ``PATH`` and ``VALUE`` can be either specified as string, or using JSON syntax (if the value can't be decoded as JSON, it is used as plain text). ``PATH`` is the name of the field to use. If you must go beyond root level fields, you can use a JSON array to specify each element of the path. If a string is used, it's an object key, if a number is used it's an array index. Thus you can use ``title`` to access the root title key, or ``'"title"'`` (JSON string escaped for shell) or ``'["title"]'`` (JSON array with the "title" string, escaped for shell). .. note:: The extra fields ``pubsub_service``, ``pubsub_node`` and ``node_profile`` are added to the result after the query, thus they can't be used as fields for filtering (use the direct arguments for that). ``OPERATOR`` indicate how to use the value to make a filter. The currently supported operators are: ``==`` or ``eq`` Equality operator, true if field value is the same as given value. ``!=`` or ``ne`` Inequality operator, true if the field value is different from given value. ``>`` or ``gt`` Greater than, true if the field value is higher than given value. For string, this is according to alphabetical order. Time Pattern can be used here, see below. ``<`` or ``lt`` Lesser than, true if the field value is lower than given value. For string, this is according to alphabetical order. Time Pattern can be used here, see below. ``between`` Given value must be an array with 2 elements. The condition is true if field value is between the 2 elements (for string, this is according to alphabetical order). Time Pattern can be used here, see below. ``in`` Given value must be an array of elements. Field value must be one of them to make the condition true. ``not_in`` Given value must be an array of elements. Field value must not be any of them the make the condition true. ``overlap`` This can be used only on array fields. If given value is not already an array, it is put in an array. Condition is true if any element of field value match any element of given value. Notably useful to filter on tags. ``ioverlap`` Same as ``overlap`` but done in a case insensitive way. ``disjoint`` This can be used only on array fields. If given value is not already an array, it is put in an array. Condition is true if no element of field value match any element of given value. Notably useful to filter out tags. ``idisjoint`` Same as ``disjoint`` but done in a case insensitive way. ``like`` Does pattern matching on a string. ``%`` can be used to match zero or more characters and ``_`` can be used to match any single character. If you're not looking after a specific field, it's better to use Full-Text Search when possible. ``ilike`` Like ``like`` but done in a case insensitive way. ``not_like`` Same as ``like`` except that condition is true when pattern is **not** matching. ``not_ilike`` Same as ``not_like`` but done in a case insensitive way. For ``gt``/``>``, ``lt``/``<`` and ``between``, you can use :ref:`time_pattern` by using the syntax ``TP(<time pattern>)`` (see examples below). Ordering -------- Result ordering can be done by a well know order, or using a parsed data field. Ordering default to ``created`` (see below), but this may be changed with ``-o ORDER [FIELD] [DIRECTION], --order-by ORDER [FIELD] [DIRECTION]``. ``ORDER`` can be one of the following: ``creation`` Order by item creation date. Note that is this the date of creation of the item in cache (which most of time should correspond to order of creation of the item in the source pubsub service), and this may differ from the date of publication as specified with some feature (like blog). This is important when old items are imported, e.g. when they're coming from an other blog engine. ``modification`` Order by the date when item has last been modified. Modification date is the same as creation date if the item has never been modified since it is in cache. The same warning as for ``creation`` applies: this is the date of last modification in cache, not the one advertised in parsed data. ``item_id`` Order by XMPP id of the item. Notably useful when user-friendly ID are used (like it is often the case with blogs). ``rank`` Order item by Full-Text Search rank. This one can only be used when Full-Text Search is used (via ``-f FTS_QUERY, --fts FTS_QUERY``). Rank is a value indicating how well an item match the query. This usually needs to be used with ``desc`` direction, so you get most relevant items first. ``field`` This special order indicates that the ordering must be done on an parsed data field. The following argument is then the path of the field to used (which can be a plain text name of a root field, or a JSON encoded array). An optional direction can be specified as a third argument. See examples below. examples -------- Search for blog items cached for the profile ``louise`` which contain the word ``Slovakia``:: $ li pubsub cache search -t blog -p louise -f Slovakia Show title, publication date and id of blog articles (excluding comments) which have been published on Louise's blog during the last 6 months, order them by item id. Here we use an empty string as a subtype to exclude comments (for which subtype is ``comment``):: $ li pubsub cache search -t blog -S "" -p louise -s louise@example.net -n urn:xmpp:microblog:0 -F published gt 'TP(6 months ago)' -k id -k published -k title -o item_id Show all blog items from anywhere which are tagged as XMPP or ActivityPub (case insensitive) and which have been published in the last month (according to advertised publishing date, not cache creation date). We want to order them by descending publication date (again the advertised publication date, not cache creation), and we don't want more than 50 results. We do a FTS query there even if it's not mandatory, because it will do an efficient pre-filtering:: $ li pubsub cache search -f "xmpp OR activitypub" -F tags ioverlap '["xmpp", "activitypub"]' -F published gt 'TP(1 month ago)' -o field published desc -l 50