Mercurial > libervia-backend
changeset 3670:d0b66efc6c0e
doc (cli/pubsub_cache): `search` command documentation:
rel 361
author | Goffi <goffi@goffi.org> |
---|---|
date | Wed, 08 Sep 2021 17:58:48 +0200 (2021-09-08) |
parents | 23be54db81f1 |
children | 9c50d2f812c1 |
files | doc/libervia-cli/common_arguments.rst doc/libervia-cli/pubsub_cache.rst |
diffstat | 2 files changed, 244 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/libervia-cli/common_arguments.rst Wed Sep 08 17:58:48 2021 +0200 +++ b/doc/libervia-cli/common_arguments.rst Wed Sep 08 17:58:48 2021 +0200 @@ -218,6 +218,8 @@ $ li blog get -O template --oo browser --oo template=/tmp/my_template.html +.. _time_pattern: + Time Pattern ============
--- a/doc/libervia-cli/pubsub_cache.rst Wed Sep 08 17:58:48 2021 +0200 +++ b/doc/libervia-cli/pubsub_cache.rst Wed Sep 08 17:58:48 2021 +0200 @@ -106,3 +106,245 @@ Reset the whole pubsub cache:: $ li pubsub cache reset + +search +====== + +Search items into pubsub cache. The search is done on the whole cache, it's not restricted +to a single node/profile (even if it may be if suitable filters are specified). Full-Text +Search can be done with ``-f FTS, --fts FTS`` argument, as well as filtering on parsed +data (with ``-F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE``, see below). + +By default, parsed data are returned, with the 3 additional keys ``pubsub_service``, +``pubsub_items`` (the search being done on the whole cache, those data are here to get the +full location of each item) and ``node_profile``. + +"Parsed data" are the result of the parsing of the items XML payload by feature aware +plugins. Those data are usually more readable and easier to work with. Parsed data are +only stored when a parser is registered for a specific feature, that means that a Pubsub +item in cache may not have parsed data at all, in which case an empty dict will be used +instead (and ``-P, --payload`` argument should be used to get content of the item). + +The dates are normally stored as `Unix time`_ in database, but the default output convert +the ``updated``, ``created`` and ``published`` fields to human readable local time. Use +``--output simple`` if you want to keep the float (or int) value. + +XML item payload is not returned by default, but it can be added to the ``item_payload`` +field if ``-P, --payload`` argument is set. You can also use the ``--output xml`` (or +``xml_raw`` if you don't want prettifying) to output directly the highlighted XML +— without the parsed data —, to have an output similar to the one of ``li pubsub get``. + +If you are interested only in a specific data (e.g. item id and title), the ``-k KEY, +--key KEY`` can be used. + +You'll probably want to limit result size by using ``-l LIMIT, --limit LIMIT``, and do +pagination using ``-i INDEX, --index INDEX``. + +.. _Unix time: https://en.wikipedia.org/wiki/Unix_time + +Filters +------- + +By default search returns all items in cache, you have to use filter to specify what you +are looking after. We can split filters in 3 categories: nodes/items metadata, +Full-Text Search query and parsed metadata. + +Nodes/items metadata are the generic information you have on a node: which profile it +belong too, which pubsub service it's coming from, what's the name or type of the node, +etc. + +Arguments there should be self-explanatory. Type (set with ``-t TYPE, --type TYPE``) and +subtype (set with ``-S SUBTYPE, --subtype SUBTYPE``) are values dependent of the +plugin/feature associated with the node, so we can't list them in an exhaustive way here. +The most common type is probably ``blog``, from which a subtype can be ``comment``. An +empty string can be used to find items with (sub)type not set. + +It's usually a good idea to specify a profile with ``-p PROFILE, --profile PROFILE``, +otherwise you may get duplicated results. + +Full-Text Search +---------------- + +You can specify a Full-Text Search query with the ``-f FTS_QUERY, --fts FTS_QUERY`` +argument. The engine is currently SQLite FTS5, and you can check its `query syntax`_. +FTS is done on the whole raw XML payload, that means that all data there can be matched +(including XML tags and attributes). + +FTS queries are indexed, that means that they are fast and efficient. + +.. note:: + + Futures version of Libervia will probably include other FTS engines (support for + PostgreSQL and MySQL/MariaDB is planned). Thus the syntax may vary depending on the + engine, or a common syntax may be implemented for all engines in the future. Keep that + in mind if you plan to use FTS capabilities in long-term queries, e.g. in scripts. + +.. _query syntax: https://sqlite.org/fts5.html#full_text_query_syntax + +Parsed Metadata Filters +----------------------- + +It is possible to filter on any field of parsed data. This is done with the ``-F PATH +OPERATOR VALUE, --field PATH OPERATOR VALUE`` (be careful that the short option is an +uppercase ``F``, the lower case one being used for Full-Text Search). + +.. note:: + + Parsed Metadata Filters are not indexed, that means that using them is less efficient + than using e.g. Full-Text Search. If you want to filter on a text field, it's often a + good idea to pre-filter using Full-Text Search to have more efficient queries. + +``PATH`` and ``VALUE`` can be either specified as string, or using JSON syntax (if the +value can't be decoded as JSON, it is used as plain text). + +``PATH`` is the name of the field to use. If you must go beyond root level fields, you can +use a JSON array to specify each element of the path. If a string is used, it's an object +key, if a number is used it's an array index. Thus you can use ``title`` to access the +root title key, or ``'"title"'`` (JSON string escaped for shell) or ``'["title"]'`` (JSON +array with the "title" string, escaped for shell). + +.. note:: + + The extra fields ``pubsub_service``, ``pubsub_node`` and ``node_profile`` are added to + the result after the query, thus they can't be used as fields for filtering (use the + direct arguments for that). + +``OPERATOR`` indicate how to use the value to make a filter. The currently supported +operators are: + +``==`` or ``eq`` + Equality operator, true if field value is the same as given value. + +``!=`` or ``ne`` + Inequality operator, true if the field value is different from given value. + +``>`` or ``gt`` + Greater than, true if the field value is higher than given value. For string, this is + according to alphabetical order. + + Time Pattern can be used here, see below. + +``<`` or ``lt`` + Lesser than, true if the field value is lower than given value. For string, this is + according to alphabetical order. + + Time Pattern can be used here, see below. + +``between`` + Given value must be an array with 2 elements. The condition is true if field value is + between the 2 elements (for string, this is according to alphabetical order). + + Time Pattern can be used here, see below. + +``in`` + Given value must be an array of elements. Field value must be one of them to make the + condition true. + +``not_in`` + Given value must be an array of elements. Field value must not be any of them the make + the condition true. + +``overlap`` + This can be used only on array fields. + + If given value is not already an array, it is put in an array. Condition is true if any + element of field value match any element of given value. Notably useful to filter on + tags. + +``ioverlap`` + Same as ``overlap`` but done in a case insensitive way. + +``disjoint`` + This can be used only on array fields. + + If given value is not already an array, it is put in an array. Condition is true if no + element of field value match any element of given value. Notably useful to filter out + tags. + +``idisjoint`` + Same as ``disjoint`` but done in a case insensitive way. + +``like`` + Does pattern matching on a string. ``%`` can be used to match zero or more characters + and ``_`` can be used to match any single character. + + If you're not looking after a specific field, it's better to use Full-Text Search when + possible. + +``ilike`` + Like ``like`` but done in a case insensitive way. + + +``not_like`` + Same as ``like`` except that condition is true when pattern is **not** matching. + +``not_ilike`` + Same as ``not_like`` but done in a case insensitive way. + + +For ``gt``/``>``, ``lt``/``<`` and ``between``, you can use :ref:`time_pattern` by using +the syntax ``TP(<time pattern>)`` (see examples below). + +Ordering +-------- + +Result ordering can be done by a well know order, or using a parsed data field. Ordering +default to ``created`` (see below), but this may be changed with ``-o ORDER [FIELD] +[DIRECTION], --order-by ORDER [FIELD] [DIRECTION]``. + +``ORDER`` can be one of the following: + +``creation`` + Order by item creation date. Note that is this the date of creation of the item in cache + (which most of time should correspond to order of creation of the item in the source + pubsub service), and this may differ from the date of publication as specified with some + feature (like blog). This is important when old items are imported, e.g. when they're + coming from an other blog engine. + +``modification`` + Order by the date when item has last been modified. Modification date is the same as + creation date if the item has never been modified since it is in cache. The same warning + as for ``creation`` applies: this is the date of last modification in cache, not the one + advertised in parsed data. + +``item_id`` + Order by XMPP id of the item. Notably useful when user-friendly ID are used (like it is + often the case with blogs). + +``rank`` + Order item by Full-Text Search rank. This one can only be used when Full-Text Search is + used (via ``-f FTS_QUERY, --fts FTS_QUERY``). Rank is a value indicating how well an + item match the query. This usually needs to be used with ``desc`` direction, so you get + most relevant items first. + +``field`` + This special order indicates that the ordering must be done on an parsed data field. The + following argument is then the path of the field to used (which can be a plain text name + of a root field, or a JSON encoded array). An optional direction can be specified as a + third argument. See examples below. + +examples +-------- + +Search for blog items cached for the profile ``louise`` which contain the word +``Slovakia``:: + + $ li pubsub cache search -t blog -p louise -f Slovakia + +Show title, publication date and id of blog articles (excluding comments) which have been +published on Louise's blog during the last 6 months, order them by item id. Here we use an +empty string as a subtype to exclude comments (for which subtype is ``comment``):: + + $ li pubsub cache search -t blog -S "" -p louise -s louise@example.net -n urn:xmpp:microblog:0 -F published gt 'TP(6 months ago)' -k id -k published -k title -o item_id + +Show all blog items from anywhere which are tagged as XMPP or ActivityPub (case +insensitive) and which have been published in the last month (according to advertised +publishing date, not cache creation date). + +We want to order them by descending publication date (again the advertised publication +date, not cache creation), and we don't want more than 50 results. + +We do a FTS query there even if it's not mandatory, because it will do an efficient +pre-filtering:: + + $ li pubsub cache search -f "xmpp OR activitypub" -F tags ioverlap '["xmpp", "activitypub"]' -F published gt 'TP(1 month ago)' -o field published desc -l 50