annotate libervia/backend/plugins/plugin_misc_text_syntaxes.py @ 4351:6a0a081485b8

plugin autocrypt: Autocrypt protocol implementation: Implementation of autocrypt: `autocrypt` header is checked, and if present and no public key is known for the peer, the key is imported. `autocrypt` header is also added to outgoing message (only if an email gateway is detected). For the moment, the JID is use as identifier, but the real email used by gateway should be used in the future. rel 456
author Goffi <goffi@goffi.org>
date Fri, 28 Feb 2025 09:23:35 +0100
parents 111dce64dcb5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
1 #!/usr/bin/env python3
3137
559a625a236b fixed shebangs
Goffi <goffi@goffi.org>
parents: 3136
diff changeset
2
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
3
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
4 # SAT plugin for managing various text syntaxes
3479
be6d91572633 date update
Goffi <goffi@goffi.org>
parents: 3137
diff changeset
5 # Copyright (C) 2009-2021 Jérôme Poisson (goffi@goffi.org)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
6
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
7 # This program is free software: you can redistribute it and/or modify
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
8 # it under the terms of the GNU Affero General Public License as published by
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
9 # the Free Software Foundation, either version 3 of the License, or
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
10 # (at your option) any later version.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
11
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
12 # This program is distributed in the hope that it will be useful,
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
15 # GNU Affero General Public License for more details.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
16
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
17 # You should have received a copy of the GNU Affero General Public License
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
19
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
20 from functools import partial
3075
501a1a3c8594 plugin text syntaxes: don't use anymore deprecated cgi.escape
Goffi <goffi@goffi.org>
parents: 3040
diff changeset
21 from html import escape
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
22 import re
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
23 from typing import Set
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
24
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
25 from twisted.internet import defer
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
26 from twisted.internet.threads import deferToThread
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
27
4071
4b842c1fb686 refactoring: renamed `sat` package to `libervia.backend`
Goffi <goffi@goffi.org>
parents: 4059
diff changeset
28 from libervia.backend.core import exceptions
4b842c1fb686 refactoring: renamed `sat` package to `libervia.backend`
Goffi <goffi@goffi.org>
parents: 4059
diff changeset
29 from libervia.backend.core.constants import Const as C
4b842c1fb686 refactoring: renamed `sat` package to `libervia.backend`
Goffi <goffi@goffi.org>
parents: 4059
diff changeset
30 from libervia.backend.core.i18n import D_, _
4b842c1fb686 refactoring: renamed `sat` package to `libervia.backend`
Goffi <goffi@goffi.org>
parents: 4059
diff changeset
31 from libervia.backend.core.log import getLogger
4b842c1fb686 refactoring: renamed `sat` package to `libervia.backend`
Goffi <goffi@goffi.org>
parents: 4059
diff changeset
32 from libervia.backend.tools import xml_tools
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
33
1542
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
34 try:
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
35 from lxml import html
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
36 from lxml import etree
1542
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
37 except ImportError:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
38 raise exceptions.MissingModule(
4325
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
39 'Missing module "lxml", please download/install it from http://lxml.de/'
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
40 )
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
41 try:
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
42 from lxml.html import clean
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
43 except ImportError:
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
44 raise exceptions.MissingModule(
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
45 'Missing module "lxml_html_clean". Please download and install it from '
4334
111dce64dcb5 plugins XEP-0300, XEP-0446, XEP-0447, XEP0448 and others: Refactoring to use Pydantic:
Goffi <goffi@goffi.org>
parents: 4325
diff changeset
46 "http://lxml.de/. Note that this is a separate package to install in addition "
4325
45662662a432 plugin text syntaxes: update exception message to be more explicit when `lxml_html_clean` is missing.
Goffi <goffi@goffi.org>
parents: 4270
diff changeset
47 'to "lxml".'
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
48 )
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
49
2873
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
50 log = getLogger(__name__)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
51
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
52 CATEGORY = D_("Composition")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
53 NAME = "Syntax"
2869
148d30147890 plugin text syntaxes: fixed default syntax
Goffi <goffi@goffi.org>
parents: 2786
diff changeset
54 _SYNTAX_XHTML = "xhtml" # must be lower case
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
55 _SYNTAX_CURRENT = "@CURRENT@"
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
56
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
57 # TODO: check/adapt following list
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
58 # list initialy based on feedparser list (http://pythonhosted.org/feedparser/html-sanitization.html)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
59 STYLES_WHITELIST = (
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
60 "azimuth",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
61 "background-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
62 "border-bottom-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
63 "border-collapse",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
64 "border-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
65 "border-left-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
66 "border-right-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
67 "border-top-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
68 "clear",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
69 "color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
70 "cursor",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
71 "direction",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
72 "display",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
73 "elevation",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
74 "float",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
75 "font",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
76 "font-family",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
77 "font-size",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
78 "font-style",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
79 "font-variant",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
80 "font-weight",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
81 "height",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
82 "letter-spacing",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
83 "line-height",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
84 "overflow",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
85 "pause",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
86 "pause-after",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
87 "pause-before",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
88 "pitch",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
89 "pitch-range",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
90 "richness",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
91 "speak",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
92 "speak-header",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
93 "speak-numeral",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
94 "speak-punctuation",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
95 "speech-rate",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
96 "stress",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
97 "text-align",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
98 "text-decoration",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
99 "text-indent",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
100 "unicode-bidi",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
101 "vertical-align",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
102 "voice-family",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
103 "volume",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
104 "white-space",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
105 "width",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
106 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
107
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
108 # cf. https://www.w3.org/TR/html/syntax.html#void-elements
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
109 VOID_ELEMENTS = (
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
110 "area",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
111 "base",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
112 "br",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
113 "col",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
114 "embed",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
115 "hr",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
116 "img",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
117 "input",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
118 "keygen",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
119 "link",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
120 "menuitem",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
121 "meta",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
122 "param",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
123 "source",
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
124 "track",
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
125 "wbr",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
126 )
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
127
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
128 SAFE_ATTRS = html.defs.safe_attrs.union({"style", "poster", "controls"}) - {"id"}
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
129 SAFE_CLASSES = {
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
130 # those classes are used for code highlighting
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
131 "bp",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
132 "c",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
133 "ch",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
134 "cm",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
135 "cp",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
136 "cpf",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
137 "cs",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
138 "dl",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
139 "err",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
140 "fm",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
141 "gd",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
142 "ge",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
143 "get",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
144 "gh",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
145 "gi",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
146 "go",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
147 "gp",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
148 "gr",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
149 "gs",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
150 "gt",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
151 "gu",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
152 "highlight",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
153 "hll",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
154 "il",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
155 "k",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
156 "kc",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
157 "kd",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
158 "kn",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
159 "kp",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
160 "kr",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
161 "kt",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
162 "m",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
163 "mb",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
164 "mf",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
165 "mh",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
166 "mi",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
167 "mo",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
168 "na",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
169 "nb",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
170 "nc",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
171 "nd",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
172 "ne",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
173 "nf",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
174 "ni",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
175 "nl",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
176 "nn",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
177 "no",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
178 "nt",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
179 "nv",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
180 "o",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
181 "ow",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
182 "s",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
183 "sa",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
184 "sb",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
185 "sc",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
186 "sd",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
187 "se",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
188 "sh",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
189 "si",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
190 "sr",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
191 "ss",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
192 "sx",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
193 "vc",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
194 "vg",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
195 "vi",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
196 "vm",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
197 "w",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
198 "write",
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
199 }
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
200 STYLES_VALUES_REGEX = (
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
201 r"^("
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
202 + "|".join(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
203 [
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
204 "([a-z-]+)", # alphabetical names
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
205 "(#[0-9a-f]+)", # hex value
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
206 "(\d+(.\d+)? *(|%|em|ex|px|in|cm|mm|pt|pc))", # values with units (or not)
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
207 "rgb\( *((\d+(.\d+)?), *){2}(\d+(.\d+)?) *\)", # rgb function
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
208 "rgba\( *((\d+(.\d+)?), *){3}(\d+(.\d+)?) *\)", # rgba function
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
209 ]
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
210 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
211 + ") *(!important)?$"
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
212 ) # we accept "!important" at the end
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
213 STYLES_ACCEPTED_VALUE = re.compile(STYLES_VALUES_REGEX)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
214
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
215 PLUGIN_INFO = {
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
216 C.PI_NAME: "Text syntaxes",
2780
85d3240a400f plugin text syntaxes: changed import name to TEXT_SYNTAX (better with underscore for autocompletion)
Goffi <goffi@goffi.org>
parents: 2771
diff changeset
217 C.PI_IMPORT_NAME: "TEXT_SYNTAXES",
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
218 C.PI_TYPE: "MISC",
3726
33d75cd3c371 plugin XEP-0060, XEP-0163, XEP-0277, text syntaxes: make those plugins usable with components
Goffi <goffi@goffi.org>
parents: 3709
diff changeset
219 C.PI_MODES: C.PLUG_MODE_BOTH,
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
220 C.PI_PROTOCOLS: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
221 C.PI_DEPENDENCIES: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
222 C.PI_MAIN: "TextSyntaxes",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
223 C.PI_HANDLER: "no",
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
224 C.PI_DESCRIPTION: _(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
225 """Management of various text syntaxes (XHTML-IM, Markdown, etc)"""
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
226 ),
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
227 }
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
228
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
229
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
230 class TextSyntaxes(object):
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
231 """Text conversion class
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
232 XHTML utf-8 is used as intermediate language for conversions
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
233 """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
234
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
235 OPT_DEFAULT = "DEFAULT"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
236 OPT_HIDDEN = "HIDDEN"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
237 OPT_NO_THREAD = "NO_THREAD"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
238 SYNTAX_XHTML = _SYNTAX_XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
239 SYNTAX_MARKDOWN = "markdown"
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
240 SYNTAX_TEXT = "text"
2869
148d30147890 plugin text syntaxes: fixed default syntax
Goffi <goffi@goffi.org>
parents: 2786
diff changeset
241 # default_syntax must be lower case
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
242 default_syntax = SYNTAX_XHTML
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
243
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
244 def __init__(self, host):
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
245 log.info(_("Text syntaxes plugin initialization"))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
246 self.host = host
3620
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
247 self.syntaxes = {}
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
248
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
249 self.params = """
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
250 <params>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
251 <individual>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
252 <category name="%(category_name)s" label="%(category_label)s">
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
253 <param name="%(name)s" label="%(label)s" type="list" security="0">
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
254 %(options)s
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
255 </param>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
256 </category>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
257 </individual>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
258 </params>
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
259 """
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
260
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
261 self.params_data = {
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
262 "category_name": CATEGORY,
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
263 "category_label": _(CATEGORY),
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
264 "name": NAME,
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
265 "label": _(NAME),
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
266 "syntaxes": self.syntaxes,
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
267 }
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
268
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
269 self.add_syntax(
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
270 self.SYNTAX_XHTML,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
271 lambda xhtml: defer.succeed(xhtml),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
272 lambda xhtml: defer.succeed(xhtml),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
273 TextSyntaxes.OPT_NO_THREAD,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
274 )
1826
d80ccf4bf201 plugin blog import dotclear: this plugin import Dotclear 2 backups
Goffi <goffi@goffi.org>
parents: 1811
diff changeset
275 # TODO: text => XHTML should add <a/> to url like in frontends
4074
26b7ed2817da refactoring: rename `sat_frontends` to `libervia.frontends`
Goffi <goffi@goffi.org>
parents: 4071
diff changeset
276 # it's probably best to move libervia.frontends.tools.strings to sat.tools.common or similar
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
277 self.add_syntax(
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
278 self.SYNTAX_TEXT,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
279 lambda text: escape(text),
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
280 lambda xhtml: self._remove_markups(xhtml),
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
281 [TextSyntaxes.OPT_HIDDEN],
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
282 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
283 try:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
284 import markdown, html2text
2873
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
285 from markdown.extensions import Extension
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
286
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
287 # XXX: we disable raw HTML parsing by default, to avoid parsing error
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
288 # when the user is not aware of markdown and HTML
e1207b8ad97c plugin text syntaxes: disable raw HTML parsing in mardown by default
Goffi <goffi@goffi.org>
parents: 2869
diff changeset
289 class EscapeHTML(Extension):
4059
00dbc3370d35 plugin text syntaxes: fix `EscapeHTML` following massive snake_case renaming
Goffi <goffi@goffi.org>
parents: 4037
diff changeset
290 def extendMarkdown(self, md):
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
291 md.preprocessors.deregister("html_block")
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
292 md.inlinePatterns.deregister("html")
841
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
293
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
294 def _html2text(html, baseurl=""):
841
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
295 h = html2text.HTML2Text(baseurl=baseurl)
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
296 h.body_width = 0 # do not truncate the lines, it breaks the long URLs
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
297 return h.handle(html)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
298
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
299 self.add_syntax(
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
300 self.SYNTAX_MARKDOWN,
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
301 partial(
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
302 markdown.markdown,
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
303 extensions=[
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
304 EscapeHTML(),
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
305 "nl2br",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
306 "codehilite",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
307 "fenced_code",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
308 "sane_lists",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
309 "tables",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
310 ],
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
311 extension_configs={
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
312 "codehilite": {
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
313 "css_class": "highlight",
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
314 }
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
315 },
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
316 ),
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
317 _html2text,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
318 [TextSyntaxes.OPT_DEFAULT],
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
319 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
320 except ImportError:
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
321 log.warning("markdown or html2text not found, can't use Markdown syntax")
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
322 log.info(
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
323 "You can download/install them from https://pythonhosted.org/Markdown/ "
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
324 "and https://github.com/Alir3z4/html2text/"
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
325 )
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
326 host.bridge.add_method(
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
327 "syntax_convert",
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
328 ".plugin",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
329 in_sign="sssbs",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
330 out_sign="s",
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
331 async_=True,
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
332 method=self.convert,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
333 )
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
334 host.bridge.add_method(
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
335 "syntax_get", ".plugin", in_sign="s", out_sign="s", method=self.get_syntax
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
336 )
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
337 if xml_tools.clean_xhtml is None:
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
338 log.debug("Installing cleaning method")
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
339 xml_tools.clean_xhtml = self.clean_xhtml
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
340
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
341 def _update_param_options(self):
3620
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
342 data_synt = self.syntaxes
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
343 default_synt = TextSyntaxes.default_syntax
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
344 syntaxes = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
345
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
346 for syntax in list(data_synt.keys()):
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
347 flags = data_synt[syntax]["flags"]
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
348 if TextSyntaxes.OPT_HIDDEN not in flags:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
349 syntaxes.append(syntax)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
350
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
351 syntaxes.sort(key=lambda synt: synt.lower())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
352 options = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
353
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
354 for syntax in syntaxes:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
355 selected = 'selected="true"' if syntax == default_synt else ""
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
356 options.append('<option value="%s" %s/>' % (syntax, selected))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
357
3620
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
358 self.params_data["options"] = "\n".join(options)
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
359 self.host.memory.update_params(self.params % self.params_data)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
360
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
361 def get_current_syntax(self, profile):
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
362 """Return the selected syntax for the given profile
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
363
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
364 @param profile: %(doc_profile)s
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
365 @return: profile selected syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
366 """
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
367 return self.host.memory.param_get_a(NAME, CATEGORY, profile_key=profile)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
368
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
369 def _log_error(self, failure, action="converting syntax"):
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
370 log.error(
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
371 "Error while {action}: {failure}".format(action=action, failure=failure)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
372 )
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
373 return failure
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
374
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
375 def clean_style(self, styles_raw: str) -> str:
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
376 """ "Clean unsafe CSS styles
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
377
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
378 Remove styles not in the whitelist, or where the value doesn't match the regex
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
379 @param styles_raw: CSS styles
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
380 @return: cleaned styles
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
381 """
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
382 styles: List[str] = styles_raw.split(";")
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
383 cleaned_styles = []
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
384 for style in styles:
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
385 try:
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
386 key, value = style.split(":")
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
387 except ValueError:
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
388 continue
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
389 key = key.lower().strip()
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
390 if key not in STYLES_WHITELIST:
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
391 continue
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
392 value = value.lower().strip()
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
393 if not STYLES_ACCEPTED_VALUE.match(value):
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
394 continue
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
395 if value == "none":
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
396 continue
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
397 cleaned_styles.append((key, value))
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
398 return "; ".join(["%s: %s" % (key_, value_) for key_, value_ in cleaned_styles])
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
399
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
400 def clean_classes(self, classes_raw: str) -> str:
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
401 """Remove any non whitelisted class
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
402
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
403 @param classes_raw: classes set on an element
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
404 @return: remaining classes (can be empty string)
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
405 """
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
406 return " ".join(SAFE_CLASSES.intersection(classes_raw.split()))
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
407
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
408 def clean_xhtml(self, xhtml):
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
409 """Clean XHTML text by removing potentially dangerous/malicious parts
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
410
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
411 @param xhtml(unicode, lxml.etree._Element): raw HTML/XHTML text to clean
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
412 @return (unicode): cleaned XHTML
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
413 """
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
414
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
415 if isinstance(xhtml, str):
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
416 try:
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
417 xhtml_elt = html.fromstring(xhtml)
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
418 except etree.ParserError as e:
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
419 if not xhtml.strip():
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
420 return ""
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
421 log.error("Can't clean XHTML: {xhtml}".format(xhtml=xhtml))
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
422 raise e
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
423 elif isinstance(xhtml, html.HtmlElement):
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
424 xhtml_elt = xhtml
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
425 else:
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
426 log.error("Only strings and HtmlElements can be cleaned")
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
427 raise exceptions.DataError
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
428 cleaner = clean.Cleaner(style=False, add_nofollow=False, safe_attrs=SAFE_ATTRS)
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
429 xhtml_elt = cleaner.clean_html(xhtml_elt)
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
430 for elt in xhtml_elt.xpath("//*[@style]"):
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
431 elt.set("style", self.clean_style(elt.get("style")))
3693
0bbdc50aa405 plugin text syntaxes: remove `id` attributes and whitelist allowed classes:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
432 for elt in xhtml_elt.xpath("//*[@class]"):
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
433 elt.set("class", self.clean_classes(elt.get("class")))
2786
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
434 # we remove self-closing elements for non-void elements
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
435 for element in xhtml_elt.iter(tag=etree.Element):
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
436 if not element.text:
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
437 if element.tag in VOID_ELEMENTS:
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
438 element.text = None
be8405795e09 plugin text syntaxes: handle empty content in cleanXHTML + don't use self-closing tags for non-void elements.
Goffi <goffi@goffi.org>
parents: 2782
diff changeset
439 else:
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
440 element.text = ""
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
441 return html.tostring(xhtml_elt, encoding=str, method="xml")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
442
4270
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
443 def convert(
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
444 self, text, syntax_from, syntax_to=_SYNTAX_XHTML, safe=True, profile=None
0d7bb4df2343 Reformatted code base using black.
Goffi <goffi@goffi.org>
parents: 4258
diff changeset
445 ):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
446 """Convert a text between two syntaxes
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
447
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
448 @param text: text to convert
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
449 @param syntax_from: source syntax (e.g. "markdown")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
450 @param syntax_to: dest syntax (e.g.: "XHTML")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
451 @param safe: clean resulting XHTML to avoid malicious code if True
2781
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
452 @param profile: needed only when syntax_from or syntax_to is set to
816be0a23877 plugin text syntaxes: cleanStyle is an independent method, cleanXHTML is now blocking (no need to launch thread for that)
Goffi <goffi@goffi.org>
parents: 2780
diff changeset
453 _SYNTAX_CURRENT
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
454 @return(unicode): converted text
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
455 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
456 # FIXME: convert should be abled to handle domish.Element directly
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
457 # when dealing with XHTML
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
458 # TODO: a way for parser to return parsing errors/warnings
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
459
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
460 if syntax_from == _SYNTAX_CURRENT:
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
461 syntax_from = self.get_current_syntax(profile)
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
462 else:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
463 syntax_from = syntax_from.lower().strip()
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
464 if syntax_to == _SYNTAX_CURRENT:
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
465 syntax_to = self.get_current_syntax(profile)
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
466 else:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
467 syntax_to = syntax_to.lower().strip()
3620
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
468 syntaxes = self.syntaxes
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
469 if syntax_from not in syntaxes:
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
470 raise exceptions.NotFound(syntax_from)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
471 if syntax_to not in syntaxes:
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
472 raise exceptions.NotFound(syntax_to)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
473 d = None
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
474
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
475 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_from]["flags"]:
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
476 d = defer.maybeDeferred(syntaxes[syntax_from]["to"], text)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
477 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
478 d = deferToThread(syntaxes[syntax_from]["to"], text)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
479
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
480 # TODO: keep only body element and change it to a div here ?
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
481
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
482 if safe:
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
483 d.addCallback(self.clean_xhtml)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
484
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
485 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_to]["flags"]:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
486 d.addCallback(syntaxes[syntax_to]["from"])
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
487 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
488 d.addCallback(lambda xhtml: deferToThread(syntaxes[syntax_to]["from"], xhtml))
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
489
836
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
490 # converters can add new lines that disturb the microblog change detection
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
491 d.addCallback(lambda text: text.rstrip())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
492 return d
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
493
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
494 def add_syntax(self, name, to_xhtml_cb, from_xhtml_cb, flags=None):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
495 """Add a new syntax to the manager
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
496
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
497 @param name: unique name of the syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
498 @param to_xhtml_cb: callback to convert from syntax to XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
499 @param from_xhtml_cb: callback to convert from XHTML to syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
500 @param flags: set of optional flags, can be:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
501 TextSyntaxes.OPT_DEFAULT: use as the default syntax (replace former one)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
502 TextSyntaxes.OPT_HIDDEN: do not show in parameters
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
503 TextSyntaxes.OPT_NO_THREAD: do not defer to thread when converting (the callback may then return a deferred)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
504 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
505 flags = flags if flags is not None else []
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
506 if TextSyntaxes.OPT_HIDDEN in flags and TextSyntaxes.OPT_DEFAULT in flags:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
507 raise ValueError(
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
508 "{} and {} are mutually exclusive".format(
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
509 TextSyntaxes.OPT_HIDDEN, TextSyntaxes.OPT_DEFAULT
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
510 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
511 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
512
3620
f568f304c982 plugin text syntaxes: remove side effect on init:
Goffi <goffi@goffi.org>
parents: 3479
diff changeset
513 syntaxes = self.syntaxes
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
514 key = name.lower().strip()
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
515 if key in syntaxes:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
516 raise exceptions.ConflictError(
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
517 "This syntax key already exists: {}".format(key)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
518 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
519 syntaxes[key] = {
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
520 "name": name,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
521 "to": to_xhtml_cb,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
522 "from": from_xhtml_cb,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
523 "flags": flags,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
524 }
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
525 if TextSyntaxes.OPT_DEFAULT in flags:
2869
148d30147890 plugin text syntaxes: fixed default syntax
Goffi <goffi@goffi.org>
parents: 2786
diff changeset
526 TextSyntaxes.default_syntax = key
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
527
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
528 self._update_param_options()
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
529
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
530 def get_syntax(self, name):
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
531 """get syntax key corresponding to a name
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
532
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
533 @raise exceptions.NotFound: syntax doesn't exist
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
534 """
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
535 key = name.lower().strip()
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
536 if key in self.syntaxes:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
537 return key
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
538 raise exceptions.NotFound
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
539
4037
524856bd7b19 massive refactoring to switch from camelCase to snake_case:
Goffi <goffi@goffi.org>
parents: 3726
diff changeset
540 def _remove_markups(self, xhtml):
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
541 """Remove XHTML markups from the given string.
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
542
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
543 @param xhtml: the XHTML string to be cleaned
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
544 @return: the cleaned string
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
545 """
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
546 cleaner = clean.Cleaner(kill_tags=["style"])
4258
ba28ca268f4a plugin text syntaxes: fix empty string handling in `_remove_markups`.
Goffi <goffi@goffi.org>
parents: 4074
diff changeset
547 if not xhtml:
ba28ca268f4a plugin text syntaxes: fix empty string handling in `_remove_markups`.
Goffi <goffi@goffi.org>
parents: 4074
diff changeset
548 return ""
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
549 cleaned = cleaner.clean_html(html.fromstring(xhtml))
3028
ab2696e34d29 Python 3 port:
Goffi <goffi@goffi.org>
parents: 2878
diff changeset
550 return html.tostring(cleaned, encoding=str, method="text")