annotate sat/plugins/plugin_misc_text_syntaxes.py @ 2766:93a421de0e3d

tools (common/data_objects): metadata parsing in BlogItems: metadata are parsed to deserialise some well known values like rsm index and count, and some properties have been added to have easier access. A "complete" property (or item in metadata) is set to True if we are on the last page, False if we are not, or None if we don't have enough data to know if we are on the last page or not.
author Goffi <goffi@goffi.org>
date Fri, 11 Jan 2019 16:35:13 +0100
parents 56f94936df1e
children 003b8b4b56a7
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1934
2daf7b4c6756 use of /usr/bin/env instead of /usr/bin/python in shebang
Goffi <goffi@goffi.org>
parents: 1867
diff changeset
1 #!/usr/bin/env python2
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
2 # -*- coding: utf-8 -*-
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
3
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
4 # SAT plugin for managing various text syntaxes
2483
0046283a285d dates update
Goffi <goffi@goffi.org>
parents: 2414
diff changeset
5 # Copyright (C) 2009-2018 Jérôme Poisson (goffi@goffi.org)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
6
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
7 # This program is free software: you can redistribute it and/or modify
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
8 # it under the terms of the GNU Affero General Public License as published by
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
9 # the Free Software Foundation, either version 3 of the License, or
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
10 # (at your option) any later version.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
11
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
12 # This program is distributed in the hope that it will be useful,
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
15 # GNU Affero General Public License for more details.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
16
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
17 # You should have received a copy of the GNU Affero General Public License
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
19
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
20 from sat.core.i18n import _, D_
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
21 from sat.core.constants import Const as C
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
22 from sat.core.log import getLogger
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
23
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
24 log = getLogger(__name__)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
25
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
26 from twisted.internet import defer
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
27 from twisted.internet.threads import deferToThread
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
28 from sat.core import exceptions
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
29
1542
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
30 try:
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
31 from lxml import html
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
32 from lxml.html import clean
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
33 except ImportError:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
34 raise exceptions.MissingModule(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
35 u"Missing module lxml, please download/install it from http://lxml.de/"
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
36 )
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
37 from cgi import escape
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
38 import re
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
39
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
40
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
41 CATEGORY = D_("Composition")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
42 NAME = "Syntax"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
43 _SYNTAX_XHTML = "XHTML"
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
44 _SYNTAX_CURRENT = "@CURRENT@"
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
45
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
46 # TODO: check/adapt following list
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
47 # list initialy based on feedparser list (http://pythonhosted.org/feedparser/html-sanitization.html)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
48 STYLES_WHITELIST = (
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
49 "azimuth",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
50 "background-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
51 "border-bottom-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
52 "border-collapse",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
53 "border-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
54 "border-left-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
55 "border-right-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
56 "border-top-color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
57 "clear",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
58 "color",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
59 "cursor",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
60 "direction",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
61 "display",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
62 "elevation",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
63 "float",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
64 "font",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
65 "font-family",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
66 "font-size",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
67 "font-style",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
68 "font-variant",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
69 "font-weight",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
70 "height",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
71 "letter-spacing",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
72 "line-height",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
73 "overflow",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
74 "pause",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
75 "pause-after",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
76 "pause-before",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
77 "pitch",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
78 "pitch-range",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
79 "richness",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
80 "speak",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
81 "speak-header",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
82 "speak-numeral",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
83 "speak-punctuation",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
84 "speech-rate",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
85 "stress",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
86 "text-align",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
87 "text-decoration",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
88 "text-indent",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
89 "unicode-bidi",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
90 "vertical-align",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
91 "voice-family",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
92 "volume",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
93 "white-space",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
94 "width",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
95 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
96
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
97 SAFE_ATTRS = html.defs.safe_attrs.union(("style", "poster", "controls"))
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
98 STYLES_VALUES_REGEX = (
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
99 r"^("
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
100 + "|".join(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
101 [
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
102 "([a-z-]+)", # alphabetical names
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
103 "(#[0-9a-f]+)", # hex value
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
104 "(\d+(.\d+)? *(|%|em|ex|px|in|cm|mm|pt|pc))", # values with units (or not)
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
105 "rgb\( *((\d+(.\d+)?), *){2}(\d+(.\d+)?) *\)", # rgb function
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
106 "rgba\( *((\d+(.\d+)?), *){3}(\d+(.\d+)?) *\)", # rgba function
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
107 ]
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
108 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
109 + ") *(!important)?$"
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
110 ) # we accept "!important" at the end
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
111 STYLES_ACCEPTED_VALUE = re.compile(STYLES_VALUES_REGEX)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
112
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
113 PLUGIN_INFO = {
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
114 C.PI_NAME: "Text syntaxes",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
115 C.PI_IMPORT_NAME: "TEXT-SYNTAXES",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
116 C.PI_TYPE: "MISC",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
117 C.PI_PROTOCOLS: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
118 C.PI_DEPENDENCIES: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
119 C.PI_MAIN: "TextSyntaxes",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
120 C.PI_HANDLER: "no",
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
121 C.PI_DESCRIPTION: _(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
122 """Management of various text syntaxes (XHTML-IM, Markdown, etc)"""
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
123 ),
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
124 }
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
125
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
126
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
127 class TextSyntaxes(object):
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
128 """ Text conversion class
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
129 XHTML utf-8 is used as intermediate language for conversions
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
130 """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
131
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
132 OPT_DEFAULT = "DEFAULT"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
133 OPT_HIDDEN = "HIDDEN"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
134 OPT_NO_THREAD = "NO_THREAD"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
135 SYNTAX_XHTML = _SYNTAX_XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
136 SYNTAX_MARKDOWN = "markdown"
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
137 SYNTAX_TEXT = "text"
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
138 syntaxes = {}
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
139 default_syntax = SYNTAX_XHTML
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
140
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
141 params = """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
142 <params>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
143 <individual>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
144 <category name="%(category_name)s" label="%(category_label)s">
968
75f3b3b430ff tools, frontends, memory: param definition and XMLUI handle multi-selection for list widgets:
souliane <souliane@mailoo.org>
parents: 852
diff changeset
145 <param name="%(name)s" label="%(label)s" type="list" security="0">
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
146 %(options)s
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
147 </param>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
148 </category>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
149 </individual>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
150 </params>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
151 """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
152
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
153 params_data = {
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
154 "category_name": CATEGORY,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
155 "category_label": _(CATEGORY),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
156 "name": NAME,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
157 "label": _(NAME),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
158 "syntaxes": syntaxes,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
159 }
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
160
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
161 def __init__(self, host):
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
162 log.info(_("Text syntaxes plugin initialization"))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
163 self.host = host
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
164 self.addSyntax(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
165 self.SYNTAX_XHTML,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
166 lambda xhtml: defer.succeed(xhtml),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
167 lambda xhtml: defer.succeed(xhtml),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
168 TextSyntaxes.OPT_NO_THREAD,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
169 )
1826
d80ccf4bf201 plugin blog import dotclear: this plugin import Dotclear 2 backups
Goffi <goffi@goffi.org>
parents: 1811
diff changeset
170 # TODO: text => XHTML should add <a/> to url like in frontends
d80ccf4bf201 plugin blog import dotclear: this plugin import Dotclear 2 backups
Goffi <goffi@goffi.org>
parents: 1811
diff changeset
171 # it's probably best to move sat_frontends.tools.strings to sat.tools.common or similar
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
172 self.addSyntax(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
173 self.SYNTAX_TEXT,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
174 lambda text: escape(text),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
175 lambda xhtml: self._removeMarkups(xhtml),
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
176 [TextSyntaxes.OPT_HIDDEN],
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
177 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
178 try:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
179 import markdown, html2text
841
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
180
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
181 def _html2text(html, baseurl=""):
841
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
182 h = html2text.HTML2Text(baseurl=baseurl)
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
183 h.body_width = 0 # do not truncate the lines, it breaks the long URLs
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
184 return h.handle(html)
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
185
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
186 self.addSyntax(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
187 self.SYNTAX_MARKDOWN,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
188 markdown.markdown,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
189 _html2text,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
190 [TextSyntaxes.OPT_DEFAULT],
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
191 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
192 except ImportError:
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
193 log.warning(u"markdown or html2text not found, can't use Markdown syntax")
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
194 log.info(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
195 u"You can download/install them from https://pythonhosted.org/Markdown/ and https://github.com/Alir3z4/html2text/"
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
196 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
197 host.bridge.addMethod(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
198 "syntaxConvert",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
199 ".plugin",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
200 in_sign="sssbs",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
201 out_sign="s",
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
202 async=True,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
203 method=self.convert,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
204 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
205 host.bridge.addMethod(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
206 "syntaxGet", ".plugin", in_sign="s", out_sign="s", method=self.getSyntax
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
207 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
208
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
209 def _updateParamOptions(self):
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
210 data_synt = TextSyntaxes.syntaxes
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
211 default_synt = TextSyntaxes.default_syntax
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
212 syntaxes = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
213
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
214 for syntax in data_synt.keys():
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
215 flags = data_synt[syntax]["flags"]
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
216 if TextSyntaxes.OPT_HIDDEN not in flags:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
217 syntaxes.append(syntax)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
218
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
219 syntaxes.sort(key=lambda synt: synt.lower())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
220 options = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
221
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
222 for syntax in syntaxes:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
223 selected = 'selected="true"' if syntax == default_synt else ""
968
75f3b3b430ff tools, frontends, memory: param definition and XMLUI handle multi-selection for list widgets:
souliane <souliane@mailoo.org>
parents: 852
diff changeset
224 options.append(u'<option value="%s" %s/>' % (syntax, selected))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
225
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
226 TextSyntaxes.params_data["options"] = u"\n".join(options)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
227 self.host.memory.updateParams(TextSyntaxes.params % TextSyntaxes.params_data)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
228
702
a25db3fe3959 plugin XEP-0071: rich messages management for sendMessage
Goffi <goffi@goffi.org>
parents: 699
diff changeset
229 def getCurrentSyntax(self, profile):
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
230 """ Return the selected syntax for the given profile
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
231
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
232 @param profile: %(doc_profile)s
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
233 @return: profile selected syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
234 """
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
235 return self.host.memory.getParamA(NAME, CATEGORY, profile_key=profile)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
236
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
237 def _logError(self, failure, action=u"converting syntax"):
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
238 log.error(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
239 u"Error while {action}: {failure}".format(action=action, failure=failure)
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
240 )
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
241 return failure
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
242
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
243 def cleanXHTML(self, xhtml):
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
244 """ Clean XHTML text by removing potentially dangerous/malicious parts
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
245 @param xhtml: raw xhtml text to clean (or lxml's HtmlElement)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
246 """
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
247
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
248 def blocking_cleaning(xhtml):
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
249 """ Clean XHTML and style attributes """
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
250
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
251 def clean_style(styles_raw):
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
252 """" Remove styles not in the whitelist,
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
253 or where the value doesn't match the regex """
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
254 styles = styles_raw.split(";")
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
255 cleaned_styles = []
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
256 for style in styles:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
257 try:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
258 key, value = style.split(":")
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
259 except ValueError:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
260 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
261 key = key.lower().strip()
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
262 if key not in STYLES_WHITELIST:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
263 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
264 value = value.lower().strip()
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
265 if not STYLES_ACCEPTED_VALUE.match(value):
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
266 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
267 if value == "none":
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
268 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
269 cleaned_styles.append((key, value))
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
270 return "; ".join(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
271 ["%s: %s" % (key_, value_) for key_, value_ in cleaned_styles]
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
272 )
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
273
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
274 if isinstance(xhtml, basestring):
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
275 xhtml_elt = html.fromstring(xhtml)
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
276 elif isinstance(xhtml, html.HtmlElement):
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
277 xhtml_elt = xhtml
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
278 else:
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
279 log.error("Only strings and HtmlElements can be cleaned")
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
280 raise exceptions.DataError
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
281 cleaner = clean.Cleaner(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
282 style=False, add_nofollow=False, safe_attrs=SAFE_ATTRS
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
283 )
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
284 xhtml_elt = cleaner.clean_html(xhtml_elt)
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
285 for elt in xhtml_elt.xpath("//*[@style]"):
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
286 elt.set("style", clean_style(elt.get("style")))
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
287 return html.tostring(xhtml_elt, encoding=unicode, method="xml")
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
288
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
289 d = deferToThread(blocking_cleaning, xhtml)
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
290 d.addErrback(self._logError, action=u"cleaning syntax")
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
291 return d
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
292
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
293 def convert(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
294 self, text, syntax_from, syntax_to=_SYNTAX_XHTML, safe=True, profile=None
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
295 ):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
296 """Convert a text between two syntaxes
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
297
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
298 @param text: text to convert
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
299 @param syntax_from: source syntax (e.g. "markdown")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
300 @param syntax_to: dest syntax (e.g.: "XHTML")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
301 @param safe: clean resulting XHTML to avoid malicious code if True
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
302 @param profile: needed only when syntax_from or syntax_to is set to _SYNTAX_CURRENT
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
303 @return(unicode): converted text
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
304 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
305 # FIXME: convert should be abled to handle domish.Element directly
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
306 # when dealing with XHTML
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
307 # TODO: a way for parser to return parsing errors/warnings
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
308
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
309 if syntax_from == _SYNTAX_CURRENT:
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
310 syntax_from = self.getCurrentSyntax(profile)
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
311 else:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
312 syntax_from = syntax_from.lower().strip()
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
313 if syntax_to == _SYNTAX_CURRENT:
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
314 syntax_to = self.getCurrentSyntax(profile)
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
315 else:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
316 syntax_to = syntax_to.lower().strip()
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
317 syntaxes = TextSyntaxes.syntaxes
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
318 if syntax_from not in syntaxes:
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
319 raise exceptions.NotFound(syntax_from)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
320 if syntax_to not in syntaxes:
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
321 raise exceptions.NotFound(syntax_to)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
322 d = None
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
323
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
324 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_from]["flags"]:
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
325 d = defer.maybeDeferred(syntaxes[syntax_from]["to"], text)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
326 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
327 d = deferToThread(syntaxes[syntax_from]["to"], text)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
328
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
329 # TODO: keep only body element and change it to a div here ?
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
330
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
331 if safe:
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
332 d.addCallback(self.cleanXHTML)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
333
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
334 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_to]["flags"]:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
335 d.addCallback(syntaxes[syntax_to]["from"])
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
336 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
337 d.addCallback(lambda xhtml: deferToThread(syntaxes[syntax_to]["from"], xhtml))
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
338
836
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
339 # converters can add new lines that disturb the microblog change detection
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
340 d.addCallback(lambda text: text.rstrip())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
341 return d
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
342
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
343 def addSyntax(self, name, to_xhtml_cb, from_xhtml_cb, flags=None):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
344 """Add a new syntax to the manager
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
345
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
346 @param name: unique name of the syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
347 @param to_xhtml_cb: callback to convert from syntax to XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
348 @param from_xhtml_cb: callback to convert from XHTML to syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
349 @param flags: set of optional flags, can be:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
350 TextSyntaxes.OPT_DEFAULT: use as the default syntax (replace former one)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
351 TextSyntaxes.OPT_HIDDEN: do not show in parameters
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
352 TextSyntaxes.OPT_NO_THREAD: do not defer to thread when converting (the callback may then return a deferred)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
353 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
354 flags = flags if flags is not None else []
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
355 if TextSyntaxes.OPT_HIDDEN in flags and TextSyntaxes.OPT_DEFAULT in flags:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
356 raise ValueError(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
357 u"{} and {} are mutually exclusive".format(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
358 TextSyntaxes.OPT_HIDDEN, TextSyntaxes.OPT_DEFAULT
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
359 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
360 )
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
361
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
362 syntaxes = TextSyntaxes.syntaxes
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
363 key = name.lower().strip()
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
364 if key in syntaxes:
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
365 raise exceptions.ConflictError(
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
366 u"This syntax key already exists: {}".format(key)
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
367 )
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
368 syntaxes[key] = {
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
369 "name": name,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
370 "to": to_xhtml_cb,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
371 "from": from_xhtml_cb,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
372 "flags": flags,
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
373 }
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
374 if TextSyntaxes.OPT_DEFAULT in flags:
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
375 TextSyntaxes.default_syntaxe = key
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
376
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
377 self._updateParamOptions()
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
378
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
379 def getSyntax(self, name):
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
380 """get syntax key corresponding to a name
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
381
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
382 @raise exceptions.NotFound: syntax doesn't exist
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
383 """
2324
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
384 key = name.lower().strip()
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
385 if key in self.syntaxes:
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
386 return key
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
387 raise exceptions.NotFound
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
388
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
389 def _removeMarkups(self, xhtml):
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
390 """Remove XHTML markups from the given string.
fe922e6fabd4 plugin text syntaxes: various improvments:
Goffi <goffi@goffi.org>
parents: 2145
diff changeset
391
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
392 @param xhtml: the XHTML string to be cleaned
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
393 @return: the cleaned string
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
394 """
2624
56f94936df1e code style reformatting using black
Goffi <goffi@goffi.org>
parents: 2562
diff changeset
395 cleaner = clean.Cleaner(kill_tags=["style"])
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
396 cleaned = cleaner.clean_html(html.fromstring(xhtml))
852
4cc55e05266d plugin text syntaxes: fixed cleaners encoding
Goffi <goffi@goffi.org>
parents: 841
diff changeset
397 return html.tostring(cleaned, encoding=unicode, method="text")