annotate src/plugins/plugin_misc_text_syntaxes.py @ 2182:087eec4c6c07

memory (persistent, sqlite): better private values handling + new LazyPersistentBinaryDict: - merged private values method handling in sqlite, and added a keys arguments to get only some keys - LazyPersistentBinaryDict allow to get values in database only when they are needed, saving memory for big data
author Goffi <goffi@goffi.org>
date Sun, 12 Mar 2017 19:33:17 +0100
parents 33c8c4973743
children fe922e6fabd4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1934
2daf7b4c6756 use of /usr/bin/env instead of /usr/bin/python in shebang
Goffi <goffi@goffi.org>
parents: 1867
diff changeset
1 #!/usr/bin/env python2
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
2 # -*- coding: utf-8 -*-
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
3
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
4 # SAT plugin for managing various text syntaxes
1766
d17772b0fe22 copyright update
Goffi <goffi@goffi.org>
parents: 1542
diff changeset
5 # Copyright (C) 2009-2016 Jérôme Poisson (goffi@goffi.org)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
6
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
7 # This program is free software: you can redistribute it and/or modify
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
8 # it under the terms of the GNU Affero General Public License as published by
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
9 # the Free Software Foundation, either version 3 of the License, or
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
10 # (at your option) any later version.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
11
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
12 # This program is distributed in the hope that it will be useful,
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
15 # GNU Affero General Public License for more details.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
16
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
17 # You should have received a copy of the GNU Affero General Public License
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
19
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
20 from sat.core.i18n import _, D_
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
21 from sat.core.constants import Const as C
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
22 from sat.core.log import getLogger
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
23 log = getLogger(__name__)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
24
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
25 from twisted.internet import defer
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
26 from twisted.internet.threads import deferToThread
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
27 from sat.core import exceptions
1542
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
28 try:
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
29 from lxml import html
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
30 from lxml.html import clean
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
31 except ImportError:
94901070478e plugins: added new MissingModule exceptions to plugins using third party modules
Goffi <goffi@goffi.org>
parents: 1458
diff changeset
32 raise exceptions.MissingModule(u"Missing module lxml, please download/install it from http://lxml.de/")
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
33 from cgi import escape
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
34 import re
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
35
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
36
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
37 CATEGORY = D_("Composition")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
38 NAME = "Syntax"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
39 _SYNTAX_XHTML = "XHTML"
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
40 _SYNTAX_CURRENT = "@CURRENT@"
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
41
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
42 # TODO: check/adapt following list
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
43 # list initialy based on feedparser list (http://pythonhosted.org/feedparser/html-sanitization.html)
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
44 STYLES_WHITELIST = ("azimuth", "background-color", "border-bottom-color", "border-collapse", "border-color", "border-left-color", "border-right-color", "border-top-color", "clear", "color", "cursor", "direction", "display", "elevation", "float", "font", "font-family", "font-size", "font-style", "font-variant", "font-weight", "height", "letter-spacing", "line-height", "overflow", "pause", "pause-after", "pause-before", "pitch", "pitch-range", "richness", "speak", "speak-header", "speak-numeral", "speak-punctuation", "speech-rate", "stress", "text-align", "text-decoration", "text-indent", "unicode-bidi", "vertical-align", "voice-family", "volume", "white-space", "width")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
45
1867
47108a4f3a70 plugin text syntaxes: added 'poster' and 'controls' in allowed attributes for cleanXHTML
Goffi <goffi@goffi.org>
parents: 1847
diff changeset
46 SAFE_ATTRS = html.defs.safe_attrs.union(('style', 'poster', 'controls'))
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
47 STYLES_VALUES_REGEX = r'^(' + '|'.join(['([a-z-]+)', # alphabetical names
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
48 '(#[0-9a-f]+)', # hex value
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
49 '(\d+(.\d+)? *(|%|em|ex|px|in|cm|mm|pt|pc))', # values with units (or not)
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
50 'rgb\( *((\d+(.\d+)?), *){2}(\d+(.\d+)?) *\)', # rgb function
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
51 'rgba\( *((\d+(.\d+)?), *){3}(\d+(.\d+)?) *\)', # rgba function
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
52 ]) + ') *(!important)?$' # we accept "!important" at the end
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
53 STYLES_ACCEPTED_VALUE = re.compile(STYLES_VALUES_REGEX)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
54
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
55 PLUGIN_INFO = {
2145
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
56 C.PI_NAME: "Text syntaxes",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
57 C.PI_IMPORT_NAME: "TEXT-SYNTAXES",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
58 C.PI_TYPE: "MISC",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
59 C.PI_PROTOCOLS: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
60 C.PI_DEPENDENCIES: [],
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
61 C.PI_MAIN: "TextSyntaxes",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
62 C.PI_HANDLER: "no",
33c8c4973743 core (plugins): added missing contants + use of new constants in PLUGIN_INFO
Goffi <goffi@goffi.org>
parents: 2106
diff changeset
63 C.PI_DESCRIPTION: _("""Management of various text syntaxes (XHTML-IM, Markdown, etc)""")
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
64 }
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
65
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
66 class UnknownSyntax(Exception):
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
67 pass
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
68
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
69 class TextSyntaxes(object):
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
70 """ Text conversion class
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
71 XHTML utf-8 is used as intermediate language for conversions
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
72 """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
73
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
74 OPT_DEFAULT = "DEFAULT"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
75 OPT_HIDDEN = "HIDDEN"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
76 OPT_NO_THREAD = "NO_THREAD"
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
77 SYNTAX_XHTML = _SYNTAX_XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
78 SYNTAX_MARKDOWN = "markdown"
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
79 SYNTAX_TEXT = "text"
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
80
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
81 params = """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
82 <params>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
83 <individual>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
84 <category name="%(category_name)s" label="%(category_label)s">
968
75f3b3b430ff tools, frontends, memory: param definition and XMLUI handle multi-selection for list widgets:
souliane <souliane@mailoo.org>
parents: 852
diff changeset
85 <param name="%(name)s" label="%(label)s" type="list" security="0">
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
86 %(options)s
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
87 </param>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
88 </category>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
89 </individual>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
90 </params>
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
91 """
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
92
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
93 params_data = {
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
94 'category_name': CATEGORY,
771
bfabeedbf32e core: i18n refactoring:
Goffi <goffi@goffi.org>
parents: 744
diff changeset
95 'category_label': _(CATEGORY),
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
96 'name': NAME,
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
97 'label': _(NAME),
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
98 'syntaxes': {},
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
99 }
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
100
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
101 def __init__(self, host):
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
102 log.info(_("Text syntaxes plugin initialization"))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
103 self.host = host
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
104 self.syntaxes = {}
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
105 self.addSyntax(self.SYNTAX_XHTML, lambda xhtml: defer.succeed(xhtml), lambda xhtml: defer.succeed(xhtml),
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
106 TextSyntaxes.OPT_NO_THREAD)
1826
d80ccf4bf201 plugin blog import dotclear: this plugin import Dotclear 2 backups
Goffi <goffi@goffi.org>
parents: 1811
diff changeset
107 # TODO: text => XHTML should add <a/> to url like in frontends
d80ccf4bf201 plugin blog import dotclear: this plugin import Dotclear 2 backups
Goffi <goffi@goffi.org>
parents: 1811
diff changeset
108 # it's probably best to move sat_frontends.tools.strings to sat.tools.common or similar
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
109 self.addSyntax(self.SYNTAX_TEXT, lambda text: escape(text), lambda xhtml: self._removeMarkups(xhtml), [TextSyntaxes.OPT_HIDDEN])
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
110 try:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
111 import markdown, html2text
841
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
112
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
113 def _html2text(html, baseurl=''):
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
114 h = html2text.HTML2Text(baseurl=baseurl)
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
115 h.body_width = 0 # do not truncate the lines, it breaks the long URLs
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
116 return h.handle(html)
831f208b4ea3 plugin text_syntaxes: html2text was breaking the long URLs
souliane <souliane@mailoo.org>
parents: 836
diff changeset
117 self.addSyntax(self.SYNTAX_MARKDOWN, markdown.markdown, _html2text, [TextSyntaxes.OPT_DEFAULT])
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
118 except ImportError:
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
119 log.warning(u"markdown or html2text not found, can't use Markdown syntax")
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
120 log.info(u"You can download/install them from https://pythonhosted.org/Markdown/ and https://github.com/Alir3z4/html2text/")
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
121 host.bridge.addMethod("syntaxConvert", ".plugin", in_sign='sssbs', out_sign='s',
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
122 async=True, method=self.convert)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
123
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
124 def _updateParamOptions(self):
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
125 data_synt = TextSyntaxes.params_data['syntaxes']
1847
d8c6c55aafd7 plugin text syntaxes: fixed defaut syntax handling
Goffi <goffi@goffi.org>
parents: 1826
diff changeset
126 default_synt = TextSyntaxes.params_data.get('default', _SYNTAX_XHTML)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
127 syntaxes = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
128
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
129 for syntax in data_synt.keys():
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
130 flags = data_synt[syntax]["flags"]
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
131 if TextSyntaxes.OPT_HIDDEN not in flags:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
132 syntaxes.append(syntax)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
133
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
134 syntaxes.sort(key=lambda synt: synt.lower())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
135 options = []
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
136
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
137 for syntax in syntaxes:
1847
d8c6c55aafd7 plugin text syntaxes: fixed defaut syntax handling
Goffi <goffi@goffi.org>
parents: 1826
diff changeset
138 selected = 'selected="true"' if syntax == default_synt else ''
968
75f3b3b430ff tools, frontends, memory: param definition and XMLUI handle multi-selection for list widgets:
souliane <souliane@mailoo.org>
parents: 852
diff changeset
139 options.append(u'<option value="%s" %s/>' % (syntax, selected))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
140
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
141 TextSyntaxes.params_data["options"] = u'\n'.join(options)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
142 self.host.memory.updateParams(TextSyntaxes.params % TextSyntaxes.params_data)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
143
702
a25db3fe3959 plugin XEP-0071: rich messages management for sendMessage
Goffi <goffi@goffi.org>
parents: 699
diff changeset
144 def getCurrentSyntax(self, profile):
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
145 """ Return the selected syntax for the given profile
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
146
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
147 @param profile: %(doc_profile)s
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
148 @return: profile selected syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
149 """
699
e819630c6737 plugin text syntaxes: fixed bad parameter order for getParamA
Goffi <goffi@goffi.org>
parents: 695
diff changeset
150 return self.host.memory.getParamA(NAME, CATEGORY , profile_key=profile)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
151
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
152 def _logError(self, failure, action=u"converting syntax"):
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
153 log.error(u"Error while {action}: {failure}".format(action=action, failure=failure))
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
154 return failure
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
155
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
156 def cleanXHTML(self, xhtml):
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
157 """ Clean XHTML text by removing potentially dangerous/malicious parts
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
158 @param xhtml: raw xhtml text to clean (or lxml's HtmlElement)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
159 """
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
160 def blocking_cleaning(xhtml):
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
161 """ Clean XHTML and style attributes """
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
162
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
163 def clean_style(styles_raw):
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
164 """" Remove styles not in the whitelist,
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
165 or where the value doesn't match the regex """
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
166 styles = styles_raw.split(";")
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
167 cleaned_styles = []
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
168 for style in styles:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
169 try:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
170 key, value = style.split(':')
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
171 except ValueError:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
172 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
173 key = key.lower().strip()
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
174 if key not in STYLES_WHITELIST:
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
175 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
176 value = value.lower().strip()
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
177 if not STYLES_ACCEPTED_VALUE.match(value):
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
178 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
179 if value == "none":
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
180 continue
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
181 cleaned_styles.append((key, value))
1458
832846fefe85 plugin text-syntaxes: minor variable renaming
Goffi <goffi@goffi.org>
parents: 1396
diff changeset
182 return "; ".join(["%s: %s" % (key_, value_) for key_, value_ in cleaned_styles])
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
183
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
184 if isinstance(xhtml, basestring):
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
185 xhtml_elt = html.fromstring(xhtml)
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
186 elif isinstance(xhtml, html.HtmlElement):
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
187 xhtml_elt = xhtml
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
188 else:
993
301b342c697a core: use of the new core.log module:
Goffi <goffi@goffi.org>
parents: 968
diff changeset
189 log.error("Only strings and HtmlElements can be cleaned")
705
6c8a119dcc94 plugin text syntaxes: clean_xhtml now accept lxml's HtmlElement to avoid parsing two times the same xml
Goffi <goffi@goffi.org>
parents: 702
diff changeset
190 raise exceptions.DataError
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
191 cleaner = clean.Cleaner(style=False,
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
192 add_nofollow=False,
692
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
193 safe_attrs=SAFE_ATTRS)
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
194 xhtml_elt = cleaner.clean_html(xhtml_elt)
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
195 for elt in xhtml_elt.xpath("//*[@style]"):
e98db42cd78c plugin text syntaxes: styles sanitisation
Goffi <goffi@goffi.org>
parents: 674
diff changeset
196 elt.set("style", clean_style(elt.get('style')))
852
4cc55e05266d plugin text syntaxes: fixed cleaners encoding
Goffi <goffi@goffi.org>
parents: 841
diff changeset
197 return html.tostring(xhtml_elt, encoding=unicode, method='xml')
674
fb0b1100c908 plugin text_syntaxes: fixed clean_xhml (it now return XHTML instead of HTML)
Goffi <goffi@goffi.org>
parents: 665
diff changeset
198
2106
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
199 d = deferToThread(blocking_cleaning, xhtml)
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
200 d.addErrback(self._logError, action=u"cleaning syntax")
5874da3811b7 plugin text syntaxes: log error on cleanXHTML failure
Goffi <goffi@goffi.org>
parents: 1934
diff changeset
201 return d
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
202
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
203 def convert(self, text, syntax_from, syntax_to=_SYNTAX_XHTML, safe=True, profile=None):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
204 """Convert a text between two syntaxes
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
205
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
206 @param text: text to convert
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
207 @param syntax_from: source syntax (e.g. "markdown")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
208 @param syntax_to: dest syntax (e.g.: "XHTML")
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
209 @param safe: clean resulting XHTML to avoid malicious code if True
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
210 @param profile: needed only when syntax_from or syntax_to is set to _SYNTAX_CURRENT
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
211 @return(unicode): converted text
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
212 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
213 # FIXME: convert should be abled to handle domish.Element directly
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
214 # when dealing with XHTML
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
215 # TODO: a way for parser to return parsing errors/warnings
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
216
744
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
217 if syntax_from == _SYNTAX_CURRENT:
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
218 syntax_from = self.getCurrentSyntax(profile)
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
219 if syntax_to == _SYNTAX_CURRENT:
312a2842b2b8 plugins text-syntaxes: added a default value to use the current user syntax in convert
souliane <souliane@mailoo.org>
parents: 705
diff changeset
220 syntax_to = self.getCurrentSyntax(profile)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
221 syntaxes = TextSyntaxes.params_data['syntaxes']
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
222 if syntax_from not in syntaxes:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
223 raise UnknownSyntax(syntax_from)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
224 if syntax_to not in syntaxes:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
225 raise UnknownSyntax(syntax_to)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
226 d = None
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
227
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
228 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_from]["flags"]:
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
229 d = defer.maybeDeferred(syntaxes[syntax_from]["to"], text)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
230 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
231 d = deferToThread(syntaxes[syntax_from]["to"], text)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
232
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
233 #TODO: keep only body element and change it to a div here ?
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
234
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
235 if safe:
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
236 d.addCallback(self.cleanXHTML)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
237
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
238 if TextSyntaxes.OPT_NO_THREAD in syntaxes[syntax_to]["flags"]:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
239 d.addCallback(syntaxes[syntax_to]["from"])
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
240 else:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
241 d.addCallback(lambda xhtml: deferToThread(syntaxes[syntax_to]["from"], xhtml))
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
242
836
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
243 # converters can add new lines that disturb the microblog change detection
2cc0201b4613 plugin text_syntaxes: rstrip the conversion result to avoid new lines systematically added by converters (e.g. html2text do this)
souliane <souliane@mailoo.org>
parents: 832
diff changeset
244 d.addCallback(lambda text: text.rstrip())
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
245 return d
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
246
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
247 def addSyntax(self, name, to_xhtml_cb, from_xhtml_cb, flags = None):
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
248 """Add a new syntax to the manager
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
249
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
250 @param name: unique name of the syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
251 @param to_xhtml_cb: callback to convert from syntax to XHTML
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
252 @param from_xhtml_cb: callback to convert from XHTML to syntax
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
253 @param flags: set of optional flags, can be:
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
254 TextSyntaxes.OPT_DEFAULT: use as the default syntax (replace former one)
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
255 TextSyntaxes.OPT_HIDDEN: do not show in parameters
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
256 TextSyntaxes.OPT_NO_THREAD: do not defer to thread when converting (the callback may then return a deferred)
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
257 """
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
258 flags = flags if flags is not None else []
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
259 if TextSyntaxes.OPT_HIDDEN in flags and TextSyntaxes.OPT_DEFAULT in flags:
1803
14a97a5fe1c0 plugin text syntaxes: a non blocking syntax callback can now return a unicode directly instead of a Deferred
Goffi <goffi@goffi.org>
parents: 1766
diff changeset
260 raise ValueError(u"{} and {} are mutually exclusive".format(TextSyntaxes.OPT_HIDDEN, TextSyntaxes.OPT_DEFAULT))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
261
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
262 syntaxes = TextSyntaxes.params_data['syntaxes']
1805
3c40fa0dcd7a pluging text syntaxes: various minor improvments:
Goffi <goffi@goffi.org>
parents: 1803
diff changeset
263 if name in syntaxes:
1811
90ae9a2462c2 plugin text syntaxes: fixes typo
souliane <souliane@mailoo.org>
parents: 1805
diff changeset
264 raise exceptions.ConflictError(u"This syntax name already exists: {}".format(name))
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
265 syntaxes[name] = {"to": to_xhtml_cb, "from": from_xhtml_cb, "flags": flags}
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
266 if TextSyntaxes.OPT_DEFAULT in flags:
1847
d8c6c55aafd7 plugin text syntaxes: fixed defaut syntax handling
Goffi <goffi@goffi.org>
parents: 1826
diff changeset
267 TextSyntaxes.params_data['default'] = name
665
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
268
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
269 self._updateParamOptions()
6a64e0a759e6 plugin text syntaxes: this plugin manage rich text syntaxes conversions and cleaning.
Goffi <goffi@goffi.org>
parents:
diff changeset
270
832
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
271 def _removeMarkups(self, xhtml):
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
272 """
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
273 Remove XHTML markups from the given string.
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
274 @param xhtml: the XHTML string to be cleaned
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
275 @return: the cleaned string
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
276 """
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
277 cleaner = clean.Cleaner(kill_tags=['style'])
c4b22aedb7d7 plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title:
souliane <souliane@mailoo.org>
parents: 811
diff changeset
278 cleaned = cleaner.clean_html(html.fromstring(xhtml))
852
4cc55e05266d plugin text syntaxes: fixed cleaners encoding
Goffi <goffi@goffi.org>
parents: 841
diff changeset
279 return html.tostring(cleaned, encoding=unicode, method="text")