diff src/plugins/plugin_misc_text_syntaxes.py @ 832:c4b22aedb7d7

plugin groupblog, XEP-0071, XEP-0277, text_syntaxes: manage raw/rich/xhtml data for content/title: Implementation should follow the following formal specification: "title" and "content" data can be passed in raw, xhtml or rich format. When we receive from a frontend a new/updated microblog item: - keys "title" or "content" have to be escaped (disable HTML tags) - keys "title_rich" or "content_rich" have to be converted from the current syntax to XHTML - keys "title_xhtml" or "content_xhtml" have to be cleaned from unwanted XHTML content Rules to deal with concurrent keys: - existence of both "*_xhtml" and "*_rich" keys must raise an exception - existence of both raw and ("*_xhtml" or "*_rich") is OK As the storage always need raw data, if it is not given by the user it can be extracted from the "*_rich" or "*_xhtml" data (remove the XHTML tags). When a frontend wants to edit a blog post that contains XHTML title or content, the conversion is made from XHTML to the current user-defined syntax. - plugin text_syntaxes: added "text" syntax (using lxml)
author souliane <souliane@mailoo.org>
date Wed, 05 Feb 2014 16:36:51 +0100
parents 1fe00f0c9a91
children 2cc0201b4613
line wrap: on
line diff
--- a/src/plugins/plugin_misc_text_syntaxes.py	Wed Jan 22 17:10:28 2014 +0100
+++ b/src/plugins/plugin_misc_text_syntaxes.py	Wed Feb 05 16:36:51 2014 +0100
@@ -26,6 +26,7 @@
 from sat.core import exceptions
 from lxml import html
 from lxml.html import clean
+from cgi import escape
 import re
 
 
@@ -70,6 +71,7 @@
     OPT_NO_THREAD = "NO_THREAD"
     SYNTAX_XHTML = _SYNTAX_XHTML
     SYNTAX_MARKDOWN = "markdown"
+    SYNTAX_TEXT = "text"
 
     params = """
     <params>
@@ -99,6 +101,7 @@
         self.syntaxes = {}
         self.addSyntax(self.SYNTAX_XHTML, lambda xhtml: defer.succeed(xhtml), lambda xhtml: defer.succeed(xhtml),
                        TextSyntaxes.OPT_NO_THREAD)
+        self.addSyntax(self.SYNTAX_TEXT, lambda text: escape(text), lambda xhtml: self._removeMarkups(xhtml), [TextSyntaxes.OPT_HIDDEN])
         try:
             import markdown, html2text
             self.addSyntax(self.SYNTAX_MARKDOWN, markdown.markdown, html2text.html2text, [TextSyntaxes.OPT_DEFAULT])
@@ -238,3 +241,12 @@
 
         self._updateParamOptions()
 
+    def _removeMarkups(self, xhtml):
+        """
+        Remove XHTML markups from the given string.
+        @param xhtml: the XHTML string to be cleaned
+        @return: the cleaned string
+        """
+        cleaner = clean.Cleaner(kill_tags=['style'])
+        cleaned = cleaner.clean_html(html.fromstring(xhtml))
+        return html.tostring(cleaned, method="text")