This tests the middleware, using a basically static site and applying themes.

First we'll setup the site, using urlmap:

    >>> from paste.urlmap import URLMap
    >>> from webob import Request, Response
    >>> app = URLMap()

A theme:

    >>> app['/theme.html'] = Response('''\
    ... <html>
    ...  <head>
    ...   <title>This is a theme title</title>
    ...   <link rel=Stylesheet type="text/css" href="style.css">
    ...   <style type="text/css">
    ...     @import "style2.css";
    ...   </style>
    ...  </head>
    ...  <body>
    ... 
    ...   <div id="header" class="title-bar">
    ...     <h1 id="title">This is the theme title</h1>
    ...     <div class="topnav"></div>
    ...   </div>
    ...   <div id="content-wrapper">
    ...     <a name="top"></a>
    ...     <div id="content">
    ...       This content will be replaced.
    ...     </div>
    ...     <a href="#top">Back to top</a>
    ...   </div>
    ... 
    ...   <div id="footer">
    ...     <span id="copyright">Copyright (C)</span> 2000 Some Corporation
    ...   </div>
    ... 
    ...  </body>
    ... </html>''')

The rule xml:

    >>> rule_xml = '''\
    ... <ruleset>
    ...   <match path="/blog" class="blog" />
    ...   <match path="exact:/about.html" class="breakout" />
    ...   <match request-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match response-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match environ="wsgi.url_scheme: https" class="via-https" />
    ...   <theme href="/theme.html" />
    ...   <match path="exact:/magic" class="magic" />
    ...   <rule class="magic">
    ...     
    ...   </rule>
    ...   <rule path="/foo">
    ...     <drop content="#badstuff" />
    ...   </rule>
    ...   <rule class="default">
    ...     <replace content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content" nocontent="abort" />
    ...     <prepend content="elements:/html/head/meta" theme="children:/html/head" nocontent="ignore" />  
    ...   </rule>
    ...   <rule class="breakout">
    ...     <replace content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content-wrapper" nocontent="abort" />
    ...   </rule>
    ...   <rule class="blog">
    ...     <drop theme="#copyright" if-content="#cc" />
    ...     <drop theme="tag:#copyright" notheme="ignore" />
    ...     <drop content="#cc" nocontent="ignore" />
    ...     <replace content="children:#content" theme="children:#content" nocontent="abort" />
    ...   </rule>
    ... </ruleset>'''

Rule files can be published and fetched with a subrequest:

    >>> app['/mytheme/rules.xml'] = Response(rule_xml, content_type="application/xml")

Rule files can also be read directly from the filesystem. Here's one:

    >>> import tempfile
    >>> rule_filename_pos, rule_filename = tempfile.mkstemp()
    >>> f = open(rule_filename, 'w+')
    >>> f.write(rule_xml)
    >>> f.close()

Now let's set up some pages for Deliverance to work with:

    >>> app['/blog/index.html'] = Response('''\
    ... <html><head><title>A blog post</title>
    ... <link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed" />
    ... </head>
    ... <body>
    ... Some junk
    ... <div id="content">the blog post <b>with some style</b></div>
    ... some more junk
    ... <div id="footer">a footer that will be ignored</div>
    ... <div id="cc">Creative Commons License</div>
    ... </body></html>
    ... ''')
    >>> app['/about.html'] = Response('''\
    ... <html><title>About this site</title></html>
    ... <body>
    ... This is all about this site.
    ... <div id="footer">a footer that will be ignored</div>
    ... </body></html>
    ... ''')
    >>> app['/magic'] = Response('''\
    ... <html><head></head><body>A simple page</body></html>''')
    >>> app['/magic'].headers['x-no-deliverate'] = '1'
    >>> app['/magic2'] = Response('''\
    ... <html><head><meta http-equiv="x-no-deliverate" content="1" /></head><body>something</body></html>''')
    >>> app['/foo'] = Response('''\
    ... <html><body>
    ...     <div id="goodstuff">Good!</div>
    ...     <div id="badstuff">Bad.</div></body></html>''')

We'll set up one DeliveranceMiddleware using the published rules, and
another using the rule file:

    >>> from deliverance.middleware import DeliveranceMiddleware, FileRuleGetter, SubrequestRuleGetter
    >>> from deliverance.log import PrintingLogger
    >>> import logging
    >>> deliv_filename = DeliveranceMiddleware(app, FileRuleGetter(rule_filename),
    ...                                     PrintingLogger,
    ...                                     log_factory_kw=dict(print_level=logging.WARNING))
    >>> deliv_url = DeliveranceMiddleware(app, SubrequestRuleGetter('http://localhost/mytheme/rules.xml'),
    ...                               PrintingLogger,
    ...                               log_factory_kw=dict(print_level=logging.WARNING))

Now let's look at some plain content and its deliverated equivalent. Here's
a helper function to make the content easy to compare:

    >>> def compare_request(path, deliv):
    ...     # work around WebOb bug fixed here:
    ...     # http://bitbucket.org/ianb/webob/changeset/b8671bd53cf4/
    ...     app[path].body = app[path].body
    ...
    ...     raw_res = Request.blank(path).get_response(app)
    ...     result = 'Original content:\n' + raw_res.body.strip()
    ...     themed_res = Request.blank(path).get_response(deliv)
    ...     result += '\nThemed content:\n' + themed_res.body.strip()
    ...     return result

First we'll look at the blog, fairly simple:

    >>> print compare_request('/blog/index.html', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    <html><head><title>A blog post</title>
    <link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed" />
    </head>
    <body>
    Some junk
    <div id="content">the blog post <b>with some style</b></div>
    some more junk
    <div id="footer">a footer that will be ignored</div>
    <div id="cc">Creative Commons License</div>
    </body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed"><title>A blog post</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top"></a>
        <div id="content">the blog post <b>with some style</b></div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">
         2000 Some Corporation
      </div>
    <BLANKLINE>
     </body></html>

Should be the same in both cases:

    >>> first = compare_request('/blog/index.html', deliv_filename)
    >>> second = compare_request('/blog/index.html', deliv_url)
    >>> first == second
    True

Now the about page, with its breakout style:

    >>> print compare_request('/about.html', deliv_url) # doctest: +REPORT_UDIFF
    Original content:
    <html><title>About this site</title></html>
    <body>
    This is all about this site.
    <div id="footer">a footer that will be ignored</div>
    </body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><title>About this site</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
    This is all about this site.
    </div>
    <BLANKLINE>
      <div id="footer">a footer that will be ignored</div>
    <BLANKLINE>
     </body></html>

Now the magic response, which shouldn't get themed at all:

    >>> print compare_request('/magic', deliv_filename)
    Original content:
    <html><head></head><body>A simple page</body></html>
    Themed content:
    <html><head></head><body>A simple page</body></html>
    >>> print compare_request('/magic2', deliv_url)
    Original content:
    <html><head><meta http-equiv="x-no-deliverate" content="1" /></head><body>something</body></html>
    Themed content:
    <html><head><meta http-equiv="x-no-deliverate" content="1" /></head><body>something</body></html>

Deliverance should not blow up if the content response is empty:

    >>> app['/empty'] = Response('')
    >>> print compare_request('/empty', deliv_filename)
    Original content:
    <BLANKLINE>
    Themed content:
    <BLANKLINE>

Let's also make sure Deliverance correctly preserves HTML entities:

    >>> app['/html'] = Response("One &hellip; two")
    >>> print compare_request('/html', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    One &hellip; two
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><title>This is a theme title</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top"></a>
        <div id="content"><p>One &#8230; two</p></div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">
        <span id="copyright">Copyright (C)</span> 2000 Some Corporation
      </div>
    <BLANKLINE>
     </body></html>

When you are using a rule file from the filesystem, it will not be re-read
on every request by default. So if we change the rules on the filesystem
and make a request through the existing DeliveranceMiddleware, there will be
no change.

Here we'll remove the theming rules for the blog:

    >>> new_rule_xml = '''\
    ... <ruleset>
    ...   <match path="exact:/about.html" class="breakout" />
    ...   <match request-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match response-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match environ="wsgi.url_scheme: https" class="via-https" />
    ...   <theme href="/theme.html" />
    ...   <rule path="/foo">
    ...     <drop content="#badstuff" />
    ...   </rule>
    ...   <rule class="default">
    ...     <replace content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content" nocontent="abort" />
    ...     <prepend content="elements:/html/head/meta" theme="children:/html/head" nocontent="ignore" />  
    ...   </rule>
    ...   <rule class="breakout">
    ...     <replace content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content-wrapper" nocontent="abort" />
    ...   </rule>
    ... </ruleset>'''
    >>> f = open(rule_filename, 'w+')
    >>> f.write(new_rule_xml)
    >>> f.close()

    >>> print compare_request('/blog/index.html', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    <html><head><title>A blog post</title>
    <link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed" />
    </head>
    <body>
    Some junk
    <div id="content">the blog post <b>with some style</b></div>
    some more junk
    <div id="footer">a footer that will be ignored</div>
    <div id="cc">Creative Commons License</div>
    </body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed"><title>A blog post</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top"></a>
        <div id="content">the blog post <b>with some style</b></div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">
         2000 Some Corporation
      </div>
    <BLANKLINE>
     </body></html>

However, if we set always_reload=True in the FileRuleGetter, the rules
will be re-read from the file on every request:

    >>> f = open(rule_filename, 'w+')
    >>> f.write(rule_xml)
    >>> f.close()
    >>> deliv_filename = DeliveranceMiddleware(app,
    ...                                     FileRuleGetter(rule_filename, always_reload=True),
    ...                                     PrintingLogger,
    ...                                     log_factory_kw=dict(print_level=logging.WARNING))

    >>> print compare_request('/blog/index.html', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    <html><head><title>A blog post</title>
    <link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed" />
    </head>
    <body>
    Some junk
    <div id="content">the blog post <b>with some style</b></div>
    some more junk
    <div id="footer">a footer that will be ignored</div>
    <div id="cc">Creative Commons License</div>
    </body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed"><title>A blog post</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top"></a>
        <div id="content">the blog post <b>with some style</b></div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">
         2000 Some Corporation
      </div>
    <BLANKLINE>
     </body></html>

    >>> f = open(rule_filename, 'w+')
    >>> f.write(new_rule_xml)
    >>> f.close()

    >>> print compare_request('/blog/index.html', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    <html><head><title>A blog post</title>
    <link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed" />
    </head>
    <body>
    Some junk
    <div id="content">the blog post <b>with some style</b></div>
    some more junk
    <div id="footer">a footer that will be ignored</div>
    <div id="cc">Creative Commons License</div>
    </body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><link href="rss.xml" rel="alternate" type="application/rss+xml" title="RSS Feed"><title>A blog post</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top"></a>
        <div id="content">
    Some junk
    <div id="content">the blog post <b>with some style</b></div>
    some more junk
    <div id="cc">Creative Commons License</div>
    </div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">a footer that will be ignored</div>
    <BLANKLINE>
     </body></html>

The content's DOCTYPE should be respected. So if the content's DOCTYPE is XHTML,
the merged output should preserve that DOCTYPE, and self-closing tags should be
preserved rather than being rewritten as unclosed tags. Let's see:

    >>> app['/magic'].body = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    ... "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    ... <html><head>
    ... <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    ... <meta http-equiv="refresh" content="5" />
    ... </head><body> 
    ... <img src="foo.png" /> 
    ... A simple page</body></html>"""
    >>> del app['/magic'].headers['x-no-deliverate']

Now let's see what happens when Deliverance is done with it:

    >>> print compare_request('/magic', deliv_filename) # doctest: +REPORT_UDIFF
    Original content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html><head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    <meta http-equiv="refresh" content="5" />
    </head><body> 
    <img src="foo.png" /> 
    A simple page</body></html>
    Themed content:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /><meta http-equiv="refresh" content="5" /><title>This is a theme title</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css" /><style type="text/css">
        @import "http://localhost/style2.css";
      </style></head><body>
    <BLANKLINE>
      <div id="header" class="title-bar">
        <h1 id="title">This is the theme title</h1>
        <div class="topnav"></div>
      </div>
      <div id="content-wrapper">
        <a name="top" id="top"></a>
        <div id="content"> 
    <img src="foo.png" /> 
    A simple page</div>
        <a href="#top">Back to top</a>
      </div>
    <BLANKLINE>
      <div id="footer">
        <span id="copyright">Copyright (C)</span> 2000 Some Corporation
      </div>
    <BLANKLINE>
     </body></html>

It worked. There's a new id="top" attribute on the <a name="top"> tag;
it's required for XHTML: http://www.w3.org/TR/xhtml1/#h-4.10

Test that rule matches work:

   >>> print compare_request('/foo', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><body>
       <div id="goodstuff">Good!</div>
       <div id="badstuff">Bad.</div></body></html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><title>This is a theme title</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
       @import "http://localhost/style2.css";
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       <div id="goodstuff">Good!</div>
       </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       <span id="copyright">Copyright (C)</span> 2000 Some Corporation
     </div>
   <BLANKLINE>
    </body></html>

Test that HTML comments inside SCRIPT and STYLE tags aren't escaped:

   >>> app['/scriptcomments'] = Response('''\
   ... <html><head>
   ...  <style type="text/css" media="all"><!-- @import url( http://localhost:8080/testplonesite/content_types.css); --></style>
   ...  </head><body>
   ...     foo
   ...  </body>
   ... </html>''')

   >>> print compare_request('/scriptcomments', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><head>
    <style type="text/css" media="all"><!-- @import url( http://localhost:8080/testplonesite/content_types.css); --></style>
    </head><body>
       foo
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><style type="text/css" media="all"><!-- @import url( http://localhost:8080/testplonesite/content_types.css); --></style><title>This is a theme title</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css">
       @import "http://localhost/style2.css";
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       foo
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       <span id="copyright">Copyright (C)</span> 2000 Some Corporation
     </div>
   <BLANKLINE>
    </body></html>

Works fine; what about XHTML?

   >>> app['/scriptcomments'] = Response('''\
   ... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   ... <html><head>
   ...  <style type="text/css" media="all"><!-- @import url( http://localhost:8080/testplonesite/content_types.css); --></style>
   ...  </head><body>
   ...     foo
   ...  </body>
   ... </html>''')

   >>> print compare_request('/scriptcomments', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html><head>
    <style type="text/css" media="all"><!-- @import url( http://localhost:8080/testplonesite/content_types.css); --></style>
    </head><body>
       foo
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ASCII" /><style type="text/css" media="all">&lt;!-- @import url( http://localhost:8080/testplonesite/content_types.css); --&gt;</style><title>This is a theme title</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css" /><style type="text/css">
       @import "http://localhost/style2.css";
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top" id="top"></a>
       <div id="content">
       foo
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       <span id="copyright">Copyright (C)</span> 2000 Some Corporation
     </div>
   <BLANKLINE>
    </body></html>

CDATA sections in XHTML documents should be preserved! lxml has a tendency to escape the angle brackets
that start and end a CDATA section in XHTML documents (but not HTML) -- so Deliverance will munge the 
markers that start and end CDATA sections before passing the documents to lxml, and then unmunge them
after getting a merged string back from lxml. Let's make sure they're properly preserved in both theme
and content documents:

   >>> app['/cdata'] = Response('''
   ... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   ... <html xmlns="http://www.w3.org/1999/xhtml">
   ...     <head>
   ...         <title></title>
   ...     </head>
   ...     <body><script type="text/javascript">
   ...         <![CDATA[
   ...                 ... unescaped script content ...
   ...                 ]]>
   ...                 </script>
   ...                 <img src="http://foo.jpg"/>
   ...     </body>
   ... </html>
   ... ''')

   >>> app['/theme.html'] = Response('''\
   ... <html>
   ...  <head>
   ...   <title>This is a theme title</title>
   ...   <link rel=Stylesheet type="text/css" href="style.css">
   ...   <style type="text/css"><![CDATA[
   ...     @import "style2.css";
   ...     ]]>
   ...   </style>
   ...  </head>
   ...  <body>
   ... 
   ...   <div id="header" class="title-bar">
   ...     <h1 id="title">This is the theme title</h1>
   ...     <div class="topnav"></div>
   ...   </div>
   ...   <div id="content-wrapper">
   ...     <a name="top"></a>
   ...     <div id="content">
   ...       This content will be replaced.
   ...     </div>
   ...     <a href="#top">Back to top</a>
   ...   </div>
   ... 
   ...   <div id="footer">
   ...     <span id="copyright">Copyright (C)</span> 2000 Some Corporation
   ...   </div>
   ... 
   ...  </body>
   ... </html>''')

   >>> print compare_request('/cdata', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml">
       <head>
           <title></title>
       </head>
       <body><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script>
                   <img src="http://foo.jpg"/>
       </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ASCII" /><title></title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css" /><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top" id="top"></a>
       <div id="content"><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script><img src="http://foo.jpg" /></div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       <span id="copyright">Copyright (C)</span> 2000 Some Corporation
     </div>
   <BLANKLINE>
    </body></html>

We should also see what happens to CDATA sections merged in from external
content documents, when the `href` attribute  is used in a rule action:

    >>> new_rule_xml = '''\
    ... <ruleset>
    ...   <match path="exact:/about.html" class="breakout" />
    ...   <match request-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match response-header="X-No-Deliverate: boolean:true" abort="1" />
    ...   <match environ="wsgi.url_scheme: https" class="via-https" />
    ...   <theme href="/theme.html" />
    ...   <rule path="/foo">
    ...     <drop content="#badstuff" />
    ...   </rule>
    ...   <rule class="default">
    ...     <replace href="/newfooter" content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content" nocontent="abort" />
    ...     <prepend content="elements:/html/head/meta" theme="children:/html/head" nocontent="ignore" />  
    ...   </rule>
    ...   <rule class="breakout">
    ...     <replace content="children:#footer" theme="children:#footer" nocontent="ignore" />
    ...     <replace content="children:body" theme="children:#content-wrapper" nocontent="abort" />
    ...   </rule>
    ... </ruleset>'''
    >>> f = open(rule_filename, 'w+')
    >>> f.write(new_rule_xml)
    >>> f.close()

   >>> app['/newfooter'] = Response('''\
   ... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   ... <html>
   ...  <body>
   ...   <div id="footer">
   ...     foo
   ...     <![CDATA[
   ...             some unescaped script content in the footer
   ...             ]]>
   ...   </div>
   ...  </body>
   ... </html>''')
    
   >>> print compare_request('/cdata', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml">
       <head>
           <title></title>
       </head>
       <body><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script>
                   <img src="http://foo.jpg"/>
       </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ASCII" /><title></title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css" /><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top" id="top"></a>
       <div id="content"><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script><img src="http://foo.jpg" /></div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>

Note that there's a small chance of false positives, if a document happens to
contain the markers that we use internally for the CDATA start and end:

   >>> app['/newfooter'] = Response('''\
   ... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   ... <html>
   ...  <body>
   ...   <div id="footer">
   ...     foo
   ...     __START_CDATA__
   ...             some unescaped script content in the footer
   ...             __END_CDATA__
   ...   </div>
   ...  </body>
   ... </html>''')
    
   >>> print compare_request('/cdata', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml">
       <head>
           <title></title>
       </head>
       <body><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script>
                   <img src="http://foo.jpg"/>
       </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ASCII" /><title></title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css" /><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top" id="top"></a>
       <div id="content"><script type="text/javascript">
           <![CDATA[
                   ... unescaped script content ...
                   ]]>
                   </script><img src="http://foo.jpg" /></div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>

lxml will properly parse html documents only if the meta tag with charset
declaration occurs before any chars outside ASCII (per the HTML spec). To
play nicely with content that breaks that assumption, Deliverance will move
the charset declaration before passing the document to lxml, to make sure
the resulting content isn't mangled.

   >>> app['/reddot'] = Response('''\
   ... <html><head>
   ...  <title>\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e</title>
   ...  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   ...  </head><body>
   ...     \xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e
   ...  </body>
   ... </html>''')

   >>> print compare_request('/reddot', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><head>
    <title>日本語</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head><body>
       日本語
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>&#26085;&#26412;&#35486;</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       &#26085;&#26412;&#35486;
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>

   >>> app['/reddot'] = Response('''\
   ... <html><head>
   ...  <title>\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e</title>
   ...  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></meta>
   ...  </head><body>
   ...     \xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e
   ...  </body>
   ... </html>''')

   >>> print compare_request('/reddot', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><head>
    <title>日本語</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></meta>
    </head><body>
       日本語
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>&#26085;&#26412;&#35486;</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       &#26085;&#26412;&#35486;
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>

Let's test that for the theme document also. We'll put the Japanese title and misplaced
charset declaration in the theme instead of the content. The resulting title should be
the correct HTML sequence:

   >>> app['/reddot'] = Response('''\
   ... <html><head>
   ...  </head><body>
   ...     foo
   ...  </body>
   ... </html>''')

   >>> app['/theme.html'] = Response('''\
   ... <html>
   ...  <head>
   ...   <title>\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e</title>
   ...   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   ...   <link rel=Stylesheet type="text/css" href="style.css">
   ...   <style type="text/css"><![CDATA[
   ...     @import "style2.css";
   ...     ]]>
   ...   </style>
   ...  </head>
   ...  <body>
   ... 
   ...   <div id="header" class="title-bar">
   ...     <h1 id="title">This is the theme title</h1>
   ...     <div class="topnav"></div>
   ...   </div>
   ...   <div id="content-wrapper">
   ...     <a name="top"></a>
   ...     <div id="content">
   ...       This content will be replaced.
   ...     </div>
   ...     <a href="#top">Back to top</a>
   ...   </div>
   ... 
   ...   <div id="footer">
   ...     <span id="copyright">Copyright (C)</span> 2000 Some Corporation
   ...   </div>
   ... 
   ...  </body>
   ... </html>''')

   >>> print compare_request('/reddot', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><head>
    </head><body>
       foo
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>&#26085;&#26412;&#35486;</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       foo
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>


Some non-ASCII characters can end up mangled when they pass through lxml.html::

   >>> from lxml.html import fromstring, tostring
   >>> x = "<html><body>…</body></html>"
   >>> print tostring(fromstring(x))
   <html><body>&#226;&#128;&#166;</body></html>
   
That should have been "&#8230;", which is the HTML code for the ellipsis character.
The way to fix this is to decode the string to unicode before lxml.html gets it::

   >>> print tostring(fromstring(x.decode('utf8')))
   <html><body>&#8230;</body></html>

So, internally, Deliverance will use webob.Response.unicode_body, which uses the
response's charset to figure out how to decode the string. Let's make sure that
these characters aren't mangled when they are themed through Deliverance::

   >>> app['/mangled'] = Response('''\
   ... <html><head>
   ...  </head><body>
   ...     …
   ...  </body>
   ... </html>''')

   >>> print compare_request('/mangled', deliv_filename) # doctest: +REPORT_UDIFF
   Original content:
   <html><head>
    </head><body>
       …
    </body>
   </html>
   Themed content:
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
   <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>&#26085;&#26412;&#35486;</title><link rel="Stylesheet" type="text/css" href="http://localhost/style.css"><style type="text/css"><![CDATA[
       @import "http://localhost/style2.css";
       ]]>
     </style></head><body>
   <BLANKLINE>
     <div id="header" class="title-bar">
       <h1 id="title">This is the theme title</h1>
       <div class="topnav"></div>
     </div>
     <div id="content-wrapper">
       <a name="top"></a>
       <div id="content">
       &#8230;
    </div>
       <a href="#top">Back to top</a>
     </div>
   <BLANKLINE>
     <div id="footer">
       foo
       <![CDATA[
               some unescaped script content in the footer
               ]]>
     </div>
   <BLANKLINE>
    </body></html>

