Clean HTML

The Clean HTML AddOn takes the HTML embedded in your email and cleans it up and makes sure it is W3C compliant so that it won’t cause any issues with your blog. Saves hours of time fiddling with your posts.

Configuration

This is optional. If you leave everything blank the AddOn still makes your HTML W3C compliant.

There is a configuration page that allows you to set commonly used settings.

2016-02-09_1446

Allowed is the comma separated list of HTML elements you want to allow. By default everything is allowed. If you enter anything in this setting you have to enter all the elements you want to allow. This is very restrictive. Typically you’ll want to use Forbidden Elements.

Forbidden Elements is the comma separated list of HTML elements you want to remove. By default nothing is forbidden. For example if you wanted to prevent images you would enter “img”

Forbidden Attributes is the comma separated list of HTML attributes you want to remove. For example if you change Forbidden Attributes to “style,class” then all style and class attributes will be removed.

Remove Empty will remove any HTML elements that have no content such as <span></span>

Remove Empty Remove Nbsp will remove any HTML elements that have no content other than standard and nonbreaking (nbsp) spaces. E.g. <span>&nbsp; </span>

Programmer Stuff

The Clean HTML AddOn settings page will handle most people’s needs, but if you need something additional read on.

This AddOn has one filter called “postie_htmlcleaner_config” which allows you the opportunity to modify the configuration.

Create a file named filterPostie.php in the wp-content directory and paste the following code into it:

For example this will remove all style and class attributes from the incoming email.

<?php
add_filter('postie_htmlcleaner_config', 'my_htmlcleaner_config');

function my_htmlcleaner_config($config)
{
    $config->set('HTML.ForbiddenAttributes', 'class,style');
    return $config;
}

This example this will only allow <b>, <u>, <div>, <p> and <a>. The <a> will also allow the href attribute

<?php
add_filter('postie_htmlcleaner_config', 'my_htmlcleaner_config');

function my_htmlcleaner_config($config)
{
    $config->set('HTML.Allowed', 'a[href],b,u,div,p');
    return $config;
}

Internally this AddOn uses HtmlPurifier. See http://htmlpurifier.org/live/configdoc/plain.html for all the options.


A user asked if this AddOn will clean up the “junk” MS Word and MS Outlook add. It does clean up the Word/Outlook junk, but I will say that some of what Word adds is valid html (such as class and style attributes) so you need to see what styles your theme provides that conflicts with what comes with the email.For example here is something that was from Outlook and was cleaned, but didn’t display quite the way the user wanted:

<p class="MsoNormal" style="text-align: center;" align="center"><span style="font-size: 12pt; font-family: 'Times New Roman', serif;">BOARD OF DIRECTORS MEETING</span></p>

This is valid html, but if your theme has a “MsoNormal” style defined things might look strange. Note also that a lot of specific styling was specified which might look odd in your theme especially the “font-size: 12pt; font-family: ‘Times New Roman’, serif”

Release Notes

2.0.5 released 2024-01-12

  • Updated htmlpurifier library.

2.0.4 released 2022-12-12

2.0.3 released 2022-09-07

2.0.2 released 2022-03-19