: str_replace(): Passing null to parameter #2 ($replace) of type array|string is deprecated in
* HTML API: WP_HTML_Processor class
* Core class used to safely parse and modify an HTML document.
* The HTML Processor class properly parses and modifies HTML5 documents.
* It supports a subset of the HTML5 specification, and when it encounters
* unsupported markup, it aborts early to avoid unintentionally breaking
* the document. The HTML Processor should never break an HTML document.
* While the `WP_HTML_Tag_Processor` is a valuable tool for modifying
* attributes on individual HTML tags, the HTML Processor is more capable
* and useful for the following operations:
* - Querying based on nested HTML structure.
* Eventually the HTML Processor will also support:
* - Wrapping a tag in surrounding HTML.
* - Unwrapping a tag by removing its parent.
* - Inserting and removing nodes.
* - Reading and changing inner content.
* - Navigating up or around HTML structure.
* Use of this class requires three steps:
* 1. Call a static creator method with your input HTML document.
* 2. Find the location in the document you are looking for.
* 3. Request changes to the document at that location.
* $processor = WP_HTML_Processor::create_fragment( $html );
* if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) {
* $processor->add_class( 'responsive-image' );
* Breadcrumbs represent the stack of open elements from the root
* of the document or fragment down to the currently-matched node,
* if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs()
* to inspect the breadcrumbs for a matched tag.
* Breadcrumbs can specify nested HTML structure and are equivalent
* to a CSS selector comprising tag names separated by the child
* combinator, such as "DIV > FIGURE > IMG".
* Since all elements find themselves inside a full HTML document
* when parsed, the return value from `get_breadcrumbs()` will always
* contain any implicit outermost elements. For example, when parsing
* with `create_fragment()` in the `BODY` context (the default), any
* tag in the given HTML document will contain `array( 'HTML', 'BODY', … )`
* Despite containing the implied outermost elements in their breadcrumbs,
* tags may be found with the shortest-matching breadcrumb query. That is,
* `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )`
* matches all IMG elements directly inside a P element. To ensure that no
* partial matches erroneously match it's possible to specify in a query
* the full breadcrumb match all the way down from the root HTML element.
* $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
* $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) );
* $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
* $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) );
* $html = '<div><img></div><img>';
* // ----- Matches here, because IMG must be a direct child of the implicit BODY.
* $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) );
* This class implements a small part of the HTML5 specification.
* It's designed to operate within its support and abort early whenever
* encountering circumstances it can't properly handle. This is
* the principle way in which this class remains as simple as possible
* without cutting corners and breaking compliance.
* If any unsupported element appears in the HTML input the HTML Processor
* will abort early and stop all processing. This draconian measure ensures
* that the HTML Processor won't break any HTML it doesn't fully understand.
* The following list specifies the HTML tags that _are_ supported:
* - Containers: ADDRESS, BLOCKQUOTE, DETAILS, DIALOG, DIV, FOOTER, HEADER, MAIN, MENU, SPAN, SUMMARY.
* - Custom elements: All custom elements are supported. :)
* - Form elements: BUTTON, DATALIST, FIELDSET, INPUT, LABEL, LEGEND, METER, PROGRESS, SEARCH.
* - Formatting elements: B, BIG, CODE, EM, FONT, I, PRE, SMALL, STRIKE, STRONG, TT, U, WBR.
* - Heading elements: H1, H2, H3, H4, H5, H6, HGROUP.
* - Lists: DD, DL, DT, LI, OL, UL.
* - Media elements: AUDIO, CANVAS, EMBED, FIGCAPTION, FIGURE, IMG, MAP, PICTURE, SOURCE, TRACK, VIDEO.
* - Phrasing elements: ABBR, AREA, BDI, BDO, CITE, DATA, DEL, DFN, INS, MARK, OUTPUT, Q, SAMP, SUB, SUP, TIME, VAR.
* - Sectioning elements: ARTICLE, ASIDE, HR, NAV, SECTION.
* - Templating elements: SLOT.
* - Text decoration: RUBY.
* - Deprecated elements: ACRONYM, BLINK, CENTER, DIR, ISINDEX, KEYGEN, LISTING, MULTICOL, NEXTID, PARAM, SPACER.
* Some kinds of non-normative HTML involve reconstruction of formatting elements and
* re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE
* may in fact belong _before_ the table in the DOM. If the HTML Processor encounters
* such a case it will stop processing.
* The following list specifies HTML markup that _is_ supported:
* - Markup involving only those tags listed above.
* - Fully-balanced and non-overlapping tags.
* - HTML with unexpected tag closers.
* - Some unbalanced or overlapping tags.
* - P tags after unclosed P tags.
* - BUTTON tags after unclosed BUTTON tags.
* - A tags after unclosed A tags that don't involve any active formatting elements.
* @see WP_HTML_Tag_Processor
* @see https://html.spec.whatwg.org/
class WP_HTML_Processor extends WP_HTML_Tag_Processor {
* The maximum number of bookmarks allowed to exist at any given time.
* HTML processing requires more bookmarks than basic tag processing,
* so this class constant from the Tag Processor is overwritten.
const MAX_BOOKMARKS = 100;
* Holds the working state of the parser, including the stack of
* open elements and the stack of active formatting elements.
* Initialized in the constructor.
* @var WP_HTML_Processor_State
* Used to create unique bookmark names.
* This class sets a bookmark for every tag in the HTML document that it encounters.
* The bookmark name is auto-generated and increments, starting with `1`. These are
* internal bookmarks and are automatically released when the referring WP_HTML_Token
* goes out of scope and is garbage-collected.
* @see WP_HTML_Processor::$release_internal_bookmark_on_destruct
private $bookmark_counter = 0;
* Stores an explanation for why something failed, if it did.
* @see self::get_last_error
private $last_error = null;
* Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance.
* This function is created inside the class constructor so that it can be passed to
* the stack of open elements and the stack of active formatting elements without
* exposing it as a public method on the class.
private $release_internal_bookmark_on_destruct = null;
* Stores stack events which arise during parsing of the
* HTML document, which will then supply the "match" events.
* @var WP_HTML_Stack_Event[]
private $element_queue = array();
* Current stack event, if set, representing a matched token.
* Because the parser may internally point to a place further along in a document
* than the nodes which have already been processed (some "virtual" nodes may have
* appeared while scanning the HTML document), this will point at the "current" node
* being processed. It comes from the front of the element queue.
* @var ?WP_HTML_Stack_Event
private $current_element = null;
* Context node if created as a fragment parser.
private $context_node = null;
* Whether the parser has yet processed the context node,
* if created as a fragment parser.
* The context node will be initially pushed onto the stack of open elements,
* but when created as a fragment parser, this context element (and the implicit
* HTML document node above it) should not be exposed as a matched token or node.
* This boolean indicates whether the processor should skip over the current
* node in its initial search for the first node created from the input HTML.
private $has_seen_context_node = false;
* Public Interface Functions
* Creates an HTML processor in the fragment parsing mode.
* Use this for cases where you are processing chunks of HTML that
* will be found within a bigger HTML document, such as rendered
* block output that exists within a post, `the_content` inside a
* Fragment parsing occurs within a context, which is an HTML element
* that the document will eventually be placed in. It becomes important
* when special elements have different rules than others, such as inside
* a TEXTAREA or a TITLE tag where things that look like tags are text,
* or inside a SCRIPT tag where things that look like HTML syntax are JS.
* The context value should be a representation of the tag into which the
* HTML is found. For most cases this will be the body element. The HTML
* form is provided because a context element may have attributes that
* impact the parse, such as with a SCRIPT tag and its `type` attribute.
* ## Current HTML Support
* - The only supported context is `<body>`, which is the default value.
* - The only supported document encoding is `UTF-8`, which is the default value.
* @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances.
* @param string $html Input HTML fragment to process.
* @param string $context Context element for the fragment, must be default of `<body>`.
* @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
* @return static|null The created processor if successful, otherwise null.
public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
$processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
$processor->state->context_node = array( 'BODY', array() );
$processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
// @todo Create "fake" bookmarks for non-existent but implied nodes.
$processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 );
$processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
$processor->state->stack_of_open_elements->push(
$context_node = new WP_HTML_Token(
$processor->state->context_node[0],
$processor->state->stack_of_open_elements->push( $context_node );
$processor->context_node = $context_node;
* Do not use this method. Use the static creator methods instead.
* @see WP_HTML_Processor::create_fragment()
* @param string $html HTML to process.
* @param string|null $use_the_static_create_methods_instead This constructor should not be called manually.
public function __construct( $html, $use_the_static_create_methods_instead = null ) {
parent::__construct( $html );
if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) {
/* translators: %s: WP_HTML_Processor::create_fragment(). */
__( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ),
'<code>WP_HTML_Processor::create_fragment()</code>'
$this->state = new WP_HTML_Processor_State();
$this->state->stack_of_open_elements->set_push_handler(
function ( WP_HTML_Token $token ) {
$is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer();
$same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
$provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
$this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance );
$this->state->stack_of_open_elements->set_pop_handler(
function ( WP_HTML_Token $token ) {
$is_virtual = ! isset( $this->state->current_token ) || ! $this->is_tag_closer();
$same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
$provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
$this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance );
* Create this wrapper so that it's possible to pass
* a private method into WP_HTML_Token classes without
* exposing it to any public API.
$this->release_internal_bookmark_on_destruct = function ( $name ) {
parent::release_bookmark( $name );
* Returns the last error, if any.
* Various situations lead to parsing failure but this class will
* return `false` in all those cases. To determine why something
* failed it's possible to request the last error. This can be
* helpful to know to distinguish whether a given tag couldn't
* be found or if content in the document caused the processor
* to give up and abort processing.
* $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' );
* false === $processor->next_tag();
* WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error();
* @see self::ERROR_UNSUPPORTED
* @see self::ERROR_EXCEEDED_MAX_BOOKMARKS
* @return string|null The last error, if one exists, otherwise null.
public function get_last_error() {
return $this->last_error;
* Finds the next tag matching the $query.
* @todo Support matching the class name and tag name.
* @since 6.6.0 Visits all tokens, including virtual ones.
* @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
* @param array|string|null $query {
* Optional. Which tag name to find, having which class, etc. Default is to find any tag.
* @type string|null $tag_name Which tag to find, or `null` for "any tag."
* @type string $tag_closers 'visit' to pause at tag closers, 'skip' or unset to only visit openers.
* @type int|null $match_offset Find the Nth tag matching all search criteria.
* 1 for "first" tag, 3 for "third," etc.
* @type string|null $class_name Tag must contain this whole class name to match.
* @type string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
* May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
* @return bool Whether a tag was matched.
public function next_tag( $query = null ) {
$visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers'];
while ( $this->next_token() ) {
if ( '#tag' !== $this->get_token_type() ) {
if ( ! $this->is_tag_closer() || $visit_closers ) {
if ( is_string( $query ) ) {
$query = array( 'breadcrumbs' => array( $query ) );
if ( ! is_array( $query ) ) {
__( 'Please pass a query array to this function.' ),
$needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) )
if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) {
while ( $this->next_token() ) {
if ( '#tag' !== $this->get_token_type() ) {
if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
if ( ! $this->is_tag_closer() || $visit_closers ) {
$breadcrumbs = $query['breadcrumbs'];
$match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1;
while ( $match_offset > 0 && $this->next_token() ) {
if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) {
if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) {