Package Documentation

  • Readme

    Build Status Coverage Status Codacy Badge Latest Stable Version Total Downloads License Donate to this project using Paypal Donate to this project using Patreon

    :scroll: Simple Html Dom Parser for PHP

    A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM Parser project but instead of string manipulation we use DOMDocument and modern php classes like "Symfony CssSelector".

    • PHP 7.0+ & 8.0 Support
    • PHP-FIG Standard
    • Composer & PSR-4 support
    • PHPUnit testing via Travis CI
    • PHP-Quality testing via SensioLabsInsight
    • UTF-8 Support (more support via "voku/portable-utf8")
    • Invalid HTML Support (partly ...)
    • Find tags on an HTML page with selectors just like jQuery
    • Extract contents from HTML in a single line

    Install via "composer require"

    composer require voku/simple_html_dom
    composer require voku/portable-utf8 # if you need e.g. UTF-8 fixed output
    

    Quick Start

    use voku\helper\HtmlDomParser;
    
    require_once 'composer/autoload.php';
    
    ...
    $dom = HtmlDomParser::str_get_html($str);
    // or 
    $dom = HtmlDomParser::file_get_html($file);
    
    $element = $dom->findOne('#css-selector'); // "$element" === instance of "SimpleHtmlDomInterface"
    
    $elements = $dom->findMulti('.css-selector'); // "$elements" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface>
    
    $elementOrFalse = $dom->findOneOrFalse('#css-selector'); // "$elementOrFalse" === instance of "SimpleHtmlDomInterface" or false
    
    $elementsOrFalse = $dom->findMultiOrFalse('.css-selector'); // "$elementsOrFalse" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface> or false
    ...

    Examples

    github.com/voku/simple_html_dom/tree/master/example

    API

    github.com/voku/simple_html_dom/tree/master/README_API.md

    Support

    For support and donations please visit Github | Issues | PayPal | Patreon.

    For status updates and release announcements please visit Releases | Twitter | Patreon.

    For professional support please contact me.

    Thanks

    • Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Managment, etc.
    • Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
    • Thanks to Travis CI for being the most awesome, easiest continous integration tool out there!
    • Thanks to StyleCI for the simple but powerfull code style check.
    • Thanks to PHPStan && Psalm for relly great Static analysis tools and for discover bugs in the code!

    License

    FOSSA Status

  • Readme Api

    :scroll: Simple Html Dom Parser for PHP

    DomParser API

    SimpleHtmlDomNode (group of dom elements) API

    SimpleHtmlDom (single dom element) API


    find(string $selector, int|null $idx): mixed

    ↑ Find list of nodes with a CSS selector.

    Parameters:

    • string $selector
    • int|null $idx

    Return:

    • mixed

    findMulti(string $selector): mixed

    ↑ Find nodes with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • mixed

    findMultiOrFalse(string $selector): mixed

    ↑ Find nodes with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • mixed

    findOne(string $selector): static

    ↑ Find one node with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • static

    findOneOrFalse(string $selector): mixed

    ↑ Find one node with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • mixed

    fixHtmlOutput(string $content, bool $multiDecodeNewHtmlEntity): string

    ↑

    Parameters:

    • string $content
    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    getDocument(): DOMDocument

    ↑

    Parameters: nothing

    Return:

    • \DOMDocument

    getElementByClass(string $class): mixed

    ↑ Return elements by ".class".

    Parameters:

    • string $class

    Return:

    • mixed

    getElementById(string $id): mixed

    ↑ Return element by #id.

    Parameters:

    • string $id

    Return:

    • mixed

    getElementByTagName(string $name): mixed

    ↑ Return element by tag name.

    Parameters:

    • string $name

    Return:

    • mixed

    getElementsById(string $id, int|null $idx): mixed

    ↑ Returns elements by "#id".

    Parameters:

    • string $id
    • int|null $idx

    Return:

    • mixed

    getElementsByTagName(string $name, int|null $idx): mixed

    ↑ Returns elements by tag name.

    Parameters:

    • string $name
    • int|null $idx

    Return:

    • mixed

    html(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's outer html.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    innerHtml(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's inner html.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    innerXml(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's inner xml.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    loadHtml(string $html, int|null $libXMLExtraOptions): DomParserInterface

    ↑ Load HTML from string.

    Parameters:

    • string $html
    • int|null $libXMLExtraOptions

    Return:

    • \DomParserInterface

    loadHtmlFile(string $filePath, int|null $libXMLExtraOptions): DomParserInterface

    ↑ Load HTML from file.

    Parameters:

    • string $filePath
    • int|null $libXMLExtraOptions

    Return:

    • \DomParserInterface

    save(string $filepath): string

    ↑ Save the html-dom as string.

    Parameters:

    • string $filepath

    Return:

    • string

    set_callback(callable $functionName): mixed

    ↑

    Parameters:

    • callable $functionName

    Return:

    • mixed

    text(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's plain text.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    xml(bool $multiDecodeNewHtmlEntity, bool $htmlToXml, bool $removeXmlHeader, int $options): string

    ↑ Get the HTML as XML or plain XML if needed.

    Parameters:

    • bool $multiDecodeNewHtmlEntity
    • bool $htmlToXml
    • bool $removeXmlHeader
    • int $options

    Return:

    • string

    count(): int

    ↑ Get the number of items in this dom node.

    Parameters: nothing

    Return:

    • int

    find(string $selector, int $idx): SimpleHtmlDomNode|\SimpleHtmlDomNode[]|null

    ↑ Find list of nodes with a CSS selector.

    Parameters:

    • string $selector
    • int $idx

    Return:

    • \SimpleHtmlDomNode|\SimpleHtmlDomNode[]|null

    findMulti(string $selector): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Find nodes with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • \SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    findMultiOrFalse(string $selector): false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Find nodes with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    findOne(string $selector): SimpleHtmlDomNode|null

    ↑ Find one node with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • \SimpleHtmlDomNode|null

    findOneOrFalse(string $selector): false|\SimpleHtmlDomNode

    ↑ Find one node with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • false|\SimpleHtmlDomNode

    innerHtml(): string[]

    ↑ Get html of elements.

    Parameters: nothing

    Return:

    • string[]

    innertext(): string[]

    ↑ alias for "$this->innerHtml()" (added for compatibly-reasons with v1.x)

    Parameters: nothing

    Return:

    • string[]

    outertext(): string[]

    ↑ alias for "$this->innerHtml()" (added for compatibly-reasons with v1.x)

    Parameters: nothing

    Return:

    • string[]

    text(): string[]

    ↑ Get plain text.

    Parameters: nothing

    Return:

    • string[]

    childNodes(int $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface|null

    ↑ Returns children of node.

    Parameters:

    • int $idx

    Return:

    • \SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface|null

    delete(): mixed

    ↑ Delete

    Parameters: nothing

    Return:

    • mixed

    find(string $selector, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Find list of nodes with a CSS selector.

    Parameters:

    • string $selector
    • int|null $idx

    Return:

    • \SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    findMulti(string $selector): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Find nodes with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • \SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    findMultiOrFalse(string $selector): false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Find nodes with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    findOne(string $selector): SimpleHtmlDomInterface

    ↑ Find one node with a CSS selector.

    Parameters:

    • string $selector

    Return:

    • \SimpleHtmlDomInterface

    findOneOrFalse(string $selector): false|\SimpleHtmlDomInterface

    ↑ Find one node with a CSS selector or false, if no element is found.

    Parameters:

    • string $selector

    Return:

    • false|\SimpleHtmlDomInterface

    firstChild(): SimpleHtmlDomInterface|null

    ↑ Returns the first child of node.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    getAllAttributes(): string[]|null

    ↑ Returns an array of attributes.

    Parameters: nothing

    Return:

    • string[]|null

    getAttribute(string $name): string

    ↑ Return attribute value.

    Parameters:

    • string $name

    Return:

    • string

    getElementByClass(string $class): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Return elements by ".class".

    Parameters:

    • string $class

    Return:

    • \SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    getElementById(string $id): SimpleHtmlDomInterface

    ↑ Return element by "#id".

    Parameters:

    • string $id

    Return:

    • \SimpleHtmlDomInterface

    getElementByTagName(string $name): SimpleHtmlDomInterface

    ↑ Return element by tag name.

    Parameters:

    • string $name

    Return:

    • \SimpleHtmlDomInterface

    getElementsById(string $id, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Returns elements by "#id".

    Parameters:

    • string $id
    • int|null $idx

    Return:

    • \SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    getElementsByTagName(string $name, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Returns elements by tag name.

    Parameters:

    • string $name
    • int|null $idx

    Return:

    • \SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    getHtmlDomParser(): HtmlDomParser

    ↑ Create a new "HtmlDomParser"-object from the current context.

    Parameters: nothing

    Return:

    • \HtmlDomParser

    getIterator(): SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>

    ↑ Retrieve an external iterator.

    Parameters: nothing

    Return:

    • `\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface> An instance of an object implementing Iterator or Traversable

    getNode(): DOMNode

    ↑

    Parameters: nothing

    Return:

    • \DOMNode

    getTag(): string

    ↑ Return the tag of node

    Parameters: nothing

    Return:

    • string

    hasAttribute(string $name): bool

    ↑ Determine if an attribute exists on the element.

    Parameters:

    • string $name

    Return:

    • bool

    html(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's outer html.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    innerHtml(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's inner html.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    innerXml(bool $multiDecodeNewHtmlEntity): string

    ↑ Get dom node's inner html.

    Parameters:

    • bool $multiDecodeNewHtmlEntity

    Return:

    • string

    isRemoved(): bool

    ↑ Nodes can get partially destroyed in which they're still an actual DOM node (such as \DOMElement) but almost their entire body is gone, including the nodeType attribute.

    Parameters: nothing

    Return:

    • bool true if node has been destroyed

    lastChild(): SimpleHtmlDomInterface|null

    ↑ Returns the last child of node.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    nextNonWhitespaceSibling(): SimpleHtmlDomInterface|null

    ↑ Returns the next sibling of node, and it will ignore whitespace elements.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    nextSibling(): SimpleHtmlDomInterface|null

    ↑ Returns the next sibling of node.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    parentNode(): SimpleHtmlDomInterface

    ↑ Returns the parent of node.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface

    previousNonWhitespaceSibling(): SimpleHtmlDomInterface|null

    ↑ Returns the previous sibling of node, and it will ignore whitespace elements.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    previousSibling(): SimpleHtmlDomInterface|null

    ↑ Returns the previous sibling of node.

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface|null

    removeAttribute(string $name): SimpleHtmlDomInterface

    ↑ Remove attribute.

    Parameters:

    • string $name <p>The name of the html-attribute.</p>

    Return:

    • \SimpleHtmlDomInterface

    removeAttributes(): SimpleHtmlDomInterface

    ↑ Remove all attributes

    Parameters: nothing

    Return:

    • \SimpleHtmlDomInterface

    setAttribute(string $name, string|null $value, bool $strictEmptyValueCheck): SimpleHtmlDomInterface

    ↑ Set attribute value.

    Parameters:

    • string $name <p>The name of the html-attribute.</p>
    • string|null $value <p>Set to NULL or empty string, to remove the attribute.</p>
    • `bool $strictEmptyValueCheck $value must be NULL, to remove the attribute, so that you can set an empty string as attribute-value e.g. autofocus=""

    Return:

    • \SimpleHtmlDomInterface

    text(): string

    ↑ Get dom node's plain text.

    Parameters: nothing

    Return:

    • string

    val(string|string[]|null $value): string|string[]|null

    ↑

    Parameters:

    • `string|string[]|null $value null === get the current input value text === set a new input value

    Return:

    • string|string[]|null

Namespaces
\voku\helper\data
Classes
voku\helper\AbstractDomParser
voku\helper\AbstractSimpleHtmlDom
voku\helper\AbstractSimpleHtmlDomNode
voku\helper\AbstractSimpleXmlDom
voku\helper\AbstractSimpleXmlDomNode
voku\helper\ASCII
voku\helper\Bootup
voku\helper\DomParserInterface
voku\helper\HtmlDomHelper
voku\helper\HtmlDomParser
voku\helper\SelectorConverter
voku\helper\SimpleHtmlAttributes
voku\helper\SimpleHtmlAttributesInterface
voku\helper\SimpleHtmlDom
voku\helper\SimpleHtmlDomBlank
voku\helper\SimpleHtmlDomInterface
voku\helper\SimpleHtmlDomNode
voku\helper\SimpleHtmlDomNodeBlank
voku\helper\SimpleHtmlDomNodeInterface
voku\helper\SimpleXmlDom
voku\helper\SimpleXmlDomBlank
voku\helper\SimpleXmlDomInterface
voku\helper\SimpleXmlDomNode
voku\helper\SimpleXmlDomNodeBlank
voku\helper\SimpleXmlDomNodeInterface
voku\helper\UTF8
voku\helper\XmlDomParser
© 2025 Bruce Wells
Search Namespaces \ Classes
Configuration