Package Documentation
- Readme
:scroll: Simple Html Dom Parser for PHP
A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM Parser project but instead of string manipulation we use DOMDocument and modern php classes like "Symfony CssSelector".
- PHP 7.0+ & 8.0 Support
- PHP-FIG Standard
- Composer & PSR-4 support
- PHPUnit testing via Travis CI
- PHP-Quality testing via SensioLabsInsight
- UTF-8 Support (more support via "voku/portable-utf8")
- Invalid HTML Support (partly ...)
- Find tags on an HTML page with selectors just like jQuery
- Extract contents from HTML in a single line
Install via "composer require"
composer require voku/simple_html_dom composer require voku/portable-utf8 # if you need e.g. UTF-8 fixed outputQuick Start
use voku\helper\HtmlDomParser; require_once 'composer/autoload.php'; ... $dom = HtmlDomParser::str_get_html($str); // or $dom = HtmlDomParser::file_get_html($file); $element = $dom->findOne('#css-selector'); // "$element" === instance of "SimpleHtmlDomInterface" $elements = $dom->findMulti('.css-selector'); // "$elements" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface> $elementOrFalse = $dom->findOneOrFalse('#css-selector'); // "$elementOrFalse" === instance of "SimpleHtmlDomInterface" or false $elementsOrFalse = $dom->findMultiOrFalse('.css-selector'); // "$elementsOrFalse" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface> or false ...Examples
github.com/voku/simple_html_dom/tree/master/example
API
github.com/voku/simple_html_dom/tree/master/README_API.md
Support
For support and donations please visit Github | Issues | PayPal | Patreon.
For status updates and release announcements please visit Releases | Twitter | Patreon.
For professional support please contact me.
Thanks
- Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Managment, etc.
- Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
- Thanks to Travis CI for being the most awesome, easiest continous integration tool out there!
- Thanks to StyleCI for the simple but powerfull code style check.
- Thanks to PHPStan && Psalm for relly great Static analysis tools and for discover bugs in the code!
License
- Readme Api
:scroll: Simple Html Dom Parser for PHP
DomParser API
SimpleHtmlDomNode (group of dom elements) API
SimpleHtmlDom (single dom element) API
find(string $selector, int|null $idx): mixed
↑ Find list of nodes with a CSS selector.
Parameters:
string $selectorint|null $idx
Return:
mixed
findMulti(string $selector): mixed
↑ Find nodes with a CSS selector.
Parameters:
string $selector
Return:
mixed
findMultiOrFalse(string $selector): mixed
↑ Find nodes with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
mixed
findOne(string $selector): static
↑ Find one node with a CSS selector.
Parameters:
string $selector
Return:
static
findOneOrFalse(string $selector): mixed
↑ Find one node with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
mixed
fixHtmlOutput(string $content, bool $multiDecodeNewHtmlEntity): string
↑
Parameters:
string $contentbool $multiDecodeNewHtmlEntity
Return:
string
getDocument(): DOMDocument
↑
Parameters: nothing
Return:
\DOMDocument
getElementByClass(string $class): mixed
↑ Return elements by ".class".
Parameters:
string $class
Return:
mixed
getElementById(string $id): mixed
↑ Return element by #id.
Parameters:
string $id
Return:
mixed
getElementByTagName(string $name): mixed
↑ Return element by tag name.
Parameters:
string $name
Return:
mixed
getElementsById(string $id, int|null $idx): mixed
↑ Returns elements by "#id".
Parameters:
string $idint|null $idx
Return:
mixed
getElementsByTagName(string $name, int|null $idx): mixed
↑ Returns elements by tag name.
Parameters:
string $nameint|null $idx
Return:
mixed
html(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's outer html.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
innerHtml(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's inner html.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
innerXml(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's inner xml.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
loadHtml(string $html, int|null $libXMLExtraOptions): DomParserInterface
↑ Load HTML from string.
Parameters:
string $htmlint|null $libXMLExtraOptions
Return:
\DomParserInterface
loadHtmlFile(string $filePath, int|null $libXMLExtraOptions): DomParserInterface
↑ Load HTML from file.
Parameters:
string $filePathint|null $libXMLExtraOptions
Return:
\DomParserInterface
save(string $filepath): string
↑ Save the html-dom as string.
Parameters:
string $filepath
Return:
string
set_callback(callable $functionName): mixed
↑
Parameters:
callable $functionName
Return:
mixed
text(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's plain text.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
xml(bool $multiDecodeNewHtmlEntity, bool $htmlToXml, bool $removeXmlHeader, int $options): string
↑ Get the HTML as XML or plain XML if needed.
Parameters:
bool $multiDecodeNewHtmlEntitybool $htmlToXmlbool $removeXmlHeaderint $options
Return:
string
count(): int
↑ Get the number of items in this dom node.
Parameters: nothing
Return:
int
find(string $selector, int $idx): SimpleHtmlDomNode|\SimpleHtmlDomNode[]|null
↑ Find list of nodes with a CSS selector.
Parameters:
string $selectorint $idx
Return:
\SimpleHtmlDomNode|\SimpleHtmlDomNode[]|null
findMulti(string $selector): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Find nodes with a CSS selector.
Parameters:
string $selector
Return:
\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
findMultiOrFalse(string $selector): false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Find nodes with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
findOne(string $selector): SimpleHtmlDomNode|null
↑ Find one node with a CSS selector.
Parameters:
string $selector
Return:
\SimpleHtmlDomNode|null
findOneOrFalse(string $selector): false|\SimpleHtmlDomNode
↑ Find one node with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
false|\SimpleHtmlDomNode
innerHtml(): string[]
↑ Get html of elements.
Parameters: nothing
Return:
string[]
innertext(): string[]
↑ alias for "$this->innerHtml()" (added for compatibly-reasons with v1.x)
Parameters: nothing
Return:
string[]
outertext(): string[]
↑ alias for "$this->innerHtml()" (added for compatibly-reasons with v1.x)
Parameters: nothing
Return:
string[]
text(): string[]
↑ Get plain text.
Parameters: nothing
Return:
string[]
childNodes(int $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface|null
↑ Returns children of node.
Parameters:
int $idx
Return:
\SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface|null
delete(): mixed
↑ Delete
Parameters: nothing
Return:
mixed
find(string $selector, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Find list of nodes with a CSS selector.
Parameters:
string $selectorint|null $idx
Return:
\SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
findMulti(string $selector): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Find nodes with a CSS selector.
Parameters:
string $selector
Return:
\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
findMultiOrFalse(string $selector): false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Find nodes with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
false|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
findOne(string $selector): SimpleHtmlDomInterface
↑ Find one node with a CSS selector.
Parameters:
string $selector
Return:
\SimpleHtmlDomInterface
findOneOrFalse(string $selector): false|\SimpleHtmlDomInterface
↑ Find one node with a CSS selector or false, if no element is found.
Parameters:
string $selector
Return:
false|\SimpleHtmlDomInterface
firstChild(): SimpleHtmlDomInterface|null
↑ Returns the first child of node.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
getAllAttributes(): string[]|null
↑ Returns an array of attributes.
Parameters: nothing
Return:
string[]|null
getAttribute(string $name): string
↑ Return attribute value.
Parameters:
string $name
Return:
string
getElementByClass(string $class): SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Return elements by ".class".
Parameters:
string $class
Return:
\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
getElementById(string $id): SimpleHtmlDomInterface
↑ Return element by "#id".
Parameters:
string $id
Return:
\SimpleHtmlDomInterface
getElementByTagName(string $name): SimpleHtmlDomInterface
↑ Return element by tag name.
Parameters:
string $name
Return:
\SimpleHtmlDomInterface
getElementsById(string $id, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Returns elements by "#id".
Parameters:
string $idint|null $idx
Return:
\SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
getElementsByTagName(string $name, int|null $idx): SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Returns elements by tag name.
Parameters:
string $nameint|null $idx
Return:
\SimpleHtmlDomInterface|\SimpleHtmlDomInterface[]|\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
getHtmlDomParser(): HtmlDomParser
↑ Create a new "HtmlDomParser"-object from the current context.
Parameters: nothing
Return:
\HtmlDomParser
getIterator(): SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface>
↑ Retrieve an external iterator.
Parameters: nothing
Return:
- `\SimpleHtmlDomNodeInterface<\SimpleHtmlDomInterface> An instance of an object implementing Iterator or Traversable
getNode(): DOMNode
↑
Parameters: nothing
Return:
\DOMNode
getTag(): string
↑ Return the tag of node
Parameters: nothing
Return:
string
hasAttribute(string $name): bool
↑ Determine if an attribute exists on the element.
Parameters:
string $name
Return:
bool
html(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's outer html.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
innerHtml(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's inner html.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
innerXml(bool $multiDecodeNewHtmlEntity): string
↑ Get dom node's inner html.
Parameters:
bool $multiDecodeNewHtmlEntity
Return:
string
isRemoved(): bool
↑ Nodes can get partially destroyed in which they're still an actual DOM node (such as \DOMElement) but almost their entire body is gone, including the
nodeTypeattribute.Parameters: nothing
Return:
bool true if node has been destroyed
lastChild(): SimpleHtmlDomInterface|null
↑ Returns the last child of node.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
nextNonWhitespaceSibling(): SimpleHtmlDomInterface|null
↑ Returns the next sibling of node, and it will ignore whitespace elements.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
nextSibling(): SimpleHtmlDomInterface|null
↑ Returns the next sibling of node.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
parentNode(): SimpleHtmlDomInterface
↑ Returns the parent of node.
Parameters: nothing
Return:
\SimpleHtmlDomInterface
previousNonWhitespaceSibling(): SimpleHtmlDomInterface|null
↑ Returns the previous sibling of node, and it will ignore whitespace elements.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
previousSibling(): SimpleHtmlDomInterface|null
↑ Returns the previous sibling of node.
Parameters: nothing
Return:
\SimpleHtmlDomInterface|null
removeAttribute(string $name): SimpleHtmlDomInterface
↑ Remove attribute.
Parameters:
string $name <p>The name of the html-attribute.</p>
Return:
\SimpleHtmlDomInterface
removeAttributes(): SimpleHtmlDomInterface
↑ Remove all attributes
Parameters: nothing
Return:
\SimpleHtmlDomInterface
setAttribute(string $name, string|null $value, bool $strictEmptyValueCheck): SimpleHtmlDomInterface
↑ Set attribute value.
Parameters:
string $name <p>The name of the html-attribute.</p>string|null $value <p>Set to NULL or empty string, to remove the attribute.</p>- `bool $strictEmptyValueCheck $value must be NULL, to remove the attribute, so that you can set an empty string as attribute-value e.g. autofocus=""
Return:
\SimpleHtmlDomInterface
text(): string
↑ Get dom node's plain text.
Parameters: nothing
Return:
string
val(string|string[]|null $value): string|string[]|null
↑
Parameters:
- `string|string[]|null $value null === get the current input value text === set a new input value
Return:
string|string[]|null
- Namespaces
\voku \helper \data - Classes
- voku
\helper \AbstractDomParser - voku
\helper \AbstractSimpleHtmlDom - voku
\helper \AbstractSimpleHtmlDomNode - voku
\helper \AbstractSimpleXmlDom - voku
\helper \AbstractSimpleXmlDomNode - voku
\helper \ASCII - voku
\helper \Bootup - voku
\helper \DomParserInterface - voku
\helper \HtmlDomHelper - voku
\helper \HtmlDomParser - voku
\helper \SelectorConverter - voku
\helper \SimpleHtmlAttributes - voku
\helper \SimpleHtmlAttributesInterface - voku
\helper \SimpleHtmlDom - voku
\helper \SimpleHtmlDomBlank - voku
\helper \SimpleHtmlDomInterface - voku
\helper \SimpleHtmlDomNode - voku
\helper \SimpleHtmlDomNodeBlank - voku
\helper \SimpleHtmlDomNodeInterface - voku
\helper \SimpleXmlDom - voku
\helper \SimpleXmlDomBlank - voku
\helper \SimpleXmlDomInterface - voku
\helper \SimpleXmlDomNode - voku
\helper \SimpleXmlDomNodeBlank - voku
\helper \SimpleXmlDomNodeInterface - voku
\helper \UTF8 - voku
\helper \XmlDomParser