PHP Tutorial - PHP html_entities() Function

Definition

The html_entities() function converts characters that are illegal in HTML, such as &, <, and ", into their safe equivalents: &, <, and ", respectively.

Syntax

PHP html_entities() Function has the following syntax.

string html_entities ( string html [, int options [, string charset]] )

Parameter

Parameter	Description
html	the html string to convert
options	A bitmask of flags
charset	defines encoding used in conversion. Default value is ISO-8859-1 prior to PHP 5.4.0, and UTF-8 from PHP 5.4.0 onwards. You are highly encouraged to specify the correct value for your code.

Options

options is a bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_COMPAT | ENT_HTML401.

Available flags constants

Constant Name	Description
ENT_COMPAT	Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES	Will convert both double and single quotes.
ENT_NOQUOTES	Will leave both double and single quotes unconverted.
ENT_IGNORE	Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it ? may have security implications.
ENT_SUBSTITUTE	Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWED	Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401	Handle code as HTML 4.01.
ENT_XML1	Handle code as XML 1.
ENT_XHTML	Handle code as XHTML.
ENT_HTML5	Handle code as HTML 5.

Charset

The following character sets are supported:

Charset	Aliases	Description
ISO-8859-1	ISO8859-1	Western European, Latin-1.
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8	NoAlias	ASCII compatible multi-byte 8-bit Unicode.
cp866	ibm866, 866	DOS-specific Cyrillic charset.
cp1251	Windows-1251, win-1251, 1251	Windows-specific Cyrillic charset.
cp1252	Windows-1252, 1252	Windows specific charset for Western European.
KOI8-R	koi8-ru, koi8r	Russian.
BIG5	950	Traditional Chinese, mainly used in Taiwan.
GB2312	936	Simplified Chinese, national standard character set.
BIG5-HKSCS	NoAlias	Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS	SJIS, SJIS-win, cp932, 932	Japanese
EUC-JP	EUCJP, eucJP-win	Japanese
MacRoman	Charset that was used by Mac OS.
''	NoAlias	An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale()), in this order. Not recommended.

Return

PHP html_entities() function returns the escaped string.

Note

We can reverse this conversion using the html_entity_decode() function.

Example

Convert string to HTML friendly


<?PHP
$title = "java2s.com & PHP"; 
$safe = htmlentities($title); 
echo $safe;
?>

The code above generates the following result.