| [ Index ] |
PHP Cross Reference of DokuWiki |
[Source view] [Print] [Project Stats]
UTF8 helper functions
| Author: | Andreas Gohr |
| License: | LGPL (http://www.gnu.org/copyleft/lesser.html) |
| File Size: | 1536 lines (78 kb) |
| Included or required: | 16 times |
| Referenced: | 0 times |
| Includes or requires: | 0 files |
utf8_entity_decoder:: (9 methods):
utf8_entity_decoder()
makeutf8()
decode()
utf8_to_unicode()
unicode_to_utf8()
utf8_to_utf16be()
utf16be_to_utf8()
utf8_bad_replace()
utf8_correctIdx()
Class: utf8_entity_decoder - X-Ref
| utf8_entity_decoder() X-Ref |
| No description |
| makeutf8($c) X-Ref |
| No description |
| decode($ent) X-Ref |
| No description |
| utf8_to_unicode($str,$strict=false) X-Ref |
| Takes an UTF-8 string and returns an array of ints representing the Unicode characters. Astral planes are supported ie. the ints in the output can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates are not allowed. If $strict is set to true the function returns false if the input string isn't a valid UTF-8 octet sequence and raises a PHP error at level E_USER_WARNING Note: this function has been modified slightly in this library to trigger errors on encountering bad bytes link: http://hsivonen.iki.fi/php-utf8/ link: http://sourceforge.net/projects/phputf8/ author: <hsivonen@iki.fi> author: Harry Fuecks <hfuecks@gmail.com> param: string UTF-8 encoded string param: boolean Check for invalid sequences? see: unicode_to_utf8 return: mixed array of unicode code points or false if UTF-8 invalid |
| unicode_to_utf8($arr,$strict=false) X-Ref |
| Takes an array of ints representing the Unicode characters and returns a UTF-8 string. Astral planes are supported ie. the ints in the input can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates are not allowed. If $strict is set to true the function returns false if the input array contains ints that represent surrogates or are outside the Unicode range and raises a PHP error at level E_USER_WARNING Note: this function has been modified slightly in this library to use output buffering to concatenate the UTF-8 string (faster) as well as reference the array by it's keys link: http://hsivonen.iki.fi/php-utf8/ link: http://sourceforge.net/projects/phputf8/ author: <hsivonen@iki.fi> author: Harry Fuecks <hfuecks@gmail.com> param: array of unicode code points representing a string param: boolean Check for invalid sequences? see: utf8_to_unicode return: mixed UTF-8 string or false if array contains invalid code points |
| utf8_to_utf16be(&$str, $bom = false) X-Ref |
| UTF-8 to UTF-16BE conversion. Maybe really UCS-2 without mb_string due to utf8_to_unicode limits |
| utf16be_to_utf8(&$str) X-Ref |
| UTF-8 to UTF-16BE conversion. Maybe really UCS-2 without mb_string due to utf8_to_unicode limits |
| utf8_bad_replace($str, $replace = '') X-Ref |
| Replace bad bytes with an alternative character ASCII character is recommended for replacement char PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars author: Harry Fuecks <hfuecks@gmail.com> param: string to search param: string to replace bad bytes with (defaults to '?') - use ASCII see: http://www.w3.org/International/questions/qa-forms-utf-8 return: string |
| utf8_correctIdx(&$str,$i,$next=false) X-Ref |
| adjust a byte index into a utf8 string to a utf8 character boundary author: chris smith <chris@jalakai.co.uk> param: $str string utf8 character string param: $i int byte index into $str param: $next bool direction to search for boundary, return: int byte index into $str now pointing to a utf8 character boundary |
| utf8_encodeFN($file,$safe=true) X-Ref |
| URL-Encode a filename to allow unicodecharacters Slashes are not encoded When the second parameter is true the string will be encoded only if non ASCII characters are detected - This makes it safe to run it multiple times on the same string (default is true) author: Andreas Gohr <andi@splitbrain.org> see: urlencode |
| utf8_decodeFN($file) X-Ref |
| URL-Decode a filename This is just a wrapper around urldecode author: Andreas Gohr <andi@splitbrain.org> see: urldecode |
| utf8_isASCII($str) X-Ref |
| Checks if a string contains 7bit ASCII only author: Andreas Gohr <andi@splitbrain.org> |
| utf8_strip($str) X-Ref |
| Strips all highbyte chars Returns a pure ASCII7 string author: Andreas Gohr <andi@splitbrain.org> |
| utf8_check($Str) X-Ref |
| Tries to detect if a string is in Unicode encoding link: http://www.php.net/manual/en/function.utf8-encode.php author: <bmorel@ssi.fr> |
| utf8_strlen($string) X-Ref |
| Unicode aware replacement for strlen() utf8_decode() converts characters that are not in ISO-8859-1 to '?', which, for the purpose of counting, is alright - It's even faster than mb_strlen. author: <chernyshevsky at hotmail dot com> see: strlen() see: utf8_decode() |
| utf8_substr($str, $offset, $length = null) X-Ref |
| UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length) author: Harry Fuecks <hfuecks@gmail.com> author: Chris Smith <chris@jalakai.co.uk> param: string param: integer number of UTF-8 characters offset (from left) param: integer (optional) length in UTF-8 characters from offset return: mixed string or false if failure |
| utf8_substr_replace($string, $replacement, $start , $length=0 ) X-Ref |
| Unicode aware replacement for substr_replace() author: Andreas Gohr <andi@splitbrain.org> see: substr_replace() |
| utf8_ltrim($str,$charlist='') X-Ref |
| Unicode aware replacement for ltrim() author: Andreas Gohr <andi@splitbrain.org> see: ltrim() return: string |
| utf8_rtrim($str,$charlist='') X-Ref |
| Unicode aware replacement for rtrim() author: Andreas Gohr <andi@splitbrain.org> see: rtrim() return: string |
| utf8_trim($str,$charlist='') X-Ref |
| Unicode aware replacement for trim() author: Andreas Gohr <andi@splitbrain.org> see: trim() return: string |
| utf8_strtolower($string) X-Ref |
| This is a unicode aware replacement for strtolower() Uses mb_string extension if available author: Leo Feyer <leo@typolight.org> see: strtolower() see: utf8_strtoupper() |
| utf8_strtoupper($string) X-Ref |
| This is a unicode aware replacement for strtoupper() Uses mb_string extension if available author: Leo Feyer <leo@typolight.org> see: strtoupper() see: utf8_strtoupper() |
| utf8_deaccent($string,$case=0) X-Ref |
| Replace accented UTF-8 characters by unaccented ASCII-7 equivalents Use the optional parameter to just deaccent lower ($case = -1) or upper ($case = 1) letters. Default is to deaccent both cases ($case = 0) author: Andreas Gohr <andi@splitbrain.org> |
| utf8_romanize($string) X-Ref |
| Romanize a non-latin string author: Andreas Gohr <andi@splitbrain.org> |
| utf8_stripspecials($string,$repl='',$additional='') X-Ref |
| Removes special characters (nonalphanumeric) from a UTF-8 string This function adds the controlchars 0x00 to 0x19 to the array of stripped chars (they are not included in $UTF8_SPECIAL_CHARS) author: Andreas Gohr <andi@splitbrain.org> param: string $string The UTF8 string to strip of special chars param: string $repl Replace special with this string param: string $additional Additional chars to strip (used in regexp char class) |
| utf8_strpos($haystack, $needle, $offset=0) X-Ref |
| This is an Unicode aware replacement for strpos author: Leo Feyer <leo@typolight.org> param: string param: string param: integer see: strpos() return: integer |
| utf8_tohtml($str) X-Ref |
| Encodes UTF-8 characters to HTML entities link: http://www.php.net/manual/en/function.utf8-decode.php author: Tom N Harris <tnharris@whoopdedo.org> author: <vpribish at shopping dot com> |
| utf8_unhtml($str, $entities=null) X-Ref |
| Decodes HTML entities to UTF-8 characters Convert any &#..; entity to a codepoint, The entities flag defaults to only decoding numeric entities. Pass HTML_ENTITIES and named entities, including & < etc. are handled as well. Avoids the problem that would occur if you had to decode "&#38;&amp;#38;" unhtmlspecialchars(utf8_unhtml($s)) -> "&&" utf8_unhtml(unhtmlspecialchars($s)) -> "&&#38;" what it should be -> "&&#38;" author: Tom N Harris <tnharris@whoopdedo.org> param: string $str UTF-8 encoded string param: boolean $entities Flag controlling decoding of named entities. return: UTF-8 encoded string with numeric (and named) entities replaced. |
| utf8_decode_numeric($ent) X-Ref |
| No description |
| Generated: Mon Sep 8 01:30:01 2008 | Cross-referenced by PHPXref 0.7 |