HTMLencoded2Text ( )

Function stats

Average user rating
4.0000
37
178
9999
Support
FileMaker 10.0 +
Date posted
08 January 2009
Last updated
14 September 2011
Version
Recursive function
Yes

Author Info
 Fabrice

54 functions

Average Rating 4.3

author_avatar



 

Function overview

Prototype

HTMLencoded2Text  ( _text )


Parameters

_text  


Description

Tags:  Text   HTML   Encoding  

Translates HTML encoded text into standard text

Examples

Sample input

HTMLencoded2Text ( "Smith&Wesson" )
or
HTMLencoded2Text ( "Smith&Wesson" )


Sample output

Smith&Wesson

 

Function code

/* HTMLencoded2Text ( _text )

by Fabrice Nordmann
http://www.1-more-thing.com - Twitter: 1morethingtweet

updated by HOnza Koudelka


v.1.4.1 - Sep 2011
- fixed bug causing Substitute to be skipped when using this function multiple times in the same script context (HOnza)
v.1.4 - Sep 2011
- optimized to run much faster thanks to FM Bench Detective (http://fmbench.com/detective) (HOnza)

v.1.3 - Aug 2011
- updated list of named character entities from http://alumnus.caltech.edu/~leif/namedchar.html (HOnza)
v.1.2 - Mar 2009
- handles long unicodes (Clément Hoffmann)
v.1.1.1 - Jan 2009
- added more HTML entities
v.1.1 - Jan 2009
- added the HTML entities
v.1.0 - Jan 2009


Translates HTML encoded text into standard text

example :
HTMLencoded2Text ( "Smith&Wesson" ) = "Smith&Wesson"


Requires FileMaker 10 or later

Recursive function
*/
Let ( [ _text = Case($HTMLencoded2Text_deep; _text; Substitute ( _text ;
    [" "; " "] ;
    ["¡"; "¡"] ;
    ["¢"; "¢"] ;
    ["&pound"; "£"] ;
    ["¤"; "¤"] ;
    ["¥"; "¥"] ;
    ["¦"; "¦"] ;
    ["§"; "§"] ;
    ["¨"; "¨"] ;
    ["©"; "©"] ;
    ["ª"; "ª"] ;
    ["«"; "«"] ;
    ["¬"; "¬"] ;
    ["­"; " ­ "] ;
    ["®"; "®"] ;
    ["¯"; "¯"] ;
    ["°"; "°"] ;
    ["±"; "±"] ;
    ["²"; "²"] ;
    ["³"; "³"] ;
    ["´"; "´"] ;
    ["µ"; "µ"] ;
    ["¶"; "\¶"] ;
    ["·"; "·"] ;
    ["¸"; "¸"] ;
    ["¹"; "¹"] ;
    ["º"; "º"] ;
    ["»"; "»"] ;
    ["¼"; "¼"] ;
    ["½"; "½"] ;
    ["¾"; "¾"] ;
    ["¿"; "¿"] ;
    ["À"; "À"] ;
    ["Á"; "Á"] ;
    ["Â"; "Â"] ;
    ["Ã"; "Ã"] ;
    ["Ä"; "Ä"] ;
    ["Å"; "Å"] ;
    ["Æ"; "Æ"] ;
    ["Ç"; "Ç"] ;
    ["È"; "È"] ;
    ["É"; "É"] ;
    ["Ê"; "Ê"] ;
    ["Ë"; "Ë"] ;
    ["Ì"; "Ì"] ;
    ["Í"; "Í"] ;
    ["Î"; "Î"] ;
    ["Ï"; "Ï"] ;
    ["Ð"; "Ð"] ;
    ["Ñ"; "Ñ"] ;
    ["Ò"; "Ò"] ;
    ["Ó"; "Ó"] ;
    ["Ô"; "Ô"] ;
    ["Õ"; "Õ"] ;
    ["Ö"; "Ö"] ;
    ["×"; "×"] ;
    ["Ø"; "Ø"] ;
    ["Ù"; "Ù"] ;
    ["Ú"; "Ú"] ;
    ["Û"; "Û"] ;
    ["Ü"; "Ü"] ;
    ["Ý"; "Ý"] ;
    ["Þ"; "Þ"] ;
    ["ß"; "ß"] ;
    ["à"; "à"] ;
    ["á"; "á"] ;
    ["â"; "â"] ;
    ["ã"; "ã"] ;
    ["ä"; "ä"] ;
    ["å"; "å"] ;
    ["æ"; "æ"] ;
    ["ç"; "ç"] ;
    ["è"; "è"] ;
    ["é"; "é"] ;
    ["ê"; "ê"] ;
    ["ë"; "ë"] ;
    ["ì"; "ì"] ;
    ["í"; "í"] ;
    ["î"; "î"] ;
    ["ï"; "ï"] ;
    ["ð"; "ð"] ;
    ["ñ"; "ñ"] ;
    ["ò"; "ò"] ;
    ["ó"; "ó"] ;
    ["ô"; "ô"] ;
    ["õ"; "õ"] ;
    ["ö"; "ö"] ;
    ["÷"; "÷"] ;
    ["ø"; "ø"] ;
    ["ù"; "ù"] ;
    ["ú"; "ú"] ;
    ["û"; "û"] ;
    ["ü"; "ü"] ;
    ["ý"; "ý"] ;
    ["þ"; "þ"] ;
    ["ÿ"; "ÿ"] ;
    ["ƒ"; "ƒ"] ;
    ["Α"; "Α"] ;
    ["Β"; "Β"] ;
    ["Γ"; "Γ"] ;
    ["Δ"; "Δ"] ;
    ["Ε"; "Ε"] ;
    ["Ζ"; "Ζ"] ;
    ["Η"; "Η"] ;
    ["Θ"; "Θ"] ;
    ["Ι"; "Ι"] ;
    ["Κ"; "Κ"] ;
    ["Λ"; "Λ"] ;
    ["Μ"; "Μ"] ;
    ["Ν"; "Ν"] ;
    ["Ξ"; "Ξ"] ;
    ["Ο"; "Ο"] ;
    ["Π"; "Π"] ;
    ["Ρ"; "Ρ"] ;
    ["Σ"; "Σ"] ;
    ["Τ"; "Τ"] ;
    ["Υ"; "Υ"] ;
    ["Φ"; "Φ"] ;
    ["Χ"; "Χ"] ;
    ["Ψ"; "Ψ"] ;
    ["Ω"; "Ω"] ;
    ["α"; "α"] ;
    ["β"; "β"] ;
    ["γ"; "γ"] ;
    ["δ"; "δ"] ;
    ["ε"; "ε"] ;
    ["ζ"; "ζ"] ;
    ["η"; "η"] ;
    ["θ"; "θ"] ;
    ["ι"; "ι"] ;
    ["κ"; "κ"] ;
    ["λ"; "λ"] ;
    ["μ"; "μ"] ;
    ["ν"; "ν"] ;
    ["ξ"; "ξ"] ;
    ["ο"; "ο"] ;
    ["π"; "π"] ;
    ["ρ"; "ρ"] ;
    ["ς"; "ς"] ;
    ["σ"; "σ"] ;
    ["τ"; "τ"] ;
    ["υ"; "υ"] ;
    ["φ"; "φ"] ;
    ["χ"; "χ"] ;
    ["ψ"; "ψ"] ;
    ["ω"; "ω"] ;
    ["ϑ"; "ϑ"] ;
    ["ϒ"; "ϒ"] ;
    ["ϖ"; "ϖ"] ;
    ["•"; "•"] ;
    ["…"; "…"] ;
    ["′"; "′"] ;
    ["″"; "″"] ;
    ["‾"; "‾"] ;
    ["⁄"; "⁄"] ;
    ["℘"; "℘"] ;
    ["ℑ"; "ℑ"] ;
    ["ℜ"; "ℜ"] ;
    ["™"; "™"] ;
    ["ℵ"; "ℵ"] ;
    ["←"; "←"] ;
    ["↑"; "↑"] ;
    ["→"; "→"] ;
    ["↓"; "↓"] ;
    ["↔"; "↔"] ;
    ["↵"; "↵"] ;
    ["⇐"; "⇐"] ;
    ["⇑"; "⇑"] ;
    ["⇒"; "⇒"] ;
    ["⇓"; "⇓"] ;
    ["⇔"; "⇔"] ;
    ["∀"; "∀"] ;
    ["∂"; "∂"] ;
    ["∃"; "∃"] ;
    ["∅"; "∅"] ;
    ["∇"; "∇"] ;
    ["∈"; "∈"] ;
    ["∉"; "∉"] ;
    ["∋"; "∋"] ;
    ["∏"; "∏"] ;
    ["∑"; "∑"] ;
    ["−"; "−"] ;
    ["∗"; "∗"] ;
    ["√"; "√"] ;
    ["∝"; "∝"] ;
    ["∞"; "∞"] ;
    ["∠"; "∠"] ;
    ["∧"; "∧"] ;
    ["∨"; "∨"] ;
    ["∩"; "∩"] ;
    ["∪"; "∪"] ;
    ["∫"; "∫"] ;
    ["∴"; "∴"] ;
    ["∼"; "∼"] ;
    ["≅"; "≅"] ;
    ["≈"; "≈"] ;
    ["≠"; "≠"] ;
    ["≡"; "≡"] ;
    ["≤"; "≤"] ;
    ["≥"; "≥"] ;
    ["⊂"; "⊂"] ;
    ["⊃"; "⊃"] ;
    ["⊆"; "⊆"] ;
    ["⊇"; "⊇"] ;
    ["⊕"; "⊕"] ;
    ["⊗"; "⊗"] ;
    ["⊥"; "⊥"] ;
    ["⋅"; "⋅"] ;
    ["⌈"; "⌈"] ;
    ["⌉"; "⌉"] ;
    ["⌊"; "⌊"] ;
    ["⌋"; "⌋"] ;
    ["⟨"; "⟨"] ;
    ["⟩"; "⟩"] ;
    ["◊"; "◊"] ;
    ["♠"; "♠"] ;
    ["♣"; "♣"] ;
    ["♥"; "♥"] ;
    ["♦"; "♦"] ;
    ["""; "\""] ;
    ["&"; "&"] ;
    ["&lt;"; "<"] ;
    ["&gt;"; ">"] ;
    ["&OElig;"; "Œ"] ;
    ["&oelig;"; "œ"] ;
    ["&Scaron;"; "Š"] ;
    ["&scaron;"; "š"] ;
    ["&Yuml;"; "Ÿ"] ;
    ["&circ;"; "ˆ"] ;
    ["&tilde;"; "˜"] ;
    ["&ensp;"; " "] ;
    ["&emsp;"; " "] ;
    ["&thinsp;"; " "] ;
    ["&zwnj;"; "   "] ;
    ["&zwj;"; "   "] ;
    ["&lrm;"; "   "] ;
    ["&rlm;"; "   "] ;
    ["&ndash;"; "–"] ;
    ["&mdash;"; "—"] ;
    ["&lsquo;"; ""] ;
    ["&rsquo;"; ""] ;
    ["&sbquo;"; "‚"] ;
    ["&ldquo;"; "\“"] ;
    ["&rdquo;"; "\”"] ;
    ["&bdquo;"; "\„"] ;
    ["&dagger;"; "†"] ;
    ["&Dagger;"; "‡"] ;
    ["&permil;"; "‰"] ;
    ["&lsaquo;"; "‹"] ;
    ["&rsaquo;"; "›"] ;
    ["&euro;"; "€"]
)) ;
$HTMLencoded2Text_depth = $HTMLencoded2Text_depth + 1 ;
_finalresult = Case ( not Position ( _text ; "&#"; 1; 1 ) ; _text ;
    Let ([
        _pos = Position ( _text ; "&#" ; 1 ; 1 ) ;
        _pos2 = Position ( _text ; ";" ; _pos ; 1 ) - _pos ;
        _word = Middle ( _text ; _pos ; _pos2 + 1 ) ;
        _isCode = Length ( _word ) >= 4 and Length ( _word ) <= 8 and Substitute ( _word ; [ 0 ; "" ] ; [ 1 ; "" ] ; [ 2 ; "" ] ; [ 3 ; "" ] ; [ 4 ; "" ] ; [ 5 ; "" ] ; [ 6 ; "" ] ; [ 7 ; "" ] ; [ 8 ; "" ] ; [ 9 ; "" ]) = "&#;" ;
        _result = Left ( _text ; _pos - 1 ) & Case ( _isCode ; Char ( GetAsNumber ( _word )) ; "&" )
        ];
        _result & HTMLencoded2Text ( Right ( _text ; Length ( _text ) - ( _pos + Case ( _isCode ; Length ( _word ) - 1 ))))
        ));
    $HTMLencoded2Text_depth = $HTMLencoded2Text_depth - 1
];
    _finalresult
)

// ===================================
/*

    This function is published on FileMaker Custom Functions
    to check for updates and provide feedback and bug reports
    please visit http://www.fmfunctions.com/fid/178

    Prototype: HTMLencoded2Text( _text )
    Function Author: Fabrice (http://www.fmfunctions.com/mid/37)
    Last updated: 14 September 2011
    Version: 3.5

*/
// ===================================

 

Comments

Agnès
04 May 2009



Hello Fabrice,

During an import, I found myself with those here who was not in your list
; [ "&le;" ; "≤" ]
; [ "&ge; " ; "≥" ]
; [ "&para;" ; Citation ( "#|^|#¶#|^|#" ) ]
; [ "\"#|^|#" ; "" ]; [ "#|^|#\"" ; "" ]
; [ "&ldquo;" ; "\"" ]
; [ "&rdquo;" ; "\"" ]

in cazou
Thanks a lots for this one !

Agnès
(Edited by Agnès on 04/05/09 )
  General comment
HOnza
29 August 2011



Hey, useful but still not complete. So I created a sample file that generates an updated version of this function using a database of html entities ;-)
Feel free to download it here: http://24usw.com/hent
(Edited by HOnza on 29/08/11 )
  General comment
Fabrice
29 August 2011



Thanks ! that's why I like this site. Updated the function according to your modifications.
  General comment
HOnza
14 September 2011



I have just optimized the custom function to run about 4 times faster when processing 33 kilobytes of text and millions times faster when processing 1.4 MB of text.

Re-download my sample file from http://24usw.com/hent to get the updated custom function.

I am also going to post a video of the optimization as soon as I get some time to cut it…
  General comment
HOnza
14 September 2011



One more update and correction.
I have made a mistake in the optimization, causing the function to not work right when used multiple times in the same context (the $HTMLencoded2Text_deep variable was persisting across multiple calls). If you have already downloaded the optimized version, please re-download the file with this bug fixed.

Correction: it's about 8 times faster on the 33KB text and maybe not millions but only several hundreds times faster on the 1.4MB text ;-)
(Edited by HOnza on 14/09/11 )
  General comment
HOnza
27 January 2012



Further optimized version I demoed at Pause[x]London 2011 is now available at http://24usw.com/529b
  General comment

 

 


Login or register to comment

Create a new account with fmcustomfunctions.com or login to post a comment.

 

 

 

Top Tags

Text Parsing  (31)
List  (20)
XML  (20)
Date  (19)
Format  (18)
Debug  (12)
Dev  (11)
Variables  (11)
Layout  (11)
Interface  (10)
Text  (10)
Filter  (10)
ValueIterator  (6)
Layout Objects  (6)
Uuid  (6)
HTML  (6)