/* URL encoding (Rosetta code) in Picat. http://rosettacode.org/wiki/URL_encoding """ The task is to provide a function or mechanism to convert a provided string into URL encoding representation. In URL encoding, special characters, control characters and extended characters are converted into a percent symbol followed by a two digit hexadecimal code, So a space character encodes into %20 within the string. For the purposes of this task, every character except 0-9, A-Z and a-z requires conversion, so the following characters all require conversion by default: ASCII control codes (Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal). ASCII symbols (Character ranges 32-47 decimal (20-2F hex)) ASCII symbols (Character ranges 58-64 decimal (3A-40 hex)) ASCII symbols (Character ranges 91-96 decimal (5B-60 hex)) ASCII symbols (Character ranges 123-126 decimal (7B-7E hex)) Extended characters with character codes of 128 decimal (80 hex) and above. Example The string "http://foo bar/" would be encoded as "http%3A%2F%2Ffoo%20bar%2F". Variations - Lowercase escapes are legal, as in "http%3a%2f%2ffoo%20bar%2f". - Some standards give different rules: RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, section 2.3, says that "-._~" should not be encoded. HTML 5, section 4.10.22.5 URL-encoded form data, says to preserve "-._*", and to encode space " " to "+". The options below provide for utilization of an exception string, enabling preservation (non encoding) of particular characters to meet specific standards. Options It is permissible to use an exception string (containing a set of symbols that do not need to be converted). However, this is an optional feature and is not a requirement of this task. """ Also see - http://hakank.org/picat/url_decoding.pi This Picat model was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ import util. main => go. go => URLS = ["http://foo bar/", "http://hakank.org/picat/", "http://picat-lang.org/", "http://www.håkan.org/", % not a real URL "This is a plain text with some non-ASCII7 chars: Håkan är snäll." ], foreach(URL in URLS) println(URL=url_encode(URL)), println(URL=url_encode2(URL)) end, nl. % It's a pity that Picat don't support regular expressions.... chars() = char_range([['0','9'], ['a','z'], ['A','Z']]). char_range(List) = [ [chr(I) : I in ord(From)..ord(To)] : [From,To] in List].flatten(). url_encode(URL) = S => Chars = chars(), S = [cond(member(C,Chars), C.to_string(), "%" ++ to_hex_string(ord(C))) : C in URL].join(''). % % recursive version % url_encode2(URL) = S => url_encode2(URL,"",S). url_encode2("",S0,S) => S = S0.flatten(). url_encode2([C|URL],S0,S), member(C, chars()) => url_encode2(URL,S0 ++ [C],S). url_encode2([C|URL],S0,S) => C2 = "%" ++ to_hex_string(ord(C)), url_encode2(URL,S0 ++ [C2],S).