Tags: encoding, php, tweeklyfm, xml
Today, apart from studying I spent my time trying to fix a Tweekly.fm problem. It’s been there from the start, I only decided to fix it now. When I fetched the data from last.fm, there were occasions where there would be weird, strange characters, like “’”. It replaced quotation marks. Pondering, what caused it, I set off my adventure into google. It turns out that it was some form of encoding problem. PHP’s XML functions return them in UTF-8 encoding. I tried fixing it with PHP’s utf8_decode() function and even tried iconv(), but it didn’t work.
What baffled me even more was the fact that some quotation marks displayed properly… I sifted through the XML responses trying to find some trace of a meaning why it was happening. It turned out that there were different ways of writing quotation marks (or inverted commas). There is the normal straight-down one: ” and the bent in one (that you can’t make on a normal western qwerty keyboard. I can’t, so I assume most keyboards can’t): ” (<– See my browser can’t display it properly). Normal ISO_8859-1 encoding used in most western latin-ised languages can’t display the “other” quotation mark. If you are in an english country, chances are your computer came with the default “Windows-1252″ encoding, which is a superset of “ISO_8859-1″.
This basically means that unless you tell your browser to show the right encoding, it will default to your… default encoding.
So in order to remove the stupid “’”, I had to tell my browser to use UTF-8 Encoding across the site. In the end it worked out.