.Net WebClient DownloadString screwed up my Unicode non english characters
For the past several days, I was trying to build a small utility tool to copy a Tumblr blog in the same account. Some of my posts contains unicode characters, and instead of getting Ù…Ø±ÙƒØ² Ù…ÙŠØ±ÙƒØ§ØªÙˆ instead I'm getting Ã™’¦Ã˜±Ã™Æ’Ã˜² Ã™’¦Ã™Å Ã˜±Ã™Æ’Ã˜§Ã˜ªÃ™Ë†. Probably not much difference to you and I, but for arabic readers, the two made a lot of difference
Originally I had something like the following:
string data = client.DownloadString("[some url]"); var reader = new StringReader(data);
Pretty straight forward right? But the thing is, it doesn't work. So I found out the hard way, client.DownloadString doesn't encode the characters using UTF-8.
To do that, I had to change the code to the following:
var data = client.DownloadData("[some url]"); var strungData = Encoding.UTF8.GetString(data); var reader = new StringReader(strungData);
Here is the easy way : WebClient client = new WebClient(); client.Encoding = Encoding.UTF8; string data = client.DownloadString("[some url]");