领悟旧事--木头云之

为什么用fread会乱码：兼有Utf-8与Ansi(GB2312)编码的转换

来源：http://www.xx0594.com/
时间：2017-9-5
作者：浏览人数:1285

本人在抓取网络数据的过程中，发现用fread将Html文件读到内存总是会出错。起初不明白为什么会出错，该函数本人用了多年都未出现过问题。折腾了半天，经过跟踪，终于发现，在fread时就出现了乱码。又想了半天，猜想是编码不同引起的，因为一般文本文件的编码是Ansi，而Html的编码是UTF-8。
    想至此，我把UTF-8的Html文件用记事本打开，另存了Ansi的编码，一试，果然正常。
    但问题又出来了，我有上万个Html要处理，总不能一个一个地这样来处理，那样累晕的都没人同情——程序猿都是这样晕死的。到网络去搜索，都是在线转化的，也不可能。只能自己动手了——于是便有了下面的代码：
    使用方法：先用fread将Html读到内存中，然后用Utf82GB将行转化。
    例如：
    fread(sFileString,sizeof(char),fsize,fp);
    sFileString = Utf2GB(sFileString);
//-----------------------------------------------------
//UTF-8到GB2312的转换
char* Utf2Gb(const char* utf8)
{
int len = MultiByteToWideChar(CP_UTF8, 0, utf8, -1, NULL, 0);
wchar_t* wstr = new wchar_t[len+1];
memset(wstr, 0, len+1);
MultiByteToWideChar(CP_UTF8, 0, utf8, -1, wstr, len);
len = WideCharToMultiByte(CP_ACP, 0, wstr, -1, NULL, 0, NULL, NULL);
char* str = new char[len+1];
memset(str, 0, len+1);
WideCharToMultiByte(CP_ACP, 0, wstr, -1, str, len, NULL, NULL);
if(wstr) delete[] wstr;
return str;
}

//GB2312到UTF-8的转换
char* Gb2Utf(const char* gb2312)
{
int len = MultiByteToWideChar(CP_ACP, 0, gb2312, -1, NULL, 0);
wchar_t* wstr = new wchar_t[len+1];
memset(wstr, 0, len+1);
MultiByteToWideChar(CP_ACP, 0, gb2312, -1, wstr, len);
len = WideCharToMultiByte(CP_UTF8, 0, wstr, -1, NULL, 0, NULL, NULL);
char* str = new char[len+1];
memset(str, 0, len+1);
WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, len, NULL, NULL);
if(wstr) delete[] wstr;
return str;
}

【关闭窗口】