Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.DetectContentType do not return gbk charset by the mimesniff spec. #77

Closed
sanguohot opened this issue Sep 7, 2018 · 4 comments
Closed

Comments

@sanguohot
Copy link

sanguohot commented Sep 7, 2018

this is my code:

package main

import (
	"fmt"
	"go-ethereum/common/hexutil"
	"log"
	"net/http"
	"golang.org/x/text/transform"
	"golang.org/x/text/encoding/simplifiedchinese"
	"strings"
	"io/ioutil"
)
func DecodeToGBK(utf8Str string) (dst string, err error) {
	var trans transform.Transformer = simplifiedchinese.GBK.NewEncoder()
	var reader *strings.Reader = strings.NewReader(utf8Str)
	var transReader *transform.Reader = transform.NewReader(reader, trans)
	bytes, err := ioutil.ReadAll(transReader)
	if err != nil {
		return
	}
	dst = string(bytes)
	return
}

func EncodeFromGBK(gbkStr string) (utf8Str string, err error) {
	var trans transform.Transformer = simplifiedchinese.GBK.NewDecoder()
	var reader *strings.Reader = strings.NewReader(gbkStr)
	var transReader *transform.Reader = transform.NewReader(reader, trans)
	bytes, err := ioutil.ReadAll(transReader)
	if err != nil {
		return
	}
	utf8Str = string(bytes)
	return
}
func main() {
	bytes, err := hexutil.Decode("0x68656c6c6f20776f726c64210d0aced2cac7bae3dec8313131313232323232")
	if err != nil {
		log.Fatal(err)
	}
	gbkStr := string(bytes)
	fmt.Println("http.DetectContentType ===>", http.DetectContentType(bytes))
	fmt.Println("gbkStr ===>", gbkStr)
	gbkToUtf8Str, err := EncodeFromGBK(gbkStr)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("gbkToUtf8Str ===>", gbkToUtf8Str)
}

and this is the output:

http.DetectContentType ===> text/plain; charset=utf-8
gbkStr ===> hello world!
���Ǻ���111122222
gbkToUtf8Str ===> hello world!
我是恒奕111122222

As you see, it should be decoded with gbk, but the http.DetectContentType which using mimesniff lib return utf-8.
Sadly I counld not found any chardet as powerful as python chardet or js chardet
So I have to wrapper the http.DetectContentType with a ugly way.

package util

import (
	"github.com/sanguohot/chardet"
	"net/http"
	"strings"
)

var defaultCharset = "utf-8"
var gbk = "gbk"
func DetectCharsetWithOnlyUtf8OrGbk(data []byte) string {
	strs := chardet.Possible(data)
	if strs[0] == defaultCharset {
		return defaultCharset
	}
	foundGbk := false
	for _, value := range strs {
		if value == gbk {
			foundGbk = true
		}
	}
	if foundGbk {
		return gbk
	}
	return defaultCharset
}
func DetectContentType(data []byte) string {
	contentType := http.DetectContentType(data)
	if strings.Index(contentType, defaultCharset) >= 0 {
		contentType = strings.Replace(contentType, defaultCharset, DetectCharsetWithOnlyUtf8OrGbk(data), -1)
	}
	return contentType
}

This code can work, but should be with bugs.
Any help would be greatly appreciated.

@sanguohot
Copy link
Author

sanguohot commented Sep 7, 2018

I have already raise an issue golang/go#27461

@sanguohot sanguohot changed the title http.DetectContentType do not return gbk charset http.DetectContentType do not return gbk charset by the mimesniff spec. Sep 7, 2018
@annevk
Copy link
Member

annevk commented Sep 7, 2018

Unfortunately, detecting encodings is a non-goal for this algorithm. @agnivade if similar requests arrive on the golang side it's probably best not to redirect them here.

We might still get limited encoding sniffing on the web platform standardized at some point (see whatwg/encoding#68 and whatwg/encoding#102), but that would only be used in a couple of places, sometime after MIME sniffing has done its thing.

@annevk annevk closed this as completed Sep 7, 2018
@agnivade
Copy link

agnivade commented Sep 7, 2018

@agnivade if similar requests arrive on the golang side it's probably best not to redirect them here.

Ok, since we implement the mimesniff spec, we redirect any requests here. Where should we redirect them ?

@annevk
Copy link
Member

annevk commented Sep 7, 2018

@agnivade basically we don't really want to extend this algorithm, so I suppose it'd be best to politely decline. The one exception is if someone found an interoperability issue between golang and a browser or an actual bug of some kind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants