The short answer is that this is a very ugly problem
HTTP speaks an 8 bit encoding compatible with ASCII, but the HTML inside the response can be any encoding, including multi-byte encodings like UTF-16 UTF-32 or Shift-JIS.
You need to open the socket with a raw 8 bit encoding, speak to the server in HTTP to make the request, and then download the body of the response.
Once you have the body you can:
- check for the 8, 16, or 32 bit versions of the unicode byte-order mark (BOM), in which case it's that encoding,
- look for an xml charset processing directive,
- scan the html HEAD element for a charset directive, and
- finally: guess based on byte frequencies.