These –
=?GBK?B?1cK5scf4s8e53L7WudjT2s34wufT38fp0MXPoteo?= =?GBK?B?sai1xLTwuLQoz8K6vszBMbrFKS5kb2M=?=
– are MIME Encoded-Words. The general form is:
=?charset?encoding?encoded text?=
You're correct that the charset is GBK, but you must first interpret the transport encoding, which is either B
for Base64 or Q
for Quoted-Printable. Thus:
py3.5 >>> base64.b64decode("sai1xLTwuLQoz8K6vszBMbrFKS5kb2M=").decode("GBK") '报的答复(下壕塘1号).doc'
However, email.header
will handle this better:
py3.5 >>> email.header.decode_header("=?GBK?B?1cK5scf4s8e53L7WudjT2s34wufT38fp0MXPoteo?= =?GBK?B?sai1xLTwuLQoz8K6vszBMbrFKS5kb2M=?=") [(b'\xd5\xc2\xb9\xb1\xc7\xf8\xb3\xc7\xb9\xdc\xbe\xd6\xb9\xd8\xd3\xda\xcd\xf8\xc2\xe7\xd3\xdf\xc7\xe9\xd0\xc5\xcf\xa2\xd7\xa8\xb1\xa8\xb5\xc4\xb4\xf0\xb8\xb4(\xcf\xc2\xba\xbe\xcc\xc11\xba\xc5).doc', 'gbk')] py3.5 >>> _[0][0].decode(_[0][1]) '章贡区城管局关于网络舆情信息专报的答复(下壕塘1号).doc'
The structure of the first result is such because a single header may have multiple components, i.e. different encodings or mixed raw text and Encoded-Words. Unlike Perl's Encode, the Python module leaves it up to you to join() the results:
def decode_header(enc): dec = email.header.decode_header(enc) dec = [f[0].decode(f[1] or "us-ascii") for f in dec] return "".join(dec)
Speaking of Perl:
$ perl -E 'use open qw(:std :utf8); use Encode; say Encode::decode("MIME-Header", "=?GBK?B?1cK5scf4s8e53L7WudjT2s34wufT38fp0MXPoteo?= =?GBK?B?sai1xLTwuLQoz8K6vszBMbrFKS5kb2M=?=");' 章贡区城管局关于网络舆情信息专报的答复(下壕塘1号).doc
(Also, the body isn't uuencoded, it's Base64-encoded. They use different character sets, even though both are 3:4 encodings and uudecode
is usually smart enough to detect raw Base64.)