You've probably seen this now and then: a text document or email that might look something like this:
This =3D an equal sign.
You also may see many lines that end with an = or =20
That's "quoted-printable" as defined in RFC 2045. It's fairly easy to convert to ordinary text: the "=" at the end of a line means join that line to the following, and any other "=" will be followed by 2 hex digits, which is just the ASCII value of the character. So, =4E is an "N" and =20 is a space.
Aside from the legitimate uses (to encode characters that don't travel well and to send long lines through systems that don't like them), spammers use this and other encodings to disguise their work. If you are writing any sort of filtering to deal with spam, you need to take this sort of encoding (and more) into account.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2009-11-07 Tony Lawrence