Let's look at a few 'tricks' with unicode that can make a program look like it's doing (or not doing, for that matter) something it doesn't. Based on the findings in a recent publication, these are well worth being aware of; both from a security point of view and for simply being on your guard against friends who may be trying to pull a prank on you :-D.
These tricks are well suited for trojan attacks as it can be difficult to detect even with a manual code review thanks to aspects of unicode like bidirectional (bidi) control characters.
The talk is based on some of the possibilites described in the paper "Trojan Source: Invisible vulnerabilities" by Nicholas Boucher and Ross Anderson of University of Cambridge. The implications of this work with regard to Python has been outlined in PEP 672.
Examples of using/abusing unicode inlude:
- Look-alike characters (homoglyphs) being used to make two different functions and make calls of one function look like that of the other (eg: Cyrillic ะต and Latin e are too similar for us to distinguish easily).
- Use bidi control characters to make a part of the appear to be present when it's actually part of a comment.
- Classic trick of naming files so that even an .exe file can look like a .pdf.
- Use of invisible characters to make strings that look same when they aren't.