Detecting RLO character in Python

First posted on: 2018/02/19

Last modified: 2019/06/17,1b90ad5

Categories: Python

At work, I learned about how Right-to-Left Override was being used to make actually malicious files to look harmless. For example, a .exe file was being made to appear as .doc files. We didn’t want to allow uploading such files. This meant that I nedded to detect the presence of the RLO character in the filename.

Then, I came across this post, where I learned about unicode bidirectional class and Python’s bidirectional() method.

The final solution for detection looked like this:

import unicodedata
..
filename = 'arbitrary_filename.doc'
if 'RLO' in [unicodedata.bidirectional(c) for c in unicode(filename)]:
    raise ValueError('Invalid character in one or more of the file names')
..