Skip to content

Does not handle badly encoded filenames transparently #2

@matthijskooijman

Description

@matthijskooijman

When a filename on disk has an invalid name, in the sense that the name is not valid in the character encoding python is using for filenames, pylibacl breaks.

Since python 3.1, there is a mechanism for supporting filenames like these, using surrogate code points: http://legacy.python.org/dev/peps/pep-0383/

This causes these invalid filenames to be converted to normal unicode strings, using the "surrogate code points" in place of the invalid characters. However, when such a filename is passed to pylibacl (through the ACL constructor file argument), an exception occurs:

    actual = posix1e.ACL(file=path)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 99: surrogates not allowed

Looking at the code, I suspect this error comes from the call to PyArg_ParseTupleAndKeywords, which is told to convert into a string using utf-8, which then fails.

Looking at the docs for PyArg_ParseTupleAndKeywords, it suggests that this problem can be solved by using the 0& format:

Note: This format does not accept bytes-like objects. If you want to accept filesystem paths and convert them to C character strings, it is preferable to use the O& format with PyUnicode_FSConverter() as converter.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions