Path names, directory names, and file names may contain characters that make validation difficult and inaccurate. Furthermore, any path name component can be a symbolic link, which further obscures the actual location or identity of a file. To simplify file name validation, it is recommended that names be translated into their canonical form. Canonicalizing file names makes it much easier to verify a path, directory, or file name by making it easier to compare names.

Because the canonical form can vary between operating systems and file systems, it is best to use operating-system-specific mechanisms for canonicalization.

As an illustration, here is a function that ensures that a path name refers to a file in the user's home directory on POSIX systems:

#include <pwd.h>
#include <unistd.h>
#include <string.h>

int verify_file(char *const filename) {
  /* Get /etc/passwd entry for current user */
  struct passwd *pwd = getpwuid(getuid());
  if (pwd == NULL) {
    /* Handle error */
    return 0;
  }

  const size_t len = strlen( pwd->pw_dir);
  if (strncmp( filename, pwd->pw_dir, len) != 0) {
    return 0;
  }
  /* Make sure there is only one '/', immediately after homedir */
  if (strrchr( filename, '/') == filename + len) {
    return 1;
  }
  return 0;
}

The verify_file() function requires that the file name be an absolute path name. Furthermore, it can be deceived if the file name being referenced is actually a symbolic link to a file name not in the users's home directory.

Noncompliant Code Example

In this noncompliant example, argv[1] contains a file name that originates from a tainted source and is opened for writing. Before this file name is used in file operations, it should be validated to ensure that it refers to an expected and valid file. Unfortunately, the file name referenced by argv[1] may contain special characters, such as directory characters, that make validation difficult if not impossible. Furthermore, any path name component in argv[1] may be a symbolic link, resulting in the file name referring to an invalid file even though it passes validation.

If validation is not performed correctly, the call to fopen() may result in an unintended file being accessed.

/* Verify argv[1] is supplied */

if (!verify_file(argv[1]) {
  /* Handle error */
}

if (fopen(argv[1], "w") == NULL) {
  /* Handle error */
}

/* ... */

Compliant Solution (POSIX)

Canonicalizing file names is difficult and involves an understanding of the underlying file system.

The POSIX realpath() function can assist in converting path names to their canonical form. According to Standard for Information Technology—Portable Operating System Interface (POSIX®), Base Specifications, Issue 7 (IEEE Std 1003.1, 2013 Edition) [IEEE Std 1003.1:2013],

The realpath() function shall derive, from the pathname pointed to by file_name, an absolute pathname that names the same file, whose resolution does not involve '.', '..', or symbolic links.

Further verification, such as ensuring that two successive slashes or unexpected special files do not appear in the file name, must be performed. See Section 4.12, "Pathname Resolution," of IEEE Std 1003.1, 2013 Edition, for more details on how path name resolution is performed [IEEE Std 1003.1:2013].

Many manual pages for the realpath() function come with an alarming warning, such as this one from the Linux Programmer's Manual [Linux 2008]:

Avoid using this function. It is broken by design since (unless using the non-standard resolved_path == NULL feature) it is impossible to determine a suitable size for the output buffer, resolved_path. According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX need not be a defined constant, and may have to be obtained using pathconf(3). And asking pathconf(3) does not really help, since on the one hand POSIX warns that the result of pathconf(3) may be huge and unsuitable for mallocing memory. And on the other hand pathconf(3) may return −1 to signify that PATH_MAX is not bounded.

The libc4 and libc5 implementation contains a buffer overflow (fixed in libc-5.4.13). As a result, set-user-ID programs like mount(8) need a private version.

The realpath() function was changed in POSIX.1-2008. Older versions of POSIX allow implementation-defined behavior in situations where the resolved_name is a null pointer. The current POSIX revision and many current implementations (led by glibc and Linux) allocate memory to hold the resolved name if a null pointer is used for this argument.

The following statement can be used to conditionally include code that depends on this revised form of the realpath() function:

#if _POSIX_VERSION >= 200809L || defined (linux)

Consequently, despite the alarming warnings, it is safe to call realpath() with resolved_name assigned the value NULL (on systems that support it), as shown in this compliant solution:

char *realpath_res = NULL;

/* Verify argv[1] is supplied */

realpath_res = realpath(argv[1], NULL);
if (realpath_res == NULL) {
  /* Handle error */
}

if (!verify_file(realpath_res) {
  /* Handle error */
}

if (fopen(realpath_res, "w") == NULL) {
  /* Handle error */
}

/* ... */

free(realpath_res);
realpath_res = NULL;

It is also safe to call realpath() with a non-null resolved_path provided that PATH_MAX is defined as a constant in <limits.h>. In this case, the realpath() function expects resolved_path to refer to a character array that is large enough to hold the canonicalized path. If PATH_MAX is defined, allocate a buffer of size PATH_MAX to hold the result of realpath(), as shown in this compliant solution:

char *realpath_res = NULL;
char *canonical_file name = NULL;
size_t path_size = 0;

/* Verify argv[1] is supplied */

path_size = (size_t)PATH_MAX;

if (path_size > 0) {
  canonical_filename = malloc(path_size);

  if (canonical_filename == NULL) {
    /* Handle error */
  }

  realpath_res = realpath(argv[1], canonical_filename);
}

if (realpath_res == NULL) {
  /* Handle error */
}

if (!verify_file(realpath_res) {
  /* Handle error */
}
if (fopen(realpath_res, "w") == NULL ) {
  /* Handle error */
}

/* ... */

free(canonical_filename);
canonical_filename = NULL;

Care still must be taken to avoid creating a time-of-check, time-of-use (TOCTOU) condition by using realpath() to check a file name.

Noncompliant Code Example (POSIX)

Calling the realpath() function with a non-null resolved_path when PATH_MAX is not defined as a constant is not safe. IEEE Std 1003.1, 2013 Edition [IEEE Std 1003.1:2013], effectively forbids such uses of realpath():

If resolved_name is not a null pointer and {PATH_MAX} is not defined as a constant in the <limits.h> header, the behavior is undefined.

The rationale from IEEE Std 1003.1, 2013 Edition, explains why this case is unsafe:

Since realpath( ) has no length argument, if {PATH_MAX} is not defined as a constant in <limits.h>, applications have no way of determining how large a buffer they need to allocate for it to be safe to pass to realpath( ). A {PATH_MAX} value obtained from a prior pathconf( ) call is out-of-date by the time realpath( ) is called. Hence the only reliable way to use realpath( ) when {PATH_MAX} is not defined in <limits.h> is to pass a null pointer for resolved_name so that realpath( ) will allocate a buffer of the necessary size.

PATH_MAX can vary among file systems (which is the reason for obtaining it with pathconf() and not sysconf()). A PATH_MAX value obtained from a prior pathconf() call can be invalidated, for example, if a directory in the path is replaced with a symlink to a different file system or if a new file system is mounted somewhere along the path.

char *realpath_res = NULL;
char *canonical_filename = NULL;
size_t path_size = 0;
long pc_result;

/* Verify argv[1] is supplied */

errno = 0;

/* Query for PATH_MAX */
pc_result = pathconf(argv[1], _PC_PATH_MAX);

if ( (pc_result == -1) && (errno != 0) ) {
  /* Handle error */
} else if (pc_result == -1) {
  /* Handle error */
} else if (pc_result <= 0) {
  /* Handle error */
}
path_size = (size_t)pc_result;

if (path_size > 0) {
  canonical_filename = malloc(path_size);

  if (canonical_filename == NULL) {
    /* Handle error */
  }

  realpath_res = realpath(argv[1], canonical_filename);
}

if (realpath_res == NULL) {
  /* Handle error */
}

if (!verify_file(realpath_res) {
  /* Handle error */
}

if (fopen(realpath_res, "w") == NULL) {
  /* Handle error */
}

/* ... */

free(canonical_filename);
canonical_filename = NULL;

Implementation Details (Linux)

The libc4 and libc5 implementations of realpath() contain a buffer overflow (fixed in libc-5.4.13) [VU#743092]. Consequently, programs need a private version of this function in which this issue is known to be fixed.

Compliant Solution (glibc)

The realpath() function can be difficult to use and inefficient. Another solution, available as a GNU extension, is canonicalize_file_name(). This function has the same effect as realpath(), but the result is always returned in a newly allocated buffer [Drepper 2006].

/* Verify argv[1] is supplied */

char *canonical_filename = canonicalize_file_name(argv[1]);
if (canonical_filename == NULL) {
  /* Handle error */
}

/* Verify file name */

if (fopen(canonical_filename, "w") == NULL) {
  /* Handle error */
}

/* ... */

free(canonical_filename);
canonical_filename = NULL;

Because memory is allocated by canonicalize_file_name(), the programmer must remember to free the allocated memory.

Noncompliant Code Example (Windows)

This noncompliant code example uses the Windows function GetFullPathName() for canonicalization [MSDN]:

/* ... */

enum { INITBUFSIZE = 256 };
DWORD ret = 0;
DWORD new_ret = 0;
char *canonical_filename;
char *new_file;
char *file_name;

/* ... */

file_name = (char *)malloc(strlen(argv[1])+1);
canonical_filename = (char *)malloc(INITBUFSIZE);

if ( (file_name != NULL) && (canonical_filename != NULL) ) {
  strcpy(file_name, argv[1]);
  strcpy(canonical_filename, "");
} else {
  /* Handle error */
}

ret = GetFullPathName(
  file_name,
  INITBUFSIZE,
  canonical_filename,
  NULL
);

if (ret == 0) {
  /* Handle error */
}
else if (ret > INITBUFSIZE) {
  new_file = (char *)realloc(canonical_filename, ret);
  if (new_file == NULL) {
    /* Handle error */
  }

  canonical_filename = new_file;

  new_ret = GetFullPathName(
    file_name,
    ret,
    canonical_filename,
    NULL
  );
  if (new_ret > ret) {
    /*
     * The length of the path changed between calls
     * to GetFullPathName(); handle error.
     */
  }
  else if (new_ret == 0) {
    /* Handle error */
  }
}

if (!verify_file(canonical_filename) {
  /* Handle error */
}
/* Verify file name before using */

The GetFullPathName() function can be used to eliminate the .. and /./ components from a path name, but there are numerous other canonicalization issues that are not addressed by use of GetFullPathName(), including universal naming convention (UNC) shares, short (8.3) names, long names, Unicode names, trailing dots, forward slashes, backslashes, short cuts, and so on.

Care also must be taken to avoid creating a TOCTOU condition by using GetFullPathName() to check a file name.

Compliant Solution (Windows)

Producing canonical file names for Windows operating systems is extremely complex and beyond the scope of this standard. The best advice is to try to avoid making decisions based on a path, directory, or file name [Howard 2002]. Alternatively, use operating-system-based mechanisms, such as access control lists (ACLs) or other authorization techniques.

Risk Assessment

File-related vulnerabilities can often be exploited to cause a program with elevated privileges to access an unintended file. Canonicalizing a file path makes it easier to identify the reference file object.

Recommendation

Severity

Likelihood

Remediation Cost

Priority

Level

FIO02-C

Medium

Probable

Medium

P8

L2

Automated Detection

Tool

Version

Checker

Description

CodeSonar
IO.TAINT.FNAME

BADFUNC.PATH.*

Tainted Filename

A collection of checks that report uses of library functions that require securely-specified path parameters.

Compass/ROSE



Could catch violations of this rule by enforcing that any call to open() or fopen() is preceded by a canonicalization routine—that is, a call to realpath() or canonicalize_file_name(). This call will catch some false positives, as ROSE cannot tell when canonicalization is warranted. False positives can be reduced (but not eliminated) by only reporting instances of fopen() or open() where the file name string has some other processing done to it. This reflects the fact that canonicalization is only necessary for doing verification based on the file name string

Klocwork

SV.DLLPRELOAD.NONABSOLUTE.DLL
SV.TOCTOU.FILE_ACCESS


LDRA tool suite

85 D

Partially implemented

Polyspace Bug Finder

CERT C: Rec. FIO02-C

Checks for vulnerable path manipulation (rule fully covered)

Related Vulnerabilities

CVE-2009-1760 results from a violation of this recommendation. Until version 0.4.13, libtorrent attempts to rule out unsafe file paths by checking only against the ".." string. An attacker can exploit this to access any file on the system by using more complex relative paths [xorl 2009].

CVE-2014-9390 results from a violation of this recommendation. When git is used on a case-insensitive file system (e.g., NTFS under Windows, HFS+ under Mac OS X), a file named ".Git/config" in the repository would overwrite the user's local ".git/config" file.  This config file can define external commands (e.g., a custom diff utility), and it can lead to arbitrary code execution.  The commit that fixes this vulnerability is https://github.com/git/git/commit/77933f4449b8d6aa7529d627f3c7b55336f491db.  The release notes briefly discuss other canonicalization issues, in addition to case-insensitiivity, under Windows and Mac OS X.

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

SEI CERT C++ Coding StandardVOID FIO02-CPP. Canonicalize path names originating from untrusted sources
CERT Oracle Secure Coding Standard for JavaFIO16-J. Canonicalize path names before validating them
ISO/IEC TR 24772:2013Path Traversal [EWR]
MITRE CWE

CWE-22, Path traversal
CWE-23, Relative Path Traversal
CWE-28, Path Traversal: '..\filedir'
CWE-40, Path Traversal: '\\UNC\share\name\' (Windows UNC Share)
CWE-41, Failure to resolve path equivalence
CWE-59, Failure to resolve links before file access (aka "link following")
CWE-73, External control of file name or path

Bibliography

[Drepper 2006]Section 2.1.2, "Implicit Memory Allocation"
[Howard 2002]Chapter 11, "Canonical Representation Issues"
[Linux 2008]realpath(3)
pathconf(3)
[MSDN]"GetFullPathName Function"
[IEEE Std 1003.1:2013]Section 4.12, "Pathname Resolution"
System Interfaces: realpath
[Seacord 2013]Chapter 8, "File I/O"
[VU#743092]
[xorl 2009]CVE-2009-1760: libtorrent Arbitrary File Overwrite