PHP input data filtering in a nutshell

Application must filter all possible forms of input data, either provided by the user or obtained from environment, database systems, web services or any other external systems.

Whole filtering must be made or repeated at server side. Client-side filtering can NEVER be trusted as it can be easily bypassed by malicious user. Do not assume that form’s hidden fields, checkboxes, radio buttons or selection boxes are safe.

All input data must be filtered prior to use. It must be proved to match expected form and meet actual business rules, i.e. it must be not only of correct type, size (including length and range) and syntax, but also has proper meaning (e.g. credit card number).

Filtering strategies

Filtering approaches

Safer approach is always whitelist.

Input data sources

Identify untrusted input sources

$_REQUEST should never be used to fetch input data (it introduces similar problems to register_globals as it practically combines GET and POST arrays as well as cookies into a single array).

Numeric data

Casting to desired numeric type, e.g. (int)$_GET['id'], (float)$_GET['price']. Cast forces converting string to desired numeric type. But: Casting does not work for hexadecimal numbers, octal numbers and scientific notation. is_numeric() is lower but more flexible

Range issues:

Strings

Filtering strings is usually a little complex and involves using a number of various ways and functions depending on what type of data variable should represent (URI, phone number, credit card number, username, date, time, etc).

File uploads

Turn off file uploads if you don’t use them (file_uploads = off).

Limit the maximum size of an uploaded file using upload_max_filesize directive. Be aware that post_max_size and memory_limit may affect this setting.

Change directory where uploads are stored (upload_tmp_dir = "path/to/safe/dir") as PHP uses world-readable temporary directory by default. Any user from given shared host or simply Apache process has usually permission to read and write to the files in this directory. Note that using Suhosin cookie and session data are encrypted.

$_FILES contains data for each uploaded file. It should be handled with care.

Prior to filtering anything, use is_uploaded_file() to check whether file was actually uploaded.

All uploaded file should be moved to desired destination directory only using move_uploaded_file(). It may be necessary to call it at the beginning to avoid open_basedir and safe_mode restrictions.

Serialized data

Use checksum (generated by hash_hmac()) to validate serialized data. Check checksum before unserializing. Secret key used for hashing should be random and at least 10-chars long.

Output

Output is HTML, JavaScript, JSON, databases, XML, feeds, shell commands, etc/

Ensure you escape (and encode special characters into corresponding HTML entities) all data before displaying it (prior to sending the output).

Integrity checks

Integrity checks - ensure that the data has not been tampered with/corrupted and is the same as before [OWASP05]. Integrity control types: checksum, HMAC, encryption, digital signaure depending on the security level. The preferred integrity control should be at least a HMAC using SHA-256 or preferably digitally signed or encrypted using PGP [OWASP05].

References / Further Reading

  1. by sobstel • May 2010