Cleaning User Input

Suppose your web site implements a "whois" CGI page, and to implment the program, you take a domain name from a text field on a form, and run this command in your code:

print "Content-Type: text/plain\n\n";
system ("/usr/bin/whois $domain");

This looks like a nice easy program.  But what if a user typed in a domain name of:;mail < /etc/passwd

Instead of just running a whois lookup, your CGI program would run "whois", followed by a command to mail the /etc/passwd file to himself. This file could then be run through a brute force or dictionary password guessing program in order to find potentially vulnerable passwords. Any time that user input is passed to a sub shell in a Perl program, that user input needs to be scrubbed clean in order to prevent this sort of problem. In general, removing the following characters will remove the metacharacters that could be used to exploit this type of hole:


These types of characters can be checked for in user input, your program can either remove the characters, replace them with something harmless, or abort the program with an error.

# disallow the following chars:
# !$&;*?()'`<>[]{}|~
# when dealing with paths, "." can be dangerous too
if ( $parm =~ m /[!$&;*?\(\)'`<>\[\]\{\}\|\n\r~]/ ) {
       &return_error (403, "Bad Input", "Invalid input from

One drawback of this match is that you have to remember every character that is not considered safe. An alternate match would be to check for characters that are not word characters, plus a few other safe characters. This way you are less likely to forget a meta character or two.

if ($addr =~ /^([-\@\w.]+)$/) { 
   &return_error (403, "Bad Input", "Invalid input from

While this method of cleaning up user data can be good, it does have drawbacks. Sometimes the data you want from the user may need to have these characters.  In addition, if you are accepting something that gets turned into a file name or path name from the user, accpeting periods (.) can be dangerous. If for example you had a CGI that let a user fetch files by a path name, you don't want a user to specify a file name of "../../../etc/passwd" instead of "phonelist.html".

What you really need to do is use the standard meta characters as a starting point, and evaluate each use of user input to see what your particular concerns are.


One feature of perl that can help with this process is the "taint" feature.  The taint feature is a way of automatically tracking what data has come from a source outside your program, and watching to make sure you clean it up before using it.

The taint feature of perl is turned on via the "-T" parameter on the perl execution statement at the beginning of your script, ex:

#!/usr/bin/perl -T

When a Perl is running with taint checking, it operates under these rules:

If Perl is running with taint checking on, and detects that you try to use a piece of data without removing it's taint, the script will terminate with an error, "Insecure dependency", preventing unsafe operations. Here are some examples of how tainting works:

 $arg = shift;               # $arg is tainted
$hid = "$arg, 'bar'";       # $hid is also tainted
$line = 
;                 # Tainted
$path = $ENV{PATH};         # Tainted, but see below
$mine = 'abc';              # Not tainted
$shout = `echo abc`;        # Tainted
$shout = `echo $shout`;     # Insecure

system "echo $arg";         # Insecure (uses sh)
system "/bin/echo", $arg;   # OK (doesn't use sh)
system "echo $mine";        # Insecure until PATH set
system "echo $hid";         # Insecure two ways

$path = $ENV{PATH};         # $path tainted

$ENV{PATH} = '/bin:/usr/bin'; 
$ENV{IFS} = "" if $ENV{IFS} ne "";

$path = $ENV{PATH};         # $path now NOT tainted
system "echo $mine";        # OK, is secure now!
system "echo $hid";         # Insecure via $hid still

open(OOF, "< $arg");        # OK (read-only file)
open(OOF, "> $arg");        # Insecure (trying to write)

open(OOF, "echo $arg|");    # Insecure via $arg, but...
    or exec 'echo', $arg;   # Considered OK

$shout = `echo $arg`;       # Insecure via $arg

unlink $mine, $arg;         # Insecure via $arg
umask $arg;                 # Insecure via $arg

exec "echo $arg";           # Single arg to exec or system is insecure
exec "echo", $arg;          # Considered OK (doesn't use the shell)
exec "sh", '-c', $arg;      # Considered OK, but isn't really

To clean the data with tanit checking, you need to assign the variable a value from a clean source, so you could modify your cleaning statements to look like this:

if ($addr =~ /^([-\@\w.]+)$/) {     
    $addr = $1;                     # $addr now untainted
else {
    &return_error (403, "Bad Input", "Invalid input from user");

Next Previous Overview

Copyright 2001 - Andy Welter. Taint examples taken from Programming Perl, by Wall Christiansen, and Schwartz O'Reilly and Assoc