KB‎ > ‎

UNIX shell script reference


Back in the man pages the next section is called USAGE and goes on to talk about pipelines and lists. Most of what it says here can be understood by any UNIX user so I will skip this for now but there will be some examples later showing various implementations of these definitions. The issue I want to deal with next is the simple, complex and special commands. This is nowhere near as bad as it sounds.

Simple Commands

Simple commands are just straight UNIX commands that exist regardless of the surrounding shell environment. Like our old favourites ls -l or df -al or lpr -Pprinter filename. There are large numbers of commands that fall into this category but the following list is a selection of the more useful when scripting.

  • sort Sorts lines in ascending, descending and unique order
  • grep Searches for regular expressions in strings or files
  • basename Strips the path from a path string to leave just the filename
  • dirname Removes the file from a path string to leave just the pathname
  • cut Chops up a text string by characters or fields
  • wc Count the characters, words, or lines
  • [ (test) ] Predicate or conditional processor
  • tr 'a' 'b' Transform characters
  • expr Simple arithmetic processor
  • bc Basic Calculator
  • eval Evaluate variables
  • echo Output strings
  • date Create date strings
  • nawk Manipulate text strings
  • head | tail Access lines in files

Some of the above commands can be very complex indeed, especially when assembled into pipelines and lists. However, these are still referred to as simple commands - presumably because they stand alone. Take a close look at the man pages for all of the above commands, you will find them invaluable during your scripting sojourn.

Complex Commands

Complex commands are just the shells internal commands which are used to group simple commands into controlled sets based on your requirements. These include the loop constructs and conditional test structures. These cannot stand alone. An if requires a then and a fi at the very least. Lets take a look at the man pages again at this point.

The for structure:

It says on my systems man page for name [ in word ... ] do list done as a syntax description of the for command construct. Well, it is correct but does not really show the layout of the command at all. Look at the example below and you can see straight away what is supposed to happen.

Example for syntax

alphabet="a b c d e"			# Initialise a string
count=0 # Initialise a counter
for letter in $alphabet # Set up a loop control
do # Begin the loop
count=`expr $count + 1` # Increment the counter
echo "Letter $count is [$letter]" # Display the result
done # End of loop

So in plain English, for each letter found in alphabet loop between do and done and process the list of commands found. Lets take this one line at a time from the top. This is the way the sh likes to have its variables set. There is no leading word as in the csh (set) just start with the variable name. There are also no blanks either side of the equal sign. Indeed, if you put a blank in, the shell will give you an error message for your trouble. This also gives rise to the difference between the top two lines in this example. Because I want to include spaces in my string for alphabet, I must enclose the whole string in double quotes. On the next line this is not required as there are no embedded blanks in the value of count. When setting variables, no blanks are allowed. Everywhere else, sh loves blanks.

In line 3 the for statement creates a loop construct by selecting the next letter from alphabet each time through the loop and executing the list found between the do and the done for each letter. This process also strips away any blanks (before and after) each letter found in alphabet . The do and done statements are not executed as such, they simply mark the beginning and end of the loop list. They are however a matched pair, leave one out and the shell will complain.

Inside the loop are two simple commands (apparently!). The first one just increments the loop counter by adding one to its current value. Note the use of the back-quote here to force the execution of the expr command before setting the new value of count. There will be more about this later.

The next line is something we have seen before, just a display command showing the values of the variables. Note the use of the $ symbol to request the value of the variables.

The while structure:

There is another similarly structured command in the sh called while. Its syntax structure is listed as while list do list done which you should now be able to translate yourself into something that looks like the example below.

Example while syntax

alphabet="a b c d e"						# Initialise a string
count=0 # Initialise a counter
while [ $count -lt 5 ] # Set up a loop control
do # Begin the loop
count=`expr $count + 1` # Increment the counter
position=`bc $count + $count - 1` # Position of next letter
letter=`echo "$alphabet" | cut -c$position-$position` # Get next letter
echo "Letter $count is [$letter]" # Display the result
done # End of loop

Most of this is the same construct, I have just replaced the for loop set-up with its equivalent while syntax. Instead of stepping through the letters in alphabet, the loop control now monitors the size of the count with [ $count -lt 5]. The -lt flag here represents less-than and is part of the UNIX test command, which is implied by the square brackets. Any other command, list or variable could be put here as long as its substituted value equates to an integer. A zero value will exit the loop, anything else and the loop will continue to process. From the above you can work out that test returns 1 for true and 0 for false. Have a look at the man pages for test at this point, you will find it a very useful command with great flexibility.

The if structure:

Next in complexity is if list then list [ elif list then list ] ... [ else list ] fi, or the if construct. What does that lot mean? Well usually if statements in any language are associated with predication and so as you would expect there is some more implied use of the UNIX test command. Lets generate an example to see the structure in a more usual form. The square brackets in the echo statement have no relevance other than to clarify the output when executed (See - Debugging). However, the square brackets in the if and elif lines are mandatory to the structure.

Example simple if syntax

if [ -f $dirname/$filename ]
echo "This filename [$filename] exists"
elif [ -d $dirname ]
echo "This dirname [$dirname] exists"
echo "Neither [$dirname] or [$filename] exist"

You can see here more examples of what test can do. The -f flag tests for existence of a plain file, while -d tests for existence of a directory. There is no limit (that I can discover) to the number of elif's you can use in one if statement. You can also stack up the tests into a list using a double pipe or double ampersand as in Example complex if syntax below. Here the use of the double pipe (||) is the syntax for a logical or whereas the double ampersand (&&) is the logical and.

Example complex if syntax

if [ -f $dir/$file ] || [ -f $dir/$newfile ]
echo "Either this filename [$file] exists"
echo "Or this filename [$newfile] exists"
elif [ -d $dir ]
echo "This dirname [$dir] exists"
echo "Neither [$dir] or [$file or $newfile] exist"

In the sh if construct it is important to put the then word on its own line or sh will complain about an invalid test. Also important is the blank inside each end of the test. Without this the test will generate a syntax error - usually "test expected!" which is a bit meaningless.

case structure:

Next is the case word in [ pattern [ pattern ] ... ) list ;; ] esac which is probably the most complicated construct to decode from the simple syntax listed above. It is a bit like a multi-line if statement linked with logical or symbols (||). It is commonly used to process a list of parameters passed into a script as arguments when the actual parameters could be in any order or of any value. The layout is shown in below, which is a section from a print script.

Example case syntax

size=0					# Default Char Point Size (!)
page=660 # Default Page Point Size
while [ "$1" != "" ] # When there are arguments...
do # Process the next one
case $1 # Look at $1
-l) lines=47; # If it's a "-l", set lines
page=470; # Set the Landscape Page Point
options="$options -L -l"; # Set the Landscape Options
shift;; # Shift one argument along
-p) lines=66; # If it's a "-p", set lines
options="$options -l"; # Set the Portrait Options
shift;; # Shift one argument along
-s) size=$2; # If it's a "-s", set size
shift 2;; # Shift two arguments along
*) echo "Option [$1] not one of [p, l, s]"; # Error (!)
exit;; # Abort Script Now
if [ $size = 0 ] # If size still un-set...
size=`echo "$page / $lines" | bc` # Set from pages over lines
else # or
lines=`echo "$page / $size" | bc` # Set lines
options="$options$lines -s$size" # Build complete option list
lp -P$PRINTER $options $filename # Output print file to printer

Here we see a while loop, exiting when no more parameters are found on input line, enclosing a case statement. The case statement repeatedly tests $1 against a list of possible matches indicated by the right parentheses. The star (*) at the end is the default case and will match anything left over. When a match is found, the list of commands following the right parentheses are executed up to the double semi-colon. In each of these lists, there is a shift statement which shifts the input parameters one place left (so $2 becomes $1 etc.), allowing the next parameter to be tested on the next pass through the loop. In the case of the "-s" parameter, an extra following argument is expected, the size value, which is why the shift instruction contains the additional argument 2 (shifting the parameters 2 spaces left). This effectively allows the processing of all the passed arguments in any order and includes an exit for an invalid parameter condition via the star match. The if statement at the end checks if the size parameter has been set then uses the bc command to set either size or lines accordingly. When complete, the final options are created and passed to the lp command to print the file.

The parent and sub-shell structure:

Then there are two easy ones the ( list ) and { list; } constructs which simply execute the whole list of commands in a separate sub-shell ( ) or in the parent shell { } with a note that the blanks between the { } are mandatory.

The function structure:

Lastly in the complex command section we come to what is probably the most underused but most useful construct for serious scripters. The function definition. The syntax is deceptively simple which I guess is what leads most users to assume it's not worth learning about. How wrong they are. Just take a look at the example below to see what I mean.

Example function syntax

echo $1 | tr 'abcdefghijklmnopqrstuvwxyz' \

This is a very simple function called i_upper_case , you can probably guess what it does. The backslash at the end of the echo line is a UNIX feature that allows a command line to be continued on the next line. It tells the system to ignor the next character - in this case the newline. Note that it gets its input argument from a passed parameter ($1). So to make use of this function within a script you simply need to call it with an argument as follows:

i_upper_case "fred"


i_upper_case $name

And you will get back FRED in either case. A more appropriate usage would be something like:

large_name=`i_upper_case "$small_name"`
echo "Large Name = [$large_name]"

Which allows the case to be changed and put into a new variable. The advantage of doing this at all is that you don't have to re-code the same thing over again when you want to use the feature several times within the script. Note the use here of the double quotes around the variables to the right of the equal signs - this is to preserve any blanks within the strings which would otherwise be treated as argument separators and hence the function would only process the first argument in the list. What this means is:

small_name="fred smith"
large_name=`i_upper_case "$small_name"` # Quoted parameter
echo "Large Name = [$large_name]"

Will display FRED SMITH, whereas:

small_name="fred smith"
large_name=`i_upper_case $small_name` # Unquoted parameter
echo "Large Name = [$large_name]"

Will display FRED only. This bug can be traced back to the function definition which only reads in the $1 parameter. Changing this to read the $@ parameter would correct the bug for this function. But beware, this type of fix would not be appropriate in all situations. Try and think generically when creating functions and make them as useful as possible in all scenarios.

There are two very basic rules to remember when dealing with functions:

  1. You cannot use a function until it is defined. Thus all function definitions should appear either at the top of the script or in a start-up file such as ~/.profile.
  2. Functions can be nested to any depth, as long as the first rule is not violated.

At the end of the complex command section there is a reminder message that all of the keywords used in these complex commands are reserved words and not therefore available as variable names. This means that you can screw up any UNIX command by using it as a variable but you cannot screw up a complex shell reserved word.

/usr/bin/user/my_echo "$@"

Is perfectly okay as a function definition and the sh will happily use your echo function whenever an echo command is required within the script body.

/usr/bin/user/my_while "$@"

Is not okay and the function definition will fail at runtime.

Special Commands:

The following are a set of special commands which the shell provides as stand alone statements. Input and output redirection is permitted for all these commands unlike the complex commands. You cannot redirect the output from a while loop construct, only the simple or special commands used within the loop list.

  • The colon ( : ) does nothing! A zero exit code is returned. Can be used to stand in for a command but I must admit not to finding a real use for this command.
  • The dot ( .   filename) reads in commands from another file (See Startup Files & Environment for details). If the filename following the dot is not in the current working directory, then the shell searches along the PATH variable looking for a match. The first match that is found is the file that is used. The file is read into the shell and the commands found are executed within the current environment.
  • The break ( break [ n ] ) command causes an exit from inside a for or while loop. The optional n indicates the number of levels to break out from - the default is one level. Although not stated in the syntax rules, I have used this statement in an if then else fi construct to good effect in Simple Utility Functions where it causes an exit from the function but does not cause an exit from the calling script.
  • The continue ( continue [ n ] ) command resumes the next iteration of the enclosing for or while loop at the [ optional nth ] enclosing loop. Can't say I've used this one either.
  • The cd ( cd [ argument ] ) command is the the change directory command for the shell. The directory is specified with argument which defaults to HOME. The environment variable CDPATH is used as a search path for directories specified by argument.
  • The echo ( echo [ argument ] ) command is the shell output statement. See the man pages for echo(1) for full details.
  • The eval ( eval [ argument ] ) command reads the arguments into the shell and then attempts to execute the resulting command. This allows pre-emptive parameter substitution of hidden parameters or commands.
  • The exec ( exec [ argument ] ) command reads in the command specified by arguments and executes them in place of this shell without creating a new process. Input an output arguments may appear and, if no others are given, will cause the shell input and or output to be modified.
  • The exit ( exit [ n ] ) command causes a shell to exit with the exit status specified by the n parameter. If the n parameter is omitted, the exit status is that of the last executed command within the shell.
  • The export ( export [ variable ] ) command we have already met and is the command which makes shell variables global in scope. Without a variable, export will list currently exported variables.
  • The getopts command is provided to support command syntax standards - see getopts(1) and intro(1) man pages for details.
  • The hash ( hash [ -r ] [ name ] ) command remembers the location in the search path (PATH variable) of the command name. The option -r causes the shell to forget the location of name. With no options the command will list out details about current remembered commands. This has the effect of speeding up access to some commands.
  • The newgrp ( newgrp [ argument ] ) command is equivalent to exec newgrp argument. See newgrp(1M) for usage and description. The newgrp command logs a user into a new group by changing a user's real and effective group ID. The user remains logged in and the current directory is unchanged. The execution of newgrp always replaces the current shell with a new shell, even if the command terminates with an error (unknown group).
  • The pwd ( pwd ) command literally prints the current working directory. Usually used to set the CWD variable internally.
  • The read ( read name ) command will be seen in several examples. It allows the shell to pause and request user input for the variable name, which is then accepted as the variables value.
  • The readonly ( readonly [ name ] ) command sets a variable as imutable. Once named in this command they cannot be reassigned new values.
  • The return ( return [ n ] ) command causes a function to exit with the return value n. If the n is omitted, the return value is the exit status of the last command executed within the function. Unlike exit this does not result in termination of the calling script.
  • The shift ( shift [ n ] ) command causes the positional parameters to be moved to the left ($2 becomes $1, etc.) by the value of n, which defaults to one.
  • The test command is used to evaluate conditional expressions. See the man pages for test(1) for full details and usages.
  • The times command prints the accumulated user and system times for processes run from the shell.
  • The trap ( trap [ argument ] [ n ] ) command allows conditional execution of the commands contained within argument dependant on the shell receiving numeric or symbolic signal(s) n.
  • The type ( type [ name ] ) command indicates how name would be interpreted if used as a command name.
  • The ulimit and umask commands exist in their own right as UNIX commands. See man pages.
  • The unset ( unset [ name ] ) command allows names to be unset. This removes the values from the variable or function. The names PATH, PS1, PS2, MAILCHECK, and IFS cannot be unset.
  • The wait ( wait [ n ] ) command waits for the background process n to terminate and report its termination status; where n is the process id. With no arguments, all current background processes are waited for.

Most of these special commands get used somewhere in this book and more detailed explanations will follow at that time.

Comment structure:

The next thing on my systems man page is a reference to the hash (#) comment character. It states that any word beginning with # causes that word and all the following characters up to a newline to be ignored. There are no notes about the first line exceptions that I gave in The Basic Shells when we were dealing with shell indicators (The #! sequence)

Sample .profile

set -o vi


export JAVA_HOME



export PATH


alias nmon='/usr/local/bin/nmon'
alias dom1logs='cd /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs'
alias dom1tail='tail -f /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs/server.log'
alias dom1more='more /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs/server.log'