Reference
Back in the man pages the next section is called USAGE and goes
on to talk about pipelines and lists. Most of what
it says here can be understood by any UNIX user so I will skip
this for now but there will be some examples later showing various
implementations of these definitions. The issue I want to deal
with next is the simple, complex and special commands. This is
nowhere near as bad as it sounds.
Simple commands are just straight UNIX commands that exist regardless
of the surrounding shell environment. Like our old favourites
ls -l or df -al or lpr -Pprinter filename.
There are large numbers of commands that fall into this category
but the following list is a selection of the more useful when
scripting.
- sort Sorts lines in ascending, descending and unique order
- grep Searches for regular expressions in strings or files
- basename Strips the path from a path string to leave just the filename
- dirname Removes the file from a path string to leave just the pathname
- cut Chops up a text string by characters or fields
- wc Count the characters, words, or lines
- [ (test) ] Predicate or conditional processor
- tr 'a' 'b' Transform characters
- expr Simple arithmetic processor
- bc Basic Calculator
- eval Evaluate variables
- echo Output strings
- date Create date strings
- nawk Manipulate text strings
- head | tail Access lines in files
Some of the above commands can be very complex indeed, especially
when assembled into pipelines and lists. However, these are still
referred to as simple commands - presumably because they stand
alone. Take a close look at the man pages for all of the above
commands, you will find them invaluable during your scripting
sojourn.
Complex commands are just the shells internal commands which are
used to group simple commands into controlled sets based on your
requirements. These include the loop constructs and conditional
test structures. These cannot stand alone. An if requires
a then and a fi at the very least. Lets take a look
at the man pages again at this point.
It says on my systems man page for name
[ in word ... ] do list done
as a syntax description of the for command construct. Well,
it is correct but does not really show the layout of the command
at all. Look at the example below and you can see straight away
what is supposed to happen.
alphabet="a b c d e" # Initialise a string count=0 # Initialise a counter for letter in $alphabet # Set up a loop control do # Begin the loop count=`expr $count + 1` # Increment the counter echo "Letter $count is [$letter]" # Display the result done # End of loop
So in plain English, for each letter found in alphabet
loop between do and done and
process the list of commands found. Lets take this one
line at a time from the top. This is the way the sh likes
to have its variables set. There is no leading word as
in the csh (set) just start with the variable name.
There are also no blanks either side of the equal sign. Indeed,
if you put a blank in, the shell will give you an error message
for your trouble. This also gives rise to the difference between
the top two lines in this example. Because I want to include spaces
in my string for alphabet, I must enclose the whole string
in double quotes. On the next line this is not required
as there are no embedded blanks in the value of count.
When setting variables, no blanks are allowed. Everywhere
else, sh loves blanks.
In line 3 the for statement creates a loop construct
by selecting the next letter from alphabet each
time through the loop and executing the list found between
the do and the done for each letter. This
process also strips away any blanks (before and after)
each letter found in alphabet . The do and
done statements are not executed as such, they simply mark
the beginning and end of the loop list. They are however
a matched pair, leave one out and the shell will complain.
Inside the loop are two simple commands (apparently!).
The first one just increments the loop counter by adding
one to its current value. Note the use of the back-quote
here to force the execution of the expr command
before setting the new value of count. There will be more
about this later.
The next line is something we have seen before, just a display
command showing the values of the variables. Note the use of the
$ symbol to request the value of the variables.
There is another similarly structured command in the sh
called while. Its syntax structure is listed as while
list do list done which you should
now be able to translate yourself into something that looks like the
example below.
alphabet="a b c d e" # Initialise a string count=0 # Initialise a counter while [ $count -lt 5 ] # Set up a loop control do # Begin the loop count=`expr $count + 1` # Increment the counter position=`bc $count + $count - 1` # Position of next letter letter=`echo "$alphabet" | cut -c$position-$position` # Get next letter echo "Letter $count is [$letter]" # Display the result done # End of loop
Most of this is the same construct, I have just replaced the for
loop set-up with its equivalent while syntax. Instead of
stepping through the letters in alphabet, the loop
control now monitors the size of the count with [
$count -lt 5]. The -lt flag here represents
less-than and is part of the UNIX test command,
which is implied by the square brackets. Any other command,
list or variable could be put here as long as its substituted
value equates to an integer. A zero value will exit the loop,
anything else and the loop will continue to process. From the
above you can work out that test returns 1 for true
and 0 for false. Have a look at the man pages for test
at this point, you will find it a very useful command with great
flexibility.
Next in complexity is if list then
list [ elif list then list
] ... [ else list ] fi, or the if construct.
What does that lot mean? Well usually if statements in
any language are associated with predication and so as you would
expect there is some more implied use of the UNIX test
command. Lets generate an example to see the structure in a more
usual form. The square brackets in the echo statement have
no relevance other than to clarify the output when executed (See
- Debugging). However, the square
brackets in the if and elif lines are mandatory to
the structure.
if [ -f $dirname/$filename ] then echo "This filename [$filename] exists" elif [ -d $dirname ] then echo "This dirname [$dirname] exists" else echo "Neither [$dirname] or [$filename] exist" fi
You can see here more examples of what test can do. The
-f flag tests for existence of a plain file, while
-d tests for existence of a directory. There is
no limit (that I can discover) to the number of elif's
you can use in one if statement. You can also stack up
the tests into a list using a double pipe or double
ampersand as in Example complex if syntax
below. Here the use of the double
pipe (||) is the syntax for a logical or whereas
the double ampersand (&&) is the logical
and.
if [ -f $dir/$file ] || [ -f $dir/$newfile ] then echo "Either this filename [$file] exists" echo "Or this filename [$newfile] exists" elif [ -d $dir ] then echo "This dirname [$dir] exists" else echo "Neither [$dir] or [$file or $newfile] exist" fi
In the sh if construct it is important to put the then
word on its own line or sh will complain about an invalid
test. Also important is the blank inside each end
of the test. Without this the test will generate
a syntax error - usually "test expected!" which is a
bit meaningless.
Next is the case word in [ pattern
[ pattern ] ... ) list ;; ] esac
which is probably the most complicated construct to decode from
the simple syntax listed above. It is a bit like a multi-line
if statement linked with logical or symbols (||).
It is commonly used to process a list of parameters passed into
a script as arguments when the actual parameters could be in any
order or of any value. The layout is shown in 8.2.4.1 below, which
is a section from a print script.
size=0 # Default Char Point Size (!) page=660 # Default Page Point Size while [ "$1" != "" ] # When there are arguments... do # Process the next one case $1 # Look at $1 in -l) lines=47; # If it's a "-l", set lines page=470; # Set the Landscape Page Point options="$options -L -l"; # Set the Landscape Options shift;; # Shift one argument along -p) lines=66; # If it's a "-p", set lines options="$options -l"; # Set the Portrait Options shift;; # Shift one argument along -s) size=$2; # If it's a "-s", set size shift 2;; # Shift two arguments along *) echo "Option [$1] not one of [p, l, s]"; # Error (!) exit;; # Abort Script Now esac if [ $size = 0 ] # If size still un-set... then size=`echo "$page / $lines" | bc` # Set from pages over lines else # or lines=`echo "$page / $size" | bc` # Set lines fi done options="$options$lines -s$size" # Build complete option list lp -P$PRINTER $options $filename # Output print file to printer
Here we see a while loop, exiting when no more parameters
are found on input line, enclosing a case statement. The
case statement repeatedly tests $1 against a list
of possible matches indicated by the right parentheses. The star
(*) at the end is the default case and will match anything
left over. When a match is found, the list of commands following
the right parentheses are executed up to the double semi-colon.
In each of these lists, there is a shift statement which
shifts the input parameters one place left (so $2 becomes
$1 etc.), allowing the next parameter to be tested on the
next pass through the loop. In the case of the "-s"
parameter, an extra following argument is expected, the size
value, which is why the shift instruction contains the
additional argument 2 (shifting the parameters 2 spaces left).
This effectively allows the processing of all the passed arguments
in any order and includes an exit for an invalid parameter condition
via the star match. The if statement at the end checks
if the size parameter has been set then uses the bc command
to set either size or lines accordingly. When complete,
the final options are created and passed to the lp
command to print the file.
Then there are two easy ones the ( list ) and
{ list; } constructs which simply execute the whole
list of commands in a separate sub-shell (
) or in the parent shell { } with a note that the
blanks between the { } are mandatory.
Lastly in the complex command section we come to what is probably
the most underused but most useful construct for serious scripters.
The function definition. The syntax is deceptively simple which
I guess is what leads most users to assume it's not worth learning
about. How wrong they are. Just take a look at the example below
to see what I mean.
i_upper_case() { echo $1 | tr 'abcdefghijklmnopqrstuvwxyz' \ 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' }
This is a very simple function called i_upper_case
, you can probably guess what it does. The backslash at
the end of the echo line is a UNIX feature that allows a
command line to be continued on the next line. It tells the system
to ignor the next character - in this case the newline.
Note that it gets its input argument from a passed parameter ($1).
So to make use of this function within a script you simply need to call
it with an argument as follows:
i_upper_case "fred"
or
name="fred" i_upper_case $name
And you will get back FRED in either case. A more appropriate
usage would be something like:
small_name="$input_argument" large_name=`i_upper_case "$small_name"` echo "Large Name = [$large_name]"
Which allows the case to be changed and put into a new variable.
The advantage of doing this at all is that you don't have to re-code
the same thing over again when you want to use the feature several
times within the script. Note the use here of the double quotes
around the variables to the right of the equal signs - this is
to preserve any blanks within the strings which would otherwise
be treated as argument separators and hence the function would
only process the first argument in the list. What this means is:
small_name="fred smith" large_name=`i_upper_case "$small_name"` # Quoted parameter echo "Large Name = [$large_name]"
Will display FRED SMITH, whereas:
small_name="fred smith" large_name=`i_upper_case $small_name` # Unquoted parameter echo "Large Name = [$large_name]"
Will display FRED only. This bug can be traced back
to the function definition which only reads in the $1
parameter. Changing this to read the $@ parameter
would correct the bug for this function. But beware, this
type of fix would not be appropriate in all situations. Try and
think generically when creating functions and make them as useful
as possible in all scenarios.
There are two very basic rules to remember when dealing with functions:
- You cannot use a function until it is defined. Thus all function
definitions should appear either at the top of the script or in
a start-up file such as ~/.profile.
- Functions can be nested to any depth, as long as the first
rule is not violated.
At the end of the complex command section there is a reminder
message that all of the keywords used in these complex commands
are reserved words and not therefore available as variable names.
This means that you can screw up any UNIX command by using it as
a variable but you cannot screw up a complex shell reserved word.
echo() { /usr/bin/user/my_echo "$@" }
Is perfectly okay as a function definition and the sh will
happily use your echo function whenever an echo
command is required within the script body.
while() { /usr/bin/user/my_while "$@" }
Is not okay and the function definition will fail at runtime.
The following are a set of special commands which the shell provides
as stand alone statements. Input and output redirection is permitted
for all these commands unlike the complex commands. You cannot
redirect the output from a while loop construct, only the
simple or special commands used within the loop list.
- The colon ( : ) does nothing! A zero exit
code is returned. Can be used to stand in for a command but I
must admit not to finding a real use for this command.
- The dot ( . filename) reads in commands
from another file (See
Startup Files & Environment for details).
If the filename following the dot is not in the current
working directory, then the shell searches along the PATH
variable looking for a match. The first match that is found is
the file that is used. The file is read into the shell and the
commands found are executed within the current environment.
- The break ( break [ n ] ) command
causes an exit from inside a for or while loop.
The optional n indicates the number of levels to break
out from - the default is one level. Although not stated in the
syntax rules, I have used this statement in an if then else
fi construct to good effect in
Simple Utility Functions where it causes an exit
from the function but does not cause an exit from the calling
script.
- The continue ( continue [ n ]
) command resumes the next iteration of the enclosing for
or while loop at the [ optional nth ] enclosing
loop. Can't say I've used this one either.
- The cd ( cd [ argument ] ) command
is the the change directory command for the shell. The
directory is specified with argument which defaults to
HOME. The environment variable CDPATH is used as
a search path for directories specified by argument.
- The echo ( echo [ argument ] ) command
is the shell output statement. See the man pages for echo(1) for
full details.
- The eval ( eval [ argument ] ) command
reads the arguments into the shell and then attempts to
execute the resulting command. This allows pre-emptive parameter
substitution of hidden parameters or commands.
- The exec ( exec [ argument ] ) command
reads in the command specified by arguments and executes them
in place of this shell without creating a new process. Input an
output arguments may appear and, if no others are given, will
cause the shell input and or output to be modified.
- The exit ( exit [ n ] ) command causes
a shell to exit with the exit status specified by the n
parameter. If the n parameter is omitted, the exit status
is that of the last executed command within the shell.
- The export ( export [ variable
] ) command we have already met and is the command which makes
shell variables global in scope. Without a variable,
export will list currently exported variables.
- The getopts command is provided to support
command syntax standards - see getopts(1) and intro(1) man pages
for details.
- The hash ( hash [ -r ] [ name
] ) command remembers the location in the search path (PATH
variable) of the command name. The option -r causes
the shell to forget the location of name. With no options
the command will list out details about current remembered commands.
This has the effect of speeding up access to some commands.
- The newgrp ( newgrp [ argument
] ) command is equivalent to exec newgrp
argument. See newgrp(1M) for usage and description.
The newgrp command logs a user into a new group by changing
a user's real and effective group ID. The user remains
logged in and the current directory is unchanged. The execution
of newgrp always replaces the current shell with a new
shell, even if the command terminates with an error (unknown
group).
- The pwd ( pwd ) command literally prints
the current working directory. Usually used to set the CWD
variable internally.
- The read ( read name ) command will
be seen in several examples. It allows the shell to pause and
request user input for the variable name, which is then
accepted as the variables value.
- The readonly ( readonly [ name
] ) command sets a variable as imutable. Once named in this command
they cannot be reassigned new values.
- The return ( return [ n ] ) command
causes a function to exit with the return value n. If the
n is omitted, the return value is the exit status of the
last command executed within the function. Unlike exit
this does not result in termination of the calling script.
- The shift ( shift [ n ] ) command
causes the positional parameters to be moved to the left ($2
becomes $1, etc.) by the value of n, which defaults
to one.
- The test command is used to evaluate conditional
expressions. See the man pages for test(1) for full details and
usages.
- The times command prints the accumulated
user and system times for processes run from the shell.
- The trap ( trap [ argument ] [ n
] ) command allows conditional execution of the commands contained
within argument dependant on the shell receiving numeric
or symbolic signal(s) n.
- The type ( type [ name ] ) command
indicates how name would be interpreted if used as a command name.
- The ulimit and umask
commands exist in their own right as UNIX commands. See man pages.
- The unset ( unset [ name ] ) command
allows names to be unset. This removes the values from
the variable or function. The names PATH, PS1,
PS2, MAILCHECK, and IFS cannot be unset.
- The wait ( wait [ n ] ) command waits
for the background process n to terminate and report its
termination status; where n is the process id. With
no arguments, all current background processes are waited for.
Most of these special commands get used somewhere in this book
and more detailed explanations will follow at that time.
The next thing on my systems man page is a reference to the hash
(#) comment character. It states that any
word beginning with # causes that word and all the
following characters up to a newline to be ignored. There are
no notes about the first line exceptions that I gave in
The Basic Shells when
we were dealing with shell indicators (The
#! sequence) Sample .profile set -o vi
JAVA_HOME=/usr/java5_64 export JAVA_HOME
##PATH=JAVA_HOME/bin:$PATH
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:$HOME/bin:/usr/bin/X11:/sbin:. PATH=$PATH:/usr/java5_64/bin:/u01/home/oracle/10.1.0.2/bin
export PATH
ORACLE_HOME=/u01/home/oracle/10.1.0.2
export ORACLE_HOME PS1='EINDEXT $PWD > ' alias nmon='/usr/local/bin/nmon' alias dom1logs='cd /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs' alias dom1tail='tail -f /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs/server.log' alias dom1more='more /u01/home/mdmt/JavaCAPS6/appserver/domains/domain1/logs/server.log'
|