Symbol table

Symbols

That’s pretty much everything for hashes, except for one topic usually missed out from introductory tutorials (possibly rightly!) This post will tell you a little about the innards of what you’ve been doing when you create variables. It’s not really necessary to know this stuff to be able to use Perl for day-to-day stuff, so do feel free to skip to the next post if this one becomes too esoteric.

perl maintains its own internal hash, called the symbol table, or %main:: (that’s ‘hash main double colon’), which you also have access to:

#!/usr/bin/perl
# use strict; # turn off strictures, for reasons we'll come to in a minute
use warnings;

$pibble = 2;
@foo    = ( 1, 4 );
%bits   = ( me => 'tired' );
sub my_sort { return ( $a cmp $b ) }

foreach ( sort keys %main:: ) {
    print "This perl program has a symbol called $_.\n";
}
This perl program has a symbol called STDIN.
This perl program has a symbol called pibble.
...

This program will print stuff about the ‘symbols’ perl has defined for you, and the symbols you have created. Somewhere you will find pibble, foo, bits and my_sort. You’ll also find a lot of other things, including STDIN, the name of the standard input filehandle, and a and b (as in $a and $b).

Typeglobs

The symbol table is just a hash, with the rather atypical name %main::, and that program simply printed out the keys of that hash. If you want to see the values, you’ll have to be acquainted with Perl’s final, and most esoteric data type, the typeglob, and another type of scoping besides my. Arrays have @, scalars have $, and typeglobs have *. The typeglob *foo, contains the definitions of $foo, @foo, %foo, and the subroutine sub foo (which is called &foo : subs get & as their sigil) all rolled into one. Try this program out:

#!/usr/bin/perl
# use strict;
# use warnings; # turn warnings off too

# define some things
$pibble = 2;
@foo    = ( 1, 4 );
$foo    = 'bar';
%foo    = ( key => 'value' );
%bits   = ( me => 'tired' );
sub my_sort { return ( $a cmp $b ) }

print "This program contains...\n";

while ( my ( $key, $value ) = each %main:: ) {
    # iterate over the key/value pairs of the symbol table hash

    local *symbol = $value;
    # this assigns the value from the symbol table to a typeglob

    # the following lines look to see if the typeglob contains 
    # a $, %, @ or & definition

    if ( defined $symbol ) {
        print "a scalar called \$$key\n";
            # remember \$ is just an escaped $ ...
            # followed by the contents of variable $key
    }
    if ( defined @symbol ) {
        print "an array called \@$key\n";
    }
    if ( defined %symbol ) {
        print "a hash called \%$key\n";
    }
    if ( defined &symbol ) {
        print "a subroutine called $key\n";
    }
}
a hash called %ENV
a scalar called $pibble
a scalar called $_
a hash called %UNIVERSAL::
a scalar called $foo
an array called @foo
a hash called %foo
a scalar called $$
...

The values from the symbol table hash are typeglobs, looking something like *main::foo, *main::ENV, *main::_ , etc. If you create your own local typeglob, *symbol, to contain one of these values from the symbol table, you can look to see if the various sub-types (scalar, array, etc.) are defined using $symbol, @symbol, %symbol and &symbol. So, as the loop runs through the $key, $value pairs from the symbol table, $value will at some point contain *main::foo. So:

local *symbol = $value;

creates a typeglob *symbol containing the definitions of symbols called main::foo, and

if ( defined %symbol )

will ask ‘is there a hash in the symbol table called %main::foo?’.

The main:: bit means that we’re looking at symbols from the ‘main’ symbol table. A program can use more than one symbol table: we’ll get onto this when we talk about packages and modules later: the main package and symbol table is simply the one that perl assumes your program is using if you don’t set it explicitly.

local variables

There is one final complication. Try sticking a my on any of the variables you’ve defined, like $foo, and run the program. You’ll find they suddenly disappear from the symbol table. What on earth is happening? Well, the dirty secret is that perl actually has two completely independent sets of variables: one set introduced with Perl 5, and a legacy set that harks back to the days of Perl 4. Those that you create without a my, are Perl’s old-style global or package variables, which live in the symbol table, and are extractable with typeglobs. This always includes all subroutine definitions anywhere, as you can’t use my on these. These variables are global, and any program using your code can access them. Even if they’re defined somewhere other than main, e.g. in a different package like File::Find, all you need to mess with them is to know the package to which they belong (here File::Find), the name of the variable ($dir) and you can modify them:

$File::Find::dir = "plopsy";

to probably fatal effect. The reason these package variables were supplemented with my variables in Perl 5 was because there was no way to make package variables truly private to a subroutine. There was no my in Perl 4, and you had to use a thing called local, which you’ve seen above with a typeglob, to create temporary dynamically scoped (as opposed to lexically scoped my) variables:

#!/usr/bin/perl
use strict;
use warnings;

$variable = "hello";
print "\$variable is $variable in the body.\n";
temporary();
print "\$variable is still $variable in the body.\n";

sub temporary {
    local $variable = "goodbye";
    print "\$variable is $variable in the temporary sub.\n";
}
$variable is hello in the body.
$variable is goodbye in the temporary sub.
$variable is still hello in the body.

This looks to have exactly the same effect as my would, but in fact we’re still talking about the same $variable, it just so happens that perl stashes away the original value when it hits the local, and replaces it when it returns to the body of the program. The symbol table entry is temporarily changed to its new value. In contrast, my creates a completely separate, fresh and unsullied variable with no relationship whatsoever to variables of the same name elsewhere in the program. To see the difference, if you called another subroutine from within temporary(), $variable would still be set to its temporary value of ‘goodbye’:

#!/usr/bin/perl
use strict;
use warnings;

$variable = "hello";
print "\$variable is $variable in the body.\n";
temporary();
print "\$variable is still $variable in the body.\n";

sub temporary {
    local $variable = "goodbye";
    print "\$variable is $variable in the temporary sub.\n";
    inner();
}

sub inner {
    print "\$variable is $variable in the inner sub.\n";
}
$variable is hello in the body.
$variable is goodbye in the temporary sub.
$variable is goodbye in the inner sub.
$variable is still hello in the body

In contrast, ‘lexically scoped’, my variables live in only a particular part (scope) of the program, and are completely inaccessible outside of it. Each new my $variable is a completely different $variable. They do not appear in any symbol table. If you were to put my instead of local:

#!/usr/bin/perl
use strict;
use warnings;

$variable = "hello";
print "\$variable is $variable in the body.\n";
temporary();
print "\$variable is still $variable in the body.\n";

sub temporary {
    my $variable = "goodbye";
    print "\$variable is $variable in the temporary sub.\n";
    inner();
}

sub inner {
    print "\$variable is $variable in the inner sub.\n";
}
$variable is hello in the body.
$variable is goodbye in the temporary sub.
$variable is hello in the inner sub.
$variable is still hello in the body.

You’ll see that the $variable in temporary() is now a completely different variable, isolated from the rest of the program, unrelated to the $variable in the body of the program, and certainly not accessible from inner() any more. inner() prints out the only $variable visible in its scope, which is the one in the body of the program.

You may well never have to use typeglobs, or the symbol table, or local in anger, but it’s nice to know how stuff works, rather than merely how to use stuff, hence this digression. Normal service will now be resumed.

Next up…files and directories.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.