Lab 3: File I/O and Lists

The purpose of this lab is to learn how to examine text documents, extract and display information in Tcl/Tk. We'll do more analysis of the log file we used in the previous two labs, but now do all the work within a Tcl/Tk application. This will reduce the time to examine a file from minutes to seconds.

These game labs cover similar material to what we discussed in class.

CS146GameLab-7
CS146GameLab-8
CS146GameLab-10

You can look at CS146GameLab-9 as well, but we won't be using any of the sounds in the class.

  1. A list is a collection of elements that are grouped with curly braces and have no unescaped illegal characters as elements.

    You can create lists in several ways. This example shows similar lists being created using quotes, curly braces and the list command.

    
      set list1 "This list is grouped with quotes"
      set list2 {This list is grouped with curlies}
      set list3 [list This list is created with list]
      set list4 [split "this,list,is,created,by,splitting,on,commas" ,]
    
      label .l_string -text "String has [llength $list1] elements"
      label .l_curly -text "Braces has [llength $list2] elements"
      label .l_brace -text "list command has [llength $list3] elements"
      label .l_split -text "join command has [llength $list4] elements"
      
      grid .l_string -column 1 -row 1 
      grid .l_curly  -column 1 -row 2
      grid .l_brace  -column 1 -row 3 
      grid .l_split  -column 1 -row 4 
    

    If you type in this code and run it, it should look like this:

    Rule of Thumb: For simple lists that you control, create the list with braces. For lists that your program loads from an external source, use the split, or list commands to convert the string to a list.

    The lindex command will return the element of a list from a given location. The first element of a list is the 0'th element.

    The command lindex {a b c} 1 would return b.

    Add 4 more labels to the code above to show the 6'th element of each list. The new result should look like this:

    solution

  2. Buttons let a user tell your program to do something.

    The button command supports several options. The two most important are:

    -text Text to display in the button
    -command The command to evaluate when the button is clicked.

    Here is a simple button, list and label program:

    
      set lst {}
      label .showLst -textvar lst
      button .addToLst -text "add a word" -command "lappend lst element"
      grid .showLst -row 1 -column 1
      grid .addToLst -row 2 -column 1
    

    Type this in and click the button a few times.

    Modify the program to put a number in the list and to change the number each time the button is pressed.

    You'll need to use two commands in the button's -command option. You can do that with the semicolon.

    The results should look like this after you've clicked the button several times:

    solution

  3. The split command will convert a string of data into a list. By default, it splits the string into a list wherever there is whitespace (space, newline or tab):

    
      set lst [split "this is a string"]
      puts [lindex $lst 1]
    is
    

    You can tell the split command to split on a given character instead of a whitespace character.

    This example splits the list on commas instead of spaces:

    
      set lst [split "aa,bb,cc,dd" ,]
      puts [lindex $lst 1]
    bb
    

    We can split on the newline character by splitting on \n. The \n is a special backslash sequence that Tcl knows means a newline marker.

    In this example, the entire line is a list element.

    
      set txt {this is line 1
    this is line 2
    this is line 3
    this is line 4
    this is line 5
    }
      set lst [split $txt \n]
      puts [lindex $lst 1]
    this is line 2
    

    One common paradigm for processing files is to open the file, read the data and then use split to convert it to a list.

    The code sample below

    
    # Request a file name.  Assume success.
    set fileName [tk_getOpenFile]
    set if [open $fileName r]
    set data [read $if]
    close $if
    
    # Initialize the variables to empty strings.
    set lineCount {}
    set wordCount {}
    set charCount {}
    
    # Create a label to describe the value on this line,
    # And then a label to hold the value.
    label .lLine -text "Line Count"
    label .lLineVal -textvariable lineCount
    
    # Add a button to calculate a value to display
    button .bCountLines -text "Lines" -command {set lineCount [llength [split $data \n]]}
    
    # Grid the three widgets on the top line.
    grid .lLine -row 1 -column 1
    grid .lLineVal -row 1 -column 2
    grid .bCountLines -row 1 -column 3
    
    # Add new code here
    
    

    The split command can split a set of data into individual elements on any character or set of characters. The example above splits the data into a list where each list element is a line in the file.

    If split is told to split on an empty string ( {} ) it makes a list in which each character is a separate list element.

    Extend the code above so that it will generate a GUI that looks like this when you select the messages.1 file.

    solution

  4. The if command can check mathematic comparisons (if {$number > 10} {#do something}) or string comparisons (if {$string eq "yes"} {#do something}).

    Whenever a user provides input, the program should check to see if it is a legitimate value.

    The example above assumes that the user will select a valid file.

    When someone clicks the Cancel button on the file selector, tk_getOpenFile returns an empty string as the file name.

    Add the lines to the previous code to check whether tk_getOpenFile returns an empty string and exit when that happens.

  5. The lassign command is useful to extract several pieces of data at a time from a list. The lassign command takes a list and then variable names to assign elements from the list to. This is similar to selecting list elements one at a time with the lindex command.

    The code below compares using lindex and lassign to take the 3 elements from the list a b c and assign them to the variables aVar, bVar and cVar.

    
    set lst {a b c}
    
    # Move values from list to individual variables with lindex:
    set aVar [lindex $lst 0]
    set bVar [lindex $lst 1]
    set cVar [lindex $lst 2]
    
    # Move values from list to individual variables with lassign
    lassign $lst aVar bVar cVar
    

    We can use the lassign command to extract values from a line in a log file with this code:

    
    set if [open messages.1 r]
    set data [read $if]
    close $if
    
    foreach line [split $data \n] {
      lassign $line mon day time sys cmd
      if {$day ne ""} {
        puts "On the $day'th day of $mon at $time a $cmd message was received"
      }
    }
    

    The lappend command appends a list element to a list. If you give it a variable name that hasn't been defined, it creates a new list with a single element (the value you are appending to the list.)

    A list can be a list of numbers - for instance a list that's 31 elements long could be the number of messages received on each day of the month.

    We can fill such a list with code like this

    
    # Make an empty list of counts, one 0 for each day of the month.
    for {set i 0} {$i <= 31} {incr i} {
      lappend countList 0
    }
    

    Use these code fragments and some labels to show the number of messages received on each day. The results should resemble this:

  6. When you receive data from an outside source (like a log file), you should validate the data before you try to use it.

    We can split a timestamp into hours, minutes and seconds by using the split command to convert hh:mm:ss into a list of numbers like this:

    
    foreach line [split $data \n] {
      lassign $line mon day time sys cmd
      lassign [split $time :] hr min sec
    }
    

    Like several other languages, Tcl considers any number with a leading 0 to be an octal (rather than decimal) value. The valid octal numbers are 00, 01, 02, 03, 04, 05, 06, and 07.

    The hour values include 08 and 09, which are illegal.

    The scan command can be used to examine textual data and convert it to valid integer data.

    
    foreach v {01 02 03 04 05 06 07 08 09} {
      scan $v %d v2
      puts "Read: $v scanned: $v2"
    }
    

    We can use the same variable to receive the new cleaned data.

    
    foreach line [split $data \n] {
      lassign $line mon day time sys cmd
      lassign [split $time :] hr min sec
      scan $hr %d hr
    }
    

  7. Modify the previous code to look for the number of messages in each hour instead of each day. The results should look like this:

    solution

  8. That shows the total number of messages.

    A more interesting piece of information is how many distinct messages of a certain type.

    We can use the lsearch command on each line in the file to see which ones are sshd messages.

    Just as we can split the time on a colon (:), we can split a command name like sshd[12345] on the left square bracket to convert it into a list of name and PID value. Notice that the square bracket needs to be escaped. Try the line without the backslash to see the error message.

    
        lassign [split $cmd \[] name pid   
    

    Modify the previous code to show the number of distinct sshd at given times.

    solution

  9. A number is useful, but it's faster to see patterns if they are displayed graphically. The histogram is a good method for displaying quantities,

    There are lots of ways to create graphs and histograms in Tcl/Tk, but a simple one is to use a label with an options we didn't discuss in the lecture.

    The -bg option lets you set the background color of a label. You can use this to quickly show good/bad status in a GUI by setting the background to green or red.

    Syntax: label.windowName ?-option value?

    -bg color Set the background for the label

    This code creates a strings of periods to use as the text in a label, and then creates the label with black dots on a black background.

    
      set winNum 0
      foreach count {1 1 2 3 5 8 13 21 34 55 89} {
        set string ""
        for {set i 0} {$i < $count} {incr i} {
          append string .
        }
        label .lcount_$winNum -text $count
        label .lhist_$winNum -text $string -bg black
        grid .lcount_$winNum -row $winNum -column 1 -sticky w
        grid .lhist_$winNum  -row $winNum -column 2 -sticky w
        incr winNum
      }
    

    The image it creates looks like this:

    Use a similar technique to modify the previous code to create a display like this:

    Note that at 1:00 PM there are almost 2000 hits. If you put a period for every message, it will flow off your screen. The expr command is your friend for something like this.

    solution

  10. Using nested loops, lsearch and split to determine what types of messages and what dates are in the message file we can generate a report like this

    This is the same information that we collected in the previous lab. Using the exec command generating this report took about a minute, even when optimized it with temporary files.

    A pure Tcl version using the list commands creates a display in about 2 seconds.

    No solution for this one. You've got all the commands you need to write this application on your own.

    Copyright Clif Flynt 2009