TIDES Data Geo-Temporal-Located


world time zones


word count of TIDES text, plotted by world time zones and date

Process

Visualization process in italics.

Technical process in normal text.

The major challenges with the TIDES data were that the data was entirely text in emails, and that the customer wanted some kind of visualization of time and location information, gleaned from bylines.

I did this with almost no new coding. (I finished a small feature in the mkRingsScript program so I could use it.) Mostly I just hand-massaged data files and used existing tools.

  1. I picked a ten day period to work with, a sample large enough to see trends and understand the problems, yet still small enough to process by hand. To simplify the display I wanted to reduce the location data from 2 dimensions (latitude and longitude) to one dimension, and then further reduce it to a small number of categories. For this reason I chose to represent spatial location by the time zone. There are 24 time zones worldwide, each spanning 15 degrees of the 360 degree globe. Here is a table of all of the cities of origin of the messages I used:

    cities of origin

    
           Country      City                       TZ
           -----------  -------------------------  ------
           USA          Austin, TX                 -06:00
           Canada       Toronto, Ontario           -05:00
           USA          Boston, MA                 -05:00
           UK           London                     +00:00
           Nigeria      Lagos                      +01:00
           Egypt        Cairo                      +02:00
           Israel       Jerusalem                  +02:00
           Israel       Palestine                  +02:00
           Israel       Tel-Aviv                   +02:00
           Jordan       Amman                      +02:00
           Lebanon      Beirut                     +02:00
           Russia       Moscow                     +03:00
           Uganda       Kampala                    +03:00
           Yemen        Sana'a                     +03:00
           Iran         Tehran                     +03:30
           Ahghanistan  Kabul                      +04:30
           Pakistan     Islamabad                  +05:00
           Pakistan     Karachi                    +05:00
           Pakistan     Lahore                     +05:00
           Pakistan     Peshawar                   +05:00
           Uzbekistan   Tashkent                   +05:00
           India        Mumbai                     +05:30
           Indonesia    Jakarta                    +07:00
           China        Hong Kong                  +08:00
           Malaysia     Kuala Lumpur               +08:00
           Morocco      Rabat                      +08:00
           Philippines  Manila                     +08:00
           Japan        Tokyo                      +09:00
           Australia    Sydney                     +10:00
           

    I started with TIDES emails numbers 81 through 88, dated 24 Feb to 4 March 2002. By hand I cut and paste the text into files by date and time zone of publication. A typical filename would be: 2002.02.24_+00.txt, for text dated 24 Feb 2002 and from time zone Zulu+0, which is Greenwhich Mean Time (GMT) and the time zone of London, UK. The filename 2002.02.26_+04.txt is for text dated 26 Feb 2002 and from time zone Zulu+4, which includes Afghanistan. (Actually Afghanistan is at Zulu+4:30, but I have ignored the half-hour adjustments for simplicity.)

  2. Before I did anything more elaborate, I wanted to plot a simple, single varibale against date and time zone, to see if it made any visual sense. I chose total number of words, since it was easy to determine quickly and would probably be meaningful.

    Given the files I'd created, it was a simple matter to use the UNIX utility wc (word count) to count the words in each file, and then manually paste the resulting numbers into a 2-dimensional AVS field file. Here is the output from the command
    wc -w 2*.txt:
    
        993 2002.02.23_+02.txt
        504 2002.02.23_+03.txt
       2620 2002.02.23_+05.txt
       5507 2002.02.23_+08.txt
       1662 2002.02.24_+01.txt
       3433 2002.02.24_+02.txt
       1222 2002.02.24_+03.txt
       4084 2002.02.24_+05.txt
       5297 2002.02.24_+08.txt
        694 2002.02.24_+09.txt
       1753 2002.02.25_+00.txt
       3912 2002.02.25_+02.txt
       6322 2002.02.25_+03.txt
       1097 2002.02.25_+05.txt
        667 2002.02.25_+07.txt
       1460 2002.02.25_+08.txt
       2163 2002.02.25_-06.txt
        685 2002.02.26_+00.txt
       1111 2002.02.26_+02.txt
       2037 2002.02.26_+03.txt
       2134 2002.02.26_+05.txt
       9339 2002.02.26_+08.txt
       2184 2002.02.26_-05.txt
        668 2002.02.26_-06.txt
        466 2002.02.27_+00.txt
       2870 2002.02.27_+02.txt
       5226 2002.02.27_+03.txt
       3745 2002.02.27_+05.txt
       2497 2002.02.27_+08.txt
        552 2002.02.27_+10.txt
        735 2002.02.28_+02.txt
       3918 2002.02.28_+03.txt
       4321 2002.02.28_+05.txt
       1281 2002.02.28_+08.txt
        630 2002.02.28_-05.txt
       1031 2002.02.28_-06.txt
        938 2002.03.01_+02.txt
       1069 2002.03.01_+03.txt
       6698 2002.03.01_+05.txt
       4986 2002.03.01_+08.txt
       1672 2002.03.02_+02.txt
        471 2002.03.02_+03.txt
       1113 2002.03.02_+04.txt
       1793 2002.03.02_+05.txt
        550 2002.03.02_+07.txt
       3163 2002.03.02_+08.txt
       2469 2002.03.02_+10.txt
       1993 2002.03.02_-05.txt
       7539 2002.03.03_+02.txt
        742 2002.03.03_+04.txt
       3562 2002.03.03_+05.txt
        873 2002.03.03_+08.txt
        682 2002.03.04_+00.txt
        956 2002.03.04_+02.txt
       1405 2002.03.04_+03.txt
       8712 2002.03.04_+05.txt
        522 2002.03.04_+07.txt
       3783 2002.03.04_+08.txt
       1799 2002.03.04_+09.txt
       6945 2002.03.04_-05.txt
     153255 total
           

    ...and here is the resulting AVS field data (in two files):
    tz_freqs.fld
    # AVS field file 
    #
    ndim = 2
    dim1 = 19
    dim2 = 10
    nspace = 2
    veclen = 1
    data = float
    field = uniform
    
    variable 1 file=tz_freqs.txt filetype=ascii skip=2 stride=1 

    tz_freqs.txt
      -8   -7   -6   -5   -4   -3   -2   -1    0    1    2    3    4    5    6    7    8    9   10
    ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
       0    0    0 6945    0    0    0    0  682    0  956 1405    0 8712    0  522 3783 1799    0
       0    0    0    0    0    0    0    0    0    0 7539    0  742 3562    0    0  873    0    0
       0    0    0 1993    0    0    0    0    0    0 1672  471 1113 1793    0  550 3163    0 2469
       0    0    0    0    0    0    0    0    0    0  938 1069    0 6698    0    0 4986    0    0
       0    0 1031  630    0    0    0    0    0    0  735 3918    0 4321    0    0 1281    0    0
       0    0    0    0    0    0    0    0  466    0 2870 5226    0 3745    0    0 2497    0  552
       0    0  668 2184    0    0    0    0  685    0 1111 2037    0 2134    0    0 9339    0    0
       0    0    0    0    0    0    0    0 1753    0 3912 6322    0 1097    0  667 1460    0    0
       0    0 2163    0    0    0    0    0    0 1662 3433 1222    0 4084    0    0 5297  694    0
       0    0    0    0    0    0    0    0    0    0  993  504    0 2620    0    0 5507    0    0
           

  3. For the first visualization I plotted word count as both vertical height and color. Here it is obvious that the highest word count came from time zone Zulu+8 on 26 Feb 2002.

    I produced the animation above in AVS. The field_to_mesh module let me easily control the height and color, experimenting until I liked the result. The most difficult thing was getting the labels of date and time zone correctly positioned, which I did mostly by trail and error, editing this label file:
    zones.label
    colors dropshadow left 
    5  29
    Z-08 0.02   0.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-07 0.02   1.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-06 0.02   2.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-05 0.02   3.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-04 0.02   4.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-03 0.02   5.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-02 0.02   6.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z-01 0.02   7.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+00 0.02   8.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+01 0.02   9.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+02 0.02  10.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+03 0.02  11.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+04 0.02  12.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+05 0.02  13.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+06 0.02  14.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+07 0.02  15.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+08 0.02  16.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+09 0.02  17.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    Z+10 0.02  18.0  9.0  0.0   0.00  0.00  0.00 0.9 0.9 0.9
    04-Mar-2002  0.03   0.0  9.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    03-Mar-2002  0.03   0.0  8.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    02-Mar-2002  0.03   0.0  7.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    01-Mar-2002  0.03   0.0  6.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    28-Feb-2002  0.03   0.0  5.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    27-Feb-2002  0.03   0.0  4.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    26-Feb-2002  0.03   0.0  3.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    25-Feb-2002  0.03   0.0  2.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    24-Feb-2002  0.03   0.0  1.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
    23-Feb-2002  0.03   0.0  0.0  0.0   -0.20  0.00  0.00 0.9 0.9 0.9
           

  4. Having gotten the word count display looking good, I next wanted to superimpose word frequency information in the form of colored rings. Initially I counted the 5 words:
    • America
    • Arab
    • Iran
    • Iraq
    • Laden
    because I knew from previous experiments they were all present in the TIDES text. The counts are represented by the sizes of the vertical rings, left to right. The colors range from red to green over the range of words found, zero to forty. The horizontal ring linking each set of verticals represents their average.

    I chose for this test to only show the frequencies of the two highest peaks. I also just "impaled" each set of rings on the associated peak.

    I created shell scripts to use the UNIX utilites grep (get regular expression) and wc - (line count) to count the occurences of specific words in the files. This yielded these word counts:
    2002.02.26_+08.txt
         30
          2
          0
          0
          0
    2002.03.04_+05.txt
          9
         11
         21
         12
          0

    I pasted the resulting counts into a text file formatted so my C program mkRingsScript could read it. This program makes a shell script which then runs other programs to create data files. After running everything I ended up with rings superimposed over the total word counts, representing word frequency information.

  5. Lastly I wanted to see how densely I could display the data. I raised the base rings up on vertical lines, away from the peaks, and put a set of linked rings on every peak, in other words, above every space-time coordinate that had more than zero words.

    I also changed the words I was counting, to this list of 6 words:

    • war
    • peace
    • military
    • civil
    • India
    • Israel
    based on further experiments with the word counts in this batch of messages.

    This was easy to to with the pre-esxisting tools; I just modified my search scripts to pass more data in the input file to mkRingsScript, and selected a higher Z offset than zero.

Last update 28-Mar-2002 by ABS.