The Data Spec(ification)

Ryan Shriver, and the other folks noted in the introductory comments, below, were the principal architects of the DeadLists Data Spec. This document lays out the form in which setlists will be recorded for the DeadLists Project. Comments, additions, questions, praise, and corrections, should be posted to the DeadLists mailing list (and Ryan should be cc-ed). Please note that this document is being put into action, via the scripts and checks used by Ben Brewer's pages and via Tim Buller's database. Rather than relying solely on what you see here, visit Ben's pages first, to see where this data spec comes into play. In fact, in places where this spec and Ben's pages contradict one another, please consider Ben's pages to be more definitive. Why? Ben's codifications of these specifications conform with ongoing work -- and dovetail in exacting manners with the database efforts put together by Tim and Kevin Weil.

Version 1.3.2
Last update was on April 25, 1997

Introduction
This data specification guide provides a framework that setlist
data, used for the Deadlists project, shall adhere to. The specification
was created to aid both programmers and setlist contributors in their
efforts to create a searchable interface for Grateful Dead setlists.
Setlist contributors should read this document to understand the
format setlist data should take.

Contributors to this document
   * Steve Zimmerman (saz@well.com) created the first version of this
     document.
   * Nathan Wolfson (nathan@well.com) updated Steve's version 1.0 using
     input from the DeadLists Mailing List.
   * Allen Baum (baum@apple.com) updated Nathan's version 1.1  in order
     to get his two cents in & make the spec a bit more rigorous.
   * Ryan Shriver has tried to put a final polishing
     on this document based on more discussions on the mailing list.

Changes from version 1.3.1 to 1.3.2
        - Dates must be an 8 character string of the form MM/DD/YY (ie 02/25/77).
        - Changed syntax of Song Modifiers. Added ' Tuning'. Removed '-continued'.
        - Removed the : song delimiter (because it's also used for timings). All others are ok.
        - Modified timing maps so they must appear in the COMMENTS field and not in the setlist fields.
        - Set timings appear at the beginning of a set (enclosed in brackets) and use the ; delimiter.
        - Removed literal comments. There are only reference comments, which are numbered 1 - 99 and
          enclosed in parenthesis in the setlists fields.
        - Order of reference comments and song timings don't matter.
        - Simplified the COMMENTS spec to allow the use of common sense :-)
        - Simplified the CONTRIBUTORS field to adjust for the common working consensus.
        - Updated example of the record of a show.
        - Did not attempt to modify the BNF notation of data spec.

Changes from version 1.21 to 1.3.1:
        - Cleaned up the appearence of the document while adding some general information.
        - Revised the format for data.
        - Listed the US and Canadian abbreviations.
        - Record separator is now a blank line. See the example of a record below.
        - Initial field names (BAND, DATE, etc.) are mandatory for every record.
        - Tabs must separate field names from field data. Tabs are not to be used anywhere
          else in the record.

Changes from version 1.2 to version 1.21:
        - An asterisk is used to separate sub-subfields, e.g comment items in a comment
          (not new), and timing items in a timing-map.
        - Many minor readjustments of the technical definition for clarity.
        - Timings and timing maps are allowed inside the extended-silence separator ';;'
        - Shows (records) are separated by , which is generally a  character.
        - New definition of  to eliminate ambiguities that would
          made parsing rather difficult: email address now in angle brackets.



Format
For the purpose of this document, each Grateful Dead concert from 1965 to
1995 shall be called a "show". For each show, there is certain historical data
that should be captured in an organized manner. A "record" is the structure that
will hold the historical data for each show. For every show, there is
one and only one record.

A record consists of 13 lines. The first line is a blank line,
identified solely by a carriage return (or NEWLINE character).
The next 12 lines consist of field names and their corresponding data.
The field names are (in order):

BAND
VENUE
CITY
STATE
DATE
SET1
SET2
SET3
ENCORE
COMMENTS
RECORDING
CONTRIBUTORS

Each of the above field names are hyperlinks to their respective sections in this document.

Field names and field data will be separated by one TAB character.
Each field shall be separated by a NEWLINE (carriage return) character,
which is placed at the end of the field data. The format is:

FIELD_NAME<tab>Field Data<carriage_return>


Field Names
Field names are used to define what data should be included in each
field. Please pay close attention to the format of the field data.


BAND
Grateful Dead


VENUE
The venue where the concert took place. When concerts took place on college
campuses, it is important to list the entire name of the school. For example,
UCLA should be University of California, Los Angeles. When the venue is
unknown, a question mark should be entered in the venue field.


CITY
The city where the concert took place.


STATE
The two letter abbreviation (listed below) of the state (or Canadian Province)
where the concert took place. When the concert took place outside the
United States, use the full name of the country where it took place.

United States Abbreviations

AK   Alaska
AL   Alabama
AR   Arkansas
AZ   Arizona
CA   California
CO   Colorado
CT   Connecticut
DC   District of Columbia
DE   Delaware
FL   Florida
GA   Georgia
HI   Hawaii
IA   Iowa
ID   Idaho
IL   Illinois
IN   Indiana
KS   Kansas
KY   Kentucky
LA   Louisiana
ME   Maine
MA   Massachusetts
MD   Maryland
MI   Michigan
MN   Minnesota
MO   Missouri
MS   Mississippi
MT   Montana
NC   North Carolina
ND   North Dakota
NE   Nebraska
NH   New Hampshire
NJ   New Jersey
NM   New Mexico
NV   Nevada
NY   New York
OH   Ohio
OK   Oklahoma
OR   Oregon
PA   Pennsylvania
PR   Puerto Rico
RI   Rhode Island
SC   South Carolina
SD   South Dakota
TN   Tennessee
TX   Texas
UT   Utah
VA   Virginia
VT   Vermont
WA   Washington
WI   Wisconsin
WV   West Virginia
WY   Wyoming

Canadian Abbreviations

AB   Alberta
BC   British Columbia
LB   Labrador
MB   Manitoba
NB   New Brunswick
NF   Newfoundland
NS   Nova Scotia
NT   Northwest Territories
ON   Ontario
PE   Prince Edward Island
PQ   Quebec
SK   Saskatchewan
YT   Yukon Territory


DATE
A Date is an 8 character string of the form mm/dd/yy (e.g. "02/04/88").

In the event of uncertain dates, unknown elements will be replace by question marks
(e.g.
      "Some day in Feb. of 1984" shall be "02/??/84" and
      "some day in 1969"         shall be "??/??/69" and
      "no clue whatsoever"       shall be "??/??/??" ).

Following the date are flags to indicate other information about a
given show: "a" = Early Show, "b" = Late Show, "@" = Opening Sets in which
band member(s) participated, "+" = Sound Check

For example, an early show opener on February 4, 1992 shall be
designated as 02/04/92a@.

In the event that a precise date or show is stated, but its accuracy
is questioned, a trailing question mark following the date and flags will
be added,  (e.g. 04/01/68a?) .

Note: Using the supplemental flags above will allow use of only three set list
      fields per show record and avoid a number of problems associated with
      having upward of seven sets per show record. Otherwise, there would be
      some problems such as:

      * Simple queries (e.g. finding all times a particular song was
      played in a particular set) will not work, because there will no way to
      determine which of the "Sets" is the first GD set vs. an opening set.
      * When data is imported into a database application for manipulation,
      the file size would be significantly larger than if the flagging method
      was used, because sufficient room in each of the seven? set list fields
      would have to be allocated to handle a full set list, when in reality,
      relatively few of the fields above SET THREE would ever be used.


SET1, SET2, SET3, ENCORE
Songs shall be named according to the mutually agreed upon list at
http://www.deadlists.com/ -> Resources -> Song Names.
New song names, jam names, tuning names, and so on, will be added as needed.

Song Modifiers
There will be standard song modifiers to note special occurrences
that we want to be able to filter or sort differently from other regular
song names.
These are:

      "(song-name) Jam" specifies a recognizable insertion of a known song
                        that doesn't actually become that song.
      "(song-name) Reprise" specifies the return to a song's conclusion,
                            whether or not begun during that show.
      "(song-name) Tease" specifies the hint of a song that then becomes
                          something else.
      "(song-name) Tuning" specifies the the initial notes of a song are
                           played as tuning warm-up.
      The use of the song name 'Jam' refers to an unspecified instrumental
      tune. If and when it gets classified, it will become " Jam"

Song Delimiters
Note: All song delimiters should be preceded and trailed by one blank space!!

      ';'       Songs not continuing into each other are separated by a
                semicolon (e.g. (song1) ; (song2)).
      '>'       Songs that segue into each other by means of a defined jam or
                contiguous transition -- a guzinta -- are separated by the
                character '>' (e.g. (song1) > (song2)).
      '~'       Songs that segue into each other, not by means of a defined
                transition, but through an intentional pause are separated by a tilde
                (e.g. (song1) ~ (song2)).
      '; ;'     In the event that two songs are separated by a pause of greater
                than 60 seconds, a separator of two consecutive semicolons is used
                (e.g. (song1) ;; (song2).)
                It's possible to insert the length of the pause between the semicolons
                (e.g. (song1); [3:20] ; (song2)) using the standard timing format.
      '%'       Where a tape splice, flip or pause prevents examination of the
                space between songs, a backslash will be used (e.g. (song1)\ (song2)).

Song and Set Timings
Song timings, where known, are enclosed in square brackets directly
following the item being timed, (ie  [2:33] ;  ).

Set timing, where known, comes at the beginning of the setlist field and
is separated from the first song by the ; character.
(ie SET1[58:38] ;  ; )

Timing maps are collections of timings describing the substructure of an song.
They are placed in the COMMENTS field and enclosed with the {} brackets. There
should be a reference mark next to the timing-map song in the setlist field.
(ie COMMENTS    (1) {intro: [1:20] verses1&2 [3:20] instrumental chorus
[1:55] verse 3 & fadeout [3:22] var of theme [2:43] jazzy jam [5:58]} )


COMMENTS
Comments are usually notes such as guest artists on songs, acoustic
versions, and any other information that may be useful but not
discernible from the setlist information alone. 

Reference marks are numbered references to comments contained in the
COMMENTS field. The numbers (from 1 through 99) are placed in parenthesis
following the item being commented on. For each numbered reference mark,
there must be an associated reference mark in the COMMENTS field, followed
by the corresponding comment. For example:

SET1    Little Red Rooster (1) [4:45] ; Bertha ;
COMMENTS        (1) with Carlos Santana.

There is no requirement that an item have only one reference mark,
nor that more than one item can share a reference mark. Thus, a song
might have two reference marks, one referring to guests, and one to
acoustic, or two songs might refer to the same guest artist comment.
If a timing is present, the order of the timing and the reference mark
does not matter. For example, all of the following are valid:

SET1    Little Red Rooster (1) [4:45] ; Bertha (1) ;
SET1    Little Red Rooster [4:45] (1) ;
SET1    Little Red Rooster [4:45] (1) (2) ;
SET1    Little Red Rooster (1) (2) [4:45] ;
SET1    Little Red Rooster (1) [4:45] (2) ;

COMMENTS        (1) with Carlos Santana. (2) Bob on acoustic.

All of the descriptions in the COMMENTS field except the first are required
to be labeled with a reference mark. The first descriptions in the COMMENTS
field, if unmarked, refer to the entire show and/or set, depending.

If part of the show was acoustic, then an appropriate entry would be:

COMMENTS        First set was acoustic, second set was electric.


RECORDING
This category will be used to track the kinds circulating recordings of
each performance using standard source abbreviations. An example of the
kind of information this might include (focused on two early years) can be
found at http://www.winternet.com/~edoherty.
For later years, specifications such as
the various kinds of audience tapes available for a particular show and
what kind of soundboard tapes exist will be included (though we might want
to include some default assumptions regarding the availability of at least
analog audience tape from the taper's pit of Dead shows for every show
since the section was created in '84.


Proposed abbreviations:

   * format:
      o C=cassette      Cassettes and open reel can list an equalization
      o R=open reel     or compression used, tape type, and speed such as:
      o P=PCM                 o A=Dolby A     o MO=metal oxide  o 75=7 1/2 ips
      o D=DAT                 o B=Dolby B     o CH=chromium     o 35=3 3/4 ips
      o B=HiFiBeta            o C=Dolby C
      o V=HiFiVHS             o D= DBx
   * source:
      o SB=SBD patch
      o MS=matrix SBD recording
      o FM=FM broadcast
      o PF=pre-broadcast FM recording
      o A?=audience (unknown location)
      o AF=audience Front of Sbd Audience
      o TS=audience taper's section
   * generation:
      o M=master
      o 1-9 =  generation number (master=0 gen)


CONTRIBUTORS
The source of each set list will be noted with names and email addresses of the
people that submitted it. Some dates will have many contributors. These will
be deleted with time as the consensus emerges that "this list is correct".
Comments about how the contributor received the setlist should be noted in
the COMMENTS section. An example entry would be:

CONTRIBUTORS    Gordon Sharpless (paleo550@philly.infi.net), Jeff Tiedrich (jeff@tiedrich.com)



Example of a record of a show (Data is totally bogus):

BAND    Grateful Dead
VENUE   Oakland Coliseum
CITY    Oakland
STATE   CA
DATE    10/17/91a
SET1    [59:26] ; Touch Of Gray [5:04] ~ Little Red Rooster (1) ; Lazy River Road ; When I Paint My Masterpiece (1) ; Childhood's End [5:32] ; Cumberland Blues > Promised Land (2)
SET2    [45:23] ; Shakedown Street [10:34] ; Samson And Delilah ; So Many Roads ; Playing In The Band > Corinna > Drums > Space [30:01] (3) > Playing In The Band Reprise % Sugar Magnolia [6:56]
SET3
ENCORE  Black Muddy River ; Box Of Rain
COMMENTS        Dave Matthew's Band opened. (1) Bobby on acoustic. (2) with Carlos Santana. (3) {noodling [5:26] *teasing [9:32]* real jam [5:03] }
RECORDING       180 SBD
CONTRIBUTORS    Gordon Sharpless (paleo550@philly.infi.net), Jeff Tiedrich (jeff@tiedrich.com)
<blank line>


          DeadLists Formatting Specification Proposal Formal Description

Version 1.21
Last update was 5/29/96.

Notation
An item inside angle brackets ( "<..>" ) is a type of an object.
      The type is either a literal string, or constructed of other objects.

Items within curly brackets  ( "{..}" ) are items that must occur together.
      Normally this is to delimit a range, or set of repeated items.
Items inside square brackets ( "[..]" ) are optional.
Items separated by a dash ( "-" ) indicate a selection range.
      The first item indicates the lowest value of the range, and the second
      item indicated the highest.
Items separated by a vertical bar ( "|" ) indicate selection items;
      exactly one of the items is to be selected. If inside square brackets
      then none of the items is also valid.
Items followed by an  asterisk ( "*" )   can be repeated zero or more times.
      To force something to be repeated on or more times, use <item> {<item>}*
Items inside single quotes ( "'..'" ) are literals. These can be used to
      insert characters that would otherwise be interpreted as formatting
      characters, such as brackets. To insert a single quote, use two
      consecutive single quotes.
Newlines and whitespace inside item definitions are for formatting only,
      and are NOT part of the field definition; Use <crlf> if a newline is
      required, <ws> if spaces are required, and <ws?> if whitespace is
      permitted. The amount of whitesspace is irrelevant.

Several objects are predefined literals:
      <crlf> is a newline (either a control-M or control-J
               or the character pair control-M,control-J.
      <newpage> is control-L.
      <tab> is a tab character, control-I.
      <space> is a space character
      <ws>  is (<space> [ <space> ]*
      <ws?> is          [ <space> ]*
---------------------------------------------------------
A <show> is    <header> <band-field> <date-field>   <venue-field>  <city-field>
           <state-field> <set-field>  <set-field>     <set-field> <encore-field>
           <comments-field>     <recording-field> <contrib-field>

A <header> is   BAND <tab> DATE <tab> VENUE <tab> CITY   <tab> STATE<tab>
              SET1 <tab> SET2 <tab> SET3  <tab> ENCORE <tab> COMMENTS<tab>
              RECORDING<tab>CONTRIBUTORS<crlf>

A <band-field> is (generally): "Grateful Dead"<crlf>

A <date-field> is <month>'/'<day>'/'<year>['?'][<show>]<crlf>
  <month> is   { 01-12 } | {1-12 } | '??'
  <day>   is   { 01-31 } | {1-31 } | '??'
  <year>  is { [ 19-20 ]   {00-99}}| '??'
  <show>  is   [ 'a' | 'b' ] [ '@' | '+'] [ '?' ]
      Much of this is just a complicated way to say that numbers below 10
      can have an optional leading zero for day and month.
      Note that years without leading 19 or 20 assume 19.
      For <show>, 'a' indicates early show, 'b' indicates late show,
      '@' indicates opening act with band members, '+' indicates soundcheck.
      ****there is some ambiguity here, having to do with '?', especially
      if there is no show characters, but there is a show '?', etc.****

A <venue-field> is   <name of venue> <crlf>

A <city-field>  is   <name of city > <crlf>

A <state-field> is { <name of state> | <name of country> } <crlf>
      Using the two letter postal abbreviation for state, or two letter
internet
      domain name for non-US countries is preferred.

A <set-field> is                        [ <comment-info> ]
                        [ <song-name>[<ws><comment-info> ]
           [ <song-sep> [ <song-name>[<ws><comment-info> ]] ]* ]
          The initial <comment-info> refers to the set; subsequent
          <comment-info>s have exactly the same format, and refer to the songs
          they immediately follow. An acoustic set, for example, might be
          prefixed with the comment keyword "Acoustic:". Alternately, it might
          have a comment-reference "(1)" which would refer to a comment
          "(1) Acoustic: ". Note that <set-field> is used for SET1, SET2, and
          ENCORE, and can be ntirely blank.

      A <comment-info> is [<timing>] [<timingmap>]
                          [ <ref-mark> | { '(' <lit-ref> ')' } ]*
          A <timing> is '[' [[<hr> ':' ]<minsec>] ':' <minsec> ']' <ws?>
              A <hr>     is { 0-24 | 00-24}
              A <minsec) is {00-59 | 00-24}
          A <timingmap> is '{' <phase-timing> [ '*' <phrase-timing>]* '}'
              A <phrase-timing> is <descrip><timing><ws?>
          A <ref-mark> is '(' <1-99> ')' <ws?>
          A <lit-ref>  is  [<keyword>':'] <descrip>
                     [ '*' [<keyword>':'] <descrip> ]*
                  Note that <descrip> is arbitrary text that doesn't start
                  with a digit, or contain '(', ')', '{', '}', '[', or ']'.
                  Valid <keyword>s include "Acoustic:", "Guest:", "Opener:".
                  A complete list can be found in   ***insert URL here***.
      A <song-name> is <name of a song>['-'<song-mod>]
          This list of valid <song-name>s can be found in
               ***insert URL here***
          A <song-mod> is '-jam' | '-continued' | '-reprise' | '-tease'
      A <song-sep> is <ws> { ':' | ';'[[<timing][<timing-map>]';']
                           | '>' | '~'  | '\' } <tab>
          Note that <song-titles> better not contain the characters ":;>~\".

      This definition implies:
      -Anything inside curly  brackets is a timing map.
      -Anything inside square brackets is a timing (even inside curly brackets.)
      -Anything inside parens is a literal comment or reference mark.
      Thus, we can find <song-name> by filtering out everything inside one of
      these kinds of brackets.
      The end of a <song-name> is a <sep-char> (or <newline>).

A <comments-field) is      [ <comment-subfield> <tab>]
                 [<ref-mark> <comment-subfield>]
           [<tab> <ref-mark> <comment-subfield>]*
    A <comment-subfield> is [<timing>] [<timing-map] [<lit-ref>]
      The initial <comment-subfield>s refers to the show.
        The rest must have <ref-marks>s that match <ref-marks>s in <set-fields>
      The <comment-subfield> definition is very similar to <comment-info>,
      except: <lit-refs> are not in parens and the <ref-marks> are separated.
      The field definition is similar to a <set-field>, with <tab> instead of
      <song-sep> and <ref-mark> instead of <song-name>, so:
      -Anything inside curly  brackets is a timing map.
      -Anything inside square brackets is a timing (even inside curly brackets.
      -Anything inside parens is a reference mark.
      Thus, we can find a comment by filtering out everything inside one of
      these kinds of brackets. Multiple items in a comment are separated by
      an asterisk, as usual

A <recording-field> is  [<fmt> [<eq>]] [<src>] [<gen>]
      A <fmt> is {'C'<tape>} | {'R'<speed><tape>} | 'P' | 'D'  | 'B' | 'V'
              (C=cassette, R=open reel, P=PCM,  D=DAT, B=BetaHiFi, V=VHSHiFi)
          A <tape> is 'MO' | 'CH' | ********list others here!*******
              (MO is metal oxide, CH is chromium, etc....)
          A <speed> is [ '7.50' | '3.75' ]
      A <eq> is 'A' | 'B' | 'C' | 'D'
              (A=dolby A, B=Dolby B, C=Dolby C, D= Dbx)
      A <src> is 'SB' | 'MS' | 'FM' | 'PF | 'A?' | 'AF' | 'TS'
              (SB=SBD patch, MS=matrix SBD recording, FM=FM broadcast,
               PF=pre-broadcast FM recording, A?=audience unknown location,
               AF=Front of Sbd Audience,      TS=taper's section)
      A <gen> is 'M' | ('1'-'9')

A <contrib-field> is
    [[ '<' <email-addr> '>'] ['('<name>')']  <verif-code><ws?><lit-ref><tab>]*
      A <verif-code> is  'V:' | 'A:' | 'O:' ]
           (V=verified against tape, A=assumed, O=other, described in <descrip>)
      This definition implies:
      -Anything inside angle  brackets is an email address.
      -Anything inside parenthesis     is a name