1. Introduction
1.1. About MNX
MNX is a proposed music notation markup standard. Its aim is to improve MusicXML in fundamental ways, while retaining many of its key concepts, terms and features.
Rather than attempting to create a normative specification for MNX from the start, this document tries to describe key design goals. Each goal is accompanied with examples of proposed MNX markup that directly illustrates it. In some cases, alternative examples are included.
The focus is on areas which substantially differ from MusicXML, and explanations refer to MusicXML features to show how MNX differs.
Note that this is not intended to be an exhaustive list of the changes that MNX will include; it is merely an attempt to portray some of the most important ones. Many other changes and improvements are expected to be included.
Having said that, where not explicitly called out, assume that MusicXML features may carry over to MNX unchanged, or in an obvious and analogous fashion.
MNX stands for "Music Notation X", where "X" suggests "XML", "eXtended", and potentially other X-related things as may come to mind.
1.2. Goals and Tradeoffs
MNX seeks to provide a high degree of interoperability and exchange between different applications working with music notation. This emphasis on interoperability is a differentiator between MNX and other notation encoding approaches, and takes it in a different direction from its predecessors.
MNX is designed with a few core beliefs in mind:
-
Limits are needed on either semantic richness or universality. An encoding that seeks to represent an established notational system will want rich and specific semantics, tailored to represent that system’s ideals and concepts, which are by definition not universal in scope. On the other hand, a universal system that can encompass literally any graphical and sonic expression of music, is most interoperable when the semantics of notation are set aside.
-
There are no culturally privileged notational systems. Consequently it must be possible to extend MNX to accommodate multiple such systems.
-
Within any given notational system, some limits on expressiveness are necessary in order to make the implementation effort manageable.
Note: we use the term semantics here to refer to concepts with an understood meaning in some musical culture, as distinct from their graphical or sonic instantiations. For example, in conventional Western music, we consider the shared idea of "quarter note" to be a semantic one; a quarter note can be instantiated as many different shapes, or as many different sounds.
1.3. MNX Score Types
MNX can support multiple types of score encoding, which can be bundled together through the mechanism of §3.1 The container document into a single composite document.
The present proposal focuses on two distinct types of MNX score, which represent opposite poles in the tradeoff described above between semantic richness and universality. The expectation is that others will be created over time, particularly those which target specific notational systems.
The first score type is CWMNX, which encodes Conventional Western Music Notation (CWMN) in a semantically rich fashion. It inherits many ideas and concepts from MusicXML.
The second is GMNX, where "G" is for General. It serves as a kind of universal encoding for scores having arbitary graphical and audio content. In consequence, it is relatively free of semantics.
Note: Our working definition for CWMN is "notation in which the requirements do not significantly extend beyond those of music of the 19th and early 20th centuries." [Gerald Warfield, Writings on Contemporary Music Notation (Ann Arbor: Music Library Association, 1976), ii.]
1.4. Comparisons with other notation standards
CWMNX is a lineal descendant of MusicXML, and employs many of the same concepts. However it sacrifices some features and flexibility in favor of tighter interoperability, and simplifies the element structure considerably. It also moves all non-semantic information into CSS properties. The features in GMNX have no analogue in MusicXML.
MEI is a very general and expressive medium for encoding arbitrary musical documents, with particular attention to the needs of scholars. Due to its extreme plasticity, MEI is perhaps better described as a powerful framework for building customized documents and applications, than as a single encoding method. As such, interoperability has not been a main goal of MEI to date. However there are efforts underway to define a clean MEI subset as an interoperable medium for encoding CWMN (sometimes known as "MEI Go").
IEEE 1599 is a specification that has paid unique attention to the relationships between different layers of musical information. Its Logic layer is similar in content to CWMNX, while its Notational, Performance and Audio layers answer some of the same concerns as GWMNX. GMNX takes a different approach to connecting these layers, and does not attempt to fully unify semantic information with visual and performance data. It relies to a greater degree on SVG, and to a lesser degree on MIDI.
1.5. Compatibility with MusicXML
MNX does not attempt to be backward-compatible with MusicXML, nor is it a superset of MusicXML. However, a large proportion of MusicXML markup is expected to be preserved. In these examples, MusicXML constructs are used freely throughout as a way to show how proposed new concepts dovetail with existing ones.
Backward compatibility aside, it is a goal to be able to machine-translate MusicXML into MNX. This is essential for migration purposes.
1.6. Use case concordance
A companion document details a set of known use cases for music notation. The use cases have links back to relevant sections of this document, where support can be demonstrated.
2. A brief example
To satisfy immediate curiosity up front, here is an MNX encoding of the timeless song Hot Cross Buns, in a simple grand-staff piano arrangement. This encoding is purely semantic, and includes no information on appearance or interpretation.
<?xml version="1.0" encoding="UTF-8"?> <mnx> <head> <identification> <title>Hot Cross Buns</title> </identification> </head> <score content="cwmn"> <system> <measure> <attributes> <tempo bpm="120" value="4"/> <time signature="4/4"/> </attributes> <direction placement="above"> <words>With heavy irony</words> </direction> </measure> <measure/> <measure/> <measure/> </system> <part> <part-name>Piano</part-name> <measure> <attributes> <staff> <clef sign="G" line="2"/> </staff> <staff> <clef sign="F" line="4"/> </staff> <instrument-sound>keyboard.piano</instrument-sound> </attributes> <sequence staff="1"> <direction> <dynamics><f/></dynamics> </direction> <event value="4"><note pitch="E4"/></event> <event value="4"><note pitch="D4"/></event> <event value="2"><note pitch="C4"/></event> </sequence> <sequence staff="2"> <event value="2*"><rest/></event> <direction> <dynamics><p/></dynamics> </direction> <event value="4"> <note pitch="C3"/> <note pitch="E3"/> <note pitch="G3"/> </event> </sequence> </measure> <measure> <sequence staff="1"> <event value="4"><note pitch="E4"/></event> <event value="4"><note pitch="D4"/></event> <event value="2"><note pitch="C4"/></event> </sequence> <sequence staff="2"> <event value="2*"><rest/></event> <event value="4"> <note pitch="C3"/> <note pitch="E3"/> <note pitch="G3"/> </event> </sequence> </measure> <measure> <sequence staff="1"> <event value="8"><note pitch="C4"/></event> <event value="8"><note pitch="C4"/></event> <event value="8"><note pitch="C4"/></event> <event value="8"><note pitch="C4"/></event> <event value="8"><note pitch="D4"/></event> <event value="8"><note pitch="D4"/></event> <event value="8"><note pitch="D4"/></event> <event value="8"><note pitch="D4"/></event> </sequence> <sequence staff="2"> <event value="4"><rest/></event> <event value="4"> <note pitch="C3"/> <note pitch="E3"/> <note pitch="G3"/> </event> <event value="4"><rest/></event> <event value="4"> <note pitch="G3"/> <note pitch="B3"/> </event> </sequence> </measure> <measure> <sequence staff="1"> <event value="4"><note pitch="E4"/></event> <event value="4"><note pitch="D4"/></event> <event value="2"><note pitch="C4"/></event> </sequence> <sequence staff="2"> <event value="2*"><rest/></event> <event value="4"> <note pitch="C3"/> <note pitch="E3"/> <note pitch="G3"/> </event> </sequence> </measure> </part> </score> </mnx>
3. Document organization
Note: This section applies to all content types, CWMN or otherwise.
3.1. The container document
MNX documents act as general-purpose containers, which may be arbitrarily subdivided into a hierarchy of components which collectively make up the document as a whole.
3.1.1. A simple CWMN score
Here’s an example of the simplest structure, where an MNX document contains a
single CWMN score. The head
element includes descriptive information,
while the score
element contains the score contents.
A significant MNX-specific element occurs here: the style
element, which
includes information relevant to the document’s appearance and interpretation:
<mnx xmlns="http://www.w3.org/mnx"> <head> <identification> <title>My Favorite Work</title> <creator type="composer">Alan Smithee</creator> </identification> <style> @import url(mystyles.css); </style> </head> <score content="cwmn"> // CWMNX score contents here... </score> </mnx>
3.1.2. A simple general score
The score
element can be qualified by a content
attribute that
describes the encoding of the content. It defaults to content="cwmn"
and may include values from a registry of MNX
musical content types. This example shows a score using the GMNX
type of general
:
<mnx xmlns="http://www.w3.org/mnx"> <head> <identification> <title>My Favorite Work</title> <creator type="composer">Alan Smithee</creator> </identification> </head> <score content="general"> // GMNX score contents here... </score> </mnx>
The MNX specification will maintain a registry of recognized values for content
.
3.1.3. Compound MNX documents
It’s also possible to combine different representations of music in the same
MNX document by using the collection
element to combine multiple chunks
of music into a single chunk. Each chunk may possess a distinct encoding. collection
elements can be nested, allowing a subordinate collection to be
embedded in a higher-level one.
Metadata elements such as identification
or formatting elements like style
may be included at any level of the resulting structure, causing
them to apply them only to those parts of the document.
Here’s an example that includes a hierarchy of collections and scores. (Note that some of these could employ non-CWMN encodings as well.)
<mnx xmlns="http://www.w3.org/mnx"> <head> // </head> <collection> <score type="section" content="cwmn"> <identification> <title>Section 1</title> </identification> // CWMN markup... </score> <collection type="section"> <identification> <title>Section 2</title> </identification> <score type="movement" content="cwmn"> <identification> <title>Section 2, Movement 1 (for Solo Flute)</title> </identification> // CWMN markup... </score> <score type="movement" content="cwmn"> <identification> <title>Section 2, Movement 2 (for String Orchestra)</title> </identification> // CWMN markup... </score> </collection> </collection> </mnx>
3.1.4. Inclusion by reference
The HTML link
element may be used to include a score by reference.
3.2. Profiles
A score
element may employ the profile
attribute to indicate
that it conforms to a particular set of expectations regarding its contents. A
registry of valid profile names for each allowed score encoding is maintained
as part of the specification.
The intent of profiles is to allow programmatic validation of documents, and to permit different levels of validation to be enforced where appropriate. MNX parsers may be also be constructed to specifically support certain profiles, significantly decreasing programming effort.
For the purposes of this document, the most important profile is <score content="cwmn" profile="standard">
. This profile
indicates that the score
in question conforms to a set of standard
assumptions regarding "well-formed" CWMN. Examples of such assumptions may
include:
-
All parts contain the same number of measures
-
The metrical content of a measure does not exceed its duration
-
The metrical content of a tuplet does not exceed its duration
-
All displayed accidentals are encoded in the document using
accidental
-
All parts of a score agree with respect to form, time and key signatures and barring.
This profile mechanism replaces the supports
feature of MusicXML.
3.3. Metadata and Attribution
The identification
element in MNX is used to supply descriptive and
bibliographic information, as it is in MusicXML.
Its contents are similar to MusicXML, but it can be included in a variety of parent elements for greater flexibility:
-
The
head
element of the document -
Any
score
orcollection
element at any level (work, section, movement...) -
Any notational element (part, measure, notes...)
Other parent elements will no doubt make sense as the specification is developed.
4. CWMNX: Encoding Conventional Western Music
CWMNX is the MNX score type denoted by a content
attribute of cwmn
. It is
intended specifically for encoding conventional Western music notation. It is
a vehicle for representing the semantics of such musical scores, with the
ability to weave appearance and interpretation data into these semantics by
means of CSS properties.
4.1. CWMNX Goals
CWMNX is intended to support a wide range of applications which must have a semantic description of music notation and which may make use of appearance and performance information alongside this.
Some examples of applications suited to using CWMNX include:
-
CWMN-based notation editors
-
CWMN readers which dynamically render or play music
-
tools for analysis or transformation of CWMN music
-
OMR applications that produce encodings of CWMN
-
educational applications requiring a knowledge of underlying CWMN structure
4.2. CWMNX Layers
CWMNX makes a very clear distinction between the following layers of musical information:
-
Semantics, the core stratum of notational data in a CWMN work that must inform any potential rendering of that work. CWMNX encodes all semantic information as XML markup like
measure
,note
and so on. There is no concept of "selective encoding" in this layer: without the required core of semantic elements, the document is not valid. See §5 CWMNX Semantic markup. -
Appearance, a layer of visual attributes and formatting that describes how a work appears to the reader, independent of the other two layers. Where this layer is absent, implementations are expected to supply a default appearance based on the semantic layer. CWMNX encodes all appearance information using CSS styles and properties. See §6 CWMNX Styling for more information.
-
Interpretation, a layer of performance information that describes how a work sounds to the listener, independent of the other two layers. Where this layer is absent, implementations are expected to supply a default interpretation based on the semantic layer. Where this layer is present, CWMNX encodes all interpretation information in the form of CSS styles also. See §7 CWMNX Interpretation for more information.
Since only the semantic layer is encoded in XML markup and the rest is CSS,
these distinctions are made very concrete. Furthermore, ambiguities arising
from combining the layers in MusicXML (such as the near-duplication of tie
and tied
) can disappear.
5. CWMNX Semantic markup
This section describes how CWMNX treats the semantic layer of a CWMN document in the standard profile.
Note: In most of the examples that follow, the MNX container elements are omitted for brevity.
5.1. Score structure
For a CWMN score, the score
element is roughly equivalent to the score-partwise
element in MusicXML. MusicXML’s score-timewise
element
is not used in CWMNX.
5.1.1. Unifying score-part
and part
In CWMNX, the contents of the score-part
element are simply included below part
.
This leaves the part
element as the single place in the document that provides
part-related information.
5.1.2. System notations
A number of notational concepts can be scoped to the entire system in a semantic sense, not only to a single part. Examples include:
-
Key signatures
-
Time signatures
-
Tempo indications
-
Rehearsal marks
-
Barlines
-
System and page breaks (see §6.8 System and page flow)
-
Musical form indications
In some cases (time signatures, for example) multiple instances of a notation are encoded separately in each MusicXML part, yet they are expected to semantically agree across all parts. While such agreement is not necessarily the case, it is nearly always the case. It seems desirable to allow this agreement to be expressed in a single instance, somewhere in the document.
In other cases (e.g. tempo indications) the concept typically is encoded in the first visible MusicXML part. Yet, if the document were rendered showing only some other part in the score, there would still be an expectation that the tempo indications encoded in the topmost part would be shown. MusicXML documents produced by different engravers vary widely in this respect.
MNX includes a new system
element analogous to part
, which precedes all part
elements in the score. This contains measures
whose contents are
understood to apply to all parts. Parts need only include such elements to
the extent that they are overridden. No performance events like notes
or rests may be included in the measures within system
.
An example CWMNX score skeleton could thus look like this:
<mnx xmlns="http://www.w3.org/mnx"> <head>...</head> <score content="cwmn"> <system> // measures describing system-wide features... </system> <part> // descriptive info for part 1... // measures describing part 1... </part> <part> // descriptive info for part 2... // measures describing part 2... </part> // additional parts... </score> </mnx>
5.1.3. Concert and transposed pitch
TBD pending resolution of issue.
5.2. Element IDs
Any element whatsoever in MNX markup may possess a regular id
attribute as defined for the XML namespace. References to elements form a
backbone principle of key aspects of CWMNX, for example §5.4 Spanning notations.
These attributes are of type ID
, not IDREF
and thus
fully conform to the XML standard. Existing uses of id
in
MusicXML that are in conflict with this usage, and which carry over to MNX,
will be renamed.
5.3. Musical timelines
CWMNX makes use of a different element structure than MusicXML to represent timelines of musical notations and events, always in chronological sequence. The elements of these timelines are essentially the same as those found in MusicXML, though. The differences are driven by the following design goals:
-
Use parent elements as containers to organize child elements that are part of a whole (e.g. notes in a chord).
-
Give parent elements responsibility for expressing concepts that are logically shared by the children (e.g. the stem shared by notes in a chord).
-
Wherever possible, make it impossible to encode invalid constructs.
-
Preserve a common-sense mapping between encodings and a "naïve engraving" of the same music.
-
Eliminate the need for complex book-keeping and post-processing when parsing measures of music.
-
Make it easy to alter the content of a CWMNX document using simple DOM (Document Object Model) operations such as node insertion and deletion (e.g. add a note to a chord by simply inserting a
note
element in the appropriate parent).
5.3.1. Specialized encodings
For compactness and readability, CWMNX can represent concepts such as note values, timespans and pitches as simple strings, usually encoded as XML attributes. Compactness and readability are desirable because encodings are documented, read and talked about, not only parsed and generated.
These will be covered first, before timelines are described, since they help make sense of the succeeding examples.
5.3.2. Note values
There are a variety of situations in which the note value of a musical event
needs to be supplied for an element. CWMNX uses a text encoding to represent
such durations; in general, this replaces some arbitrary combination of
MusicXMLs type
and dot
elements, with no difference in semantics at
all. The values are always implicitly subject to any tuplet time modification
that may be in effect.
The note value encoding scheme consists of a unit, expressed as a power-of-two
division of a whole note, similar to the denominator of a time signature, with
textual exceptions for large antique units. This may be optionally followed by
zero or more dots expressed as occurrences of the asterisk character *
, (to
avoid confusion with a decimal point). Examples follow:
1
-
a whole note
4
-
a quarter note
8*
-
a dotted eighth note
8**
-
a double-dotted eighth note
breve
-
a breve (double whole note)
breve*
-
a dotted breve
In general, the attribute name value
is used to supply this information
for various CWMNX elements.
While it would be possible to use full-blown rational numbers with arbitrary denominators, this would permit the specification of arbitrary non-CMN values and complicate validation and parsing.
Retaining the more verbose MusicXML approach of separate type
and dot
elements, on the other hand, makes it harder to embed note values in other
strings such as time or position offsets.
Finally, a rational number does not map cleanly onto what one sees in notated music. This encoding attempts an obvious mapping between the semantic layer and a naïvely notated score. Rational numbers that combine the notion of note values, dots and tuplets into a single number would take us further from this ideal.
5.3.3. Metrical timespans
There are also situations in which a metrical timespan needs to be supplied, as an exact multiple of a note value. There is no exact corresponding construct in MusicXML.
A metrical timespan is very similar to a time signature, but more general. It
is encoded as an integer, followed by /
and a note value encoding. This
specifies a timespan equal to the given number of note value units. Metrical
timespans are used in CWMNX to represent time intervals which must be
constrained to exact note value multiples, such as note positions relative
to the origin of a containing sequence.
Examples:
3/4
-
three quarter notes
3/8*
-
three dotted eighth notes (this form can be useful for specification of §5.3.13 Tuplets)
9/16
-
the same timespan as above, only expressed in sixteenth note units
Like an individual note value, a metrical timespan is always subject to any
contextual time modification due to tuplets. For example, inside an 8-th note
triplet, the timespan 1/8
refers to an eighth-note triplet.
5.3.4. General timespans
MNX has a less constrained notion of a general timespan which is used in situations where no constraint to an exact note value multiple is desired.
General timespans are encoded as a real number followed by one of the following suffixes.
/
note-value-
displacement based on contextual time within the containing measure or tuplet.
//
note-value-
displacement based on measure-level time, disregarding any containing tuplets.
t
-
MIDI ticks (1/960 of a measure-level quarter note)
Arbitrary denominators may also be used in general timespans.
Examples:
1/8
-
a contextual eighth note
1//8
-
a measure-level eighth note
480t
-
also a measure-level eighth note
490t
-
10 ticks longer than a measure-level eighth note
2.42//4
-
2.42 measure-level quarter notes
1/12
-
a triplet 8th note (1/3 of a quarter note)
Note: This concept replaces the MusicXML notion of divisions
(and is
capable of expressing arbitrary divisions if so desired).
5.3.5. Positions and timespans
Any timespan can also express a position within a sequence or tuplet. In these cases, the timespan has the meaning of "timespan from some origin". Just as there are metrical and general timespans, so are there metrical and general positions.
Positions expressed as timespans using contextual note values (e.g. 3/8
),
are always relative to the start of the containing sequence or tuplet.
Positions expressed in terms of measure-level note values (3//8
) are always
relative to the start of the containing measure.
5.3.6. Sequences
The sequence
element represents an independent sequential timeline within
a measure. A measure
can have any number of timelines within it.
Each sequence
child of the measure orders its child elements in
chronological sequence, starting from the beginning of the measure. These may
include such elements as rests, nodes, chords, performance directions,
barlines, clefs, and so forth. They also may include tuplets, which effectively
act as nested sequences.
Its children may be of these kinds:
-
event children are notes or rests which possess a specific note value and occupy the corresponding timespan within the sequence. Events may not overlap, and they must occur at metrical positions within their containing elements.
-
direction children are elements that occupy a zero timespan within the sequence (although they have visual extent). Directions are allowed to temporally overlap any other elements, and may occur at general positions.
-
tuplet children are sub-sequences that apply a time modification ratio to their children, which in turn may be nested events, directions or tuplets. Like events, they must occur at metrical positions.
Here’s an example measure that shows several key aspects of sequences. In the
first sequence
there are two direction elements, separated by a half-note
gap in metrical time. There are also two independent melodic voices
represented by sequence
elements. The second voice includes an 8th-note
triplet.
<measure> <sequence> <direction>...</direction> <direction position="2/4">...</direction> </sequence> <sequence> <event value="2">...</event> <event value="4">...</event> <direction>...</direction> <event value="4">...</event> </sequence> <sequence> <event value="2*">...</event> <tuplet actual="3/8" normal="1/4"> <event value="8">...</event> <event value="8">...</event> <event value="8">...</event> </tuplet> </sequence> </measure>
The optional orientation
attribute may assume the values up
or down
, affecting the default placement of stems, articulations,
ornaments and other voice-specific objects within a sequence.
The optional staff
attribute may furthermore default the staff assignment
of all events within the voice for a given sequence
. This may be selectively
overridden within the sequence, of course. For example, consider these
4 sequences set up for an SATB-style grand staff:
<measure> <sequence orientation="up" staff="1">...</sequence> <sequence orientation="down" staff="1">...</sequence> <sequence orientation="up" staff="2">...</sequence> <sequence orientation="down" staff="2">...</sequence> </measure>
Unlike the MusicXML voice
element, the CWMNX sequence
element need not
possess a number or any other label. It can, however, be assigned a label via
the optional name
attribute that declares continuity between
identically named/numbered sequences in successive measures of the same part.
5.3.7. The sequence cursor
Elements of a sequence can be assigned an explicit position, but need not be.
In the absence of an explicit position
attribute on an event, direction, or
tuplet, the current value of the sequence cursor is used.
In the standard
profile, the start tag of an event
or tuplet
element
assigns the temporal start point of that element to the sequence cursor; the
end tag of the element assigns the element’s temporal end point to the cursor.
Direction elements have no effect on the sequence cursor.
There are two motivations for the cursor:
-
to permit a terse and straightforward encoding of music wherever events and directions assume standard metrical positions.
-
to afford encoding of music which uses standard note values, but not any standard interpretation of event positions
Significant constraints distinguish CWMNX’s sequence cursor from MusicXML:
-
The cursor must progress in a forward direction
-
The cursor only progresses in units that are expressible in CMN (rather than any number of divisions)
-
Events within the cursor’s timeline always belong to the same polyphonic voice or "layer"
-
Events cannot temporally overlap (and all notes within an event are simultaneous)
5.3.8. Explicit positions
It’s also possible to use a position
attribute for any element of a
sequence.
This is most useful for directions. Directions may specify a general position since their positions are not constrained by CMN rules.
It is also possible for events to take an explicit metrical position.
MNX prohibits the use of out-of-order elements within a sequence or tuplet element. While cursor-based positioning can’t violate this constraint, explicit positions can, so care must be taken to sort elements properly when encoding.
Here is the prior example from §5.3.6 Sequences, recast with the use of
a position
attribute relative to the containing measure or tuplet.
<measure> <sequence> <direction>...</direction> <direction position="100t">...</direction> </sequence> <sequence> <event value="2">...</event> <direction position="2/4">...</direction> <event position="2/4" value="4">...</event> <event position="3/4" value="4">...</event> </sequence> <sequence> <event value="2*">...</event> <tuplet position="3/4" actual="3/8" normal="1/4"> <event value="8">...</event> <event position="1/8" value="8">...</event> <event position="2/8" value="8">...</event> </tuplet> </sequence> </measure>
5.3.9. Spaces
A space
is a way of specifying a metrical time interval that does not
contain any notation. CWMNX uses space
elements to explicitly represent such
gaps. (In MusicXML, these gaps were often created using the forward
and backward
elements or with hidden rests, and could only be determined after
complete parsing of a measure’s contents.)
Because of the use of the sequence
element to organize notations in
chronological sequence, and because non-notated gaps within a sequence can be
explicitly represented, CWMNX does not need MusicXML’s forward
and backward
elements. Instead, space
serves the purpose of forward
,
while picking up a few additional useful characteristics. The contents of a
sequence can always be expressed as a sequence of contiguous elements starting
at the beginning of the measure and proceeding forward in metrical time.
Accordingly, the concept of divisions
also becomes
unnecessary in CWMNX. To the extent that arbitrary durations are needed, they
can always be expressed as multiples of some normal musical time unit.
(See §5.3.13 Tuplets for more information on how spaces work
to take up space in tuplets -- divisions aren’t needed here either.)
A space
takes a length
attribute to specify its metrical duration (see §5.3.3 Metrical timespans):
<space length="4*"/>
The following space is equivalent to 5 eighth notes:
<space length="5/8"/>
And so is this one:
<space length="2.5/4"/>
Spaces are useful for causing a non-metrical notation like text to appear at a certain metrically anchored place within a measure. For example, the following causes text to be anchored at a point one quarter note into the given measure:
<measure> <sequence> <space length="4"/> <direction placement="above"> <direction-type> <words>And then...</words> </direction-type> </direction> </sequence> </measure>
5.3.10. Pitch encoding
CWMNX introduces a text encoding of pitches, which represents a combination of
MusicXML’s step
, octave
and alter
elements. This is done to
address issues of readability and compactness in MusicXML and makes no
semantic difference.
The format consists of a MusicXML step
, followed optionally by 0..2
occurrences of #
or b
representing an integer alteration, followed by a MusicXML octave
.
An additional, non-integer alteration may be added to the preceding integral amount
by including the suffix +
or -
, followed by a real-valued number of semitones.
As with MusicXML alter
values, any occurrences of #
or b
do not imply
rendering of accidentals on some associated element. These remain specified by
an accidental
value, if provided.
Examples:
C4
-
Middle C
C#4
-
The pitch one semitone above middle C
Db4
-
The pitch one semitone above middle C (identical to the above)
C4+0.5
-
The pitch one quarter-tone above middle C
B3+1.5
-
The pitch one quarter-tone above middle C (identical to the above)
C#4-0.5
-
The pitch one quarter-tone above middle C (identical to the above)
The pitch
attribute is used to supply pitch encodings where appropriate, typically
for note
elements.
Pitch encodings are used also in §7 CWMNX Interpretation.
5.3.11. Events
The new CWMNX element event
represents the related concepts of rest, note
or chord depending on its contents. It supplies information that is common to
all three:
-
event duration, expressed in terms of note value and dot count
-
stem orientation, length, etc. if applicable
-
flag and beam description
-
articulations or directions that apply to all contained notes
-
styling data (e.g. horizontal displacement)
-
grace/cue markers
-
lyrics
-
slurs and other event-oriented §5.4 Spanning notations
Here’s a middle C as a dotted half note.
<event value="2*"> <stem>up</stem> <note pitch="C4"/> </event>
Here’s a C major triad as an eighth note:
<event value="8"> <stem>up</stem> <note pitch="C4"/> <note pitch="E4"/> <note pitch="G4"/> </event>
Here’s a whole note rest:
<event value="1"> <rest/> </event>
And a grace note chord; the grace
attribute belongs to the event
.
<event value="8" grace="true"> <stem>up</stem> <note pitch="C4"/> <note pitch="E4"/> <note pitch="G4"/> </event>
Note: grace notes do not advance the sequence cursor.
As one special case, the following event encodes a whole-measure rest.
Note that the type
attribute is used to indicate this, rather than value
:
<event type="measure"> <rest/> </event>
5.3.12. Directions
The CWMNX direction
element carries over the features of the MusicXML direction
element.
This proposal does not yet attempt to examine how CWMNX directions will work in detail or how they differ from MusicXML directions.
CWMNX is, however, expected to include strong semantic types for different
musical purposes of text. Dynamics, tempo markings, lyrics, playing
instructions/techniques, etc. will all have proper types and may be freely
included in sequence
and event
elements.
5.3.13. Tuplets
In CWMNX, the tuplet
element applies a time modification to all of its child
elements, which are the same as those allowed in sequence
. In fact, tuplet
is very much like a nested sequence
element, with an implicit
time signature representing the notated duration inside the tuplet. It
functions just like an event
element, in that it takes up a known duration
within its parent, and is not allowed to overlap with other event
or tuplet
elements.
A tuplet
uses the two attributes actual
and normal
making use of §5.3.3 Metrical timespans to represent the tuplet ratio.
It thus combines the function of MusicXML’s tuplet
and time-modification
elements into a single construct: the contained elements are automatically
subject to time modification, and CSS styling can control the appearance of
the tuplet.
Numerals and brackets may or may not be shown in conjunction with a tuplet
, by using the relevant MusicXML attributes which carry over into CWMNX.
Here’s an example of an eighth-note triplet:
<sequence> // preceding events in sequence... <tuplet actual="3/8" normal="1/4"> <event value="8"> <note>...</note> </event> <event value="8"> <note>...</note> </event> <event value="8"> <note>...</note> </event> </tuplet> // remaining events in sequence... </sequence>
(The tag <tuplet actual="3/8" normal="2/8">...</tuplet>
would work exactly the same. There
is no semantic difference.)
One can include directions in tuplets as well. And, naturally, tuplets may be included within tuplets.
Other examples of tuplets:
<tuplet actual="5/8" normal="11/16"> // a tuplet that displays 5 eighth notes in the space of 11 sixteenths... </tuplet>
<tuplet actual="7/8*" normal="4/4*"> // a tuplet that displays 7 dotted eighth notes in the space of 4 dotted quarters... </tuplet>
5.4. Spanning notations
Many notations occupy an arbitrary span of time in a score, across some number of measures in a given part. Some examples are:
-
slurs
-
dynamic wedges (hairpins)
-
8va lines
-
performance directions like cresc. poco a poco
Essentially all the elements described by a combination
of MusicXML’s number="N"
and type="start|stop"
fall into
this category. (Some of these are considered notations
in MusicXML and
others are considered direction-type
s. We won’t deal with this
classification problem yet.)
In CWMNX, the preferred way to describe such spanning notations is in
terms of a start element and an stop element. These elements
may be any element that occurs inside a sequence
.
The start element of a spanning notation is specified by including the span
directly in the start element as a child. The stop element is specified via an end-ref
attribute on the span that supplies the stop element’s ID. Thus,
spanning notations "point to" the notation where they will end. This is different
from MusicXML, in which the stop point refers back to the spanning notation’s
number.
Thus there is no use of MusicXML’s number=N
or type="start"
and type="stop"
attributes.
Here’s an example, showing a slur that connects two non-adjacent chords in the same measure (the situation is very similar for events in different measures):
<measure> <sequence> <event value="4"> <note pitch="C5"/> <note pitch="Eb5"/> <slur end-ref="a1"/> </event> <event value="4"> <note pitch="D5"/> </event> <event id="a1" value="4"> <note pitch="Db5"/> </event> </sequence> </measure>
An alternate means of supplying spanning notations is also allowed, in which
the endpoints of the span are specified positionally. This technique does not depend on
any start or stop element: the spanning notation is simply included in a sequence,
and its length
attribute supplies its overall length in the score.
<measure> <sequence> <slur length="2/4"/> <event value="4"> <note pitch="C5"/> <note pitch="Eb5"/> </event> <event value="4"> <note pitch="D5"/> </event> <event value="4"> <note pitch="Db5"/> </event> </sequence> </measure>
5.5. Pagination, credits and page-level text
(TBD)
We need to study the CSS specification as it is currently evolving to represent pagination to meet the needs of digital publishing overall, not only music publishing.
MusicXML’s page-level elements such as credit-words
and others need to be
reconsidered in light of this evolution. There is currently a tension in the
existing corpus of MusicXML documents between the use of explicitly placed credit
elements and the metadata found in the identification
element.
5.6. Semantic layout overrides
The CWMN canon has many examples of notation in which notes do not add up metrically to yield the duration of a measure or tuplet, or in which notes in one voices align with notes in another, in spite of the dissimilar points in time at which they would appear to be specified by their notated duration.
CWMNX addresses these problems by allowing any event
to optionally specify
either or both of the following:
-
render this event using the visual conventions for an arbitrary duration (e.g. render notehead, stem and flag/beam as for a sixteenth-note, although the note occupies an eighth-note duration within its timeline). An
appearance
attribute may be used to supply the visually rendered note or rest value for any givenevent
. For example, a note that is positioned and played as an eighth note might includeappearance="8*"
to force the appearance of a dot. -
visually align this event with some other arbitrary event in the same measure (e.g. line up a regular sixteenth note with the last note in some other voice’s sixteenth-note triplet). An
offset-ref
attribute specifies an IDREF for target event with which to align.
For example, consider the following Chopin-esque example of a two-voice passage. The upper staff contains a dotted-8th and sixteenth note; the lower voice contains quintuplet 16th notes. However, the two final notes of both voices are to be displayed in alignment.
One way to encode this is to consider the upper voice to actually be denoting a quintuplet-based rhythm, but make it look like a non- tuplet rhythm:
<measure> <sequence staff="1"> <tuplet actual="5/16" normal="1/4" bracket="no" show-number="no"> <event value="4" appearance="8*">...</event> <event value="1/16">...</event> </tuplet> </sequence> <sequence staff="1"> <tuplet actual="5/16" normal="1/4"> <event value="1/16">...</event> <event value="1/16">...</event> <event value="1/16">...</event> <event value="1/16">...</event> <event value="1/16">...</event> </tuplet> </sequence> </measure>
Independent of such editorial decisions, it is also possible to prescribe any desired performance rhythm using the §7 CWMNX Interpretation features of CWMNX.
6. CWMNX Styling
CWMNX employs CSS style properties to describe the appearance layer of a score. The property vocabulary used by CWMNX is not the same as for HTML or SVG, although it has some overlap. A large part of the work to specify CWMNX will involve figuring out the new set of properties that we need to drive music rendering and appearance.
As a result of this shift in approach, a great deal of MusicXML markup will shift into CSS. This allows it to be controlled by style sheets and media queries, yielding a great deal of expressive power over score appearance in multiple contexts.
The expectation is that a conformant implementation can always infer a default appearance from the semantic layer, using built-in default style properties as required. This replaces the notion of "selective encoding" in MusicXML: appearance styles may be supplied optionally, but they are never necessary.
Here’s a quick example to set the stage for more discussion, showing the
application of color and positioning details to various CWMNX elements. In this
example, the color
property is self-explanatory, and the relative-y
property is the same as found in MusicXML. The value 1sl
means 1 staff
line (analogous to typical CSS units such as 1px
for 1 pixel).
<measure> <sequence> <event style="color: red;"> <note>...</note> </event> <direction style="relative-y: 1sl;"> <direction-type><words style="color: #666">...</words></direction-type> </direction> <event> <note style="color: red;">...</note> </event> </sequence> </measure>
Note: It is possible that additional profiles might require the encoder to
supply some degree of basic style information (e.g. measure widths). The
position of this proposal is that such requirements are optional and ride on
top of the standard
profile, rather than constituting standard behavior.
6.1. CSS and style properties
As in HTML or SVG, CSS properties can be specified in several distinct ways:
-
They can be specified directly on the pertinent element, inline, using the
style
attribute in which properties are explicitly given. This is the approach used in the preceding example. -
They can be specified indirectly on the pertinent element by using the
class
attribute, causing the applicable properties to be looked up in one or more associated stylesheets using the element’s class name(s). For instance,class="alternate"
might be used in a given document to specify a specific look for notes in an alternate melody line, allowing that appearance to be specified exactly once in the stylesheet selector for.alternate
. -
They can be inferred implicitly from the element’s characteristics, according to the selectors in the CSS specification. For example, a stylesheet might include a
note
selector specifying the default appearance of allnote
elements. An.alternate
selector in the stylesheet, would specify the appearance of all elements marked as belonging to thealternate
style class.
This flexibility and generality is the primary motivator for using CSS as part of MNX. It allows a spectrum of expression for style properties that ranges from the very general (a house style) to the very specific (an exception for just one element in the document).
6.2. Stylesheets
Stylesheets allow visual style properties to be declared using selectors, that provide both implicit and explicit ways of applying styles to elements in the markup. These work exactly as in CSS for HTML or SVG, except where extended in MNX-specific ways.
In MNX, stylesheets are referenced by including a style
element below the appropriately
scoped chunk of the document. This element may contain CSS, and may further import
external documents using the import statement.
The following example illustrates the power of stylesheets and selectors; see the CSS Selectors Level 4 draft specification for more details.
<mnx xmlns="http://www.w3.org/mnx"> <head> <style> score { font-family: Bravura; } .alternate { color: #666; /* gray */ } direction { font-family: Times Roman; font-size: 2sl; } words.aside { font-weight: bold; } </style> </head> <score> <part> <measure> <sequence> <event class="alternate"> <note>...</note> </event> <direction style="relative-y: 1sl;"> <direction-type><words class="aside">...</words></direction-type> </direction> <event> <note class="alternate">...</note> </event> </sequence> </measure> </part> </score> </mnx>
6.3. External stylesheets
It’s often useful to separate stylesheet definitions in a separate .css
file. The @import
directive serves this purpose:
<mnx xmlns="http://www.w3.org/mnx"> <head> <style> @import url(house-style.css) </style> </head> <score> // </score> </mnx>
6.4. Units
Properties representing a visual displacement or distance must use a number followed by an explicit unit:
sl
-
the space between two standard staff lines
st
-
staff tenths (same unit as in MusicXML)
px
-
display pixels (same unit as in SVG or HTML)
Also, §5.3.4 General timespans may be used as style units.
Examples:
1sl
-
One staff line
5st
-
0.5 staff lines
1/8
-
a contextual eighth note
1//8
-
a measure-level eighth note
481t
-
one tick longer than a measure-level eighthnote
1:3//4
-
a triplet 8th note (1/3 of a quarter note)
6.5. CSS property usage
6.5.1. Standard CSS properties
Certain CSS standard properties will replace the corresponding MusicXML
properties, to avoid conflict with other web standards. Notably, the CSS display and visibility replace MusicXML’s print-object
attribute. All
CSS properties relating to fonts and typography (font-family, font-weight, font-style and so forth) will also replace corresponding features in
MusicXML. color
has roughly the same meaning in both.
6.5.2. Document properties
Virtually all of the values in the MusicXML defaults
element can also
be transferred to CSS properties, permitting stylesheet control of these
important settings as well.
6.5.3. Notation properties
The expectation is that almost all other MusicXML attributes and elements governing visual rendition will simply migrate to CSS properties with no change except for the units used. These include positioning tweaks as well as system and page breaks.
The result is that MusicXML will generally look very simple and semantic,
with occasional style
and class
attributes supplying exceptions to the
general house style supplied by a score-wide stylesheet.
Examples of migrated MusicXML properties from notation markup include:
-
relative-x
andrelative-y
-
default-x
anddefault-y
-
offset
(note that this is purely a visual offset, unlike MusiXML) -
new-system
andnew-page
In keeping with CSS conventions, MusicXML’s values of yes
and no
will be
generally replaced with true
and false
.
6.5.4. New CWMNX properties
CWMNX will also define new properties for useful concepts not previously addressed in MusicXML.
As one example, an part-display: all property could distinguish elements that are intended to appear in both full-score and part views.
6.6. Media queries and parts
CSS defines a notion of media queries which permit stylesheet definitions to apply in only certain contexts. For web pages, this is used to vary the appearance according to whether a page is viewed on, say, a mobile device, or a laptop screen, or on printed paper.
@media screen { .alternate { color: blue; /* show alternate readings in blue on the screen */ } } @media print { .alternate { note-size: -0.2sl; /* show alternate readings smaller when printed */ } }
In CWMNX, media queries can be used in exactly the same way, but they can also be used to govern how parts are displayed. This permits high quality parts to be generated from the same document as the full score, formatted in a specialized way.
To take advantage of this, one labels a part
with a child media-name
element
whose text can be used as an identifier in a media query:
<mnx xmlns="http://www.w3.org/mnx"> <head> <style> score { note-size: 10px; } @media (part: vln) { note-size: 13px; /* make the violin part bigger */ } </style> </head> <score> <part> <part-name>Violin</part-name> <media-name>vln</media-name> <measure>...</measure> <measure>...</measure> <measure>...</measure> </part> </score> </mnx>
More than one part can use the same media name, allowing multiple parts to be formatted according to the same rules.
6.7. Controlling visibility
As previously mentioned, the standard CSS visibility and display properties control whether a given element is seen or not.
Setting display: none on an element prevents it from being seen and causes the layout to act as though it does not exist in any way; the space occupied by the element will disappear from the layout’s point of view.
Setting visibility: hidden on an element causes it not to be painted. However it still takes up space in the layout, and it may be sensitive to pointer events.
6.8. System and page flow
The CWMNX style properties new-page
and new-system
(as well as others)
control page and system flow. Such styles are specified at the measure
or event
level.
These properties act across the system, but may be defined differently in full-score formats as opposed to single-part formats; it is very uncommon for such breaks to be identical across all views of a score. Thus they will participate in whatever solution is arrived at for §5.1.2 System notations.
Media queries can further be used to specialize system and page flow for viewing and printing different formats on different devices.
6.9. Styling grace and cue notes
The CSS Selectors Level 4 draft specification includes the specification of a new has:()
pseudo selector that can test for the presence of a child element.
This provides a useful way to style grace and cue notes, for example:
note:has(> grace) { note-size: 80%; } note:has(> cue) { note-size: 80%; } note:has(>grace) note:has(> cue) { note-size: 50%; /* combination of grace and cue */ }
7. CWMNX Interpretation
The performance interpretation of semantic markup in CWMNX is also carried out
using CSS properties, rather than through the content of MusicXML’s sound
element. The various attributes and child elements of sound
therefore
become new sound-oriented CSS properties in CWMNX. These are driven from
stylesheets and inline specifications in much the same way as visual
properties, thus separating interpretation cleanly from both semantic and
appearance layers.
As with appearance, the expectation is that a functioning implementation can always infer a default interpretation from the semantic layer, using built-in default interpretation properties as required. This replaces the notion of "selective encoding" in MusicXML.
7.1. Inline interpretation
Any performable element may possess a perform
attribute, analogous to style
. This attribute
specifies a set of CSS properties affecting playback. For example:
<event value="4"> <note pitch="C4" perform="dynamics: 40;"/> </event>
Naturally, one might wish to use CSS classes to specify this:
<mnx xmlns="http://www.w3.org/mnx"> <head> <style> .forte { dynamics: 100; } </style> </head> <score> // <event value="4"> <note pitch="C4" class="forte"/> </event> </score> </mnx>
7.2. Interpretation properties
Here are some significant interpretation properties, but not an exhaustive list:
offset
-
A playback offset expressed as a general timespan relative to the default offset. For example, offset: -1/16 would play the given note one sixteenth note earlier. offset: 10t would play it 10 MIDI ticks later.
pitch
-
A pitch to be used for playback instead of the notated pitch, expressed as a pitch encoding (e.g. pitch: C#4+0.1).
duration
-
A note duration to be used for playback, expressed as a general timespan. For example, duration: 1/16 would cause the note to be played as a sixteenth note regardless.
dynamics
-
Volume expressed as percentage of forte dynamic (same as MusicXML)
tempo
-
Tempo expressed as a metrical timespan, e.g. tempo: 120/4. (same as MusicXML, but doesn’t require quarter note units).
playback
-
Controls whether interpretation occurs at all. playback: none will suppress the interpretation of all events with this property.
Note: There is no tied
property. It is not clear that one is needed, since
the sounded duration of a tied note should take the semantic tie
information into account.
7.3. Interpretation flow
In the same way that indenting text flow with certain CSS properties has follow-on effects that continue down the page, altering interpretation with CSS has follow-on effects too.
For example, tempo
and dynamics
clearly make sense to flow forward into successor
events. On the other hand, pitch
probably does not.
The flow behavior of interpretation properties needs to be carefully characterized.
7.4. Performance events and event streams
It is useful to combine the values found in the offset
, pitch
and duration
and dynamics
properties into a single styling construct called a performance event, using the simple syntaxes note(offset, pitch, duration)
or note(offset, pitch, duration, dynamics)
. Think of these as syntactically
similar to rgb()
or rgba()
colors. Some examples:
note(0, C4, 1/4)
-
A middle C sounding at the event’s default offset for a duration of one quarter note.
note(0, C4, 1/4, 127)
-
The same note, but at a MIDI velocity of 127.
note(0, C4, 1//4)
-
The same note with a quarter note duration, disregarding any effective time modification
This permits the useful interpretation property play
, which overrides the
entire default interpreation of whatever CWMNX element it is applied to. Here’s
a quarter note C3 that is supposed to actually be played as a 16th note C4:
<event value="4"> <note pitch="C3" perform="play: note(0, C4, 16);"/> </event>
The real point of this feature is not syntactic sugar, but to make it practical to specify more complex performance events. One can do this by separating notes with spaces to create streams of simple events:
note(0, C4, 1/4) note(0, E4, 1/4) note(0, G4, 1/4)
-
A C major triad with the duration of a quarter note, sounding at the event’s default offset.
note(0, C4, 0.33/4) note(0.33/4, E4, 0.33/4) note(0.67/4, G4, 0.33/4)
-
A C major arpeggio with the duration of a quarter note, sounding at the event’s default offset.
In this example, a regularly notated C-G dyad is actually played as an
arpeggiated C major triad, as part of an exercise in aural triad
identification (the prompt might be, "what’s the middle note?".) Note that the
styling is applied to the entire event
, replacing the interpretation of
all contained elements:
<event value="4" perform="play: note(0, C4, 4) note(1/32, E4, 7/32) note(2/32, G4, 6/32);"> <note pitch="C4"/> <note pitch="G4"/> </event>
One can style measure
or voice
elements with the play
property,
replacing the entire default interpretation of measures or voices if so
desired.
Note: There are two reasons that this event representation is not directly tied to MIDI concepts. First, it’s expected that many applications will make use of their own internal sound engines to render performances, and we cannot assume that these engines make use of MIDI constructs internally. Second, the parameters governing the interpretation layer are very conveniently expressed in terms of the same units as the semantic layer.
7.5. Rhythmic template selectors
Special CSS selectors can be defined for CWMNX to make it easier to achieve useful interpretation results with styling.
Consider these two useful selectors:
-
:occurs-within(4, 1/8)
will match any event that occurs one eighth- note into any quarter-note subdivision. -
:value(8)
will match any event whose note value is 8.
Now we can have a stylesheet definition that specifies a swing percentage of 10%:
:occurs-within(4, 0) :value(8) { duration: 110%; } :occurs-within(4, 1/8) :value(8) { offset: 10%; duration: 90%; }
It’s also possible to specify the interpretation of a French-style dotted baroque rhythm using exactly the same mechanism. In this example, all notated dotted eighths are performed as double-dotted:
:occurs-within(4, 0) :value(8*) { duration: 7/32; } :occurs-within(4, 8*) :value(16) { offset: 1/32; duration: 1/32; }
8. GMNX: Encoding Scores with General Content
GMNX is the MNX score type denoted by a content
attribute of general
.
It is intended as a general medium for representing scores in terms of
abstract visual, performance and audio information. There is no attempt
to represent semantics directly in GMNX, although GMNX may cross-reference
elements in a separate, semantic document for traceability.
This entire section is a rough sketch of a possible approach and needs considerable work.
8.1. GMNX Goals
GMNX is intended to support applications which must be able to faithfully execute a visual and/or audible rendition of a score, with an awareness of the relationship between what is seen and what is heard.
There are no dependencies on any particular notational schema for music. Because of the lack of semantics, applications must not rely on information about the notational content of the score beyond the purposely limited vocabulary of GMNX. As a corollary, GMNX applications need not be concerned with transforming semantic data into musically acceptable graphics or audio, which can simplify their development.
The only constraints on the nature of the score are:
-
The visual content of the score must be encoded in SVG.
-
The audible content of the score must be encoded either in audio media, or in a GMNX
performance-events
element.
Some applications well-suited for GMNX usage include:
-
Lightweight music viewers/players that employ standard libraries for rendering graphics and audio media, instead of doing custom rendering of semantic data.
-
Music viewers/players that wish to handle any kind of musical content, not just CWMN.
-
Music viewers/players which display a score exactly as it was depicted in a specific edition or manuscript.
-
Highly individual or unusual notation editors that want to export music in a format that can be consumed by other applications for viewing and playback.
8.2. GMNX Structure
A GMNX document contains two kinds of data: visual content and performance content.
8.2.1. Visual Content
The visual content piece of a GMNX document is a set of svg
elements
representing the score. Each element constitutes a single page of the score,
where "page" is defined as a two-dimensional canvas of arbitrary dimension
whose contents can be viewed independently of other pages.
A GMNX document is required to contain visual content, since it is music notation of some kind.
SVG elements are labeled with IDs, allowing them to be referenced within the performance content.
There are no hard requirements whatsoever on the hierarchical structure of visual content. Notation need not be organized into anything except pages; if pages are not meaningful, the entire document may consist of a single page.
GMNX permits special elements to be embedded within SVG to describe regions in the visual content having a directional flow which can be mapped to time in a musical performance. This optional feature is described below under performance flows.
Please refer to the SVG specification for information on what SVG is able to represent. For all practical purposes, any combination of vector and raster graphics is encodable in SVG.
8.2.2. Performance Content
A GMNX document can contain any number of performances, which are
taken to be musical embodiments of the visual content. Each performance is
represented either through separate, conventionally-encoded audio media files
or by encoding as a performance-events
element that is similar in spirit
to a MIDI file, but is more compatible with CWMNX sequences and events.
No matter how it is encoded, a "performance" takes place in the time dimension. GMNX allows ranges and points in a performance to be mapped onto various elements in the visual SVG content. These mappings define a two-way correspondence between the visual score and the content of each performance.
This correspondence is general in nature and not dependent on any particular approach to notation.
Performances are optional in GMNX. Any number of performances may be provided, since there is no concept of a canonical performance in GMNX.
8.2.2.1. Audio Media
A single performance represented by audio media can be described with a performance
element that includes any number of media file references.
Here’s a simple example:
<performance> <performance-name>Piano performance</performance-name> <performance-media> <media-file src="hot-cross-buns.mp4"/> </performance-media> </performance>
The performance media may be further described in terms of tracks. This is important in §8.3 Mapping Between Position and Time. An example follows:
<performance> <performance-name>Piano performance</performance-name> <performance-media> <media-file src="hot-cross-buns.mp4"> <track index="0"> <name>Piano LH</name> </track> <track index="1"> <name>Piano RH</name> </track> </media-file> </performance-media> </performance>
Multiple media files may also be referenced, indicating that they together supply a multitrack performance when played in synchronization. Here a stereo piano recording is synchronized with a mono recording of a trumpet track:
<performance> <performance-name>Piano performance</performance-name> <performance-media> <media-file src="hot-cross-buns-piano.mp4"> <track index="0"> <name>Piano LH</name> </track> <track index="1"> <name>Piano RH</name> </track> </media-file> <media-file src="hot-cross-buns-trumpet.mp4"> <track index="0"> <name>Trumpet</name> </track> </media-file> </performance-media> </performance>
8.2.2.2. Event Sequences
Alternately, a performance may be represented by a sequence of discrete events having these properties:
-
onset (in seconds from start of performance)
-
duration (in seconds)
-
instrument (described by MusicXML sound ID)
-
pitch (as in CWMNX)
-
dynamics (as in CWMNX)
-
technique
There is no tempo information in an event list. Adjustment of playback speed by an implementation may be accomplished by proportionally adjusting onset and duration times.
This information can be used for synthesis, analysis, or any other purpose by the application. However it is specifically not intended to represent semantic notational content. For example, a CWMN staccato quarter note might be represented as an event with the effective duration of a 16th note.
The performance-events
element supplies this information:
<performance> <performance-name>Event sequence</performance-name> <performance-events> <track> <instrument-sound>keyboard.piano</instrument-sound> <sequence> <event graphic="#note1" start="0" duration="0.25"> <note pitch="E4" dynamics="100"/> </event> <event graphic="#note2" start="0.25" duration="0.25"> <note pitch="D4"/> </event> // ...and so on </sequence> </track> </performance-events> </performance>
8.3. Mapping Between Position and Time
What makes GMNX more powerful than just a combination of images and audio is its ability to describe correspondences between the two. These correspondences are not necessary to use GMNX, but can be provided wherever they are known.
All of these mappings are optional, and applications may not count on their presence. While they make sense for most CWMN, there are many kinds of music notation that cannot supply them.
8.3.1. Performance Regions
A performance region consists of two pieces of information:
-
The ID of a particular SVG element corresponding to the region.
-
A time range within some performance content.
Where provided, this information allows applications to do the following:
-
modify the appearance of the element (e.g. by highlighting it) in conjunction with performance of the given time range
-
interpret user interaction with the element as referring to the performance time range
Performance regions are identified by listing them in a performance’s performance-regions
element. The graphic
attribute refers to the SVG element
for the region, while the time
attribute supplies a time range within the performance
given in seconds since start.
<performance-regions> <region graphic="#m1" time="0 1.8"/> <region graphic="#m2" time="1.8 3.61"/> <region graphic="#m3" time="3.61 5.38"/> <region graphic="#m4" time="5.38 7.23"/> </performance-regions>
Region shapes can thus be any closed shape definable in SVG. Examples include rectangles, polygons, ellipses, polylines, polysplines, shapes with hollow cutouts... literally, anything.
Region mappings are many-to-many in character: a given graphical region may be mapped to any number of performance time ranges, and vice versa. This is necessary to accommodate the varied possibilities of music notation, which include among others:
-
disjoint and non-aligned regions that describe the same range of music (e.g. medieval partbooks)
-
a region that is repeated multiple times in performance
-
a region that is repeated with temporal overlap, or at varying speeds (e.g. rounds)
-
a region that is read in more than one direction or orientation (e.g. puzzle canons)
-
a region that may or may not be performed
Thus, one can have region lists like this:
<performance-regions> <region graphic="#m1part1" time="0 1.8"/> <region graphic="#m1part2" time="0 1.8"/> // two regions exist for this time range <region graphic="#m1part1" time="1.8 3.61"/> // same region is repeated in performance </performance-regions>
A performance-regions
element can also include one or more optional track-ref
elements that identify a set of specific tracks in the performance. Such regions
are taken as specific to the musical material in these tracks, rather than to the
material in the performance as a whole.
8.3.2. Performance Flows
A performance flow is a special kind of performance region, in which a smooth, continuous two-way mapping exists between each spatial point in the region and a corresponding point in performance time. Each flow includes the following information:
-
The ID of a particular SVG element corresponding to the region.
-
A geometric path that is followed smoothly by an imaginary cursor line within the SVG element, as defined by a
flow
child element. -
A time range within some performance content.
Flows are more powerful than regions, because every point in the region can be mapped to a specific performance time within the range, and every performance time can be mapped to a specific cursor position.
Flows thus allow applications to:
-
position a cursor over the score in conjunction with an exact performance time
-
interpret user interaction with the region as indicating an exact performance time
Within an SVG region, the flow
and flow-path
elements define the extent
and path of this cursor. The pos
attributes define "position" ranges for the cursor
expressed in arbitrary time coordinates.
<flow cursor="0 40"> // cursor is a (0,40) vector in local coordinates <flow-path d="M 0 0 h 15" pos="0 1"/> // Move (0,0), horizontal(+15) over 0..1 position range <flow-path d="h 15" pos="1 2"/> <flow-path d="h 35" pos="2 4"/> </flow>
With the SVG path approach shown above, cursors may progress in any desired fashion through an arbitrarily shaped region. The cursor implicitly rotates as the tangent vector of the path changes.
Within a performance, the region-flow
element identifies a region with a
flow. It’s just like region
but it includes a position range; as the
performance progresses from the region’s start time to its end time, the
position of the flow cursor is advanced from its starting value to its ending
value:
<performance-regions> <region-flow graphic="#m1" time="0 1.8" pos="0 4"/> <region-flow graphic="#m2" time="1.8 3.61" pos="0 4"/> <region-flow graphic="#m3" time="3.61 5.38" pos="0 4"/> <region-flow graphic="#m4" time="5.38 7.23" pos="0 4"/> </performance-regions>
8.3.3. Performance Events
Performance events are the final type of mapping supplied by GMNX, whose purpose is to describe a discrete musical event. It includes the following information:
-
The ID of a particular SVG element corresponding to the event.
-
A description of the event within a
performance-events
element, supplying its onset, duration and performance attributes such as pitch, dynamics, etc.
Performance event mappings can thus connect an arbitrary graphic representation in the visual content, to a description of the sound that it makes. Obviously, the richness of attributes like pitch, velocity and performance techniques must be limited for reasons of interoperability. Consequently, audio media remain the ultimate general vehicle for describing sounds in their own native terms, in which case events refer to a time range.
Performance event mappings allow applications to:
-
identify specific graphics within the score in conjunction with their occurrence in a performance
-
interpret user interaction with these graphics as indicating specific musical events in the performance such as notes or time ranges.
Although event mappings make sense for CWMN as well as applying to a great deal of contemporary Western music, they are not universal and so are not required by GMNX. Many simple notational ideas are not susceptible to event descriptions, for example textual or graphical instructions to an improvising performer, or color gradients as opposed to discrete symbols.
Event mappings are effected by supplying a graphic
attribute on an event
or note
element in an event sequence:
<event graphic="#note1" start="0" duration="0.25"> <note pitch="E4" dynamics="100"/> </event>
8.4. A GMNX Example
Here is a simple example of the earlier CWMNX example, but structured as a GMNX document and reduced to a single staff for clarity. This example illustrates two kinds of mapping: performance flows and performance events.
Note: There is absolutely no requirement that music be structured as systems, or staves, or use a rectangle-based layout, or have a left-to-right orientation. This example merely uses a CWMN score because it’s familiar material.
<?xml version="1.0" encoding="UTF-8"?> <mnx> <head> <identification> <title>Hot Cross Buns</title> </identification> </head> <score content="general"> <svg id="page1"> <g> <g> // staff prefix: clefs, time signatures, etc... </g> <g id="m1" transform="translate(20,0)"> // Describe a time flow in terms of an arbitrary notation position, mapped to local coordinates. // The flow is an imaginary cursor that moves smoothly along a path described // using the same commands as the SVG <path> element. The <code data-opaque>pos</code> ranges // are in notation position units (which here are quarter notes), not time or pixels. <mnx:flow cursor="0 40"> <flow-path d="M 0 0 h 15" pos="0 1"/> <flow-path d="h 15" pos="1 2"/> <flow-path d="h 35" pos="2 4"/> </mnx:flow> <path id="note1" d="..."/> <path id="note2" d="..."/> <path id="note3" d="..."/> </g> <g id="m2" transform="translate(100,0)"> <mnx:flow> // similar to above </mnx:flow> <path id="note4" d="..."/> <path id="note5" d="..."/> <path id="note6" d="..."/> </g> <g id="m3" transform="translate(200,0)"> // measure 3... </g> <g id="m4" transform="translate(300,0)"> // measure 4... </g> </g> </svg> <performance> <performance-name>Audio recording</performance-name> <performance-media> <media-file src="hot-cross-buns.mp4"/> </performance-media> <performance-regions> <region-flow graphic="#m1" time="0 1.8" pos="0 4"/> <region-flow graphic="#m2" time="1.8 3.61" pos="0 4"/> <region-flow graphic="#m3" time="3.61 5.38" pos="0 4"/> <region-flow graphic="#m4" time="5.38 7.23" pos="0 4"/> </performance-regions> </performance> <performance> <performance-name>Event sequence</performance-name> <performance-events> <part> <instrument-sound>keyboard.piano</instrument-sound> // Note that all units here are in seconds: this is a mechanical description of sound. <sequence> <event graphic="#note1" start="0" duration="0.25"> <note pitch="E4" dynamics="100"/> </event> <event graphic="#note2" start="0.25" duration="0.25"> <note pitch="D4"/> </event> <event graphic="#note3" start="0.5" duration="0.5"> <note pitch="C4"/> </event> <event graphic="#note4" start="1" duration="0.25"> <note pitch="E4"/> </event> <event graphic="#note5" start="1.25" duration="0.25"> <note pitch="D4"/> </event> <event graphic="#note6" start="1.5" duration="0.5"> <note pitch="C4"/> </event> // remaining events in song... </sequence> </part> </performance-events> <performance-regions> <region-flow graphic="#m1" time="0 1" pos="0 4"/> <region-flow graphic="#m2" time="1 2" pos="0 4"/> <region-flow graphic="#m3" time="2 3" pos="0 4"/> <region-flow graphic="#m4" time="3 4" pos="0 4"/> </performance-regions> </performance> </score> </mnx>
8.5. Compilation to GMNX
Because of the generality of GMNX, semantic flavors of MNX (like CWMNX) can be fairly easily "compiled" to GMNX, by any application able to render them into graphics and either audio or performance events.
This has value as a bridge between applications that have a semantic understanding of some type of music, and applications that do not (but which can still profit from the packaging and mappings provided by a GMNX document).
Where compilation occurs, semantic traceability is recommended as a
best practice by including an mnx:semantic
attribute in each SVG element
that points to a corresponding semantic element in the original source.