MNX Proposal Overview

1. Introduction

1.1. About MNX

MNX is a proposed music notation markup standard. Its aim is to improve MusicXML in fundamental ways, while retaining many of its key concepts, terms and features.

Rather than attempting to create a normative specification for MNX from the start, this document tries to describe key design goals. Each goal is accompanied with examples of proposed MNX markup that directly illustrates it. In some cases, alternative examples are included.

The focus is on areas which substantially differ from MusicXML, and explanations refer to MusicXML features to show how MNX differs.

Note that this is not intended to be an exhaustive list of the changes that MNX will include; it is merely an attempt to portray some of the most important ones. Many other changes and improvements are expected to be included.

Having said that, where not explicitly called out, assume that MusicXML features may carry over to MNX unchanged, or in an obvious and analogous fashion.

MNX stands for "Music Notation X", where "X" suggests "XML", "eXtended", and potentially other X-related things as may come to mind.

1.2. Goals and Tradeoffs

MNX seeks to provide a high degree of interoperability and exchange between different applications working with music notation. This emphasis on interoperability is a differentiator between MNX and other notation encoding approaches, and takes it in a different direction from its predecessors.

MNX is designed with a few core beliefs in mind:

Limits are needed on either semantic richness or universality. An encoding that seeks to represent an established notational system will want rich and specific semantics, tailored to represent that system’s ideals and concepts, which are by definition not universal in scope. On the other hand, a universal system that can encompass literally any graphical and sonic expression of music, is most interoperable when the semantics of notation are set aside.
There are no culturally privileged notational systems. Consequently it must be possible to extend MNX to accommodate multiple such systems.
Within any given notational system, some limits on expressiveness are necessary in order to make the implementation effort manageable.

Note: we use the term semantics here to refer to concepts with an understood meaning in some musical culture, as distinct from their graphical or sonic instantiations. For example, in conventional Western music, we consider the shared idea of "quarter note" to be a semantic one; a quarter note can be instantiated as many different shapes, or as many different sounds.

1.3. MNX Score Types

MNX can support multiple types of score encoding, which can be bundled together through the mechanism of §3.1 The container document into a single composite document.

The present proposal focuses on two distinct types of MNX score, which represent opposite poles in the tradeoff described above between semantic richness and universality. The expectation is that others will be created over time, particularly those which target specific notational systems.

The first score type is CWMNX, which encodes Conventional Western Music Notation (CWMN) in a semantically rich fashion. It inherits many ideas and concepts from MusicXML.

The second is GMNX, where "G" is for General. It serves as a kind of universal encoding for scores having arbitary graphical and audio content. In consequence, it is relatively free of semantics.

Note: Our working definition for CWMN is "notation in which the requirements do not significantly extend beyond those of music of the 19th and early 20th centuries." [Gerald Warfield, Writings on Contemporary Music Notation (Ann Arbor: Music Library Association, 1976), ii.]

1.4. Comparisons with other notation standards

CWMNX is a lineal descendant of MusicXML, and employs many of the same concepts. However it sacrifices some features and flexibility in favor of tighter interoperability, and simplifies the element structure considerably. It also moves all non-semantic information into CSS properties. The features in GMNX have no analogue in MusicXML.

MEI is a very general and expressive medium for encoding arbitrary musical documents, with particular attention to the needs of scholars. Due to its extreme plasticity, MEI is perhaps better described as a powerful framework for building customized documents and applications, than as a single encoding method. As such, interoperability has not been a main goal of MEI to date. However there are efforts underway to define a clean MEI subset as an interoperable medium for encoding CWMN (sometimes known as "MEI Go").

IEEE 1599 is a specification that has paid unique attention to the relationships between different layers of musical information. Its Logic layer is similar in content to CWMNX, while its Notational, Performance and Audio layers answer some of the same concerns as GWMNX. GMNX takes a different approach to connecting these layers, and does not attempt to fully unify semantic information with visual and performance data. It relies to a greater degree on SVG, and to a lesser degree on MIDI.

1.5. Compatibility with MusicXML

MNX does not attempt to be backward-compatible with MusicXML, nor is it a superset of MusicXML. However, a large proportion of MusicXML markup is expected to be preserved. In these examples, MusicXML constructs are used freely throughout as a way to show how proposed new concepts dovetail with existing ones.

Backward compatibility aside, it is a goal to be able to machine-translate MusicXML into MNX. This is essential for migration purposes.

1.6. Use case concordance

A companion document details a set of known use cases for music notation. The use cases have links back to relevant sections of this document, where support can be demonstrated.

2. A brief example

To satisfy immediate curiosity up front, here is an MNX encoding of the timeless song Hot Cross Buns, in a simple grand-staff piano arrangement. This encoding is purely semantic, and includes no information on appearance or interpretation.

<?xml version="1.0" encoding="UTF-8"?>
<mnx>
    <head>
        <identification>
            <title>Hot Cross Buns</title>
        </identification>
    </head>
    <score content="cwmn">
        <system>
            <measure>
                <attributes>
                    <tempo bpm="120" value="4"/>
                    <time signature="4/4"/>
                </attributes>
                <direction placement="above">
                    <words>With heavy irony</words>
                </direction>
            </measure>
            <measure/>
            <measure/>
            <measure/>
        </system>
        <part>
            <part-name>Piano</part-name>
            <measure>
                <attributes>
                    <staff>
                        <clef sign="G" line="2"/>
                    </staff>
                    <staff>
                        <clef sign="F" line="4"/>
                    </staff>
                    <instrument-sound>keyboard.piano</instrument-sound>
                </attributes>
                <sequence staff="1">
                    <direction>
                        <dynamics><f/></dynamics>
                    </direction>
                    <event value="4"><note pitch="E4"/></event>
                    <event value="4"><note pitch="D4"/></event>
                    <event value="2"><note pitch="C4"/></event>
                </sequence>
                <sequence staff="2">
                    <event value="2*"><rest/></event>
                    <direction>
                        <dynamics><p/></dynamics>
                    </direction>
                    <event value="4">
                        <note pitch="C3"/>
                        <note pitch="E3"/>
                        <note pitch="G3"/>
                    </event>
                </sequence>
            </measure>
            <measure>
                <sequence staff="1">
                    <event value="4"><note pitch="E4"/></event>
                    <event value="4"><note pitch="D4"/></event>
                    <event value="2"><note pitch="C4"/></event>
                </sequence>
                <sequence staff="2">
                    <event value="2*"><rest/></event>
                    <event value="4">
                        <note pitch="C3"/>
                        <note pitch="E3"/>
                        <note pitch="G3"/>
                    </event>
                </sequence>
            </measure>
            <measure>
                <sequence staff="1">
                    <event value="8"><note pitch="C4"/></event>
                    <event value="8"><note pitch="C4"/></event>
                    <event value="8"><note pitch="C4"/></event>
                    <event value="8"><note pitch="C4"/></event>
                    <event value="8"><note pitch="D4"/></event>
                    <event value="8"><note pitch="D4"/></event>
                    <event value="8"><note pitch="D4"/></event>
                    <event value="8"><note pitch="D4"/></event>
                </sequence>
                <sequence staff="2">
                    <event value="4"><rest/></event>
                    <event value="4">
                        <note pitch="C3"/>
                        <note pitch="E3"/>
                        <note pitch="G3"/>
                    </event>
                    <event value="4"><rest/></event>
                    <event value="4">
                        <note pitch="G3"/>
                        <note pitch="B3"/>
                    </event>
                </sequence>
            </measure>
            <measure>
                <sequence staff="1">
                    <event value="4"><note pitch="E4"/></event>
                    <event value="4"><note pitch="D4"/></event>
                    <event value="2"><note pitch="C4"/></event>
                </sequence>
                <sequence staff="2">
                    <event value="2*"><rest/></event>
                    <event value="4">
                        <note pitch="C3"/>
                        <note pitch="E3"/>
                        <note pitch="G3"/>
                    </event>
                </sequence>
            </measure>
        </part>
    </score>
</mnx>

3. Document organization

Note: This section applies to all content types, CWMN or otherwise.

3.1. The container document

MNX documents act as general-purpose containers, which may be arbitrarily subdivided into a hierarchy of components which collectively make up the document as a whole.

3.1.1. A simple CWMN score

Here’s an example of the simplest structure, where an MNX document contains a single CWMN score. The head element includes descriptive information, while the score element contains the score contents.

A significant MNX-specific element occurs here: the style element, which includes information relevant to the document’s appearance and interpretation:

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <identification>
            <title>My Favorite Work</title>
            <creator type="composer">Alan Smithee</creator>
        </identification>
        <style>
            @import url(mystyles.css);
        </style>
    </head>
    <score content="cwmn">
        // CWMNX score contents here...
    </score>
</mnx>

3.1.2. A simple general score

The score element can be qualified by a content attribute that describes the encoding of the content. It defaults to content="cwmn" and may include values from a registry of MNX musical content types. This example shows a score using the GMNX type of general:

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <identification>
            <title>My Favorite Work</title>
            <creator type="composer">Alan Smithee</creator>
        </identification>
    </head>
    <score content="general">
        // GMNX score contents here...
    </score>
</mnx>

The MNX specification will maintain a registry of recognized values for content.

3.1.3. Compound MNX documents

It’s also possible to combine different representations of music in the same MNX document by using the collection element to combine multiple chunks of music into a single chunk. Each chunk may possess a distinct encoding. collection elements can be nested, allowing a subordinate collection to be embedded in a higher-level one.

Metadata elements such as identification or formatting elements like style may be included at any level of the resulting structure, causing them to apply them only to those parts of the document.

Here’s an example that includes a hierarchy of collections and scores. (Note that some of these could employ non-CWMN encodings as well.)

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        // 
    </head>
    <collection>
        <score type="section" content="cwmn">
            <identification>
                <title>Section 1</title>
            </identification>
            // CWMN markup...
        </score>
        <collection type="section">
            <identification>
                <title>Section 2</title>
            </identification>
            <score type="movement" content="cwmn">
                <identification>
                    <title>Section 2, Movement 1 (for Solo Flute)</title>
                </identification>
                // CWMN markup...
            </score>
            <score type="movement" content="cwmn">
                <identification>
                    <title>Section 2, Movement 2 (for String Orchestra)</title>
                </identification>
                // CWMN markup...
            </score>
        </collection>
    </collection>
</mnx>

3.1.4. Inclusion by reference

The HTML link element may be used to include a score by reference.

3.2. Profiles

A score element may employ the profile attribute to indicate that it conforms to a particular set of expectations regarding its contents. A registry of valid profile names for each allowed score encoding is maintained as part of the specification.

The intent of profiles is to allow programmatic validation of documents, and to permit different levels of validation to be enforced where appropriate. MNX parsers may be also be constructed to specifically support certain profiles, significantly decreasing programming effort.

For the purposes of this document, the most important profile is <score content="cwmn" profile="standard">. This profile indicates that the score in question conforms to a set of standard assumptions regarding "well-formed" CWMN. Examples of such assumptions may include:

All parts contain the same number of measures
The metrical content of a measure does not exceed its duration
The metrical content of a tuplet does not exceed its duration
All displayed accidentals are encoded in the document using accidental
All parts of a score agree with respect to form, time and key signatures and barring.

This profile mechanism replaces the supports feature of MusicXML.

3.3. Metadata and Attribution

The identification element in MNX is used to supply descriptive and bibliographic information, as it is in MusicXML.

Its contents are similar to MusicXML, but it can be included in a variety of parent elements for greater flexibility:

The head element of the document
Any score or collection element at any level (work, section, movement...)
Any notational element (part, measure, notes...)

Other parent elements will no doubt make sense as the specification is developed.

4. CWMNX: Encoding Conventional Western Music

CWMNX is the MNX score type denoted by a content attribute of cwmn. It is intended specifically for encoding conventional Western music notation. It is a vehicle for representing the semantics of such musical scores, with the ability to weave appearance and interpretation data into these semantics by means of CSS properties.

4.1. CWMNX Goals

CWMNX is intended to support a wide range of applications which must have a semantic description of music notation and which may make use of appearance and performance information alongside this.

Some examples of applications suited to using CWMNX include:

CWMN-based notation editors
CWMN readers which dynamically render or play music
tools for analysis or transformation of CWMN music
OMR applications that produce encodings of CWMN
educational applications requiring a knowledge of underlying CWMN structure

4.2. CWMNX Layers

CWMNX makes a very clear distinction between the following layers of musical information:

Semantics, the core stratum of notational data in a CWMN work that must inform any potential rendering of that work. CWMNX encodes all semantic information as XML markup like measure, note and so on. There is no concept of "selective encoding" in this layer: without the required core of semantic elements, the document is not valid. See §5 CWMNX Semantic markup.
Appearance, a layer of visual attributes and formatting that describes how a work appears to the reader, independent of the other two layers. Where this layer is absent, implementations are expected to supply a default appearance based on the semantic layer. CWMNX encodes all appearance information using CSS styles and properties. See §6 CWMNX Styling for more information.
Interpretation, a layer of performance information that describes how a work sounds to the listener, independent of the other two layers. Where this layer is absent, implementations are expected to supply a default interpretation based on the semantic layer. Where this layer is present, CWMNX encodes all interpretation information in the form of CSS styles also. See §7 CWMNX Interpretation for more information.

Since only the semantic layer is encoded in XML markup and the rest is CSS, these distinctions are made very concrete. Furthermore, ambiguities arising from combining the layers in MusicXML (such as the near-duplication of tie and tied) can disappear.

5. CWMNX Semantic markup

This section describes how CWMNX treats the semantic layer of a CWMN document in the standard profile.

Note: In most of the examples that follow, the MNX container elements are omitted for brevity.

5.1. Score structure

For a CWMN score, the score element is roughly equivalent to the score-partwise element in MusicXML. MusicXML’s score-timewise element is not used in CWMNX.

5.1.1. Unifying `score-part` and `part`

In CWMNX, the contents of the score-part element are simply included below part. This leaves the part element as the single place in the document that provides part-related information.

5.1.2. System notations

A number of notational concepts can be scoped to the entire system in a semantic sense, not only to a single part. Examples include:

Key signatures
Time signatures
Tempo indications
Rehearsal marks
Barlines
System and page breaks (see §6.8 System and page flow)
Musical form indications

In some cases (time signatures, for example) multiple instances of a notation are encoded separately in each MusicXML part, yet they are expected to semantically agree across all parts. While such agreement is not necessarily the case, it is nearly always the case. It seems desirable to allow this agreement to be expressed in a single instance, somewhere in the document.

In other cases (e.g. tempo indications) the concept typically is encoded in the first visible MusicXML part. Yet, if the document were rendered showing only some other part in the score, there would still be an expectation that the tempo indications encoded in the topmost part would be shown. MusicXML documents produced by different engravers vary widely in this respect.

MNX includes a new system element analogous to part, which precedes all part elements in the score. This contains measures whose contents are understood to apply to all parts. Parts need only include such elements to the extent that they are overridden. No performance events like notes or rests may be included in the measures within system.

An example CWMNX score skeleton could thus look like this:

<mnx xmlns="http://www.w3.org/mnx">
    <head>...</head>
    <score content="cwmn">
        <system>
            // measures describing system-wide features...
        </system>
        <part>
            // descriptive info for part 1...
            // measures describing part 1...
        </part>
        <part>
            // descriptive info for part 2...
            // measures describing part 2...
        </part>
        // additional parts...
    </score>
</mnx>

5.1.3. Concert and transposed pitch

TBD pending resolution of issue.

5.2. Element IDs

Any element whatsoever in MNX markup may possess a regular id attribute as defined for the XML namespace. References to elements form a backbone principle of key aspects of CWMNX, for example §5.4 Spanning notations.

These attributes are of type ID, not IDREF and thus fully conform to the XML standard. Existing uses of id in MusicXML that are in conflict with this usage, and which carry over to MNX, will be renamed.

5.3. Musical timelines

CWMNX makes use of a different element structure than MusicXML to represent timelines of musical notations and events, always in chronological sequence. The elements of these timelines are essentially the same as those found in MusicXML, though. The differences are driven by the following design goals:

Use parent elements as containers to organize child elements that are part of a whole (e.g. notes in a chord).
Give parent elements responsibility for expressing concepts that are logically shared by the children (e.g. the stem shared by notes in a chord).
Wherever possible, make it impossible to encode invalid constructs.
Preserve a common-sense mapping between encodings and a "naïve engraving" of the same music.
Eliminate the need for complex book-keeping and post-processing when parsing measures of music.
Make it easy to alter the content of a CWMNX document using simple DOM (Document Object Model) operations such as node insertion and deletion (e.g. add a note to a chord by simply inserting a note element in the appropriate parent).

5.3.1. Specialized encodings

For compactness and readability, CWMNX can represent concepts such as note values, timespans and pitches as simple strings, usually encoded as XML attributes. Compactness and readability are desirable because encodings are documented, read and talked about, not only parsed and generated.

These will be covered first, before timelines are described, since they help make sense of the succeeding examples.

5.3.2. Note values

There are a variety of situations in which the note value of a musical event needs to be supplied for an element. CWMNX uses a text encoding to represent such durations; in general, this replaces some arbitrary combination of MusicXMLs type and dot elements, with no difference in semantics at all. The values are always implicitly subject to any tuplet time modification that may be in effect.

The note value encoding scheme consists of a unit, expressed as a power-of-two division of a whole note, similar to the denominator of a time signature, with textual exceptions for large antique units. This may be optionally followed by zero or more dots expressed as occurrences of the asterisk character *, (to avoid confusion with a decimal point). Examples follow:

1: a whole note
4: a quarter note
8*: a dotted eighth note
8**: a double-dotted eighth note
breve: a breve (double whole note)
breve*: a dotted breve

In general, the attribute name value is used to supply this information for various CWMNX elements.

Note: The motivation here is to provide a terse but readable encoding which is constrained to note values expressible within the framework of CWMN.

While it would be possible to use full-blown rational numbers with arbitrary denominators, this would permit the specification of arbitrary non-CMN values and complicate validation and parsing.

Retaining the more verbose MusicXML approach of separate type and dot elements, on the other hand, makes it harder to embed note values in other strings such as time or position offsets.

Finally, a rational number does not map cleanly onto what one sees in notated music. This encoding attempts an obvious mapping between the semantic layer and a naïvely notated score. Rational numbers that combine the notion of note values, dots and tuplets into a single number would take us further from this ideal.

5.3.3. Metrical timespans

There are also situations in which a metrical timespan needs to be supplied, as an exact multiple of a note value. There is no exact corresponding construct in MusicXML.

A metrical timespan is very similar to a time signature, but more general. It is encoded as an integer, followed by / and a note value encoding. This specifies a timespan equal to the given number of note value units. Metrical timespans are used in CWMNX to represent time intervals which must be constrained to exact note value multiples, such as note positions relative to the origin of a containing sequence.

Examples:

3/4: three quarter notes
3/8*: three dotted eighth notes (this form can be useful for specification of §5.3.13 Tuplets)
9/16: the same timespan as above, only expressed in sixteenth note units

Like an individual note value, a metrical timespan is always subject to any contextual time modification due to tuplets. For example, inside an 8-th note triplet, the timespan 1/8 refers to an eighth-note triplet.

5.3.4. General timespans

MNX has a less constrained notion of a general timespan which is used in situations where no constraint to an exact note value multiple is desired.

General timespans are encoded as a real number followed by one of the following suffixes.

/note-value: displacement based on contextual time within the containing measure or tuplet.
//note-value: displacement based on measure-level time, disregarding any containing tuplets.
t: MIDI ticks (1/960 of a measure-level quarter note)

Arbitrary denominators may also be used in general timespans.

Examples:

1/8: a contextual eighth note
1//8: a measure-level eighth note
480t: also a measure-level eighth note
490t: 10 ticks longer than a measure-level eighth note
2.42//4: 2.42 measure-level quarter notes
1/12: a triplet 8th note (1/3 of a quarter note)

Note: This concept replaces the MusicXML notion of divisions (and is capable of expressing arbitrary divisions if so desired).

5.3.5. Positions and timespans

Any timespan can also express a position within a sequence or tuplet. In these cases, the timespan has the meaning of "timespan from some origin". Just as there are metrical and general timespans, so are there metrical and general positions.

Positions expressed as timespans using contextual note values (e.g. 3/8), are always relative to the start of the containing sequence or tuplet. Positions expressed in terms of measure-level note values (3//8) are always relative to the start of the containing measure.

5.3.6. Sequences

The sequence element represents an independent sequential timeline within a measure. A measure can have any number of timelines within it.

Each sequence child of the measure orders its child elements in chronological sequence, starting from the beginning of the measure. These may include such elements as rests, nodes, chords, performance directions, barlines, clefs, and so forth. They also may include tuplets, which effectively act as nested sequences.

Its children may be of these kinds:

event children are notes or rests which possess a specific note value and occupy the corresponding timespan within the sequence. Events may not overlap, and they must occur at metrical positions within their containing elements.
direction children are elements that occupy a zero timespan within the sequence (although they have visual extent). Directions are allowed to temporally overlap any other elements, and may occur at general positions.
tuplet children are sub-sequences that apply a time modification ratio to their children, which in turn may be nested events, directions or tuplets. Like events, they must occur at metrical positions.

Here’s an example measure that shows several key aspects of sequences. In the first sequence there are two direction elements, separated by a half-note gap in metrical time. There are also two independent melodic voices represented by sequence elements. The second voice includes an 8th-note triplet.

<measure>
    <sequence>
        <direction>...</direction>
        <direction position="2/4">...</direction>
    </sequence>
    <sequence>
        <event value="2">...</event>
        <event value="4">...</event>
        <direction>...</direction>
        <event value="4">...</event>
    </sequence>
    <sequence>
        <event value="2*">...</event>
        <tuplet actual="3/8" normal="1/4">
            <event value="8">...</event>
            <event value="8">...</event>
            <event value="8">...</event>
        </tuplet>
    </sequence>
</measure>

The optional orientation attribute may assume the values up or down, affecting the default placement of stems, articulations, ornaments and other voice-specific objects within a sequence.

The optional staff attribute may furthermore default the staff assignment of all events within the voice for a given sequence. This may be selectively overridden within the sequence, of course. For example, consider these 4 sequences set up for an SATB-style grand staff:

<measure>
    <sequence orientation="up" staff="1">...</sequence>
    <sequence orientation="down" staff="1">...</sequence>
    <sequence orientation="up" staff="2">...</sequence>
    <sequence orientation="down" staff="2">...</sequence>
</measure>

Unlike the MusicXML voice element, the CWMNX sequence element need not possess a number or any other label. It can, however, be assigned a label via the optional name attribute that declares continuity between identically named/numbered sequences in successive measures of the same part.

5.3.7. The sequence cursor

Elements of a sequence can be assigned an explicit position, but need not be. In the absence of an explicit position attribute on an event, direction, or tuplet, the current value of the sequence cursor is used.

In the standard profile, the start tag of an event or tuplet element assigns the temporal start point of that element to the sequence cursor; the end tag of the element assigns the element’s temporal end point to the cursor.

Direction elements have no effect on the sequence cursor.

There are two motivations for the cursor:

to permit a terse and straightforward encoding of music wherever events and directions assume standard metrical positions.
to afford encoding of music which uses standard note values, but not any standard interpretation of event positions

Significant constraints distinguish CWMNX’s sequence cursor from MusicXML:

The cursor must progress in a forward direction
The cursor only progresses in units that are expressible in CMN (rather than any number of divisions)
Events within the cursor’s timeline always belong to the same polyphonic voice or "layer"
Events cannot temporally overlap (and all notes within an event are simultaneous)

5.3.8. Explicit positions

It’s also possible to use a position attribute for any element of a sequence.

This is most useful for directions. Directions may specify a general position since their positions are not constrained by CMN rules.

It is also possible for events to take an explicit metrical position.

MNX prohibits the use of out-of-order elements within a sequence or tuplet element. While cursor-based positioning can’t violate this constraint, explicit positions can, so care must be taken to sort elements properly when encoding.

Here is the prior example from §5.3.6 Sequences, recast with the use of a position attribute relative to the containing measure or tuplet.

<measure>
    <sequence>
        <direction>...</direction>
        <direction position="100t">...</direction>
    </sequence>
    <sequence>
        <event value="2">...</event>
        <direction position="2/4">...</direction>
        <event position="2/4" value="4">...</event>
        <event position="3/4" value="4">...</event>
    </sequence>
    <sequence>
        <event value="2*">...</event>
        <tuplet position="3/4" actual="3/8" normal="1/4">
            <event value="8">...</event>
            <event position="1/8" value="8">...</event>
            <event position="2/8" value="8">...</event>
        </tuplet>
    </sequence>
</measure>

5.3.9. Spaces

A space is a way of specifying a metrical time interval that does not contain any notation. CWMNX uses space elements to explicitly represent such gaps. (In MusicXML, these gaps were often created using the forward and backward elements or with hidden rests, and could only be determined after complete parsing of a measure’s contents.)

Because of the use of the sequence element to organize notations in chronological sequence, and because non-notated gaps within a sequence can be explicitly represented, CWMNX does not need MusicXML’s forward and backward elements. Instead, space serves the purpose of forward, while picking up a few additional useful characteristics. The contents of a sequence can always be expressed as a sequence of contiguous elements starting at the beginning of the measure and proceeding forward in metrical time.

Accordingly, the concept of divisions also becomes unnecessary in CWMNX. To the extent that arbitrary durations are needed, they can always be expressed as multiples of some normal musical time unit. (See §5.3.13 Tuplets for more information on how spaces work to take up space in tuplets -- divisions aren’t needed here either.)

A space takes a length attribute to specify its metrical duration (see §5.3.3 Metrical timespans):

<space length="4*"/>

The following space is equivalent to 5 eighth notes:

<space length="5/8"/>

And so is this one:

<space length="2.5/4"/>

Spaces are useful for causing a non-metrical notation like text to appear at a certain metrically anchored place within a measure. For example, the following causes text to be anchored at a point one quarter note into the given measure:

<measure>
  <sequence>
      <space length="4"/>
      <direction placement="above">
        <direction-type>
          <words>And then...</words>
        </direction-type>
      </direction>
  </sequence>
</measure>

5.3.10. Pitch encoding

CWMNX introduces a text encoding of pitches, which represents a combination of MusicXML’s step, octave and alter elements. This is done to address issues of readability and compactness in MusicXML and makes no semantic difference.

The format consists of a MusicXML step, followed optionally by 0..2 occurrences of # or b representing an integer alteration, followed by a MusicXML octave. An additional, non-integer alteration may be added to the preceding integral amount by including the suffix + or -, followed by a real-valued number of semitones.

As with MusicXML alter values, any occurrences of # or b do not imply rendering of accidentals on some associated element. These remain specified by an accidental value, if provided.

Examples:

C4: Middle C
C#4: The pitch one semitone above middle C
Db4: The pitch one semitone above middle C (identical to the above)
C4+0.5: The pitch one quarter-tone above middle C
B3+1.5: The pitch one quarter-tone above middle C (identical to the above)
C#4-0.5: The pitch one quarter-tone above middle C (identical to the above)

The pitch attribute is used to supply pitch encodings where appropriate, typically for note elements.

Pitch encodings are used also in §7 CWMNX Interpretation.

5.3.11. Events

The new CWMNX element event represents the related concepts of rest, note or chord depending on its contents. It supplies information that is common to all three:

event duration, expressed in terms of note value and dot count
stem orientation, length, etc. if applicable
flag and beam description
articulations or directions that apply to all contained notes
styling data (e.g. horizontal displacement)
grace/cue markers
lyrics
slurs and other event-oriented §5.4 Spanning notations

Here’s a middle C as a dotted half note.

<event value="2*">
  <stem>up</stem>
  <note pitch="C4"/>
</event>

Here’s a C major triad as an eighth note:

<event value="8">
  <stem>up</stem>
  <note pitch="C4"/>
  <note pitch="E4"/>
  <note pitch="G4"/>
</event>

Here’s a whole note rest:

<event value="1">
  <rest/>
</event>

And a grace note chord; the grace attribute belongs to the event.

<event value="8" grace="true">
  <stem>up</stem>
  <note pitch="C4"/>
  <note pitch="E4"/>
  <note pitch="G4"/>
</event>

Note: grace notes do not advance the sequence cursor.

As one special case, the following event encodes a whole-measure rest. Note that the type attribute is used to indicate this, rather than value:

<event type="measure">
  <rest/>
</event>

5.3.12. Directions

The CWMNX direction element carries over the features of the MusicXML direction element.

This proposal does not yet attempt to examine how CWMNX directions will work in detail or how they differ from MusicXML directions.

CWMNX is, however, expected to include strong semantic types for different musical purposes of text. Dynamics, tempo markings, lyrics, playing instructions/techniques, etc. will all have proper types and may be freely included in sequence and event elements.

5.3.13. Tuplets

In CWMNX, the tuplet element applies a time modification to all of its child elements, which are the same as those allowed in sequence. In fact, tuplet is very much like a nested sequence element, with an implicit time signature representing the notated duration inside the tuplet. It functions just like an event element, in that it takes up a known duration within its parent, and is not allowed to overlap with other event or tuplet elements.

A tuplet uses the two attributes actual and normal making use of §5.3.3 Metrical timespans to represent the tuplet ratio. It thus combines the function of MusicXML’s tuplet and time-modification elements into a single construct: the contained elements are automatically subject to time modification, and CSS styling can control the appearance of the tuplet.

Numerals and brackets may or may not be shown in conjunction with a tuplet, by using the relevant MusicXML attributes which carry over into CWMNX.

Here’s an example of an eighth-note triplet:

<sequence>
    // preceding events in sequence...
    <tuplet actual="3/8" normal="1/4">
        <event value="8">
            <note>...</note>
        </event>
        <event value="8">
            <note>...</note>
        </event>
        <event value="8">
            <note>...</note>
        </event>
    </tuplet>
    // remaining events in sequence...
</sequence>

(The tag <tuplet actual="3/8" normal="2/8">...</tuplet> would work exactly the same. There is no semantic difference.)

One can include directions in tuplets as well. And, naturally, tuplets may be included within tuplets.

Other examples of tuplets:

<tuplet actual="5/8" normal="11/16">
    // a tuplet that displays 5 eighth notes in the space of 11 sixteenths...
</tuplet>

<tuplet actual="7/8*" normal="4/4*">
    // a tuplet that displays 7 dotted eighth notes in the space of 4 dotted quarters...
</tuplet>

5.4. Spanning notations

Many notations occupy an arbitrary span of time in a score, across some number of measures in a given part. Some examples are:

slurs
dynamic wedges (hairpins)
8va lines
performance directions like cresc. poco a poco

Essentially all the elements described by a combination of MusicXML’s number="N" and type="start|stop" fall into this category. (Some of these are considered notations in MusicXML and others are considered direction-types. We won’t deal with this classification problem yet.)

In CWMNX, the preferred way to describe such spanning notations is in terms of a start element and an stop element. These elements may be any element that occurs inside a sequence.

The start element of a spanning notation is specified by including the span directly in the start element as a child. The stop element is specified via an end-ref attribute on the span that supplies the stop element’s ID. Thus, spanning notations "point to" the notation where they will end. This is different from MusicXML, in which the stop point refers back to the spanning notation’s number.

Thus there is no use of MusicXML’s number=N or type="start" and type="stop" attributes.

Here’s an example, showing a slur that connects two non-adjacent chords in the same measure (the situation is very similar for events in different measures):

<measure>
    <sequence>
        <event value="4">
            <note pitch="C5"/>
            <note pitch="Eb5"/>
            <slur end-ref="a1"/>
        </event>
        <event value="4">
            <note pitch="D5"/>
        </event>
        <event id="a1" value="4">
            <note pitch="Db5"/>
        </event>
    </sequence>
</measure>

An alternate means of supplying spanning notations is also allowed, in which the endpoints of the span are specified positionally. This technique does not depend on any start or stop element: the spanning notation is simply included in a sequence, and its length attribute supplies its overall length in the score.

<measure>
    <sequence>
        <slur length="2/4"/>
        <event value="4">
            <note pitch="C5"/>
            <note pitch="Eb5"/>
        </event>
        <event value="4">
            <note pitch="D5"/>
        </event>
        <event value="4">
            <note pitch="Db5"/>
        </event>
    </sequence>
</measure>

5.5. Pagination, credits and page-level text

(TBD)

We need to study the CSS specification as it is currently evolving to represent pagination to meet the needs of digital publishing overall, not only music publishing.

MusicXML’s page-level elements such as credit-words and others need to be reconsidered in light of this evolution. There is currently a tension in the existing corpus of MusicXML documents between the use of explicitly placed credit elements and the metadata found in the identification element.

5.6. Semantic layout overrides

The CWMN canon has many examples of notation in which notes do not add up metrically to yield the duration of a measure or tuplet, or in which notes in one voices align with notes in another, in spite of the dissimilar points in time at which they would appear to be specified by their notated duration.

CWMNX addresses these problems by allowing any event to optionally specify either or both of the following:

render this event using the visual conventions for an arbitrary duration (e.g. render notehead, stem and flag/beam as for a sixteenth-note, although the note occupies an eighth-note duration within its timeline). An appearance attribute may be used to supply the visually rendered note or rest value for any given event. For example, a note that is positioned and played as an eighth note might include appearance="8*" to force the appearance of a dot.
visually align this event with some other arbitrary event in the same measure (e.g. line up a regular sixteenth note with the last note in some other voice’s sixteenth-note triplet). An offset-ref attribute specifies an IDREF for target event with which to align.

For example, consider the following Chopin-esque example of a two-voice passage. The upper staff contains a dotted-8th and sixteenth note; the lower voice contains quintuplet 16th notes. However, the two final notes of both voices are to be displayed in alignment.

One way to encode this is to consider the upper voice to actually be denoting a quintuplet-based rhythm, but make it look like a non- tuplet rhythm:

<measure>
    <sequence staff="1">
        <tuplet actual="5/16" normal="1/4" bracket="no" show-number="no">
            <event value="4" appearance="8*">...</event>
            <event value="1/16">...</event>
        </tuplet>
    </sequence>
    <sequence staff="1">
        <tuplet actual="5/16" normal="1/4">
            <event value="1/16">...</event>
            <event value="1/16">...</event>
            <event value="1/16">...</event>
            <event value="1/16">...</event>
            <event value="1/16">...</event>
        </tuplet>
    </sequence>
</measure>

Independent of such editorial decisions, it is also possible to prescribe any desired performance rhythm using the §7 CWMNX Interpretation features of CWMNX.

6. CWMNX Styling

CWMNX employs CSS style properties to describe the appearance layer of a score. The property vocabulary used by CWMNX is not the same as for HTML or SVG, although it has some overlap. A large part of the work to specify CWMNX will involve figuring out the new set of properties that we need to drive music rendering and appearance.

As a result of this shift in approach, a great deal of MusicXML markup will shift into CSS. This allows it to be controlled by style sheets and media queries, yielding a great deal of expressive power over score appearance in multiple contexts.

The expectation is that a conformant implementation can always infer a default appearance from the semantic layer, using built-in default style properties as required. This replaces the notion of "selective encoding" in MusicXML: appearance styles may be supplied optionally, but they are never necessary.

Here’s a quick example to set the stage for more discussion, showing the application of color and positioning details to various CWMNX elements. In this example, the color property is self-explanatory, and the relative-y property is the same as found in MusicXML. The value 1sl means 1 staff line (analogous to typical CSS units such as 1px for 1 pixel).

<measure>
    <sequence>
        <event style="color: red;">
            <note>...</note>
        </event>
        <direction style="relative-y: 1sl;">
            <direction-type><words style="color: #666">...</words></direction-type>
        </direction>
        <event>
            <note style="color: red;">...</note>
        </event>
    </sequence>
</measure>

Note: It is possible that additional profiles might require the encoder to supply some degree of basic style information (e.g. measure widths). The position of this proposal is that such requirements are optional and ride on top of the standard profile, rather than constituting standard behavior.

6.1. CSS and style properties

As in HTML or SVG, CSS properties can be specified in several distinct ways:

They can be specified directly on the pertinent element, inline, using the style attribute in which properties are explicitly given. This is the approach used in the preceding example.
They can be specified indirectly on the pertinent element by using the class attribute, causing the applicable properties to be looked up in one or more associated stylesheets using the element’s class name(s). For instance, class="alternate" might be used in a given document to specify a specific look for notes in an alternate melody line, allowing that appearance to be specified exactly once in the stylesheet selector for .alternate.
They can be inferred implicitly from the element’s characteristics, according to the selectors in the CSS specification. For example, a stylesheet might include a note selector specifying the default appearance of all note elements. An .alternate selector in the stylesheet, would specify the appearance of all elements marked as belonging to the alternate style class.

This flexibility and generality is the primary motivator for using CSS as part of MNX. It allows a spectrum of expression for style properties that ranges from the very general (a house style) to the very specific (an exception for just one element in the document).

6.2. Stylesheets

Stylesheets allow visual style properties to be declared using selectors, that provide both implicit and explicit ways of applying styles to elements in the markup. These work exactly as in CSS for HTML or SVG, except where extended in MNX-specific ways.

In MNX, stylesheets are referenced by including a style element below the appropriately scoped chunk of the document. This element may contain CSS, and may further import external documents using the import statement.

The following example illustrates the power of stylesheets and selectors; see the CSS Selectors Level 4 draft specification for more details.

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <style>
            score {
                font-family: Bravura;
            }
            .alternate {
                color: #666;  /* gray */
            }
            direction {
                font-family: Times Roman;
                font-size: 2sl;
            }
            words.aside {
                font-weight: bold;
            }
        </style>
    </head>
    <score>
        <part>
            <measure>
                <sequence>
                    <event class="alternate">
                        <note>...</note>
                    </event>
                    <direction style="relative-y: 1sl;">
                        <direction-type><words class="aside">...</words></direction-type>
                    </direction>
                    <event>
                        <note class="alternate">...</note>
                    </event>
                </sequence>
            </measure>
        </part>
    </score>
</mnx>

6.3. External stylesheets

It’s often useful to separate stylesheet definitions in a separate .css file. The @import directive serves this purpose:

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <style>
            @import url(house-style.css)
        </style>
    </head>
    <score>
        // 
    </score>
</mnx>

6.4. Units

Properties representing a visual displacement or distance must use a number followed by an explicit unit:

sl: the space between two standard staff lines
st: staff tenths (same unit as in MusicXML)
px: display pixels (same unit as in SVG or HTML)

Also, §5.3.4 General timespans may be used as style units.

Examples:

1sl: One staff line
5st: 0.5 staff lines
1/8: a contextual eighth note
1//8: a measure-level eighth note
481t: one tick longer than a measure-level eighthnote
1:3//4: a triplet 8th note (1/3 of a quarter note)

6.5. CSS property usage

6.5.1. Standard CSS properties

Certain CSS standard properties will replace the corresponding MusicXML properties, to avoid conflict with other web standards. Notably, the CSS display and visibility replace MusicXML’s print-object attribute. All CSS properties relating to fonts and typography (font-family, font-weight, font-style and so forth) will also replace corresponding features in MusicXML. color has roughly the same meaning in both.

6.5.2. Document properties

Virtually all of the values in the MusicXML defaults element can also be transferred to CSS properties, permitting stylesheet control of these important settings as well.

6.5.3. Notation properties

The expectation is that almost all other MusicXML attributes and elements governing visual rendition will simply migrate to CSS properties with no change except for the units used. These include positioning tweaks as well as system and page breaks.

The result is that MusicXML will generally look very simple and semantic, with occasional style and class attributes supplying exceptions to the general house style supplied by a score-wide stylesheet.

Examples of migrated MusicXML properties from notation markup include:

relative-x and relative-y
default-x and default-y
offset (note that this is purely a visual offset, unlike MusiXML)
new-system and new-page

In keeping with CSS conventions, MusicXML’s values of yes and no will be generally replaced with true and false.

6.5.4. New CWMNX properties

CWMNX will also define new properties for useful concepts not previously addressed in MusicXML.

As one example, an part-display: all property could distinguish elements that are intended to appear in both full-score and part views.

6.6. Media queries and parts

CSS defines a notion of media queries which permit stylesheet definitions to apply in only certain contexts. For web pages, this is used to vary the appearance according to whether a page is viewed on, say, a mobile device, or a laptop screen, or on printed paper.

@media screen {
    .alternate {
        color: blue; /* show alternate readings in blue on the screen */
    }
}
@media print {
    .alternate {
        note-size: -0.2sl; /* show alternate readings smaller when printed */
    }
}

In CWMNX, media queries can be used in exactly the same way, but they can also be used to govern how parts are displayed. This permits high quality parts to be generated from the same document as the full score, formatted in a specialized way.

To take advantage of this, one labels a part with a child media-name element whose text can be used as an identifier in a media query:

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <style>
            score {
                note-size: 10px;
            }
            @media (part: vln) {
                note-size: 13px; /* make the violin part bigger */
            }
        </style>
    </head>
    <score>
        <part>
            <part-name>Violin</part-name>
            <media-name>vln</media-name>
            <measure>...</measure>
            <measure>...</measure>
            <measure>...</measure>
        </part>
    </score>
</mnx>

More than one part can use the same media name, allowing multiple parts to be formatted according to the same rules.

6.7. Controlling visibility

As previously mentioned, the standard CSS visibility and display properties control whether a given element is seen or not.

Setting display: none on an element prevents it from being seen and causes the layout to act as though it does not exist in any way; the space occupied by the element will disappear from the layout’s point of view.

Setting visibility: hidden on an element causes it not to be painted. However it still takes up space in the layout, and it may be sensitive to pointer events.

6.8. System and page flow

The CWMNX style properties new-page and new-system (as well as others) control page and system flow. Such styles are specified at the measure or event level.

These properties act across the system, but may be defined differently in full-score formats as opposed to single-part formats; it is very uncommon for such breaks to be identical across all views of a score. Thus they will participate in whatever solution is arrived at for §5.1.2 System notations.

Media queries can further be used to specialize system and page flow for viewing and printing different formats on different devices.

6.9. Styling grace and cue notes

The CSS Selectors Level 4 draft specification includes the specification of a new has:() pseudo selector that can test for the presence of a child element.

This provides a useful way to style grace and cue notes, for example:

note:has(> grace) {
    note-size: 80%;
}
note:has(> cue) {
    note-size: 80%;
}
note:has(>grace) note:has(> cue) {
    note-size: 50%; /* combination of grace and cue */
}

7. CWMNX Interpretation

The performance interpretation of semantic markup in CWMNX is also carried out using CSS properties, rather than through the content of MusicXML’s sound element. The various attributes and child elements of sound therefore become new sound-oriented CSS properties in CWMNX. These are driven from stylesheets and inline specifications in much the same way as visual properties, thus separating interpretation cleanly from both semantic and appearance layers.

As with appearance, the expectation is that a functioning implementation can always infer a default interpretation from the semantic layer, using built-in default interpretation properties as required. This replaces the notion of "selective encoding" in MusicXML.

7.1. Inline interpretation

Any performable element may possess a perform attribute, analogous to style. This attribute specifies a set of CSS properties affecting playback. For example:

<event value="4">
    <note pitch="C4" perform="dynamics: 40;"/>
</event>

Naturally, one might wish to use CSS classes to specify this:

<mnx xmlns="http://www.w3.org/mnx">
    <head>
        <style>
            .forte {
                dynamics: 100;
            }
        </style>
    </head>
    <score>
        // 
            <event value="4">
                <note pitch="C4" class="forte"/>
            </event>
    </score>
</mnx>

7.2. Interpretation properties

Here are some significant interpretation properties, but not an exhaustive list:

offset: A playback offset expressed as a general timespan relative to the default offset. For example, offset: -1/16 would play the given note one sixteenth note earlier. offset: 10t would play it 10 MIDI ticks later.
pitch: A pitch to be used for playback instead of the notated pitch, expressed as a pitch encoding (e.g. pitch: C#4+0.1).
duration: A note duration to be used for playback, expressed as a general timespan. For example, duration: 1/16 would cause the note to be played as a sixteenth note regardless.
dynamics: Volume expressed as percentage of forte dynamic (same as MusicXML)
tempo: Tempo expressed as a metrical timespan, e.g. tempo: 120/4. (same as MusicXML, but doesn’t require quarter note units).
playback: Controls whether interpretation occurs at all. playback: none will suppress the interpretation of all events with this property.

Note: There is no tied property. It is not clear that one is needed, since the sounded duration of a tied note should take the semantic tie information into account.

7.3. Interpretation flow

In the same way that indenting text flow with certain CSS properties has follow-on effects that continue down the page, altering interpretation with CSS has follow-on effects too.

For example, tempo and dynamics clearly make sense to flow forward into successor events. On the other hand, pitch probably does not.

The flow behavior of interpretation properties needs to be carefully characterized.

7.4. Performance events and event streams

It is useful to combine the values found in the offset, pitch and duration and dynamics properties into a single styling construct called a performance event, using the simple syntaxes note(offset, pitch, duration) or note(offset, pitch, duration, dynamics). Think of these as syntactically similar to rgb() or rgba() colors. Some examples:

note(0, C4, 1/4): A middle C sounding at the event’s default offset for a duration of one quarter note.
note(0, C4, 1/4, 127): The same note, but at a MIDI velocity of 127.
note(0, C4, 1//4): The same note with a quarter note duration, disregarding any effective time modification

This permits the useful interpretation property play, which overrides the entire default interpreation of whatever CWMNX element it is applied to. Here’s a quarter note C3 that is supposed to actually be played as a 16th note C4:

<event value="4">
    <note pitch="C3" perform="play: note(0, C4, 16);"/>
</event>

The real point of this feature is not syntactic sugar, but to make it practical to specify more complex performance events. One can do this by separating notes with spaces to create streams of simple events:

note(0, C4, 1/4) note(0, E4, 1/4) note(0, G4, 1/4): A C major triad with the duration of a quarter note, sounding at the event’s default offset.
note(0, C4, 0.33/4) note(0.33/4, E4, 0.33/4) note(0.67/4, G4, 0.33/4): A C major arpeggio with the duration of a quarter note, sounding at the event’s default offset.

In this example, a regularly notated C-G dyad is actually played as an arpeggiated C major triad, as part of an exercise in aural triad identification (the prompt might be, "what’s the middle note?".) Note that the styling is applied to the entire event, replacing the interpretation of all contained elements:

<event value="4" perform="play: note(0, C4, 4) note(1/32, E4, 7/32) note(2/32, G4, 6/32);">
    <note pitch="C4"/>
    <note pitch="G4"/>
</event>

One can style measure or voice elements with the play property, replacing the entire default interpretation of measures or voices if so desired.

Note: There are two reasons that this event representation is not directly tied to MIDI concepts. First, it’s expected that many applications will make use of their own internal sound engines to render performances, and we cannot assume that these engines make use of MIDI constructs internally. Second, the parameters governing the interpretation layer are very conveniently expressed in terms of the same units as the semantic layer.

7.5. Rhythmic template selectors

Special CSS selectors can be defined for CWMNX to make it easier to achieve useful interpretation results with styling.

Consider these two useful selectors:

:occurs-within(4, 1/8) will match any event that occurs one eighth- note into any quarter-note subdivision.
:value(8) will match any event whose note value is 8.

Now we can have a stylesheet definition that specifies a swing percentage of 10%:

:occurs-within(4, 0) :value(8) {
    duration: 110%;
}
:occurs-within(4, 1/8) :value(8) {
    offset: 10%;
    duration: 90%;
}

It’s also possible to specify the interpretation of a French-style dotted baroque rhythm using exactly the same mechanism. In this example, all notated dotted eighths are performed as double-dotted:

:occurs-within(4, 0) :value(8*) {
    duration: 7/32;
}
:occurs-within(4, 8*) :value(16) {
    offset: 1/32;
    duration: 1/32;
}

8. GMNX: Encoding Scores with General Content

GMNX is the MNX score type denoted by a content attribute of general. It is intended as a general medium for representing scores in terms of abstract visual, performance and audio information. There is no attempt to represent semantics directly in GMNX, although GMNX may cross-reference elements in a separate, semantic document for traceability.

This entire section is a rough sketch of a possible approach and needs considerable work.

8.1. GMNX Goals

GMNX is intended to support applications which must be able to faithfully execute a visual and/or audible rendition of a score, with an awareness of the relationship between what is seen and what is heard.

There are no dependencies on any particular notational schema for music. Because of the lack of semantics, applications must not rely on information about the notational content of the score beyond the purposely limited vocabulary of GMNX. As a corollary, GMNX applications need not be concerned with transforming semantic data into musically acceptable graphics or audio, which can simplify their development.

The only constraints on the nature of the score are:

The visual content of the score must be encoded in SVG.
The audible content of the score must be encoded either in audio media, or in a GMNX performance-events element.

Some applications well-suited for GMNX usage include:

Lightweight music viewers/players that employ standard libraries for rendering graphics and audio media, instead of doing custom rendering of semantic data.
Music viewers/players that wish to handle any kind of musical content, not just CWMN.
Music viewers/players which display a score exactly as it was depicted in a specific edition or manuscript.
Highly individual or unusual notation editors that want to export music in a format that can be consumed by other applications for viewing and playback.

8.2. GMNX Structure

A GMNX document contains two kinds of data: visual content and performance content.

8.2.1. Visual Content

The visual content piece of a GMNX document is a set of svg elements representing the score. Each element constitutes a single page of the score, where "page" is defined as a two-dimensional canvas of arbitrary dimension whose contents can be viewed independently of other pages.

A GMNX document is required to contain visual content, since it is music notation of some kind.

SVG elements are labeled with IDs, allowing them to be referenced within the performance content.

There are no hard requirements whatsoever on the hierarchical structure of visual content. Notation need not be organized into anything except pages; if pages are not meaningful, the entire document may consist of a single page.

GMNX permits special elements to be embedded within SVG to describe regions in the visual content having a directional flow which can be mapped to time in a musical performance. This optional feature is described below under performance flows.

Please refer to the SVG specification for information on what SVG is able to represent. For all practical purposes, any combination of vector and raster graphics is encodable in SVG.

8.2.2. Performance Content

A GMNX document can contain any number of performances, which are taken to be musical embodiments of the visual content. Each performance is represented either through separate, conventionally-encoded audio media files or by encoding as a performance-events element that is similar in spirit to a MIDI file, but is more compatible with CWMNX sequences and events.

No matter how it is encoded, a "performance" takes place in the time dimension. GMNX allows ranges and points in a performance to be mapped onto various elements in the visual SVG content. These mappings define a two-way correspondence between the visual score and the content of each performance.

This correspondence is general in nature and not dependent on any particular approach to notation.

Performances are optional in GMNX. Any number of performances may be provided, since there is no concept of a canonical performance in GMNX.

8.2.2.1. Audio Media

A single performance represented by audio media can be described with a performance element that includes any number of media file references. Here’s a simple example:

<performance>
    <performance-name>Piano performance</performance-name>
    <performance-media>
        <media-file src="hot-cross-buns.mp4"/>
    </performance-media>
</performance>

The performance media may be further described in terms of tracks. This is important in §8.3 Mapping Between Position and Time. An example follows:

<performance>
    <performance-name>Piano performance</performance-name>
    <performance-media>
        <media-file src="hot-cross-buns.mp4">
            <track index="0">
                <name>Piano LH</name>
            </track>
            <track index="1">
                <name>Piano RH</name>
            </track>
        </media-file>
    </performance-media>
</performance>

Multiple media files may also be referenced, indicating that they together supply a multitrack performance when played in synchronization. Here a stereo piano recording is synchronized with a mono recording of a trumpet track:

<performance>
    <performance-name>Piano performance</performance-name>
    <performance-media>
        <media-file src="hot-cross-buns-piano.mp4">
            <track index="0">
                <name>Piano LH</name>
            </track>
            <track index="1">
                <name>Piano RH</name>
            </track>
        </media-file>
        <media-file src="hot-cross-buns-trumpet.mp4">
            <track index="0">
                <name>Trumpet</name>
            </track>
        </media-file>
    </performance-media>
</performance>

8.2.2.2. Event Sequences

Alternately, a performance may be represented by a sequence of discrete events having these properties:

onset (in seconds from start of performance)
duration (in seconds)
instrument (described by MusicXML sound ID)
pitch (as in CWMNX)
dynamics (as in CWMNX)
technique

There is no tempo information in an event list. Adjustment of playback speed by an implementation may be accomplished by proportionally adjusting onset and duration times.

This information can be used for synthesis, analysis, or any other purpose by the application. However it is specifically not intended to represent semantic notational content. For example, a CWMN staccato quarter note might be represented as an event with the effective duration of a 16th note.

The performance-events element supplies this information:

<performance>
    <performance-name>Event sequence</performance-name>
    <performance-events>
        <track>
            <instrument-sound>keyboard.piano</instrument-sound>
            <sequence>
                <event graphic="#note1" start="0" duration="0.25">
                    <note pitch="E4" dynamics="100"/>
                </event>
                <event graphic="#note2" start="0.25" duration="0.25">
                    <note pitch="D4"/>
                </event>
                // ...and so on
            </sequence>
        </track>
    </performance-events>
</performance>

8.3. Mapping Between Position and Time

What makes GMNX more powerful than just a combination of images and audio is its ability to describe correspondences between the two. These correspondences are not necessary to use GMNX, but can be provided wherever they are known.

All of these mappings are optional, and applications may not count on their presence. While they make sense for most CWMN, there are many kinds of music notation that cannot supply them.

8.3.1. Performance Regions

A performance region consists of two pieces of information:

The ID of a particular SVG element corresponding to the region.
A time range within some performance content.

Where provided, this information allows applications to do the following:

modify the appearance of the element (e.g. by highlighting it) in conjunction with performance of the given time range
interpret user interaction with the element as referring to the performance time range

Performance regions are identified by listing them in a performance’s performance-regions element. The graphic attribute refers to the SVG element for the region, while the time attribute supplies a time range within the performance given in seconds since start.

<performance-regions>                
    <region graphic="#m1" time="0 1.8"/>
    <region graphic="#m2" time="1.8 3.61"/>
    <region graphic="#m3" time="3.61 5.38"/>
    <region graphic="#m4" time="5.38 7.23"/>
</performance-regions>

Region shapes can thus be any closed shape definable in SVG. Examples include rectangles, polygons, ellipses, polylines, polysplines, shapes with hollow cutouts... literally, anything.

Region mappings are many-to-many in character: a given graphical region may be mapped to any number of performance time ranges, and vice versa. This is necessary to accommodate the varied possibilities of music notation, which include among others:

disjoint and non-aligned regions that describe the same range of music (e.g. medieval partbooks)
a region that is repeated multiple times in performance
a region that is repeated with temporal overlap, or at varying speeds (e.g. rounds)
a region that is read in more than one direction or orientation (e.g. puzzle canons)
a region that may or may not be performed

Thus, one can have region lists like this:

<performance-regions>                
    <region graphic="#m1part1" time="0 1.8"/>
    <region graphic="#m1part2" time="0 1.8"/>     // two regions exist for this time range
    <region graphic="#m1part1" time="1.8 3.61"/>  // same region is repeated in performance
</performance-regions>

A performance-regions element can also include one or more optional track-ref elements that identify a set of specific tracks in the performance. Such regions are taken as specific to the musical material in these tracks, rather than to the material in the performance as a whole.

8.3.2. Performance Flows

A performance flow is a special kind of performance region, in which a smooth, continuous two-way mapping exists between each spatial point in the region and a corresponding point in performance time. Each flow includes the following information:

The ID of a particular SVG element corresponding to the region.
A geometric path that is followed smoothly by an imaginary cursor line within the SVG element, as defined by a flow child element.
A time range within some performance content.

Flows are more powerful than regions, because every point in the region can be mapped to a specific performance time within the range, and every performance time can be mapped to a specific cursor position.

Flows thus allow applications to:

position a cursor over the score in conjunction with an exact performance time
interpret user interaction with the region as indicating an exact performance time

Within an SVG region, the flow and flow-path elements define the extent and path of this cursor. The pos attributes define "position" ranges for the cursor expressed in arbitrary time coordinates.

<flow cursor="0 40">    // cursor is a (0,40) vector in local coordinates
    <flow-path d="M 0 0 h 15" pos="0 1"/>   // Move (0,0), horizontal(+15) over 0..1 position range
    <flow-path d="h 15" pos="1 2"/>
    <flow-path d="h 35" pos="2 4"/>
</flow>

With the SVG path approach shown above, cursors may progress in any desired fashion through an arbitrarily shaped region. The cursor implicitly rotates as the tangent vector of the path changes.

Within a performance, the region-flow element identifies a region with a flow. It’s just like region but it includes a position range; as the performance progresses from the region’s start time to its end time, the position of the flow cursor is advanced from its starting value to its ending value:

<performance-regions>                
    <region-flow graphic="#m1" time="0 1.8" pos="0 4"/>
    <region-flow graphic="#m2" time="1.8 3.61" pos="0 4"/>
    <region-flow graphic="#m3" time="3.61 5.38" pos="0 4"/>
    <region-flow graphic="#m4" time="5.38 7.23" pos="0 4"/>
</performance-regions>

8.3.3. Performance Events

Performance events are the final type of mapping supplied by GMNX, whose purpose is to describe a discrete musical event. It includes the following information:

The ID of a particular SVG element corresponding to the event.
A description of the event within a performance-events element, supplying its onset, duration and performance attributes such as pitch, dynamics, etc.

Performance event mappings can thus connect an arbitrary graphic representation in the visual content, to a description of the sound that it makes. Obviously, the richness of attributes like pitch, velocity and performance techniques must be limited for reasons of interoperability. Consequently, audio media remain the ultimate general vehicle for describing sounds in their own native terms, in which case events refer to a time range.

Performance event mappings allow applications to:

identify specific graphics within the score in conjunction with their occurrence in a performance
interpret user interaction with these graphics as indicating specific musical events in the performance such as notes or time ranges.

Although event mappings make sense for CWMN as well as applying to a great deal of contemporary Western music, they are not universal and so are not required by GMNX. Many simple notational ideas are not susceptible to event descriptions, for example textual or graphical instructions to an improvising performer, or color gradients as opposed to discrete symbols.

Event mappings are effected by supplying a graphic attribute on an event or note element in an event sequence:

<event graphic="#note1" start="0" duration="0.25">
    <note pitch="E4" dynamics="100"/>
</event>

8.4. A GMNX Example

Here is a simple example of the earlier CWMNX example, but structured as a GMNX document and reduced to a single staff for clarity. This example illustrates two kinds of mapping: performance flows and performance events.

Note: There is absolutely no requirement that music be structured as systems, or staves, or use a rectangle-based layout, or have a left-to-right orientation. This example merely uses a CWMN score because it’s familiar material.

<?xml version="1.0" encoding="UTF-8"?>
<mnx>
    <head>
        <identification>
            <title>Hot Cross Buns</title>
        </identification>
    </head>
    <score content="general">
        <svg id="page1">
            <g>
                <g>
                    // staff prefix: clefs, time signatures, etc...
                </g>
                <g id="m1" transform="translate(20,0)">
                    // Describe a time flow in terms of an arbitrary notation position, mapped to local coordinates.
                    // The flow is an imaginary cursor that moves smoothly along a path described
                    // using the same commands as the SVG <path> element. The <code data-opaque>pos</code> ranges 
                    // are in notation position units (which here are quarter notes), not time or pixels.
                    <mnx:flow cursor="0 40">
                        <flow-path d="M 0 0 h 15" pos="0 1"/>
                        <flow-path d="h 15" pos="1 2"/>
                        <flow-path d="h 35" pos="2 4"/>
                    </mnx:flow>
                    <path id="note1" d="..."/>
                    <path id="note2" d="..."/>
                    <path id="note3" d="..."/>
                </g>
                <g id="m2" transform="translate(100,0)">
                    <mnx:flow>
                        // similar to above
                    </mnx:flow>
                    <path id="note4" d="..."/>
                    <path id="note5" d="..."/>
                    <path id="note6" d="..."/>
                </g>
                <g id="m3" transform="translate(200,0)">
                    // measure 3...
                </g>
                <g id="m4" transform="translate(300,0)">
                    // measure 4...
                </g>
            </g>
        </svg>
        <performance>
            <performance-name>Audio recording</performance-name>
            <performance-media>
                <media-file src="hot-cross-buns.mp4"/>
            </performance-media>
            <performance-regions>                
                <region-flow graphic="#m1" time="0 1.8" pos="0 4"/>
                <region-flow graphic="#m2" time="1.8 3.61" pos="0 4"/>
                <region-flow graphic="#m3" time="3.61 5.38" pos="0 4"/>
                <region-flow graphic="#m4" time="5.38 7.23" pos="0 4"/>
            </performance-regions>
        </performance>
        <performance>
            <performance-name>Event sequence</performance-name>
            <performance-events>
                <part>
                    <instrument-sound>keyboard.piano</instrument-sound>
                    // Note that all units here are in seconds: this is a mechanical description of sound.
                    <sequence>
                        <event graphic="#note1" start="0" duration="0.25">
                            <note pitch="E4" dynamics="100"/>
                        </event>
                        <event graphic="#note2" start="0.25" duration="0.25">
                            <note pitch="D4"/>
                        </event>
                        <event graphic="#note3" start="0.5" duration="0.5">
                            <note pitch="C4"/>
                        </event>
                        <event graphic="#note4" start="1" duration="0.25">
                            <note pitch="E4"/>
                        </event>
                        <event graphic="#note5" start="1.25" duration="0.25">
                            <note pitch="D4"/>
                        </event>
                        <event graphic="#note6" start="1.5" duration="0.5">
                            <note pitch="C4"/>
                        </event>
                        // remaining events in song...
                    </sequence>
                </part>
            </performance-events>
            <performance-regions>                
                <region-flow graphic="#m1" time="0 1" pos="0 4"/>
                <region-flow graphic="#m2" time="1 2" pos="0 4"/>
                <region-flow graphic="#m3" time="2 3" pos="0 4"/>
                <region-flow graphic="#m4" time="3 4" pos="0 4"/>
            </performance-regions>
        </performance>
    </score>
</mnx>

8.5. Compilation to GMNX

Because of the generality of GMNX, semantic flavors of MNX (like CWMNX) can be fairly easily "compiled" to GMNX, by any application able to render them into graphics and either audio or performance events.

This has value as a bridge between applications that have a semantic understanding of some type of music, and applications that do not (but which can still profit from the packaging and mappings provided by a GMNX document).

Where compilation occurs, semantic traceability is recommended as a best practice by including an mnx:semantic attribute in each SVG element that points to a corresponding semantic element in the original source.

MNX Proposal Overview

Living Document, 27 September 2017

Abstract

1. Introduction

1.1. About MNX

1.2. Goals and Tradeoffs

1.3. MNX Score Types

1.4. Comparisons with other notation standards

1.5. Compatibility with MusicXML

1.6. Use case concordance

2. A brief example

3. Document organization

3.1. The container document

3.1.1. A simple CWMN score

3.1.2. A simple general score

3.1.3. Compound MNX documents

3.1.4. Inclusion by reference

3.2. Profiles

3.3. Metadata and Attribution

4. CWMNX: Encoding Conventional Western Music

4.1. CWMNX Goals

4.2. CWMNX Layers

5. CWMNX Semantic markup

5.1. Score structure

5.1.1. Unifying score-part and part

5.1.2. System notations

5.1.3. Concert and transposed pitch

5.2. Element IDs

5.3. Musical timelines

5.3.1. Specialized encodings

5.3.2. Note values

5.3.3. Metrical timespans

5.3.4. General timespans

5.3.5. Positions and timespans

5.3.6. Sequences

5.3.7. The sequence cursor

5.3.8. Explicit positions

5.3.9. Spaces

5.3.10. Pitch encoding

5.3.11. Events

5.3.12. Directions

5.3.13. Tuplets

5.4. Spanning notations

5.5. Pagination, credits and page-level text

5.6. Semantic layout overrides

6. CWMNX Styling

6.1. CSS and style properties

6.2. Stylesheets

6.3. External stylesheets

6.4. Units

6.5. CSS property usage

6.5.1. Standard CSS properties

6.5.2. Document properties

6.5.3. Notation properties

6.5.4. New CWMNX properties

6.6. Media queries and parts

6.7. Controlling visibility

6.8. System and page flow

6.9. Styling grace and cue notes

7. CWMNX Interpretation

7.1. Inline interpretation

7.2. Interpretation properties

7.3. Interpretation flow

7.4. Performance events and event streams

7.5. Rhythmic template selectors

8. GMNX: Encoding Scores with General Content

8.1. GMNX Goals

8.2. GMNX Structure

8.2.1. Visual Content

8.2.2. Performance Content

8.2.2.1. Audio Media

8.2.2.2. Event Sequences

8.3. Mapping Between Position and Time

8.3.1. Performance Regions

8.3.2. Performance Flows

8.3.3. Performance Events

8.4. A GMNX Example

8.5. Compilation to GMNX

Conformance

Index

5.1.1. Unifying `score-part` and `part`