Scholarly HTML

Authors
Tzviya Siegman (Wiley) & Robin Berjon, @robinberjon (science.ai)
Bugs & Feedback
Issues and PRs welcome!
Discussion Group
The Scholarly HTML Community Group at W3C (email archives)
License
CC-BY

Abstract

Scholarly HTML is a domain-specific rich document format built entirely on open standards that enables the interoperable exchange of scholarly articles in a manner that is compatible with off-the-shelf browsers. This document describes how Scholarly HTML works and how it is encoded.

Introduction

Scholarly articles are still primarily published as unstructured data in which most of the information created by the research and the practice of authoring is lost. Document technology has reached a level of maturity and universality that makes this situation no longer tenable. Information cannot be disseminated if it is destroyed before even having left its creator’s laptop.

According to the New York Times, adding structured information to their recipes improved their discoverability to the point of producing an immediate rise of 52 percent in traffic (NYT). At this point in time, cupcake recipes are reaping greater benefits from modern data format practices than the whole scientific endeavour.

This places a great burden on tool developers and service providers as well. Anyone who has explored the world of extracting data from inert publications has built their own complex toolset, offering no interoprability, no opportunity for cooperative improvements, and little or no growth in discoverability or meta-analysis in this area.

To address these issues, we have followed an approach rooted in established best practices for the reuse of open, standard formats. We propose an "HTML Vernacular", a set of guidelines for the creation of domain-specific data formats that make use of HTML’s inherent extensibility (Science.AI, 2015). Using the vernacular foundation overlaid with schema.org metadata and proposed extensions to it, we have produced a format for the authoring and interchange of scholarly articles built on open standards, ready for all to use. We hope that this format will be usable rogue scientists who choose to publish their articles on their own.

Our high-level goals are to:

Where semantic modeling is concerned, our approach is to stick as much as possible to the schema.org. Beyond the obvious advantages there are in reusing a vocabulary that is supported by all the major search engines and is actively being developed towards enabling a shared understanding of many useful concepts, it also provides a protection against ontological drift whereby a new vocabulary is defined by a small group with insufficient input from a broader community of practice. A language that solely a single participant understands is of limited value.

In a limited number of cases we have had to depart from schema.org, using the https://ns.science.ai/, prefixed with sa:. Our goal is to work with schema.org in order to extend their vocabulary, and we will align our usage with the outcome of these discussions.

Structure

A Scholarly HTML document is a valid HTML document that follows some additional rules to specialize its meaning and make it predictable to processors wishing to produce or consume scholarly articles. These rules are outlined in the following sections. While valuable on its own, the content structure defined here is simply a stepping stone to enable semantic enrichment, detailed in Semantics Overlays. If you would like to write a validation tool, please join us on GitHub.

The root and head

The document must be encoded in UTF-8 and transmitted with a media type of text/html.

The head must contain <meta charset="utf-8"> element and a title element.

The article

The first child of article must be header. The header should contain an h1 with the title of the document. The following element must be a div with the role of contentinfo containing author and affiliation information. See The contentinfo Region Semantics for information about the semantic decoration of this element.

Any number of section elements may be listed within the article at arbitrary depths, but each section must begin with an hx element, indicating a numbered section in the article. If the sections require headings that exceed h6, aria-level must be included to indicate depth.


    <section>      
      <h6>Granular Details about Zoology</h6>
        <p>…</p>
      <section>    
        <h6 aria-level="7">Even More Information!</h6>
          
Example of a level 7 heading, using aria-level

Each section may contain zero or more Hunk Elements and section elements.

Hunk Elements

Hunk elements are the meaningful blocks from which sections are built. They contain text and inline elements. There are several types of hunk elements. All content, ranging from long paragraphs to note references and footnotes can be captured using this specified set of elements. The method for distinguishing one from another in a machine-readable manner is specified in Semantics Overlay.

The most common hunk element is p.

The blockquote, ul, ol, and dl elements can be used as they typically would and require no special treatment.

The aside hunk element is used to capture portions of content that stand apart from the main flow of content. These can be separated from the article without having impact on the reader’s understanding of the article. A common use is text boxes in print. If the aside contains an header heading element, that heading must be the first element child and its numeric part must reflect its depth, making use of aria-level according to the same rules that apply for section. The other children of asidasidee must all be hunk elements. For example, if an aside follows a section with a level 3 heading, the top-level heading in the aside should be h4.

Figures

The figure element is a general container for self-contained content units that are embedded inside the main body of the text. It can come in several flavors that are dictated by its typeof attribute. Common uses for figure are as a container for images, tables, equations, and computer code.

If figure is typeof="sa:image", it is an image container. It must contain an img child element and should contain a figcaption labeling that image.

If figure is a typeof="sa:table"table, it is a table container. It must contain a table element. If there is a table caption, it should be included using the caption child element of the table, and not the figcaption child of the figure. Table notes may also be included as ol with li elements with the role of doc-footnote.

If figure is a typeof="sa:formula", it is a formula container. It must contain a math element and, optionally, a figcaption describing the formula. The math element must be valid MathML 3. Additionally, given the dismal state of support for MathML in Web browsers the math element must contain an annotation descendant with the TeX equivalent of the formula.

If figure is a typeof="schema:SoftwareSourceCode", it is a code container. It must contain a pre element and, optionally, a figcaption. The pre element must contain a code element as its only child.

Inline Elements

Inline elements decorate, describe, and enrich text. Inline elements can be used inside of hunk elements, heading elements, and captioning elements. Where applicable, they can nest within one another. Inline images and inline math can be inlcuded as well. This can be accomplished using img for images or math for formulas. Equations can be displayed inline or as blocks within a paragaph.


<p>
  If we should weep when clowns put on their show,
  if we should stumble when musicians play,
  <math display="block">
    <semantics>
      <mrow>
        <mi>Δ</mi><msup><mi>E</mi><mn>2</mn></msup>
        <mspace width="0.222em"></mspace>
        <mo>=</mo>
        <mspace width="0.222em"></mspace>
        <msub><mi>q</mi><mi>i</mi></msub>
        <mspace width="0.222em"></mspace>
        <mo>×</mo>
        <mspace width="0.222em"></mspace>
        <mo stretchy="false" form="prefix">(</mo>
        <msub>
          <msup><mi>F</mi><mn>2</mn></msup>
          <mrow>
            <mo stretchy="false" form="prefix">(</mo>
            <mi>i</mi><mo>,</mo>
            <mspace width="0.222em"></mspace>
            <mi>j</mi>
            <mo stretchy="false" form="postfix">)</mo>
          </mrow>
        </msub>
        <mspace width="0.222em"></mspace>
        <mo>/</mo>
        <mspace width="0.222em"></mspace>
        <msub><mi>ε</mi><mi>j</mi></msub>
        <mspace width="0.222em"></mspace><mspace width="0.222em"></mspace>
        <msub><mi>ε</mi><mi>i</mi></msub>
        <mo stretchy="false" form="postfix">)</mo>
      </mrow>
    </semantics>
  </math>
  time can say nothing but I told you so.
</p>
          

References

The References section requires specific semantic overlays (reference) as well as strict content structure. Apart from a (required) hx element, it must contain only one ol or dl element.

If using a dl element, the contents must be alternating dt and dd elements. The dd must contain the citation.

If using ol, the only contents are li that include citations.

Interactive Elements

information about iframes to come

Let’s discuss details of iframes with the CG

HTML Roles

It is possible to provide information about an HTML element by decorating it with the role attribute. The ARIA vocabulary and its extensions provide convenient terms that are relevant to document structure. The following roles from ARIA and DPUB-ARIA should be applied where appropriate:

Should we require ARIA’s table, grid, rowheader, and rowgroup?

I did not include doc-credit bc of extensive citation markup in JSON-LD

doc-endnote, doc-endnotes are not in the current published draft of DPUB-ARIA. See March DPUB-ARIA draft

Validation

The only validation requirement for Scholarly HTML at this point is that the HTML is valid. We are considering building a a validation tool in RelaxNG or JavaScript to check compliance with this set of rules.

Articles should be in the following basic structure:

It must feature a DOCTYPE as its preamble.

Semantics Overlay

HTML provides an excellent backbone with which to capture the structure of a given text but is evidently limited when it comes to capturing more domain-specific concepts such as people, spaceships, Humean causation, or sthenurines. That is where semantic overlays with the ability to refine the meaning and relations of HTML elements come into play. Scholarly HTML makes use of two standard mechanisms that overlay additional semantics atop the HTML DOM: role-based semantics as defined by WAI-ARIA and DPUB-ARIA, and semantics rooted in structured data as captured by RDFa.

Using technologies related to the semantic web can at times feel daunting and unrelated to everyday web development. In order to suppress this disconnect, Scholarly HTML follows a few simple guiding principles:

The properties that Scholarly HTML uses are naturally document-related (authorship, keywords, license, citations, as well as specific structure types such as acknowledgements, introduction, or funding), which additionally requires the ability to describe people and organizations. There are numerous vocabularies that address this domain and which could be used with RDFa; however, for reasons detailed in Web-First Data Citations Scholarly HTML relies almost exclusively on schema.org, complemented by a small number of additions from the Scholarly Article Vocabulary.

Persons & Organizations

Marking up persons and organizations can make use of any applicable properties in schema:Person and schema:Organization, respectively, but it is worth pointing out some good practices with how these are to be used in practice.

If the entity in question has a URL then it is best to use that as its identifier (using the resource attribute) and additionally to provide it as a link using the a element (see the person example for an instance of this).

If you happen to have information providing both the schema:givenName/schema:familyNameschema:additionalName triple and the schema:name (which can be considered to contain the name as the person wishes it to be displayed) for a person then it is (perhaps counterintuitively) best to include all of them and then use CSS (typically sibling selectors) to hide the extraneous ones (alternatively, they can be captured using the meta element). The reason for this is that it exposes more information to machine consumers without having a negative impact on human readers.


<span typeof="schema:Person" resource="http://orcid.org/0000-0003-1279-3709">
  <meta property="schema:givenName" content="Bruce">
  <meta property="schema:familyName" content="Banner">
  <a href="http://orcid.org/0000-0003-1279-3709">
    <span property="schema:name">Dr. Bruce Banner</span>
  </a>
</span>
          
Example of a Person. This demonstrates the use of extraneous name information using meta elements. The container is a span but it could be any other container element. This also shows how schema:name can be used for the person’s preferred display name, as distinct from the more specific structured name information.

Here is also an organization:


<span typeof="schema:Organization" resource="https://www.w3.org/">
  <a href="https://www.w3.org/">
    <span property="schema:name">W3C</span>
  </a>
  (<span property="schema:location" typeof="schema:Place">
    <span property="schema:address" typeof="schema:PostalAddress">
      <span property="schema:addressLocality">Cambridge</span>,
      <span property="schema:addressRegion">MA</span>,
      <span property="schema:addressCountry">USA</span>
    </span>
  </span>)
</span>
          
Example of an Organization. This also features the organization’s address.

How should we represent name transliterations? Are there language tags for transliterated text? Or should ruby+rdf:HTML be suggested instead? If the latter we can no longer use meta (which is acceptable).

Typing Sections

XXX

Schema Roles

It is worth taking a step back to understand the importance of the role modeling. Its application is clearly exemplified in the Authors & Contributors section wherein a sa:ContributorRole type is used as a wrapper and not schema:Person or schema:Organization directly.

Roles are an indirection that provides additional information about a property or relationship. A simple overview is provided in the schema.org blog post Introducing ’Role’.

Let’s look at the example of authorship information. Some properties of the agent who authored the document (person or organization), such as their name, are considered to be true outside the limited context of the document. These properties will be set directly on the agent.

On the other hand, other properties are considered to be specific to the agent in their role as author of the document. To give an example, were I to be writing the document you are currently reading as a freelancer for the Illuminati, my affiliation to them would be solely in my role as author and I should not be considered eternally indentured to them.

When a role is used to enrich a property, the convention is to have it as the value of that property, and then to repeat the property on the role to point to the object. At first glance it sounds contrived, but it is a simple and powerful construct. To stay with the authoring example, the indirection would look like:

schema:ScholarlyArticleschema:authorsa:ContributorRoleschema:authorschema:Person

To demonstrate how properties can attach differently to the role and to the agent, we can unfold the authorship example further:

schema:ScholarlyArticle
└─schema:author
  └─sa:ContributorRole
    ├─schema:author
    │ └─schema:Person
    │   ├─schema:name = Bruce Banner
    │   └─schema:url = http://berjon.com/
    └─sa:roleAffiliation
      └─schema:Organization
        ├─schema:name = Illuminati
        └─schema:address = Bavaria
          
Example of a role being used to model authorship. This effectively states that my affiliation as a contributor is to the Illuminati and that my name as a person is Bruce Banner.

Actions

Actions are a global schema.org mechanism to convey facts about things that can be or have been done. There is an overview document for actions but it dives deep very fast and may be more confusing than helpful. This sections intends to convey all that one needs to know about actions in order to understand their usage in Scholarly HTML (keeping in mind that they can do much more).

Note that actions can do much more than what Scholarly HTML uses them for. For instance, if you use an email client that supports actions (such as GMail) you may have noticed that some emails allow for direct interactions: those are implemented using actions, and without scripting.

Actions have a type (e.g. ReadAction, DrinkAction), a status (completed, in progress…), an agent being whoever carries it out, and an object which is what they are being done to. They can also have start and end times (as well as several other properties which we won’t go into here). Scholarly articles typically feature indications about things that people have done, which is a good fit for modeling with actions. A few examples should help clarify the notion.

When referencing an online work, it is customary to indicate the access date for it (since it may have changed in the meantime). This can be modeled as a schema:ReadAction, with its schema:actionStatus set to CompletedActionStatus, and a schema:endTime being the access date. In JSON-LD it would look like this:


{
  "@type":        "ReadAction",
  "actionStatus": "CompletedActionStatus",
  "endTime":      "1977-03-15"
}
          
Example of a schema:ReadAction used to model the access date of an online citation. Both the object and the agent are implicit in the context in which it is used.

Authors often acknowledge the contributions of others or have to disclose potential conflicts of interest that may stem from their interactions outside of the article. The former can be conveyed as an sa:AcknowledgeAction in which the schema:name of the action is the verb part of the acknowledgement and the schema:recipient is the person (or entity of any kind) being acknowledged. The agent is typically implicitly specified as the object to which the action is attached.


{
  "@type": "AcknowledgeAction",
  "actionStatus": "CompletedActionStatus",
  "name": "is thankful for the pioneering contribution of",
  "recipient": {
    "@type": "Person",
    "name": "Vannevar Bush",
  }
}
            
Example of an sa:AcknowledgeAction in which the author (not shown) acknowledges contributions from Vannevar Bush.

Article and Title Semantics

The article element that roots the content of the Scholarly HTML document needs further refinement to capture the specific type of article that it encodes. The typeof attribute should always contain schema:ScholarlyArticle as its first item, but it can be further refined with additional article types.

Should we recommend a specific taxonomy for article (sub)typing? There are so many: Fabio, MeSH, NPG…

In order for arbitrary parts of the document to be able to attach metadata to the article, it also needs to have its resource attribute set to a URL that can be referenced (it is typically sufficient to just use # for that purpose).

While the h1 in the document’s header is sufficient to convey the fact that it is the document’s title, some services use extraneous information in order to assign an unambiguous title to the document. As such, it needs to have its property attribute set to schema:name. Similary, if a subtitle is present in the header it needs to be decorated with both a role of doc-subtitle (to expose its DPUB-ARIA semantics) and a property of schema:alternateName.

If appropriate, the beginning of the article is also a good place in which to capture the accessibility properties of the document, using the relevant parts of schema.org (schema:accessibilityFeature, schema:accessibilityHazard, schema:accessibilityAPI, and schema:accessibilityControl, as detailed in the WebSchemas Accessibility wiki page).


<article resource="#" typeof="schema:ScholarlyArticle">
  <header>
    <h1 property="schema:name">Is Cryptopaleozoology Hopeless?</h1>
    <p role="doc-subtitle" property="schema:alternateName">
      The Future of the Scientific Method
    </p>
  </header>
  <meta property="schema:accessibilityFeature" content="alternativeText">
  <meta property="schema:accessibilityFeature" content="MathML">
  <meta property="schema:accessibilityHazard" content="noFlashingHazard">
</article>
          
Example of article and title markup.

The contentinfo Region Semantics

As described in the Structure section, the contentinfo region serves as a container for the metadata of the article. It is itself nothing more than a div with a role of contentinfo, but its content has rich structure.

It contains a list of section elements, each of which is identified with a specific typeof attribute.

Authors & Contributors

If the document has authors or contributors, they are listed in a section with typeof sa:AuthorsList. The content of that section is an h2 title appropriate for it, followed by either a ul or ol (depending on whether the authors are considered ordered, which is highly dependent on the discipline’s culture).

Each li in that list must feature a typeof of sa:ContributorRole and a property of either schema:author or schema:contributor depending on which is applicable. Modeling with schema.org roles is explained in the Roles section.

The sa:ContributorRole span is structured as follows:

  • Exactly one span with a property of either schema:author or schema:contributor (matching the one that points to the role) and typeof either schema:Person (if the author is a sentient entity) or schema:Organization (if it is a collective thereof). How to capture persons and organizations is detailed in the creatively-named Persons & Organizations section.
  • Zero or more a elements with a property of sa:roleAffiliation, one for each affiliation of the author in producing the article. Each of those elements needs further to have a resource attribute matching the one on the affiliation it is pointing to and an href attribute linking to the element on which that affiliation is defined. The a element may contain arbitrary text (typically a number, letter, or symbol matching that used by the target in its own list). These should not occur if the agent is an organization.
  • Zero or more a elements with a property of sa:roleAction, one for each comment describing the author’s specific contribution to the work (e.g. "Authors contributed equally" or "Designed the study"). Each of those elements needs further to have a resource attribute matching the one on the note it is pointing to and an href attribute linking to the element which contains that note. The a element may contain arbitrary text (typically a number, letter, or symbol matching that used by the target in its own list).
  • Zero or one ul elements. Each of its li children has a property of schema:roleContactPoint and a typeof set to schema:ContactPoint. The content of each li can be anything that describes a manner of contacting the author in question, but it will typically involve properties such as schema:email, schema:telephone, schema:address, schema:description (for arbitrary descriptions of the contact method), or for journals publishing to the Web of the early 1980s schema:faxNumber.

Here is an example of a complete kitchen sink authors’ section. Note that in most cases the markup will be much simpler — this exercises far more of the features than there is information for in a typical case.


<section typeof="sa:AuthorsList">
  <h2>Authors</h2>
  <ul>
    <li typeof="sa:ContributorRole" property="schema:author">
      <span typeof="schema:Person"
            resource="https://en.wikipedia.org/wiki/John_Henry_Holland">
        <meta property="schema:givenName" content="John">
        <meta property="schema:additionalName" content="Henry">
        <meta property="schema:familyName" content="Holland">
        <span property="schema:name">John H. Holland</span>
      </span>
      <a href="#sf" property="sa:roleAffiliation" resource="http://www.santafe.edu/">a</a>,
      <a href="#umich" property="sa:roleAffiliation" resource="http://umich.edu/">b</a>,
      <a href="#note1" property="sa:roleAction" resource="#note1" rel="footnote">1</a>
      <ul>
        <li property="schema:roleContactPoint" typeof="schema:ContactPoint">
          <a href="mailto:jholland@umich.edu"
             property="schema:email">jholland@umich.edu</a>
        </li>
        <li property="schema:roleContactPoint" typeof="schema:ContactPoint">
          <a href="fax:+4815162342" property="schema:faxNumber">+4815162342</a>
        </li>
      </ul>
    </li>
  </ul>
</section>
            
Example of an author (schema:contributor could also have been used). The links to #sf, #umich, and #note1 are expected to point to items in the Affiliations section for the first two, and in the Notes section for the last one.

Affiliations

If the authors and contributors of the documents are affiliated with organizations, they are listed in a section with typeof sa:Affiliations. The content of that section is an h2 title appropriate for it, followed by a ul or ol (but the order is less commonly relevant than it is for authors).

Note that articles that feature an organization as an author should have that organization listed in the Authors & Contributors section, and not here.

Each li in the list is one affiliation (though multiple people can reference it). The li needs to have an id matching that used in the reference. Inside the li is a span with typeof set to schema:Organization and its resource also matching the one used in the reference. (The belt and suspenders approach is unfortunately needed to produce both usable HTML and a viable data model.)

The content of the schema:Organization can contain any applicable property. An example of an affiliations section, with some extra structure for the organization is given below.


<section typeof="sa:Affiliations">
  <h2>Affiliations</h2>
  <ul>
    <li id="sa">
      <span typeof="schema:Organization" resource="https://science.ai/">
        <span property="schema:name">science.ai</span>,
        <span property="schema:parentOrganization">
          <span typeof="schema:Organization">
            <span property="schema:name">Standard Analytics</span>
            —
            <span property="schema:location" typeof="schema:Place">
              <span property="schema:address" typeof="schema:PostalAddress">
                <span property="schema:addressLocality">NYC</span>,
                <span property="schema:addressRegion">NY</span>,
                <span property="schema:addressCountry">USA</span>
              </span>
            </span>
          </span>
        </span>
      </span>
    </li>
  </ul>
</section>
            
Example of an organization section.

License, Copyright, Keywords, and Abstract

The copyright line of the document can be included in any element, but it must list two properties. schema:copyrightYear has a numeric value, and schema:copyrightHolder has a value of schema:Person or schema:Organization.

A link to the license for the article should be provided. The link should have the property of schema:license and typeof="CreativeWork".

Keywords should be listed in a section element with an appropriate h2. The list of terms should be a ul with the property of schema:keywords on every li.

The abstract should be included in a section element with typeof attribute containing sa:Abstract. The abstract should have the role of doc-abstract.

Notes

Notes that add information about the Authors and Contributors section should be hunk elements labeled as doc-footnote.

Citations & References

Citations in scholarly articles provide a way to reference the work of others upon which one builds. In the pre-Web era, they essentially served as links by carrying sufficient information for one to find the reference in question, in a relatively compact manner.

In a Web world, it can seem tempting to simply replace citations with links, but there is value in keeping the limited amount of metadata about the cited object that they provide inlined in the document. Links rot and disappear; when that happens the rest of the information can prove crucial in finding the referenced object at some other location. Unique identifiers with indirect resolution, such as DOIs, might seem like a solution to this problem but being opaque humans routinely get them wrong. (DOIs additionally suffer from a single point of failure for resolution.) All things considered, including a link for convenience and human-readable metadata about the referenced object is likely the most resilient way to cite another document.

In the print universe, reducing the number of pages one needs to use can be a noticeable cost-saver. Given that scholarly articles can easily feature dozens if not hundreds of cited references, making use of compact reference conventions (as well as smaller font sizes) made sense. Over time, however, what was a sensible idea degenerated into a territorial maze of gratuitously heterogenous conventions to the point where there now exist over 8000 citation styles.

There is no value in Scholarly HTML so much as attempting to support all citation styles. The Web does not need the compactness. Citations and references should be both data-rich and human-accessible, something on which the traditional formats fail, in some cases quite spectacularly.

For accessibility purposes, Scholarly HTML recommends that references be formatted in such a manner that they read naturally in the article’s natural language, with articulations between the metadata parts, as below:


<li typeof="schema:WebPage" role="doc-biblioentry"
    resource="http://semver.org/"
    property="schema:citation" id="some-id">
 <cite property="schema:name">
   <a href="http://semver.org/">Semantic Versioning 2.0.0</a>
 </cite>,
  by <span property="schema:author" typeof="schema:Person">
   <span property="schema:givenName">Tom</span>
   <span property="schema:familyName">Preston-Werner</span>
 </span>; published in
 <time property="schema:datePublished" datatype="xsd:gYear"
       datetime="2014">2014</time>
 <span property="schema:potentialAction" typeof="schema:ReadAction">
   <meta property="schema:actionStatus" content="CompletedActionStatus">
    (accessed on
   <time property="schema:endTime" datatype="xsd:date"
         datetime="2016-02-01">01 Feb 2016</time>)
 </span>.
</li>
          

The above code renders as:

  1. Semantic Versioning 2.0.0 , by Tom Preston-Werner ; published in (accessed on ) .
Example of a readable citation.

The references section is simply a section with an appropriate heading, containing an ol. Each li in the list follows a regular structure: it has a role of doc-biblioentry, a resource being the URL identifying the cited object, a property of schema:citation, and id to make it linkable, and a typeof capturing the kind of object that is being referenced (typically schema:ScholarlyArticle, schema:Book, or schema:WebPage but there is really no limit as to what may be cited).

The content of the li can be any RDFa that matches the typeof, but some good practices should be observed.

The title or name of the cited object should be in a cite element. If a link is available, then the title should be linked. Date and time values (such as publication or access date) should make use of the time element (further noting that the datatype attribute can be used to express the granularity of the date as in the example above).

While arbitrary metadata may be used, it is highly recommended to stick to schema.org and the Scholarly Article vocabularies. The reason for that is that, should one wish to convert from Scholarly HTML citations into a specific print format then it will be desirable to be able to reliably extract information from the citations. This could be used for instance to produce CSL variables (as exemplified in the CSL documentation) and then use a CSL implementation in order to produce the output.

Should we be more constraining and define more precisely the constructs that are more likely to interoperate?

Providing a mapping to CSL would be extremely useful.

Footnotes & Endnotes

If the document has notes, they are listed in a section with the role of doc-endnotes. The content of that section is an h2 title appropriate for it, followed by either a ul or ol. Each li should be labeled with the role of doc-endnote.

Funding Information

Funding informations is provided using a complex triples structure which can be summarized as follows:

This can be enhanced with information such as the award name and Role information. Here is a detailed example:


XXX this example has issues
<section id="funding" typeof="sa:Funding">
  <h2 role="doc-title">Funding</h2>
  <ol property="schema:sponsor">
    <li resource="http://funding.example.org/" typeof="sa:SponsorRole">
      <span typeof="schema:Person" resource="http://example.name/">
        <span property="schema:name">Xiong Ding</span>
      </span>
      <span property="schema:name">acknowledges support from</span>
      <ul>
        <li>
          <span typeof="schema:Organization" resource="http://www.nuc.edu.cn/">
            <a href="http://www.nuc.edu.cn/" property="schema:name">North University of China</a>
          </span>
        </li>
      </ul>
      <span property="schema:roleOffer" typeof="schema:FundingSource">
        <span property="schema:name">The 11th Graduate Science and Technology Projects</span>
        (<span property="schema:alternateName">Natural Science Project</span>)
        <span property="schema:serialNumber">20141130</span>
      </span>
    </li>
  </ol>
</section>
          

Disclosures

Disclosure information is a list of disclosure actions described in a simple triples structure.

Acknowledgements

XXX

Scholarly Article Vocabulary

A limited number of classes and properties are currently not available from schema.org. In most if not all cases it would be desirable to make them available there, but while work is progressing it is simpler to define them ourselves.

The current URL for the Scholarly Article vocabulary is http://ns.science.ai/. It may be desirable (should the vocabulary persist) to use a different URL. But this issue might go away if schema.org steps up.

You can read the definitions for the SA vocabulary.

Hypermedia Controls

Processing Model

Acknowledgements

Scholarly HTML would like to thank Scholarly HTML (you read that right) for blazing the trail perhaps a few years too soon. Particularly, the following people were particularly kind and helpful: Peter Sefton, Richard Smith-Unna, and Peter Murray-Rust.

PLOS has a short history of Scholarly HTML that is worth reading (and would be worth updating).

Dan Brickley was kind enough to drop by the office to chat about our usage of schema.org even though he was tired and hungry. As always, examples involving fish tanks are the most helpful. Dave Cramer shared ideas that we happily stole.

Patrick Johnston’s input has been crucial, notably in modeling authoring. We can only hope that getting those details exactly right have not caused him to lose too much sleep.

We also received very useful feedback and pointers from: Kjetil Kjernsmo (DAHUT!), Silvio Peroni, Justin Johansson, Alf Eaton, Raniere Silvia, Kaveh Bazargan and Mike Smith. We are very much indebted to the help provided us by Ivan Herman.

If we somehow forgot you in this list and you are too gracious to complain, we love you all the same.