Modern Tooling

Abstract

This document is meant to capture the state of the discussion about modernising the tooling that supports the making of W3C standards.

Discussion is open to all. Pull requests are welcome, and the discussion takes place in the issues.

4. Tools & Supporting Setup

This section is a categorised but unprioritised list of tools and services that W3C should deploy, as well as ideas on how to manage integration with third-party systems.

4.1 Specification Development

The heart of what we produce is specifications, as a manifestation of consensus. These already have some tooling support but it can be both further and better integrated or documented.

4.1.1 Automatic Publishing

Great progress has already been made thanks to the Echidna system that can publish Working Drafts in an automated manner, as often as needed. This can effectively make TR documents and Editor’s Drafts indistinguishable.

There are, however, still a number of exceptions to the documents that Echidna can publish, and a number of rough edges to the system itself and its documentation. It needs iterative improvements.

It will also need to expose a stream of notifications (as described further below) for the documents it publishes.

4.1.2 De Facto GitHub

While the door needs to remain open for alternatives that our users may wish to avail themselves of, the expected default location for specifications (and any other W3C-related repository) should be GitHub, preferably under the w3c organisation. This has several implications.

We need to document the proper usage of GitHub for such work. Some of our users who are not from the Web development community can at times struggle with the site or with common conventions. Team resources need to be dedicated to supporting users of GitHub. A specific collaboration with GitHub may be considered if needed. NOTE: a repository has been created precisely to work on this aspect.

Better even than documentation, which is often ignored, a tool that enables the Team and the Chairs to set up new repositories correctly, within conventions, and given a few specific options would be most useful.

Groups regularly need to add new contributors to their repositories (or, on occasion, to remove them); which requires administrative powers. Sadly, the granularity of the ACL system there does not allow us to grant that power to non-Team participants, which in turn means that the Team needs to be called upon for everyday business. An interface to grant our users the ability to maintain a repository’s collaborators without involving the Team would be a time-saver. The ability to activate Travis without admin powers would also come in handy and save the Team some time.

Naturally, GitHub could disappear, go bankrupt, be overtaken by an evil gang of mole rats, or simply fall behind. In such an event, we must have all the useful data at hand and, if possible, be able to exploit it elsewhere. Git certainly makes it easy to keep repository data — it only needs be cloned.

But GitHub also has a lot of useful information that isn’t in git: issues, discussions, etc. We also need to keep that around. For that, we need a GitHub backup tool. This would use git’s naturally distributed nature in conjunction with the ability to receive organisation-wide hooks (and use the API to grab everything) in order to store all the useful information. Ideally all work would happen under the w3c organisation but the tool ought to be flexible enough to account for there being work elsewhere.

The hook system should also be used so as to produce a stream of notifications that can later be integrated into the dashboard.

This also entails that both dev.w3 and dvcs.w3 should be phased out. No new repository should be created there and no new user granted access. Projects that are still active there should be moved (and redirected). Obviously the content already published that no one is touching will likely need to stay there, but it should essentially be a static, frozen archive of old work.

4.1.3 Specification Styles as a Community Project

We regularly hear complaints about the usability of our specification styles. Several projects were started to radically improve them, but petered out.

At the same time, they cannot be changed radically overnight. Older specifications cannot be broken through style changes and we shouldn’t throw away the brand recognition that comes with the current style.

The solution is to open up the management of these styles and drive the project through largely small, incremental updates to the stylesheets. A repository for the styles should be created and we should start accepting pull requests. At first a clean-up of the current code ought to be carried out. Then, regular releases ought to be made based on contributions.

4.1.4 Interactivity in Specifications

The Web is no longer a static medium that is essentially fluid print brought to the screen. Documents today can work with the user interactively. So far we have only very sporadically made use of these capabilities, when in fact they could greatly enhance the usage of our primary products.

A common library should be developed to provide basic functionality across all specifications. This would include:

Simple, streamlined bug reports from within the document, with anchor and selected text included.
Easy linking to sections.
Finding for each definition where it is used.
Page-width and boilerplate visibility configurability.
Integration with testing & coverage results.
Integration with existing issues lists.
Tabs offering different views of content (e.g. several serialisations of a given data model).

Like the styling project, this can easily start small and humble, and progressively grow into a highly useful system thanks to contributions.

4.1.5 Document Structure Beyond PubRules

The new PubRules system (Specberus) is a welcome improvement in the toolbox. It still needs some UI fixes and should be moved to a location not intended to make it impossible to find, but overall it should soon be able to replace the previous instantiation.

We should, however, think beyond specification validation. As things stand today, it is extremely hard to get information out of our documents. Even when they are correct, they are all different. The RDF export is a very partial view and borderline unusable. The data that is extracted into W3C’s systems cannot be obtained, and is partial as well. Yet so much could be done with this information.

We need to progressively refine the components that make up a specification and to rethink them using modern HTML constructs for document semantics and metadata. Step by step we need to start unifying these constructs across the board (having tools produce them and enforcing them through PubRules). Document structure and metadata need to become eventually regular enough that TR can be used as an API.

4.1.6 Specification Production Tools

We have quality specification production tools and most specifications today make use of them. It is worth thinking about how, as a common ecosystem, they can be improved.

One aspect in which they can be improved is to bring their source formats in line wherever it makes sense (and isn’t disruptive). The end goal here is that contributors to specifications who move between groups should as much as possible find themselves able to switch between tools as easily as possible.

This, naturally, should not be done at the cost of innovation in those tools, but it surely can be done for some of the common, well-understood aspects.

4.1.7 Cross-References Support

The Web platform is increasingly described by a large mesh of specifications that reference one another. Currently, this is done relatively poorly overall. There are several methods in use:

Handwave about doing something the way another specification says to do it, with a reference to the whole document.
Link to a definition in another specification, which can break (as has happened several times for people referencing HTML), and sometimes forget to include a formal reference.
List imported definitions from other specifications, with links and references, and then use internal references to those. This is likely the best option available today, but it is cumbersome work.

Ideally, all definitions from all specifications should be globally available. This would make it possible to simply reference them in a specification tool source format and get the correct link and reference handled. It would enable the generation of glossaries. (We urgently need to phase out the current one.) We could expose a searchable interface that would make it possible for people to find which specification defines what, and which make use of what concept or construct (something that would prove invaluable in coordination).

4.1.8 Live Examples

It is extremely common today to replace examples in documentation with live ones that can be edited and rendered using one of the many services that provide such functionality.

While it is probably not acceptable to inject code from a third-party service into our specifications, there exist reusable components that would enable examples in our specifications to become hackable. This would have tremendous value for developers trying to learn and understand the technology.

4.1.9 Specref

Specref is the database of bibliographical references that specifications rely upon. It has been through several iterations and is generally considered good, but it can be improved.

Currently managed by Tobie, it could be brought under W3C management.

It could use a front-end for adding resources and a few other niceties to integrate it better with the W3C tool suite (e.g., improving Marcos’ [bibliography generator](http://marcoscaceres.github.io/bib_entry_maker/)).

4.1.10 Specification Quality Checker (CI)

Some specification issues (that are not about the actual prose content) can sometimes exist for embarrassingly long periods of time before anyone notices. Sometimes they are noticed after a document is finalised. Or they are raised at the worst moment, all at once when the editor is busy doing something else.

Developers use continuous integration to make sure that there is a constant pressure towards quality rather than a last-minute rush — so should editors. There have been experiments around this (using Travis’ integration with GitHub); they should be expanded and systematised.

Ideally, an editor should be able to, at the click of a button, have the proper Travis integration (and dependencies) installed into the specification’s repository and deployed for immediate usage. This could even be made the default at repository creation if it is known that it will contain a specification.

Several different tools can be called upon for quality checking; since they can be used on their own they are listed separately below and in the «Developer Tools» section.

4.1.11 WebIDL Checker

We have an existing WebIDL Checker. It needs to be integrated into the quality workflow (arguably into Specberus as well as into the linting options of specification tools), exist with guarantees of maintenance, documentation, and made easy to find.

4.1.12 References Checker

We have existing tools (that almost no one knows about) that can read a specification and validate that its references (at least to W3C or IETF documents) are up to date with the latest versions of those documents: the references checker and the IETF references checker.

While specification production tools, by building atop SpecRef, have largely removed the need for these, some specifications are produced by hand and some have local bibliographies. They should be part of automated checking.

4.1.13 Spell Checker

W3C actually exposes a Spell checker tool (that can work with HTML). It is unclear that this could be integrated into automated checking (given the number of justifiable exceptions) but it could perhaps be made more prominent (or, if found not to be useful, retired).

4.1.14 Specification Diffs

W3C currently offers an HTML Diff service which is sometimes used to see changes to documents.

It is a common request to find out what has changed between two versions of a specification. In fact, producing a diff seems slated to be part of the new errata management approach that the AB is investigating.

This should be made automatic, and possibly specialised. Right now the HTML diff tool will produce as output a complete document, which can make the differences hard to find (e.g. a paragraph change in the whole HTML specification). It can also include a lot of spurious differences. (For instance, if a section was added at the beginning, all the section numbers come out as changes.)

This tool could be specialised for specifications in such a way that ignorable differences would be ignored. Also, an affordance could be exposed in specifications offering to show a diff between the version being looked at and any other version of that document.

4.1.15 Dependency Tracker

Specs depends on one another, but there is very little done to ensure these dependencies are kept in sync; they are expected to be reviewed at transitions, and sometimes the link checker will spot a dependency gone wrong, but a more systematic approach would allow for better coordination across groups and reduce the risks for «monkey patching».

A dependency tracker that would allow to visualise and report the dependencies between specs, e.g. in terms of WebIDL interfaces but also in terms of algorithms, would be terrific. Ideally, they would lead to much more granular dependencies — a spec usually depends on a few items of a given spec, not the whole thing; assessing dependencies at the right granularity would be helpful for streamlining transitions on the Recommendation track.

4.1.16 WYSIWYG Specifications

Supporting WYSIWYG editing for specifications could help improve collaboration and contributions from people who might know HTML but are not necessarily versed in the specific formalism of a given specification tool.

While such a tool would be interesting and valuable, it is a long and complex project to put together, and likely not a priority.

4.2 Dashboard

Perhaps the most common complaint about the W3C’s tooling setup, which is only made worse by spreading work across a broad set of external services, is that there are too many places at which to track things, often in very different ways.

In addition to tracking information, taking action within the system also involves remembering where different parts are. There are several pages listing services (e.g. the Member page, the AC page, the Editors’ page) but they can prove overwhelming, especially since they often have content that is out of date. They are also very generic: most users don’t need to know everything that they contain.

This leads to three high-level requirements for a modern system, from its users’ point of view:

Unity: there needs to be a single URL at which to find everything. This should include information about what is going on in parts of the organisation that one is interested in, but also the ability to use the tools that one needs to carry out one’s work within W3C.
Customisability: since no two participants have the same needs, they need to be able to fully customise their interface.
Consistency: learning one tool is difficult, learning many all the more so if they all look different. As far as possible we should endeavour to expose a consistent user experience to our systems so as to reduce the learning curve for newcomers.

This translates to a specific design for the various moving parts of the system.

4.2.1 Unified Feeds

There are many sources of data that can become involved in the dashboard. In order to keep them manageable, we need a specialised component that can provide a consistent interface to them.

This service is a relatively simple shell around various data streams. It can be configured to poll a resource at regular intervals (e.g. retrieve and process an RSS feed), or on the contrary to expose a hook for pushed information (e.g. we can set up a GitHub hook calling it for all events in the w3c organisation there). Ideally we could also collect data from mailing lists, for instance to produce a feed of browser vendors’ intents to implement/deprecate.

Each of its plugins knows how to obtain data and store it in a unified manner. They can easily be configured (e.g. to add RSS feeds, to receive GitHub notifications from other repositories).

This mass of data can then be returned to queries that essentially involve two filters: which feeds the user is interested in, and a date-time cut-off so that one can ask only for what has happened since one last visited.

Additionally, a WebSockets interface can be made available so that dashboard users can get notifications in real time in the UI.

4.2.2 Dashboard

The dashboard itself is nothing but a container of smaller widgets which can be configured, added, or removed by each user. In order to be fully generic, we need to ensure that it remains extremely simple both in its own functionality and in the interface it exposes to widgets.

Widgets can be used as part of an overview grid, or can take over the whole screen when they need more real estate.

It is very important that it be possible for independent contributors to develop widgets separately, that can then become available through the dashboard (after W3C approval, we can’t automatically deploy third-party code). Widgets can expect to run in insulated iframes and to communicate with the dashboard to access whatever information they need through messaging.

W3C’s own applications are expected to transition to being exposed as widgets. The general principle at work is that services should be made available as APIs, and exposed to users through widgets. With this in mind widgets can use a common stylesheet and common scripting libraries to help support greater coherence and fluidity in interaction.

4.2.3 Notifiers

Another tool that can plug into the unified feed is a notifier. Users can opt into being notified of various events by email (through, you guessed it, a dashboard widget). Likewise, IRC channels can be notified of specific events by a bot. The idea is that this is extensible: the unified feed provides the data in a usage-agnostic manner; tools then piggy-back on it.

The notifier simply receives the unified data feed, and for each event finds if there are people who want to be notified of it. If so, they get an email.

The notifier needs to be able to filter events based on labels. For instance, the I18N WG uses «I18N» labels on bugs on other groups’ bugs trackers in order to track horizontal reviews. Similar conventions could be supported by other groups.

Dom has built a tool (github-notify-ml) that can notify mailing lists when certain events happen on GitHub. It could be enhanced to make use of this system.

It would prove particularly useful if the notifier were able to filter events to match changes to specific sections of a specification (notably to help horizontal reviewers).

4.3 Groups & Community

This section covers tools that are in common use for coordination and communication inside groups and the broader community.

4.3.1 IRC & Chatting

IRC is relatively operational and it is highly programmable. Because of this, it would be hard to replace with other more modern solutions such as Slack or gitter. It is, however, often unknown to the younger generation or in fact more generally to people not strongly steeped in the open source culture of the past twenty years. As such, an improved Web interface would be desirable. (The current one is serviceable, but could do with some freshening.) One potential option here would be Shout. It is a client-server combo with a rich interface, and is entirely implemented in JS. Amongst its advantages are that it can remain connected to IRC even when you close your browser, it supports connecting as the same user through multiple channels at once, and it has a responsive layout that is well-suited to usage on a smartphone.

IRC logging could, however, be improved. RRSAgent has its value for the specific case of capturing minutes in a manner that is easily reused by other tools, but as a general-purpose logger we can do better. Too many channels are simply not logged at all — even though much work takes place over IRC! And when they are logged, RRSAgent logs are then extremely hard to find; lost in dated-space never to be retrieved again.

A general-purpose channel logger bot need not be complicated. In my experience, people are satisfied with Krijn Hoetmer’s setup. Its limitations are largely that it requires getting agreement from Krijn in order to get something logged, and it wouldn’t scale to the many channels we would want to log. The RICG also uses a simple Drupal-based PHP setup to reference discussion, retrive to GitHub issue information, and for minute-taking.

A single service at which all chat logs can be located would therefore be needed. It does not require bells and whistles, but it needs to support continuous logging, finding channels, and search.

4.3.2 Mailing List Interfaces

While the mailing list service itself is largely beyond reproach, the interface to our archives is now one of the more decried parts of our offering. While a number of the complaints may be exaggerated, it is certainly true that it could do with improvements in both style and usability.

A project that is commonly cited as a huge improvement over what existed previously is https://esdiscuss.org/. It is not perfect, and the source can probably not be reused as is. Its fundamental architecture is sound however: convert email to JSON and expose those for other services to build upon. Given the size of the W3C archive and its daily volume, this would not be a minor undertaking — it is likely better to test-drive the idea on a small set of lists.

People have also complained about the user experience of signing up to our lists. A user interface allowing people who have an account (either with us, or with a third-party service that we can authenticate against) to use that to subscribe directly (without the email ping-pong) and to immediately accept archival would help our community as well.

4.3.3 Forum

Forums have overtaken mailing lists in terms of popularity for online discussion. Groups would certainly benefit from being able to rely on those for at least some of their work.

This aspect is already covered by our Discourse installation, which is being moved to WebPlatform.

4.3.4 EtherPads

Etherpads are commonly used to take shared notes, which could be useful for minutes in some groups, but is also particularly useful during brainstorming sessions when a group of people wishes to collectively.

Because W3C does not provide its own pads, people routinely use those available from Mozilla or the MIT. This results in information being scattered far and wide.

There are numerous pad implementations, so deploying this is largely a case of picking one that we like. A good pad should support:

History: you can replay the edits;
Archivability: you can mark a pad as «done» to freeze it.
Search: the pads should be indexed so that one can find content. This could be supported externally, but does require the pad system to offer the means of listing all the existing pads.

Bonus points if there is a way for the pad to tie into our user system such that people who join a pad to edit automatically have their names set up.

4.3.5 Collaborative Todos

This section is more of an open question than anything else. Some teams use systems like Trello to organise their work, basically through collaborative todo lists. I am not aware of groups doing the same, but I would be interested in hearing about usage or interest.

4.3.6 Issue Trackers

Three major bug trackers are in use in W3C groups: Bugzilla, Tracker, and GitHub Issues.

Bugzilla is the most powerful and flexible of the three, but it is extremely slow, its interface is clunky, and automating it through external tools generally requires scraping its HTML output. (There does exist a way of enable XML-RPC for it... but, well, it’s XML-RPC.) It also likes to send email; a lot.

Tracker is the only option with IRC integration (through trackbot) and it also supports action items in addition to issues. It knows about our users, IRC nicks, and working groups, which can prove useful. However its interface is also clunky and setting up a new instance for a given project requires intervention.

GitHub issues are by far the easiest to use. They have nice conversational capabilities. However they’re external to W3C and they don’t know about groups and the such.

I am making the following perhaps radical recommendations:

Phase out Bugzilla. It is horribly slow and painful to use. Its code base is hopeless. I know of no group that uses its more advanced features; and groups that use some of them tend to use them wrong.
Phase out Tracker. Issues should be linked to a project more than to a group. Action items can be issues assigned to someone, and eventually tied to a milestone. (The dates on AIs are almost never heeded anyway.)
Move all issue tracking to GitHub. Having issues tied to projects makes up for tying them to a group. There is no requirement for administrative intervention. The hardest point is IRC integration. An IRC bot will need to be developed that can integrate with GitHub issues, can be configured directly in a channel, and can proxy action on behalf of people on the channel. This is not a minor project, but the upside is that if done well it would be of interest to a community beyond W3C and would therefore likely receive contributions.

If we cannot unify our issues handling, then at the very least the various services will need to be plugged into the Unified Feeds system so as to be visible in the dashboard.

Horizontal review groups have their own needs in terms of issue tracking. Richard makes a good description of it in issue #8. Ideally the tracking system ought to enable the automation of many of the complex steps they need to go through.

4.3.7 Group Pages

W3C Working Groups should maintain a page about what they do, what their documents are with what status, what their upcoming events of note are, how they can be interacted with, etc.

Currently that is usually done by the team contact and/or chair hand-editing a bunch of HTML documents in CVS. The information provided is completely different from group to group, and the layouts are all over the place. If you know how to find information with one group, you will be lost with another. What’s more, the editing process is painful enough that it is rarely done, and can rarely be automated. Stale information abounds (it is very common to find completely outdated charters for instance, or list of minutes that stop two years ago even though the group is active). Some groups have moved a fair part of their information elsewhere, sometimes their own site, sometimes a wiki. This is further compounded by the fact that most — but not all! — group pages are in dated space somewhere. There is simply no easy way to find them; at times even search engines get confused.

Group content should be entirely moved to a single, easily found location. For instance, https://groups.w3.org/$GROUP_NAME.

Group pages should be managed using simple but effective tools. CMSs are unlikely to provide a good match in terms of flexibility and automation. A static generator system such as Jekyll ought to be used. A common layout should be deployed across the board. Similar information should be presented in a similar fashion and at similar locations (both on screen and in URL space).

In addition to the functionality already provided by such tools (pages and blogs for news being typical), it ought to be relatively easy to write plugins to publish calendaring information as well.

4.3.8 The Fate of Wikis

Wikis are annoying. Idiosyncratic syntax, little to no notification, practically nothing in the way of actual collaboration (people might work on the same document but don’t really talk, certainly not in practice). W3C makes the problem worse by having uncountable different wikis spread out all over the place. It is already hard to find information in one wiki; when one has to remember which wiki it was in things go awry.

That said, wikis do have their die-hard lovers and they will likely remain in use. I recommend:

Avoiding the creation of new wikis as much as possible.
Making the existing ones read-only as often as possible.
If people need to produce joint notes, they can use a pad. If they need to produce joint documents they can use HTML in a repository, with all the usual collaboration this enables (issues, pull requests, etc.).
For wikis that remain in use, they should be integrated into the Unified Feeds system so that at least changes made there can show up in the dashboard. A single entry point to search across all the (public and active) wikis would be a big plus.

4.3.9 Scribe Management

Chairs often need to maintain a scribe rotation. It is not the end of the world, but it is a little bit of extra manual work that can easily go wrong.

A scribe manager could possibly be built atop the W3C Data API that the systeam is working on (assuming it could list group participants). You could mark people as "never scribing", click someone’s name when they’ve scribed, have new participants be sent to the bottom of the list.

4.3.10 Documentation & Templates

Even with a freshened up and more organised setup, making use of all our tools and the external tools they rely on can be challenging.

It is absolutely necessary that there exist clear and well-identified guidelines for operating with these tools. Ideally, short enough that people actually read and memorise them.

Where possible, we should provide template configurations (e.g. the typical GitHub repository for a specification) that can easily be reused.

Additionally, it would likely be useful to create training material for the tools. The Team would naturally be subjected to it, but all newcomers (and in fact anyone interested) would be able to use this as a resource to get up to speed quickly.

4.3.11 WebIDL Map of the World

Having an automatically maintained map of the all the WebIDL interfaces that are developed across the Web platform would make it easier to navigate the organic structure of Working Groups / Community Groups / etc., and would let groups more easily discover ways that other interfaces deal with specific patterns.

Both PLH and Dom have early versions of such a tool.

4.3.12 Translators

We have a community of dedicated people who translate specifications so that they can be read in a language more convenient to some of our users (see for instance Veebi sisu juurdepääsetavussuunised (WCAG) 2.0).

Translation can be difficult to organise well, and it can be hard to collaborate on a translation, especially for non-technical users who can nevertheless contribute usefully to this effort.

It would be interesting to have tools that support easier translation of our specifications, such as side-by-side views and document views updated live as the code is changed (to make things easier on those who have limited HTML experience).

4.4 Developer Services: W3C developer tools

Developers use W3C technology a lot, and some of our services. Yet we make little use of this interaction in order to build a relationship that we badly need. There is currently a lost opportunity to turn these useful (and, in some cases, used) tools into goodwill and good feedback. We are probably not getting as many donations as we could from these.

4.4.1 Developer Landing Page

We don’t offer an integrated view on the tools that we make available to developers. There is ongoing work in exposing this through a simple landing page. The W3C developer page aims to provide developers with a summary of all our open source tools and links to helpful resources. It is currently in development, and you can contribute to it in its Github repository.

You are very welcome to join the discussion about features, design and more in the developers repository’s issues.

4.4.2 Link Checker

The Link Checker is one of the more useful resources that W3C exposes, but it is woefully antiquated.

While not a top priority, it would be useful to rebuild it with modern technology and a modern interface. It should be relatively straightforward to make it a fair bit faster, if only by adding some caching.

4.4.3 CSS & HTML Validators

These are solid tools and do not need much work. We should stop calling the new HTML validator «experimental» and we should retire anything to do with the old one. We can sprinkle a little bit of nice design on the output, and we could document some of the APIs better, but nothing radical beyond some cleanup. It would be nice to ensure that the validators are available as libraries (outside Java), for instance as Gulp/Grunt packages that can easily integrate with common build systems for Web applications.

It has been suggested that it would be valuable to rewrite these two validators in a language more likely to be reusable by Web developers, namely JavaScript. While this may be true, it can only be a long-term, low-priority objective.

Note: the source for the Markup validator is Validator.nu.

4.4.4 I18N Checker

As in the previous section, this is good and operational. It could use being promoted more.

4.4.5 Accessibility Checker

We currently do not provide A11Y checking as a service, and arguments have been made that this would be a bad idea as there are aspects that cannot be tested. However, there are projects appearing that are beginning to offer this service (e.g. Tenon). We should take the time to think about our strategy here.

4.4.6 MobileOK

The MobileOK checker is a potentially valuable tool, but it needs an update. Validating XHTML Basic 1.1 or ensuring that a page is under 20K might not work that well with many modern uses of mobile. (Testing a site that I know works well on today’s devices scored it 29%.)

W3C is currently working on a new, open source version of the Mobile Checker. It is expected to be released soon.

4.4.7 Other Tools

These tools are currently not top priorities for the developers landing page but will be considered as soon as possible.

4.5 Core Site

While not strictly a tool in itself, the core w3.org site is a big part of what our users interact with, and improving it would help with many aspects of their work, notably not driving them to third parties when there is no need to.

4.5.1 Moving away from CVS

CVS makes using the core site inordinately painful. I for one avoid publishing anything there unless I really have to; in several cases I have set up an .htaccess proxy from a subdirectory just so I would never need to touch it again. I know others have similar experiences.

The massive size of the site also causes issues of its own. For instance, the recent discussion on upgrading to HTTPS across the board revealed just how hard it is to evolve it.

The site cannot be moved to a more modern system wholesale, but likely it does not need to.

Old content that is no longer touched can be left dormant where it is.
The focus of the core site should be as much as possible restricted to its public-facing components; basically explaining what W3C is, providing some news, etc.
Active content (group pages, services, etc.) should all be moved to their own sub-sites (e.g. using subdomains) that have their own Web space and can clearly be identified as their own thing. Not only does this make them easier to find, it also makes them manageable as smaller, modular units. These could all get their own individual repositories for instance.
Things that have historically been maintained somewhere in the bowels of this vast beast should be moved one by one to another location of their own.

4.5.2 Idiosyncrasies

W3C invented conventions before conventions existed. In some cases, those became conventions only within W3C.

Overview.html is one of those. The norm is index.html. It should become the norm on all new Web setups at W3C, and where possible Overview.html ought to be retired. It may seem like a small thing but it is a regular annoyance.

Dated space, of course, needs to die. There are exactly seven people in the world who find it usable, six of whom are on the Team (the other one being DanC). It is maddening, distracting, confusing, and in violation of every single URL design best practice in use anywhere outside the Consortium’s walls.

Again, dormant content in dated space can stay there. But content that is in any way, manner, or form still in active use should never be in dated space. A non-exhaustive list of content that should never be allowed to reside in dated space includes:

Public-facing content;
Groups;
Charters;
Tools

4.6 Testing

Our testing infrastructure has become increasingly important. In general it is much more recent than the rest of our tools and so less in need of modernising. But the setup can nevertheless be improved.

4.6.1 Critic

The Web Platform Tests (WPT) system relies on a code-review engine known as Critic. It ties into GitHub pull requests and is much more powerful than GitHub’s built-in code review.

Most groups likely do not need Critic, but for more complex code such as that which is found in tests it is a very welcome improvement.

The current Critic instance is run by James Graham on his personal server. It may be much better to host it on W3C.

It may be worth investigating the use of Reviewable as a potential hosted replacement.

4.6.2 Automatic Browser Testing

We have an existing system that allows people to run (the automated parts of) the test suite continuously, without human intervention. Right now it has only been adopted by a couple of browser vendors.

We should take this to the next level. We can integrate with SauceLabs (with whom we have a free account) and systematically run the entire test suite against the full set of browsers that they expose. This should allow us to gather very detailed statistics about which parts of the platform work where. It has the added advantage that it can help detect some broken tests.

4.6.3 Coverage & Test/Spec Linking

Finding out how much of a given specification is covered by the test suite is a hard, manual, and error-prone process. It cannot be perfectly automated, but it can be improved.

This also involves figuring out how to identify that a given test is related to part of a specification.

These tasks are not trivial and highly heuristic, but there has been previous work. It needs to be assessed and integrated.

One ideal use case for this is to be able to obtain a list of tests that need to be reviewed when the specific section of a specification that they map is updated.

4.6.4 Publishing Results

Once we have testing results for many implementations, we should publish them in a manner that is useful to developers. Such information is invaluable. The interface does not require much in terms of complexity.

4.6.5 Test Creation Helper

Creating new tests can in some cases be painful and slow, which in turn causes people to make mistakes as they cut and paste. This is particularly true for reference tests, and for groups that (still) require a lot of test metadata.

Part of that can be automated so that the basics of a test’s required parts can be filled out based on templates. Richard has a more extensive discussion of this proposal.

4.6.6 Collaborators Access

The testing project involves close collaborations between parties that are not all on the Team, yet develop the major parts of the system. While the service should be run on W3C resources, it would not be efficient to gate deployment to Team members only.

It must be possible to enable SSH access to a select group of contributors so that they could help curate the testing services.

4.7 Core Services

These are services that can be reused by other parts of the ecosystem.

4.7.1 Single Sign-On

Many tools and services will expose features that are user-dependent. It is not appropriate for those to implement their own user database duplicating W3C’s. It would also be a shame if they had to make use of third-party sign-on infrastructure (which was the case with Discourse).

Which exact solution is selected here is of relatively little importance, but it does need to support typical modern Web application authentication workflows. GitHub’s OAuth implementation may provide a good example.

It should also be possible to retrieve useful information about a user, such as their GitHub ID.

4.7.2 Search

The various tools to be developed may commonly require search functionality. Given the spread of W3C’s content, this can quickly become required. For instance, an Etherpad service might request indexing for its pads (using its own ACL knowledge), a chat logging service might index the logs, etc.

One potential candidate, that can double as a JSON document store and therefore be usable for application data storage, is Elastic Search.