This document is meant to capture the state of the discussion about modernising the tooling that supports the making of W3C standards.
Discussion is open to all. Pull requests are welcome, and the discussion takes place in the issues.
This is just informal work by an informal task force.
W3C is a heavily tooled organisation. We did Web tools long before they became ubiquitous and our tooling has earned us two decades of praise — which is probably more than can be said for any other part of the organisation.
But the Web has changed quite a fair bit since then. Today, tools abound. The Web is an application platform. Our users are happy using many third-party tools, and it is only natural that they will compare what they can find elsewhere to our offering, sometimes preferring other options.
This causes problems:
We need an energetic rethink of the W3C’s tooling in order to stay relevant to our users. That is what this document endeavours to provide.
I expect to proceed as follows:
While precise details about the manner in which our various tools could be improved were deliberately kept out of this document, there are nevertheless many things that people would like to see changed, improved, or added. As a result, the full list in the following sections is long, and it is easy to get one’s eyes off the ball.
This section provides information on the projects that have been deemed high priority. These are described at a high level, and resource estimates are provided for them. «First priority» projects are intended for immediate development, «Second priority» projects are expected to come in a second wave. Everything in this document not listed as part of those two categories is considered to be kept for later. Note that «later» is not a fancy word for «never», and people interested in jump-starting those should of course jump right to it. It simply means that they are not immediately planned for with the resources available for the execution of this plan.
Everything includes documentation and testing. All projects are expected to use continuous
development, which is to say that whenever something hits the master
branch it
is deployed into production.
The projects in this category include:
This is required for development and beta-testing purposes.
Estimate: Personal server will be used for testing, done.
The goal of this project is to develop a service that can integrate all of the information being produced as we work (as it happens) and expose it for querying. It is the foundational brick for the dashboard, the notifiers, and possibly other services.
Estimate: 1 month FTE.
A configurable dashboard exposing the information that a given user wants to have. This is necessary in order for people not to be lost anymore. This is intended to evolve towards a full-fledged control panel for our users, but in the first pass the primary goal is to make it work as a configurable information-access system.
Estimate: 1 month FTE.
This system plugs into the unified feed and send emails to specific targets for a filtered subset of events.
Estimate: At most a couple days’ work.
Same as the email notifier, but for IRC channels. Note that this could be demoted to lower priority.
Estimate: 2 weeks FTE
As more of our work transitions to GitHub, we need to maintain a backup of the information we have there. The unified feed already effectively backs up information from issues and the such, but the actual git content needs to be backed up as it changes.
Estimate: 1 week
Echidna needs to send the information that it processes to the unified feed.
Estimate: At most a few days’ work.
This project is already under way, as such it is not specifically part of the Modern Tooling effort. It is, however, important to have it available in time for some of projects currently listed in «Priority Two».
Estimate: 0 (already accounted for).
We already have an experimental installation of Discourse, but we want to move it to WebPlatform. This is essentially: reinstalling, exporting the data, redirecting.
Estimate: 0 (turns out it will already be done)
The manner in which groups manage their own pages today is appalling. The goal of this project is to provide common infrastructure for them to be able to start reporting on their work effectively. This has the added benefit that it can expose information from groups straight to the unified feed (and then in turn to the dashboard).
Estimate: 1 month FTE.
This is already in progress. It essentially requires some tweaking and then deployment at a decent URL. It should not have to block on things like donations; as soon as the content is good we can ship it, and then iterate as more information and functionality become available.
Estimate: 10 days FTE.
A general guide for the usage of GitHub, as well as guides for the tools listed in this category (including Echidna) need to be written.
Estimate: 2-3 weeks FTE.
The projects in this category include:
The goal of this project is to make the user interface for mailing lists more usable.
Estimate: 2-3 months FTE.
Our specification could suffer to make use of modern capabilities standardised across the board (rather than added in an ad hoc fashion here and there).
Estimate: 3 weeks FTE.
The ability to have live examples that can be experimented with directly in the specification would help make things much easier to understand for readers.
Estimate: 2 weeks FTE.
The current WebIDL checker works, but it should be integrated into Specberus.
Estimate: 1 week FTE.
A number of GitHub maintenance tasks are pretty repetitive and could be partially automated. Also, templates to set up repositories would help people get it right and not have to learn tricks from the lore.
Estimate: 3 weeks FTE.
Many people want to improve our specification styles, but «Big Bang» attempts to change them entirely have so far failed. The goal of this project is to make the styles open source in order to make it possible for people to contribute to gradual improvements.
Estimate: 2 weeks FTE.
Etherpads are very useful for collaboration, yet W3C does not offer one. As a result, people use pads all over the place. This project simply installs an instance of our own. The first step is to figure out which one is the best option.
Estimate: @@systeam (will be estimated later, ongoing discussion).
We regularly need to provide some degree of access to some trusted collaborators. The
primary use case is the Web Platform Tests system where we have external developers.
This could simply be no more than putting grumple
outside the firewall
and granting access to two or three people. This also needs a clear policy.
Estimate: @@ted (the most urgent item already done, may not require more).
There are multiple stakeholders with varying needs when it comes to tooling.
AC Reps represent W3C members within W3C, and, in a sense, W3C activities within their own companies.
Their requirements include tools that provide a good overview of their organization's activity in different sectors, their administrative duties, and legal aspects.
…
…
…
…
…
…
…
This section is a categorised but unprioritised list of tools and services that W3C should deploy, as well as ideas on how to manage integration with third-party systems.
The heart of what we produce is specifications, as a manifestation of consensus. These already have some tooling support but it can be both further and better integrated or documented.
Great progress has already been made thanks to the Echidna system that can publish Working Drafts in an automated manner, as often as needed. This can effectively make TR documents and Editor’s Drafts indistinguishable.
There are, however, still a number of exceptions to the documents that Echidna can publish, and a number of rough edges to the system itself and its documentation. It needs iterative improvements.
It will also need to expose a stream of notifications (as described further below) for the documents it publishes.
While the door needs to remain open for alternatives that our users may wish to avail themselves of, the expected default location for specifications (and any other W3C-related repository) should be GitHub, preferably under the w3c organisation. This has several implications.
We need to document the proper usage of GitHub for such work. Some of our users who are not from the Web development community can at times struggle with the site or with common conventions. Team resources need to be dedicated to supporting users of GitHub. A specific collaboration with GitHub may be considered if needed. NOTE: a repository has been created precisely to work on this aspect.
Better even than documentation, which is often ignored, a tool that enables the Team and the Chairs to set up new repositories correctly, within conventions, and given a few specific options would be most useful.
Groups regularly need to add new contributors to their repositories (or, on occasion, to remove them); which requires administrative powers. Sadly, the granularity of the ACL system there does not allow us to grant that power to non-Team participants, which in turn means that the Team needs to be called upon for everyday business. An interface to grant our users the ability to maintain a repository’s collaborators without involving the Team would be a time-saver. The ability to activate Travis without admin powers would also come in handy and save the Team some time.
Naturally, GitHub could disappear, go bankrupt, be overtaken by an evil gang of mole rats, or simply fall behind. In such an event, we must have all the useful data at hand and, if possible, be able to exploit it elsewhere. Git certainly makes it easy to keep repository data — it only needs be cloned.
But GitHub also has a lot of useful information that isn’t in git: issues, discussions, etc. We also need to keep that around. For that, we need a GitHub backup tool. This would use git’s naturally distributed nature in conjunction with the ability to receive organisation-wide hooks (and use the API to grab everything) in order to store all the useful information. Ideally all work would happen under the w3c organisation but the tool ought to be flexible enough to account for there being work elsewhere.
The hook system should also be used so as to produce a stream of notifications that can later be integrated into the dashboard.
This also entails that both dev.w3
and dvcs.w3
should be phased
out. No new repository should be created there and no new user granted access. Projects
that are still active there should be moved (and redirected). Obviously the content
already published that no one is touching will likely need to stay there, but it should
essentially be a static, frozen archive of old work.
We regularly hear complaints about the usability of our specification styles. Several projects were started to radically improve them, but petered out.
At the same time, they cannot be changed radically overnight. Older specifications cannot be broken through style changes and we shouldn’t throw away the brand recognition that comes with the current style.
The solution is to open up the management of these styles and drive the project through largely small, incremental updates to the stylesheets. A repository for the styles should be created and we should start accepting pull requests. At first a clean-up of the current code ought to be carried out. Then, regular releases ought to be made based on contributions.
The Web is no longer a static medium that is essentially fluid print brought to the screen. Documents today can work with the user interactively. So far we have only very sporadically made use of these capabilities, when in fact they could greatly enhance the usage of our primary products.
A common library should be developed to provide basic functionality across all specifications. This would include:
Like the styling project, this can easily start small and humble, and progressively grow into a highly useful system thanks to contributions.
The new PubRules system (Specberus) is a welcome improvement in the toolbox. It still needs some UI fixes and should be moved to a location not intended to make it impossible to find, but overall it should soon be able to replace the previous instantiation.
We should, however, think beyond specification validation. As things stand today, it is extremely hard to get information out of our documents. Even when they are correct, they are all different. The RDF export is a very partial view and borderline unusable. The data that is extracted into W3C’s systems cannot be obtained, and is partial as well. Yet so much could be done with this information.
We need to progressively refine the components that make up a specification and to rethink them using modern HTML constructs for document semantics and metadata. Step by step we need to start unifying these constructs across the board (having tools produce them and enforcing them through PubRules). Document structure and metadata need to become eventually regular enough that TR can be used as an API.
We have quality specification production tools and most specifications today make use of them. It is worth thinking about how, as a common ecosystem, they can be improved.
One aspect in which they can be improved is to bring their source formats in line wherever it makes sense (and isn’t disruptive). The end goal here is that contributors to specifications who move between groups should as much as possible find themselves able to switch between tools as easily as possible.
This, naturally, should not be done at the cost of innovation in those tools, but it surely can be done for some of the common, well-understood aspects.
The Web platform is increasingly described by a large mesh of specifications that reference one another. Currently, this is done relatively poorly overall. There are several methods in use:
Ideally, all definitions from all specifications should be globally available. This would make it possible to simply reference them in a specification tool source format and get the correct link and reference handled. It would enable the generation of glossaries. (We urgently need to phase out the current one.) We could expose a searchable interface that would make it possible for people to find which specification defines what, and which make use of what concept or construct (something that would prove invaluable in coordination).
It is extremely common today to replace examples in documentation with live ones that can be edited and rendered using one of the many services that provide such functionality.
While it is probably not acceptable to inject code from a third-party service into our specifications, there exist reusable components that would enable examples in our specifications to become hackable. This would have tremendous value for developers trying to learn and understand the technology.
Specref is the database of bibliographical references that specifications rely upon. It has been through several iterations and is generally considered good, but it can be improved.
Currently managed by Tobie, it could be brought under W3C management.
It could use a front-end for adding resources and a few other niceties to integrate it better with the W3C tool suite (e.g., improving Marcos’ [bibliography generator](http://marcoscaceres.github.io/bib_entry_maker/)).
Some specification issues (that are not about the actual prose content) can sometimes exist for embarrassingly long periods of time before anyone notices. Sometimes they are noticed after a document is finalised. Or they are raised at the worst moment, all at once when the editor is busy doing something else.
Developers use continuous integration to make sure that there is a constant pressure towards quality rather than a last-minute rush — so should editors. There have been experiments around this (using Travis’ integration with GitHub); they should be expanded and systematised.
Ideally, an editor should be able to, at the click of a button, have the proper Travis integration (and dependencies) installed into the specification’s repository and deployed for immediate usage. This could even be made the default at repository creation if it is known that it will contain a specification.
Several different tools can be called upon for quality checking; since they can be used on their own they are listed separately below and in the «Developer Tools» section.
We have an existing WebIDL Checker. It needs to be integrated into the quality workflow (arguably into Specberus as well as into the linting options of specification tools), exist with guarantees of maintenance, documentation, and made easy to find.
We have existing tools (that almost no one knows about) that can read a specification and validate that its references (at least to W3C or IETF documents) are up to date with the latest versions of those documents: the references checker and the IETF references checker.
While specification production tools, by building atop SpecRef, have largely removed the need for these, some specifications are produced by hand and some have local bibliographies. They should be part of automated checking.
W3C actually exposes a Spell checker tool (that can work with HTML). It is unclear that this could be integrated into automated checking (given the number of justifiable exceptions) but it could perhaps be made more prominent (or, if found not to be useful, retired).
W3C currently offers an HTML Diff service which is sometimes used to see changes to documents.
It is a common request to find out what has changed between two versions of a specification. In fact, producing a diff seems slated to be part of the new errata management approach that the AB is investigating.
This should be made automatic, and possibly specialised. Right now the HTML diff tool will produce as output a complete document, which can make the differences hard to find (e.g. a paragraph change in the whole HTML specification). It can also include a lot of spurious differences. (For instance, if a section was added at the beginning, all the section numbers come out as changes.)
This tool could be specialised for specifications in such a way that ignorable differences would be ignored. Also, an affordance could be exposed in specifications offering to show a diff between the version being looked at and any other version of that document.
Specs depends on one another, but there is very little done to ensure these dependencies are kept in sync; they are expected to be reviewed at transitions, and sometimes the link checker will spot a dependency gone wrong, but a more systematic approach would allow for better coordination across groups and reduce the risks for «monkey patching».
A dependency tracker that would allow to visualise and report the dependencies between specs, e.g. in terms of WebIDL interfaces but also in terms of algorithms, would be terrific. Ideally, they would lead to much more granular dependencies — a spec usually depends on a few items of a given spec, not the whole thing; assessing dependencies at the right granularity would be helpful for streamlining transitions on the Recommendation track.
Supporting WYSIWYG editing for specifications could help improve collaboration and contributions from people who might know HTML but are not necessarily versed in the specific formalism of a given specification tool.
While such a tool would be interesting and valuable, it is a long and complex project to put together, and likely not a priority.
Perhaps the most common complaint about the W3C’s tooling setup, which is only made worse by spreading work across a broad set of external services, is that there are too many places at which to track things, often in very different ways.
In addition to tracking information, taking action within the system also involves remembering where different parts are. There are several pages listing services (e.g. the Member page, the AC page, the Editors’ page) but they can prove overwhelming, especially since they often have content that is out of date. They are also very generic: most users don’t need to know everything that they contain.
This leads to three high-level requirements for a modern system, from its users’ point of view:
This translates to a specific design for the various moving parts of the system.
There are many sources of data that can become involved in the dashboard. In order to keep them manageable, we need a specialised component that can provide a consistent interface to them.
This service is a relatively simple shell around various data streams. It can be
configured to poll a resource at regular intervals (e.g. retrieve and process an RSS
feed), or on the contrary to expose a hook for pushed information (e.g. we can set up
a GitHub hook calling it for all events in the w3c
organisation there).
Ideally we could also collect data from mailing lists, for instance to produce a feed of
browser vendors’ intents to implement/deprecate.
Each of its plugins knows how to obtain data and store it in a unified manner. They can easily be configured (e.g. to add RSS feeds, to receive GitHub notifications from other repositories).
This mass of data can then be returned to queries that essentially involve two filters: which feeds the user is interested in, and a date-time cut-off so that one can ask only for what has happened since one last visited.
Additionally, a WebSockets interface can be made available so that dashboard users can get notifications in real time in the UI.
The dashboard itself is nothing but a container of smaller widgets which can be configured, added, or removed by each user. In order to be fully generic, we need to ensure that it remains extremely simple both in its own functionality and in the interface it exposes to widgets.
Widgets can be used as part of an overview grid, or can take over the whole screen when they need more real estate.
It is very important that it be possible for independent contributors to develop widgets
separately, that can then become available through the dashboard (after W3C approval, we
can’t automatically deploy third-party code). Widgets can expect to run in insulated
iframe
s and to communicate with the dashboard to access whatever
information they need through messaging.
W3C’s own applications are expected to transition to being exposed as widgets. The general principle at work is that services should be made available as APIs, and exposed to users through widgets. With this in mind widgets can use a common stylesheet and common scripting libraries to help support greater coherence and fluidity in interaction.
Another tool that can plug into the unified feed is a notifier. Users can opt into being notified of various events by email (through, you guessed it, a dashboard widget). Likewise, IRC channels can be notified of specific events by a bot. The idea is that this is extensible: the unified feed provides the data in a usage-agnostic manner; tools then piggy-back on it.
The notifier simply receives the unified data feed, and for each event finds if there are people who want to be notified of it. If so, they get an email.
The notifier needs to be able to filter events based on labels. For instance, the I18N WG uses «I18N» labels on bugs on other groups’ bugs trackers in order to track horizontal reviews. Similar conventions could be supported by other groups.
Dom has built a tool (github-notify-ml) that can notify mailing lists when certain events happen on GitHub. It could be enhanced to make use of this system.
It would prove particularly useful if the notifier were able to filter events to match changes to specific sections of a specification (notably to help horizontal reviewers).
This section covers tools that are in common use for coordination and communication inside groups and the broader community.
IRC is relatively operational and it is highly programmable. Because of this, it would be hard to replace with other more modern solutions such as Slack or gitter. It is, however, often unknown to the younger generation or in fact more generally to people not strongly steeped in the open source culture of the past twenty years. As such, an improved Web interface would be desirable. (The current one is serviceable, but could do with some freshening.) One potential option here would be Shout. It is a client-server combo with a rich interface, and is entirely implemented in JS. Amongst its advantages are that it can remain connected to IRC even when you close your browser, it supports connecting as the same user through multiple channels at once, and it has a responsive layout that is well-suited to usage on a smartphone.
IRC logging could, however, be improved. RRSAgent
has its value for the
specific case of capturing minutes in a manner that is easily reused by other tools, but
as a general-purpose logger we can do better. Too many channels are simply not logged at
all — even though much work takes place over IRC! And when they are logged,
RRSAgent
logs are then extremely hard to find; lost in dated-space never to
be retrieved again.
A general-purpose channel logger bot need not be complicated. In my experience, people are satisfied with Krijn Hoetmer’s setup. Its limitations are largely that it requires getting agreement from Krijn in order to get something logged, and it wouldn’t scale to the many channels we would want to log. The RICG also uses a simple Drupal-based PHP setup to reference discussion, retrive to GitHub issue information, and for minute-taking.
A single service at which all chat logs can be located would therefore be needed. It does not require bells and whistles, but it needs to support continuous logging, finding channels, and search.
While the mailing list service itself is largely beyond reproach, the interface to our archives is now one of the more decried parts of our offering. While a number of the complaints may be exaggerated, it is certainly true that it could do with improvements in both style and usability.
A project that is commonly cited as a huge improvement over what existed previously is https://esdiscuss.org/. It is not perfect, and the source can probably not be reused as is. Its fundamental architecture is sound however: convert email to JSON and expose those for other services to build upon. Given the size of the W3C archive and its daily volume, this would not be a minor undertaking — it is likely better to test-drive the idea on a small set of lists.
People have also complained about the user experience of signing up to our lists. A user interface allowing people who have an account (either with us, or with a third-party service that we can authenticate against) to use that to subscribe directly (without the email ping-pong) and to immediately accept archival would help our community as well.
Forums have overtaken mailing lists in terms of popularity for online discussion. Groups would certainly benefit from being able to rely on those for at least some of their work.
This aspect is already covered by our Discourse installation, which is being moved to WebPlatform.
Etherpads are commonly used to take shared notes, which could be useful for minutes in some groups, but is also particularly useful during brainstorming sessions when a group of people wishes to collectively.
Because W3C does not provide its own pads, people routinely use those available from Mozilla or the MIT. This results in information being scattered far and wide.
There are numerous pad implementations, so deploying this is largely a case of picking one that we like. A good pad should support:
Bonus points if there is a way for the pad to tie into our user system such that people who join a pad to edit automatically have their names set up.
This section is more of an open question than anything else. Some teams use systems like Trello to organise their work, basically through collaborative todo lists. I am not aware of groups doing the same, but I would be interested in hearing about usage or interest.
Three major bug trackers are in use in W3C groups: Bugzilla, Tracker, and GitHub Issues.
Bugzilla is the most powerful and flexible of the three, but it is extremely slow, its interface is clunky, and automating it through external tools generally requires scraping its HTML output. (There does exist a way of enable XML-RPC for it... but, well, it’s XML-RPC.) It also likes to send email; a lot.
Tracker is the only option with IRC integration (through trackbot
) and it
also supports action items in addition to issues. It knows about our users, IRC nicks,
and working groups, which can prove useful. However its interface is also clunky and
setting up a new instance for a given project requires intervention.
GitHub issues are by far the easiest to use. They have nice conversational capabilities. However they’re external to W3C and they don’t know about groups and the such.
I am making the following perhaps radical recommendations:
If we cannot unify our issues handling, then at the very least the various services will need to be plugged into the Unified Feeds system so as to be visible in the dashboard.
Horizontal review groups have their own needs in terms of issue tracking. Richard makes a good description of it in issue #8. Ideally the tracking system ought to enable the automation of many of the complex steps they need to go through.
W3C Working Groups should maintain a page about what they do, what their documents are with what status, what their upcoming events of note are, how they can be interacted with, etc.
Currently that is usually done by the team contact and/or chair hand-editing a bunch of HTML documents in CVS. The information provided is completely different from group to group, and the layouts are all over the place. If you know how to find information with one group, you will be lost with another. What’s more, the editing process is painful enough that it is rarely done, and can rarely be automated. Stale information abounds (it is very common to find completely outdated charters for instance, or list of minutes that stop two years ago even though the group is active). Some groups have moved a fair part of their information elsewhere, sometimes their own site, sometimes a wiki. This is further compounded by the fact that most — but not all! — group pages are in dated space somewhere. There is simply no easy way to find them; at times even search engines get confused.
Group content should be entirely moved to a single, easily found location. For instance,
https://groups.w3.org/$GROUP_NAME
.
Group pages should be managed using simple but effective tools. CMSs are unlikely to provide a good match in terms of flexibility and automation. A static generator system such as Jekyll ought to be used. A common layout should be deployed across the board. Similar information should be presented in a similar fashion and at similar locations (both on screen and in URL space).
In addition to the functionality already provided by such tools (pages and blogs for news being typical), it ought to be relatively easy to write plugins to publish calendaring information as well.
Wikis are annoying. Idiosyncratic syntax, little to no notification, practically nothing in the way of actual collaboration (people might work on the same document but don’t really talk, certainly not in practice). W3C makes the problem worse by having uncountable different wikis spread out all over the place. It is already hard to find information in one wiki; when one has to remember which wiki it was in things go awry.
That said, wikis do have their die-hard lovers and they will likely remain in use. I recommend:
Chairs often need to maintain a scribe rotation. It is not the end of the world, but it is a little bit of extra manual work that can easily go wrong.
A scribe manager could possibly be built atop the W3C Data API that the systeam is working on (assuming it could list group participants). You could mark people as "never scribing", click someone’s name when they’ve scribed, have new participants be sent to the bottom of the list.
Even with a freshened up and more organised setup, making use of all our tools and the external tools they rely on can be challenging.
It is absolutely necessary that there exist clear and well-identified guidelines for operating with these tools. Ideally, short enough that people actually read and memorise them.
Where possible, we should provide template configurations (e.g. the typical GitHub repository for a specification) that can easily be reused.
Additionally, it would likely be useful to create training material for the tools. The Team would naturally be subjected to it, but all newcomers (and in fact anyone interested) would be able to use this as a resource to get up to speed quickly.
Having an automatically maintained map of the all the WebIDL interfaces that are developed across the Web platform would make it easier to navigate the organic structure of Working Groups / Community Groups / etc., and would let groups more easily discover ways that other interfaces deal with specific patterns.
Both PLH and Dom have early versions of such a tool.
We have a community of dedicated people who translate specifications so that they can be read in a language more convenient to some of our users (see for instance Veebi sisu juurdepääsetavussuunised (WCAG) 2.0).
Translation can be difficult to organise well, and it can be hard to collaborate on a translation, especially for non-technical users who can nevertheless contribute usefully to this effort.
It would be interesting to have tools that support easier translation of our specifications, such as side-by-side views and document views updated live as the code is changed (to make things easier on those who have limited HTML experience).
Developers use W3C technology a lot, and some of our services. Yet we make little use of this interaction in order to build a relationship that we badly need. There is currently a lost opportunity to turn these useful (and, in some cases, used) tools into goodwill and good feedback. We are probably not getting as many donations as we could from these.
We don’t offer an integrated view on the tools that we make available to developers. There is ongoing work in exposing this through a simple landing page. The W3C developer page aims to provide developers with a summary of all our open source tools and links to helpful resources. It is currently in development, and you can contribute to it in its Github repository.
You are very welcome to join the discussion about features, design and more in the developers repository’s issues.
The Link Checker is one of the more useful resources that W3C exposes, but it is woefully antiquated.
While not a top priority, it would be useful to rebuild it with modern technology and a modern interface. It should be relatively straightforward to make it a fair bit faster, if only by adding some caching.
These are solid tools and do not need much work. We should stop calling the new HTML validator «experimental» and we should retire anything to do with the old one. We can sprinkle a little bit of nice design on the output, and we could document some of the APIs better, but nothing radical beyond some cleanup. It would be nice to ensure that the validators are available as libraries (outside Java), for instance as Gulp/Grunt packages that can easily integrate with common build systems for Web applications.
It has been suggested that it would be valuable to rewrite these two validators in a language more likely to be reusable by Web developers, namely JavaScript. While this may be true, it can only be a long-term, low-priority objective.
Note: the source for the Markup validator is Validator.nu.
As in the previous section, this is good and operational. It could use being promoted more.
We currently do not provide A11Y checking as a service, and arguments have been made that this would be a bad idea as there are aspects that cannot be tested. However, there are projects appearing that are beginning to offer this service (e.g. Tenon). We should take the time to think about our strategy here.
The MobileOK checker is a potentially valuable tool, but it needs an update. Validating XHTML Basic 1.1 or ensuring that a page is under 20K might not work that well with many modern uses of mobile. (Testing a site that I know works well on today’s devices scored it 29%.)
W3C is currently working on a new, open source version of the Mobile Checker. It is expected to be released soon.
These tools are currently not top priorities for the developers landing page but will be considered as soon as possible.
While not strictly a tool in itself, the core w3.org
site is a big part of
what our users interact with, and improving it would help with many aspects of their work,
notably not driving them to third parties when there is no need to.
CVS makes using the core site inordinately painful. I for one avoid publishing anything
there unless I really have to; in several cases I have set up an .htaccess
proxy from a subdirectory just so I would never need to touch it again. I know others
have similar experiences.
The massive size of the site also causes issues of its own. For instance, the recent discussion on upgrading to HTTPS across the board revealed just how hard it is to evolve it.
The site cannot be moved to a more modern system wholesale, but likely it does not need to.
W3C invented conventions before conventions existed. In some cases, those became conventions only within W3C.
Overview.html
is one of those. The norm is index.html
. It
should become the norm on all new Web setups at W3C, and where possible
Overview.html
ought to be retired. It may seem like a small thing but it is
a regular annoyance.
Dated space, of course, needs to die. There are exactly seven people in the world who find it usable, six of whom are on the Team (the other one being DanC). It is maddening, distracting, confusing, and in violation of every single URL design best practice in use anywhere outside the Consortium’s walls.
Again, dormant content in dated space can stay there. But content that is in any way, manner, or form still in active use should never be in dated space. A non-exhaustive list of content that should never be allowed to reside in dated space includes:
Our testing infrastructure has become increasingly important. In general it is much more recent than the rest of our tools and so less in need of modernising. But the setup can nevertheless be improved.
The Web Platform Tests (WPT) system relies on a code-review engine known as Critic. It ties into GitHub pull requests and is much more powerful than GitHub’s built-in code review.
Most groups likely do not need Critic, but for more complex code such as that which is found in tests it is a very welcome improvement.
The current Critic instance is run by James Graham on his personal server. It may be much better to host it on W3C.
It may be worth investigating the use of Reviewable as a potential hosted replacement.
We have an existing system that allows people to run (the automated parts of) the test suite continuously, without human intervention. Right now it has only been adopted by a couple of browser vendors.
We should take this to the next level. We can integrate with SauceLabs (with whom we have a free account) and systematically run the entire test suite against the full set of browsers that they expose. This should allow us to gather very detailed statistics about which parts of the platform work where. It has the added advantage that it can help detect some broken tests.
Finding out how much of a given specification is covered by the test suite is a hard, manual, and error-prone process. It cannot be perfectly automated, but it can be improved.
This also involves figuring out how to identify that a given test is related to part of a specification.
These tasks are not trivial and highly heuristic, but there has been previous work. It needs to be assessed and integrated.
One ideal use case for this is to be able to obtain a list of tests that need to be reviewed when the specific section of a specification that they map is updated.
Once we have testing results for many implementations, we should publish them in a manner that is useful to developers. Such information is invaluable. The interface does not require much in terms of complexity.
Creating new tests can in some cases be painful and slow, which in turn causes people to make mistakes as they cut and paste. This is particularly true for reference tests, and for groups that (still) require a lot of test metadata.
Part of that can be automated so that the basics of a test’s required parts can be filled out based on templates. Richard has a more extensive discussion of this proposal.
The testing project involves close collaborations between parties that are not all on the Team, yet develop the major parts of the system. While the service should be run on W3C resources, it would not be efficient to gate deployment to Team members only.
It must be possible to enable SSH access to a select group of contributors so that they could help curate the testing services.
These are services that can be reused by other parts of the ecosystem.
Many tools and services will expose features that are user-dependent. It is not appropriate for those to implement their own user database duplicating W3C’s. It would also be a shame if they had to make use of third-party sign-on infrastructure (which was the case with Discourse).
Which exact solution is selected here is of relatively little importance, but it does need to support typical modern Web application authentication workflows. GitHub’s OAuth implementation may provide a good example.
It should also be possible to retrieve useful information about a user, such as their GitHub ID.
The various tools to be developed may commonly require search functionality. Given the spread of W3C’s content, this can quickly become required. For instance, an Etherpad service might request indexing for its pads (using its own ACL knowledge), a chat logging service might index the logs, etc.
One potential candidate, that can double as a JSON document store and therefore be usable for application data storage, is Elastic Search.
It is this plan’s considered opinion that users should be encouraged to experiment with external tools if and when they find shortcomings in ours. Having said that, for an external tool to be considered acceptable it needs to fulfil the following basic needs:
Additionally, the W3C has requirements in terms of persistence. We cannot require of an external service that it make a promise of persistence it may be unable to keep, but we need to have the means to obtain our data easily so as to persist it easily ourselves. This has the nice side property that it requires full data export and ownership.
The following people have provided useful input into this document.