Abstract

Credibility signals are observations, made by humans or machines, which are used in deciding how much to trust some information. This document specifies some types of these observations which seem particularly useful in online credibility assessments, especially when assisted by machine processing and a network of people and systems making related observations.

In this release, we include only the handful of signals which have been reviewed and approved as “promising” by the W3C Credible Web Community Group, following a process started in late January 2020. The contents of this document are expected to be expanded and revised over time. Feedback and participation are encouraged.

§ 1. Introduction

This document is a small step toward a world where software “credibility tools” let people know which online information is trustworthy. In this small step, we present a handful of observations a human or machine can make about web sites and web content which seem likely to be useful in deciding how trustworthy the information might be. These observations, which we call “credibility signals” are selected for having several desirable features:

Verifiable: When given a report of one of these observations, it is feasible to confirm the observation yourself without doing a prohibitive amount of work in comparison to the likely value of a correct credibility assessment.
Interoperable: If multiple independent parties make these observations, guided just by this document, they will generally produce data that is effectively the same. This enables an efficient market of data-producing systems and data-consuming systems working together automatically (“interoperating”).
Hard to game: While it seems likely that every signal could be artificially synthesized as part of a disinformation attack, for these signals the cost or difficulty in doing so seems sufficiently high to make use of the signal a net benefit.
Available: Signal data is available now, or is likely to be soon, for a large number of websites. When desired, it is generally possible for individuals or small organizations to make these observations and produce this data themselves.
Scalable: Machines can make this observation now or could be programmed or trained to do so in the near future, with no need for a technical breakthrough.
Promising: In short, while we have insufficient data to say with certainty that these signals will be useful in making accurate credibility assessments, based on the information we have, we expect their use will often be worthwhile.

We expect many more potential credibility signals also meet these criteria. We are publishing this small set to get community feedback early in the process. If the feedback is sufficiently positive, we may proceed to document additional signals. In earlier work we gathered a crowdsourced list of potential credibility signals and documented technologies for making credibility assessments.

The focus of this document is on the meaning of signals, not the data protocols and formats for exchanging data. That is expected to be the subject of a related document. For now, to give a flavor of how the data might potentially be exchanged, we include potential examples of specific data along with the signal examples. For more details on the design issues in using Semantic technologies for this data interchange, see Options for RDF Expression of Credibility Data.

§ 2. Signals

These signals have been reviewed and approved as "promising" by the W3C Credible Web Community Group, following a process started in January 2020. This initial release contains only the signals the group decided to document and approve during its initial few weeks of this process.

These signals were not necessarily selected for being the most promising or the best in any other way. Later signals may turn out to be superior. These were reviewed first largely because they seemed likely to help clarify the review process. In particular, we intend no endorsement of the Pulitzer Prize as being necessarily more credible than any other prize.

§ 2.1. Date Website First Archived

Assessed by CredWeb CG to be: Reasonably verifiable, Interoperable, Promising.

Definition	There was a website operational at URL [ ] as early as isodate [ ], as shown in the archive page at URL [ ]
Review Date	2020-01-28
Other Label	Website Existed As Of Date Website Age First Known Website Archive Website Release Date
Similar to	Domain registration date

§ 2.1.1. Examples

§ 2.1.1.1. Example

Statement	There was a website operational at URL https://news.mit.edu as early as isodate 2015-09-02 as shown in the archive page at URL https://web.archive.org/web/20150902023223/http://news.mit.edu/
CSV	site,operationalAsEarlyAs,evidence https://news.mit.edu,2015-09-02,https://web.archive.org/web/20150902023223/http://news.mit.edu/
Turtle	_:a cred:site <https://news.mit.edu> . _:a cred:operationalAsEarlyAs "2015-09-02"^^xs:dateTimeStamp . _:a cred:evidence <https://web.archive.org/web/20150902023223/http://news.mit.edu/> . _:a rdf:type cred:ObservationOfDateWebsiteFirstArchived .
JSON-LD	{ "@context": "https://jsonld.example", "@id": "_:a", "@type": "http://www.w3.org/ns/credweb#ObservationOfDateWebsiteFirstArchived", "evidence": "https://web.archive.org/web/20150902023223/http://news.mit.edu/", "operationalAsEarlyAs": "2015-09-02", "site": "https://news.mit.edu" }

§ 2.1.1.2. Example

Statement	There was a website operational at URL https://nytimes.com as early as isodate 1996-11-12 as shown in the archive page at URL https://web.archive.org/web/19961112181513/http://www.nytimes.com:80/
CSV	site,operationalAsEarlyAs,evidence https://nytimes.com,1996-11-12,https://web.archive.org/web/19961112181513/http://www.nytimes.com:80/
Turtle	_:a cred:site <https://nytimes.com> . _:a cred:operationalAsEarlyAs "1996-11-12"^^xs:dateTimeStamp . _:a cred:evidence <https://web.archive.org/web/19961112181513/http://www.nytimes.com:80/> . _:a rdf:type cred:ObservationOfDateWebsiteFirstArchived .
JSON-LD	{ "@context": "https://jsonld.example", "@id": "_:a", "@type": "http://www.w3.org/ns/credweb#ObservationOfDateWebsiteFirstArchived", "evidence": "https://web.archive.org/web/19961112181513/http://www.nytimes.com:80/", "operationalAsEarlyAs": "1996-11-12", "site": "https://nytimes.com" }

§ 2.1.1.3. Example

Here we see one website with two different observations, each with a different date.

Statement	There was a website operational at URL [https://news.example] as early as isodate [2010-01-01] as shown in the archive page at URL [https://archive.example/1234]. There was a website operational at URL [https://news.example] as early as isodate [2015-01-01] as shown in the archive page at URL [https://archive.example/abcd].
CSV	site,operationalAsEarlyAs,evidence https://news.example,2010-01-01,https://archive.example/1234 https://news.example,2015-01-01,https://archive.example/abcd
Turtle	_:a cred:site <https://news.example> . _:a cred:operationalAsEarlyAs "2010-01-01"^^xs:dateTimeStamp . _:a cred:evidence <https://archive.example/1234> . _:a rdf:type cred:ObservationOfDateWebsiteFirstArchived . _:b cred:site <https://news.example> . _:b cred:operationalAsEarlyAs "2015-01-01"^^xs:dateTimeStamp . _:b cred:evidence <https://archive.example/abcd> . _:b rdf:type cred:ObservationOfDateWebsiteFirstArchived .
JSON-LD	{ "@context": "https://jsonld.example", "@graph": [ { "@id": "_:a", "@type": "http://www.w3.org/ns/credweb#ObservationOfDateWebsiteFirstArchived", "evidence": "https://archive.example/1234", "operationalAsEarlyAs": "2010-01-01", "site": "https://news.example" }, { "@id": "_:b", "@type": "http://www.w3.org/ns/credweb#ObservationOfDateWebsiteFirstArchived", "evidence": "https://archive.example/abcd", "operationalAsEarlyAs": "2015-01-01", "site": "https://news.example" } ] }

§ 2.1.2. Motivation

Why is this signal hard to game?

New attackers will have to acquire a pre-existing website with a site identity suited to their attack. Since there are not likely to be many such sites, they will likely be expensive. Older attackers can be "sleeper" sites, waiting for the time to attack, but this is also expensive, and there will be more time for them to be discovered and unmasked. It is likely this signal will decrease in value if adopted, as more sleeper sites are created.

Connection to quality

Sites that have been around a long time have probably had more of a chance to develop good internal quality controls.
Sites that have been around a long time, if they are problematic, are more likely to have been flagged as such, all other things being equal.
Sites that have been around a long time have generated enough content/information upon which an assessment of quality can be rendered.

motivationReason

Over the years, these aspects of a site will tend to be exposed, even if efforts are made to hide them.

Potential side effects

Newer sites will have an even harder time reaching an audience
Reduced innovation on the web, as ranking on this signal presents an additional barrier-to-entry

§ 2.1.3. Availability

Methods for obtaining

Enter the URL of the site into the wayback machine at archive.org and look for the earliest snapshot that looks like a real website, not a placeholder "coming soon" site.

Issues with availability

It is limited by the practices of available and trustworthy archiving services. If none of them archived your site, you're out of luck. If one of them is compromised, that will devalue use of this signal from that archive.

Currently requires humans because

It requires some human judgement to exclude initial placeholder websites

Expected to become cheap because

Machines can probably be trained to recognize and exclude placeholder websites

§ 2.2. Corrections Policy

Assessed by CredWeb CG to be: Promising, Reasonably verifiable, Interoperable.

Definition	The news website with its main page at URL [ ] provides a corrections policy at URL [ ] and evidence of the policy being implemented is visible at URL [ ]
Review Date	2020-02-05
Other Label	Corrections Policy Claimed Corrections Policy Present

§ 2.2.1. Examples

§ 2.2.1.1. Example

Statement	The news website with its main page at URL http://thestar.com provides a corrections policy at URL https://www.thestar.com/about/statementofprinciples.html and evidence of the policy being implemented is visible at URL https://www.thestar.com/opinion/corrections.html
CSV	site,correctionsPolicy,evidence http://thestar.com,https://www.thestar.com/about/statementofprinciples.html,https://www.thestar.com/opinion/corrections.html
Turtle	_:a cred:site <http://thestar.com> . _:a cred:correctionsPolicy <https://www.thestar.com/about/statementofprinciples.html> . _:a cred:evidence <https://www.thestar.com/opinion/corrections.html> . _:a rdf:type cred:ObservationOfCorrectionsPolicy .
JSON-LD	{ "@context": "https://jsonld.example", "@id": "_:a", "@type": "http://www.w3.org/ns/credweb#ObservationOfCorrectionsPolicy", "correctionsPolicy": "https://www.thestar.com/about/statementofprinciples.html", "evidence": "https://www.thestar.com/opinion/corrections.html", "site": "http://thestar.com" }

§ 2.2.2. Motivation

Why is this signal hard to game?

setting up a plausible corrections policy and showing evidence of implementation requires work that legitimate news organizations will need to do anyway but will be extra work for attackers

Connection to quality

sites that provide a corrections policy and implement it are indicating an openness to scrutiny around their mistakes, which seems likely to lead to greater accuracy
sites that show a record of corrections over a period of time indicate commitment to principles of accuracy and good journalistic practice.

Potential side effects

sites might be downranked for keeping their corrections policies private, possibly for legitimate reasons
sites that are mistakenly identified as news outlets, such as organizations which issue press releases about themselves, might be downranked for not following this journalistic practice
a bias towards larger, professional news organizations that have the resources to maintain a visible corrections practice
a bias against organizations that have not provided online visibility to some of their practices
a bias against sites that are exceedingly careful to never need corrections, and thus have little or no evidence of implementing their corrections policy

§ 2.2.3. Availability

Methods for obtaining

manually browsing a site, looking for any corrections policy and evidence of it being used
newsQ plans to offer a feed

Issues with availability

some sites mention their corrections policy non-obviously on one paragraph on a general ethics/journalism policies page, or a public engagement page. This may lead observers to mistakenly think there is no available corrections policy.
it may be hard to find evidence of corrections being done, even when they are, especially if the site does not offer a corrections "feed" page.

Expected to become cheap because

machines can probably be programmed/trained to observe this signal most of the time
sites may start to announce this information in a machine-readable form (which raises some risk of gaming)

§ 2.3. Any Award (pending)

Assessed by CredWeb CG to be: Interoperable, Reasonably verifiable.

Definition	The website with main page URL [ ] was honored as part of an awards process for the year [ ] for the prize with main page URL [ ]
Review Date	2020-02-19

§ 2.3.1. Examples

§ 2.3.1.1. Example

This shows a Pulitzer Prize signal being conveyed as this more general "Any Award" signal attached to the Pulitzer website.

Statement	The website with main page URL https://www.propublica.org/ was honored as part of an awards process for the year 2017 for the prize with main page URL https://www.pulitzer.org/
CSV	site,awardsYear,prizeSite https://www.propublica.org/,2017,https://www.pulitzer.org/
Turtle	_:a cred:site <https://www.propublica.org/> . _:a cred:awardsYear "2017"^^xs:dateTimeStamp . _:a cred:prizeSite <https://www.pulitzer.org/> .
JSON-LD	{ "@context": "https://jsonld.example", "@id": "_:a", "awardsYear": "2017", "prizeSite": "https://www.pulitzer.org/", "site": "https://www.propublica.org/" }

§ 2.3.2. Motivation

Why is this signal hard to game?

The effort in establishing an award process/system takes time and resources
Establishing a reputable award requires socialization, which is likely to involve a variety of trust evaluations and uncover most attackers.

Connection to quality

Outlets that have been awarded journalism awards, especially many over time, probably have internal quality controls and an organizational structure that supports good journalism.

Potential side effects

A bias towards older outlets that have had more time to accumulate awards
Increased pressure to subvert an existing awards process.
A surge in fake awards.
A need for determining which prize-granting sites are trustworthy
A bias in favor of outlets who campaign for awards or otherwise shift effort toward winning awards.

§ 2.3.3. Availability

Methods for obtaining

Determining authoritative websites or publications with awards lists, and then looking up the winners
Taking data for specific awards and converting them to this form

Issues with availability

There are many awards across many languages and cultures, and no central location to review all of them
Not all high-quality outlets win awards

Currently requires humans because

It requires judgement to know which awards are legitimate
Most awards do not provide data feeds or have available scrapers

Expected to become cheap because

Machines can probably scrape the information should some way of centralizing awards locations becomes possible
Awards-granting organizations might start to publish details of their awards in a machine-readable format

§ 2.4. Pulitzer Prize Recognition (pending)

Assessed by CredWeb CG to be: Interoperable, Reasonably verifiable.

Definition	The website with main page URL [ ] was honored as part of the Pulitzer Prize awards process for the year [ ] in the Pulitzer category [ ] as the website of a prize finalist or winner under the name [ ] as shown at URL [ ]
Review Date	2020-02-19

§ 2.4.1. Examples

§ 2.4.1.1. Example

This shows two organizations sharing the prize.

Statement	The website with main page URL https://www.nydailynews.com/ was honored as part of the Pulitzer Prize awards process for the year 2017 in the Pulitzer category Public Service as the website of a prize finalist or winner under the name New York Daily News as shown at URL [https://www.pulitzer.org/winners/new-york-daily-news-and-propublica].The website with main page URL https://www.propublica.org/ was honored as part of the Pulitzer Prize awards process for the year 2017 in the Pulitzer category Public Service as the website of a prize finalist or winner under the name ProPublica as shown at URL [https://www.pulitzer.org/winners/new-york-daily-news-and-propublica].
CSV	site,awardsYear,pulitzerCategory,nameAccordingToPulitzer,evidence https://www.nydailynews.com/,2017,Public Service,New York Daily News,https://www.pulitzer.org/winners/new-york-daily-news-and-propublica
Turtle	_:a cred:site <https://www.nydailynews.com/> . _:a cred:awardsYear "2017"^^xs:dateTimeStamp . _:a cred:pulitzerCategory "Public Service" . _:a cred:nameAccordingToPulitzer "New York Daily News" . _:a cred:evidence <https://www.pulitzer.org/winners/new-york-daily-news-and-propublica> .
JSON-LD	{ "@context": "https://jsonld.example", "@id": "_:a", "awardsYear": "2017", "evidence": "https://www.pulitzer.org/winners/new-york-daily-news-and-propublica", "http://www.w3.org/ns/credweb#nameAccordingToPulitzer": "New York Daily News", "http://www.w3.org/ns/credweb#pulitzerCategory": "Public Service", "site": "https://www.nydailynews.com/" }

§ 2.4.2. Motivation

Why is this signal hard to game?

Winning or being a finalist for a Pulitzer Prize is extremely difficult, even for experienced and well-funded professionals

Connection to quality

The prize process reflects a mature and well-resourced effort to recognize quality.
Outlets that have been awarded journalism awards, especially many over time, probably have internal quality controls and an organizational structure that supports good journalism.

Potential side effects

A bias towards more traditional outlets
A bias towards older outlets that have had more time to accumulate awards
A bias matching whatever biases are brought in by the Pulitzer Prize process and individual prize jury members

§ 2.4.3. Availability

Methods for obtaining

Visit pulitzer.org and browse for relevant entries
NewsQ plans to offer a feed
Use pulitzer.org's underlying JSON query API

Issues with availability

It doesn't cover very much of the news ecosystem, with many high quality sources never winning a Pulitzer Prize

Expected to become cheap because

The data is already available in JSON

Reviewed Credibility Signals

W3C Internal Draft Community Group Report, 19 February 2020

Abstract

Status of This Document

§ 1. Introduction

§ 2. Signals

§ 2.1. Date Website First Archived

§ 2.1.1. Examples

§ 2.1.1.1. Example

§ 2.1.1.2. Example

§ 2.1.1.3. Example

§ 2.1.2. Motivation

§ 2.1.3. Availability

§ 2.2. Corrections Policy

§ 2.2.1. Examples

§ 2.2.1.1. Example

§ 2.2.2. Motivation

§ 2.2.3. Availability

§ 2.3. Any Award (pending)

§ 2.3.1. Examples

§ 2.3.1.1. Example

§ 2.3.2. Motivation

§ 2.3.3. Availability

§ 2.4. Pulitzer Prize Recognition (pending)

§ 2.4.1. Examples

§ 2.4.1.1. Example

§ 2.4.2. Motivation

§ 2.4.3. Availability