This document specifies various types of information, called credibility signals, which are considered potentially useful in assessing credibility of online information.
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
Comments are welcome and are especially useful if they offer specific improvements which can be incorporated into future versions. Please comment either by raising a github issue or making inline comments on the google doc (easily reached using the pencil 🖉 link in the right margin). If neither of those options work for you, please email your comments to firstname.lastname@example.org (archive, subscribe).
GitHub Issues are preferred for
discussion of this specification.
Publication as an Editor's Draft does not imply endorsement by the
W3C Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite this
document as other than work in progress.
This document is intended to support an ecosystem of interoperable credibility tools. These software tools, which may be components of familiar existing systems, will gather, process, and use relevant data to help people more accurately decide what information they can trust online and protect themselves from being misled. We expect that an open data-sharing architecture will facilitate efficient research and development, as well as an overall system which is more visibly trustworthy.
The document has three primary audiences:
Software developers and computer science researchers wanting to build systems which work with credibility data. For them, the document aims to be a precise technical specification, stating what they need for their software to interoperate with any other software which conforms to this specification.
People who work in journalism and want to review and contribute to this technology sphere, to help make sure it is beneficial and practical.
Non-computer-science researchers, interested in helping develop and improve the science behind this work.
In general, we intend for this document to be:
Welcoming for implementers of systems using credibility data
Easy for non-tech folks to understand the proposed signals & contribute
Practical to maintain by the editors
Practical to contribute to, for a wide audience
A source of accurate guidance about signal quality and adoption
1.2 Credibility Data
The document builds on concepts and terminology explained in Technological Approaches to Improving Credibility Assessment on the Web. Our basic model is that an entity (human and/or machine) is attempting to make a credibility assessment — to predict whether something will mislead them or others — by carefully examining many different observable features of that thing and things connected with it, as well as information provided by various related or trusted sources.
To simplify and unify this complex situation, with its many different roles, we model the situation as a set of observers, each using imperfect instruments to learn about the situation and then recording their observations using simple declarative statements agreed upon in advance. Because those statements are inputs to a credibility assessment process, we call them credibility signals. (The term credibility indicators is sometimes also used.)
This document, then, is a guide to these signals. It states what each observer might say and exactly how to say it, along with other relevant information to help people choose among the possible signals and understand what it means when they are used.
Because this is a new and constantly-changing field, we do not simply state which signals should be used. Instead, we list possible signals that one might reasonably consider using, along with information we expect to be helpful in making the decision.
Assessing credibility of https://news.example/article-1
Looking at title
I consider it to be clickbait
It's clickbait because it's a cliffhanger
Looking at article
It cites scientific research
Looking at provider
Established in 1974
Owned domain since 2006
1.4 Factors in Selecting Signals
When building systems which use credibility signals and trying to decide which signals to use, there are different factors to weigh. This section is aspirational; we hope this document will in time provide guidance on all these factors.
1.4.1 Measurement Challenges
There are factors about how difficult it is to get an accurate value[a][b] for the signal[c][d]:
Do people independently observing it get approximately the same value?
Do observations vary with the culture, location, language, age, beliefs, etc, of the people doing the observation?
Would the same people make the same observation in future months or years?
How much time and effort does it take people to make the observation?
Do people need to be trained to make this specific observation?
What kind of general training do people need (eg a journalism degree) to do it?
How do machines compare to humans in making this observation, in terms of cost, quality, types of errors, and susceptibility to being tricked.
Many of these factors can be measured using inter-rater reliability (IRR) techniques. When studies have made such measurements, our intent is to include that data in this document.
Here is a table of the data we have. Excerpts are listed with the relevant signals.
Quotes from Outside Experts
Citation of Organizations
Citatation of Studies
Single Study Article
Confidence - Extent Claims Justified
Confidence - Acknowledge Uncertainty
Logical Fallacies - Straw Man
Logical Fallacies - False Dilemma
Logical Fallacies - Slippery Slope
Logical Fallacies - Appeal to Fear
Logical Fallacies - Naturalistic
Tone - Emotionally Charged
Tone - Exaggerated Claims
Inference - Type of Claims
Inference - Convincing Evidence
1.4.2 Value in Credibility Assessment
Another important set of factors relates to how useful the measurement is in assessing credibility, assuming the observation itself is accurate.
Does the signal have a strong correlation to content accuracy, itself determined by consensus among experts[e][f]?
Is it particularly indicative of credibility when used in combination with other signals? (For example, as part of computing the value of a latent variable.)
Is it conceptually easy for people to understand?
Do professionals in the field think it's likely to be a useful signal?
How dependent are these characteristics on the culture or time period being considered?
How dependent are these characteristics on the subject matter of the information being assessed for credibility?
1.4.3 Feedback Risks (“Gameability”)
One should also consider how the overall ecosystem of content producers and consumers might be changed by credibility tools adopting the signal. Once attackers see it’s being used, a signal that works well today might stop working, or even be used to make things worse. See Feedback Risks.
Is it disproportionately useful for attackers (eg viral call to action) ? If so, making this a negative credibility signal should generally be beneficial
Is it disproportionately expensive for attackers (eg journalistic language) ? If so, making this a positive credibility signal should generally be beneficial.
Who might get impacted by “friendly fire”? Even if adopting a signal might — on average — harm attackers more than everyone else, certain individuals or communities who have done nothing wrong might be penalized. Tradeoffs must be carefully made, ideally in a consensus process with the impacted people.
The value of sharing signal data depends on how that signal is used by other systems.
Are others producing data using this signal?
Are there useful data sets available?
Are others consuming data, paying attention to reported observations of this signal?
Are there tools which work with it, eg running statistics?
Is the definition clear and unambiguous, so people using it mean the same thing?
Are there clear examples?
Is there an open history of commentary, with questions and answers, and issues being addressed by various implementers?
Is documentation available in multiple languages?
If the definition is under development, how can one participate?
If the definition could possibly change, who might change it, and under what circumstances?
Is there a test suite / validation system for helping confirm that an implementation is working properly?
Are there implementation reports, confirming that tools are functioning properly, according to the testing system? (For an example, see ActivityPub).
1.5 Publishing Credibility Data
TBD, basically follow schema.org technique using JSON-LD.
1.6 Consuming Credibility Data
TBD, point to some tools and the relevant specs. Basically JSON-LD.
1.7 Organization of this document
Section 1 (“Introduction”) provides instructions for how to use and help maintain this document, along with general background information.
The rest of this document, after the introduction, is a list of signals and information about them, as discussed in the introduction. The signals are organized into related groups, in hierarchical sections. At the lower levels of the hierarchy are the signals themselves, while the higher levels provide grouping of the signals, to help people understand them.
One important level of the hierarchy identifies the subject type of the signal. This is the conceptual entity being examined, considered, or inspected, when one makes the observation being recorded in the signal data. This could be imagined in different ways: when you are observing a claim made in the 3rd paragraph of an article published in some newspaper, are you observing the claim, the paragraph, the article, the newspaper, or even the author of the article? In general, we aim for the smallest granularity that makes sense, which in this case would probably be the claim.
At times, it may not be obvious to which subject type a signal belongs, or it could sensibly belong with several different ones. In this case, it might be moved to a different section in the document as people come to understand it better. When it’s not clear, there should be links from the places a signal could reasonably be to the place it actually is.
This may require discussion, and might remain open for debate. When a signal or group of signals makes sense in two places, consider linking it from the places it isn’t, to help people find it.
In many cases, a signal could be seen as a set of similar signals which are not strictly identical. This can be handled by adding additional signal headings with the finer distinction, when necessary. In this case, template statements might appear under more than one signal.
Note that sections may be moved and renumbered. Do not rely on section numbers remaining the same. For linking to a part of the document, consider using the gdocs h.xxxxxx fragment ids, provided by the Table of Contents; those should remain stable. Also, whenever changing a heading, especially a signal heading, if someone might be referring to it by name, please move the old text into a paragraph starting “Also called:”.
1.8 Template Statements
The most important thing about a signal definition is to be clear what observation the signal data is recording. If the signal heading is “Article length”, does that mean length in words or bytes or characters or some other metric? Does it include the title? For each signal, we want an easy way to communicate its definition that is short but clear, while being as detailed as necessary.
The technique we use here is to express the semantics of the signal using plain and simple sentences in natural language which convey the same knowledge as the signal data. If you imagine people using credibility software exchanging these statements (perhaps in text messages or on Twitter), you should get the right semantics. You can assume metadata, like who sent it and when it was sent is available, so the statements can include terms like “I” and “now”.
For machine-to-machine data Interoperability, these template sentences and the signal heading are turned into a data schema, after which the JSON-LD/schema.org/sematic web/linked data technology stack can be used.
The statements we use are templates because they abstract over a variety of similar sentences which differ in specific limited ways. For example, these statements:
I have examined the article at https://example.com/alice and find it highly credible
I have examined the article at https://example.com/brian and find it highly credible
I have examined the article at https://example.com/casey and find it highly credible
are all the same, except in the URL. We convey this using a template statement, which has a variable portion in square brackets, like:
I have examined the article at [subject] and find it highly credible
If we (automatically or manually) map this template to a property with the pname :iHaveExaminedHighlyCredible, then the sentence number 2 above would be encoded in turtle as
Alternatively, we could make it a class, but boolean valued properties may be better, so that all signals remain as properties..
The bracketed template expression “[subject]” is required in every template, to indicate what entity is being observed. Additional bracket expressions can be used when there are other elements of the statement to make variable. In particular, [string] (for text in quotes) and [number].
(For now, try to just use those three. Software and documentation is being developed to allow more features. If you find this too restrictive, go ahead and write something else inside the square brackets and we'll deal with it later, but include a question mark so it's clear you knew you were making it up.)
An example needing multiple variables:
https://example.com/alice took 4.75 seconds to load, just now.
https://example.com/brian took 5.9 seconds to load, just now.
could be matched by:
[subject] took [number] seconds to load, just now.
1.9 Instructions for editing this document
As an experiment, this document is currently set so everyone can edit it, like Wikipedia. It is the Google docs version that is editable. We suggest you change the “Editing Mode” to “Suggesting” (using the pencil icon in the upper-right) until you are quite familiar with this document. You may also comment using the usual Google Docs commenting features.
The subsections below give some advice for how to make edits which are helpful.
1.9.1 Expand discussion
Each section should begin with a short introduction written with a neutral point of view, reflecting consensus about why the signal might be useful and what the risks might be. To enable consensus among a broad community, the intent is for this text to be developed iteratively, with each contributor adding their perspective while respecting what is already present.
Questions and minor concerns should generally be added as annotations using the “Add a Comment” function, without editing the document. If they become issues requiring back-and-forth discussion, they should be turned into github issues and linked from the most relevant place in this document with a paragraph starting “Issue:”
These discussion sections are intended to be nonnormative. That is, they do not say how software using the signal is required to behave for interoperability. The normative content of this specification is the template statements and the mapping of the statements to RDF.
1.9.2 Add new template statements
If you are confident you understand what a signal is intended to measure, and think you can provide a template statement which expresses it more clearly and simply, with little ambiguity, please add a new row to the bottom of the “Proposed template statements” table and add your entry. Please also put the next higher number in the Key field for reference, and your name in the By field. This “by” field is optional; it is intended to help simplify discussion, telling people who to talk to, and to give some credit. Listing the name of a large group in this field is not particularly useful.
After adding an entry, for a short time (perhaps a few hours, guided by any comments on it) it’s okay to edit it if you change your mind. After that, please leave it, and just add a new row for the new version. You can put new versions in the middle of the table and use keys like 1a.
1.9.3 Add new signals
Once you’re familiar with the structure of this document and all the signals in your area of interest, you may add new signal sections (with a title starting “Signal:” or even new group sections. (For heading numbering, you can use the “Table of contents” add-on from LumApps to number the headers. Or just leave the numbering for someone else using the add-on.)
When you add a new signal, please copy this table to the new section, and then fill in at least one row to clarify what the signal data conveys.
Proposed Template Statement
Folks who add content to this document are encouraged to add themselves in this section, potentially with some affiliation & credential information. This also allows the “By” column to stay short, as people can use short forms of names (eg only first or last name, if unique in this doc).
Entries marked as by “Credibility Coalition” are prior work by members of the Credibility Coalition. At the time, individual authorship information was not maintained. Moving forward, specific authorship detail is welcome.
(add yourself here...)
This document is assembled from multiple data sources. They provide both the overall structure of this document and the details about each signal, include definitions, example data, and implementation status.These sources are fetched started with a source list, which appears as the first entry below. In general, text in this document links back to its source with a link-out icon.
A claim is “an assertion that is open to disagreement; equivalently, a meaningful declarative sentence which is logically either true or false (to some degree); equivalently, a proposition inpropositional logic.” [credweb report]
Claims can be stated (with various decree of clarity) in some content or implied by the content (even non-textual content, like a photograph).
Claims are usually the smallest practical granularity. Credibility data about claims is largely focussed on what other sources have said about that claim, as in fact checking, but could also involve relationships between claims and textual analysis of claim text.
2.1 Claim Review
The “ClaimReview” model developed at s[g][h][i]chema.org grows out of the tradition of independent, external fact-checking, as in PolitiFact. With this model, a fact-checker reviews a claim, typically made by a public figure, and then publishes a review of that claim, a “claim review”. Within schema.org, this parallels other reviews, like restaurant reviews.
[ Can we fit claimreview neatly into this observer/signal model? It’s a bit of a stretch. TBD. ]
2.1.1 Signal: Fact-check status of claim
From Section 7.7.1. Signal: Article has a central claim, claims in articles according to Credibility Coalition WebConf2018 and more recent studies includes the following values for fact-check results at the time of the study: false, true, unclear, mixed; not finding a fact-check is equivalent to an empty statement.
[ClaimA] is a claim that equals another claim, [ClaimB].[j]
Developed for CredCo Political Indicators Study 2018-19. Original example question: “Are there claims that contain phrases, words, or coded language that have taken on a special loaded meaning, in the understanding of the speaker and audience?”, with an example of "go to work," used as code for killing during the Rwandan genocide.
[Organization]’s commitments to accuracy and other professional standards are unknown.
2.3 Explicitly Unverified Claims
From CredCo Political Indicators Study 2018-2019: in some cases, articles may reference claims or pieces of information that do not contain citations or references. In some cases, within an article, an author can make explicit reference to a claim that has not been verified, using language that specifies that the claim has not been validated or proven to be true. This includes language in an article explicitly referencing that a claim has not yet been verified to date - but the claim is being mentioned in the article nonetheless.
[Claim] is explicitly unverified, containing language such as “charges have not been proven true.” [r]
3. Subject type: Text
Includes: phrase, sentence, paragraph, document, document fragment
A text, in this sense, is a sequence of words, with the usual punctuation, and sometimes embedded multimedia content or meaningful layout, like tables. That is, it’s a document or portion of a document. As examples, a phrase, sentence, paragraph, document section, book chapter, book, and complete book series would typically each count as a text.
Signals here concern properties of the text, itself, separate from how it might be published (eg on a Web Page, on a billboard, spoken at a rally) or where it might be published (in some Venue). The text should be considered immutable: a text (in this sense) doesn't change. If you take a text and change it, you are making a new text, which needs to be reexamined, to see which observations (and thus which signal data) applies to this other, new text.
Issue: (tech) How to represent texts in RDF? Options include annotation URL with secure hash, annotation object URL with secure hash, data: URI, etc.
Texts adopt a tone to appeal to their audience and/or attempt to convey how the text should be used. For instance, an academic study is written in formal, verbose and grammatically correct language, while a listicle is short, informal and often humorous. The academic study uses these characteristics to convey authority, while the listicle is intentionally unauthoritative.
Text of [subject article] has consecutive question points.
Example sentence: "If you're a Friends fan, you probably know that Ross and Rachel's relationship was...kind of a disaster 95% of the time." (Source)
3.2 References or citations
3.2.1 Signal: Uses standardized references or citations
These standards are required and enforced by professions that demand accuracy, and are typically found in highly researched, and therefore more authoritative, texts. Examples: Legal, academic, or scientific citations, e.g., MLA, APA.
Text of [subject article] uses standardized references or citations.
Example sentence: "Changes in body temperature have long been used as an indicator of injury, inflammation or infection in veterinary medicine (George et al., 2014), however, the use of
temperature devices such as rectal thermometers and thermal microchips can be both invasive
and time consuming (Johnson et al., 2011)." (Source)
3.2.2 Signal: Uses formal but not standardized references or citations
Examples: Journalism, nonfiction or explanatory material
Some texts use references extensively, even if they are not written according to a rigid structure. These texts tend to be authoritative but not as authoritative as the texts using the rigidly structured citations. The content of the references is also extremely influential.
Text of [subject article] has few or no references or citations.
One exception is a first-hand account, which can become a primary document for later research. These personal accounts, however, should be vetted and cross-referenced with other sources to evaluate its accuracy.
Example sentence: "The shrine is the work of SUNY Purchase sophomore Phillip Hosang, who, like a lot of students at the school, had long heard rumors about a secret room in a men's bathroom somewhere in the visual arts building." (Source)
3.3.1 Signal: Many or multiple instances of the pronouns "I" or "you"
Texts that use the pronouns "I" or "you" are typically opinion, correspondence or personal account. These texts are usually not trying to be authoritative or explanatory, however, they sometimes form a primary document that is used in secondary research.
Text of [subject article] has few or no instances of the words "I" or "you."
"President Trump said he would not overrule his acting attorney general, Matthew G. Whitaker, if he decides to curtail the special counsel probe being led by Robert S. Mueller III into Russian interference in the 2016 election campaign." (Source)
3.4 Signal: Vocabulary or reading level
Texts with a wide and varied vocabulary, which may include jargon or uncommon words, is an indicator of formal tone.
Text of [subject article] contains verbalized threat to democracy, such as a proposal to overthrow democratic government by force or undemocratic way (e.g. “Obama is a Muslim Agent with Brotherhood Ties. American people must take him down.”)
Source: Oz, M., Zheng, P., Chen, G. M., & Park, R. H. (2018). Twitter versus Facebook: Comparing incivility, impoliteness, and deliberative attributes. New Media & Society, 20(9), 3400–3419. http://doi.org/10.1177/1461444817749516
Text of [subject article] contains words in all capital letters (e.g. “Who flew the planes into the towers on 9/11? ILLEGAL IMMIGRANTS!”)
Source: Oz, M., Zheng, P., Chen, G. M., & Park, R. H. (2018). Twitter versus Facebook: Comparing incivility, impoliteness, and deliberative attributes. New Media & Society, 20(9), 3400–3419. http://doi.org/10.1177/1461444817749516
3.6 Text type
Editor note: This should probably be abstracted to all different types of contents.
When pictures of people are used, there are often choices about which image to use, and how to manipulate it, to make the person look better/worse or associate them with some positive or negative concept. Some people have pointed out how media gets to choose, when someone is arrested, whether to use flattering photos provided by supporters or a mug shot provided by the police.
4.1.1 Signal: Flattering image
4.1.2 Signal: Unflattering image
4.2 Originality of Photo Used in an Article
These signals are designed with the assumption that the image is used in the broader context of a journalistic article.
[Video] still photography does not include attribution.
6.9.4 Other graphics
Signals for rhetoric, valence, what else?
Signals for valence and drama/exaggeration?
7. Subject type: Article
Includes: News Story, News Article, Scientific Paper, Blog Post
An article is a collection of information intended to convey some information, usually factual, usually created by one or more identifier people, and usually released at a specific point in time in some venue. It consists of elements like a body, a title, a publication date, and an author list. Unlike Texts, where any change makes it a different Text, an Article may be revised over time and still be considered the same Article (albeit a different version). Usually only minor changes are socially appropriate, however. Consumers of credibility data may need to be cautious of which version an observation applies to.
If an article appears on a web page, or in a portion of a web page, we can use its URL to identify the article.
Differentiation between Article and Text. Consider whether the signal data would be the same if the text were moved to a different article, perhaps published in a different venue, with a different title, at a different time, and with other text before or after it in some article. If the observation would be the same, then the signal is a property of the text, not the article. In that case it be in 3. Subject type: Text not here.
[Text] in [Content-Object] [u][v][w]characterizes a group or groups of people along lines that explicitly differentiate them from others.
Developed for CredCo Political Indicators Study 2018-19.This can apply to situations in which the author is associated with the defined group or defining an external group. Can be used in combination with 7.8. Claims in Articles and other “Content-Objects.”
[Text] is an address that exhorts, or urges someone to do something.
7.2.7 Signal: Call to Violence
This signal is meant to capture a call to violence. Perhaps also expressed as part of ‘Dangerous Speech’: “ any form of expression (speech, text, or images) that can increase the risk that its audience will condone or participate in violence against members of another group” (see https://dangerousspeech.org/about-dangerous-speech/).
[Text] contains language that can be understood as a call to violence or seems harmful.
Developed for CredCo Political Indicators Study 2018-19.
7.2.8 Signal: Call to Action (Political)
This signal is meant to capture a textual call to action, not to be confused with a marketing call to action https://en.wikipedia.org/wiki/Call_to_action_(marketing). Sometimes, these calls to action are also associated with requests for enacting/executing an action as an expression of one’s loyalty, identity, or affiliation.
[Text] contains language that can be understood as a political call to action, which requests readers to follow-through with a particular task, or tells readers what to do such as: signing online petitions, joining a mailing list, giving donations, voting, protesting, boycotting.
Developed for CredCo Political Indicators Study 2018-19.
cciv: This article properly characterizes the methods and conclusions of the cited or quoted source. In addition to a Likert measure, two other options are possible: (A) Unable to find source, (B) Source is behind a paywall
What is the impact factor of the journal or conference cited? *** From Wikipedia: The impact factor (IF) or journal impact factor (JIF) of an academic journal is a measure reflecting the yearly average number of citations to recent articles published in that journal. It is frequently used as a proxy for the relative importance of a journal within its field; journals with higher impact factors are often deemed to be more important than those with lower ones. https://en.wikipedia.org/wiki/Impact_factor
A “dek” is a subhed in journalism that appears below the headline of an article, usually in a smaller font (but in a larger font than the main body of the article). It typically summarizes the article or highlights a main point from the article.
7.6 Claims in Articles
Although there is a separate section for Claims [2. Subject type: Claim], this section deals with the case when the analysis of one or more claims within an article is made to signify something about the article itself. [Probably could use an introductory paragraph on different levels/objects once those are clarified, since this translation is taking place for a number of projects, consider articles to domains/publishers.]
In the following signals, an assumption is made on the existence of a central claim of the article that is recognizable.
A Title is an immutable association of an Article and some short Text which is typically presented first. When a Title text changes, that's a different Title. An article may have many different titles at different points in time and in different contexts, although social practice is usually against this.
The same title text may used for many different articles, but those are considered different Titles here. For example, many of the signals about a Title with the text, “Local man dies”, will depend on which Article it's associated with.
Title understates claims from the primary content of the article body.
8.3 Misleading about the world
8.4 Non-misleading consumer manipulation
8.5 Title Characteristics
8.5.1 (Section with no title?)
9. Subject type: Web Page
Issue: Should this be a Heading1 like Title? Probably no, because the statements naturally get phrased with the subject of the statements being a web page. Most people wouldn't conceptualize the page layout as its own entity.
Does an ads.txt file exist on the domain? *** ads.txt (Authorized Digital Sellers) is an Interactive Advertising Bureau initiative. It specifies a text file that companies can host on their web servers, listing the other companies authorized to sell their products or services. This is designed to allow online buyers to check the validity of the sellers from whom they buy, for the purposes of internet fraud prevention. (from https://en.wikipedia.org/wiki/Ads.txt)
How strongly do you agree or disagree that the page of the article has spammy or clickbaity advertisements? *** The page of the article has spammy or clickbaity advertisements. This is limited to a subjective assessment at this time.
How many ads appears on the article page? *** There are multiple types of ads to look for. (1) Display ads. These are boxes that are clearly advertisements, typically in the form of a graphic image or, in the case of Google Adwords, a box with text. (2) Content recommendation engines, specifically, Taboola, Outbrain, Tivo, RevContent. A box of content recommendations on a page counts as one. (3) Sponsored content. This is content recommended on the site with a clear label: “Sponsored.” (4) Call for social sharing. (5) Call to subscribe to a mailing list
How strongly do you agree or disagree that the page of the article has aggressive advertisements, including calls to join a mailing list? *** The page of the article has aggressive advertisements. This is limited to a subjective assessment at this time.
The aggregator displays an indicator of the provider’s bias.
12. Subject type: Venue
A Venue is a branded content channel, which might be separable from the provider of that channel. For instance a particular newspaper’s “Lifestyle” and “Sports” sections would typically be considered distinct Venues with distinct reputations. In this case, they would be sub-Venues of the newspaper itself.
The distinction between Venue and Provider is not always clear in people’s mind; it is essentially the distinction between a brand and the brand’s owner. We try to make the distinction in order to be able to understand the impacts on reputation when, for example, one company sells a content brand to another company.
13. Subject type: Provider
14. Subject type: Creator
Also called: author, writer, reporter, byline
A Creator (in this context) is a Person is a person who creates content, such as by writing articles. All the signals which apply to a Person also apply to a Creator. Signals are listed here if they only really make sense for content creators. Even if every Person was a Creator, this grouping could still be convenient.
15. Subject type: Person
15.1 Good Faith
[subject] acts in good faith in their online interactions
[subject] accurately characterizes their confidence in what they post online
Use of “ethos”
15.3 Domain Expertise
Statements from [subject] about [topic] are true [percent] of the time.
This person uses logos
This person uses pathos
This person generally uses ethos in argument
Is a recognized expert in the field of [...]
16. Subject type: Organization
17. Subject type: Testing
Using this section to help migrate over signals from other spaces.
18. To Be Categorized
Comparative indicators (as per BitPress): “[article1] and [article2] describe the same event in a significantly different way”. Something like that. For example:
[article1] includes an important claim [claim] that [article2] omits when describing the same topic
But that could be done as “[article1] describing event [event] makes claims [claim]”
If the content of the article is not original, was attribution given and if so, was the attribution accurate? *** (A) Attribution was not given
(B) Attribution was given but was inaccurate
(C) Attribution was given and was accurate
(D) Unclear which is the original
Has the text of this article appeared in exactly the same words or very similar words in another publication? *** (A) Most likely original
(B) Appears to be a copy of one or more articles, with some portions different or remixed
(C) Extensive quoting from another source, with some original content
(D) A wholesale duplicate of another article
When does the article claim it was published? *** From An: "i think technically this states both the date and location of the article’s publication. in which case it needs two forms of operationalization
cciv: Articles 1:M authorship. Note that not all articles have bylines, even in traditional news sources. Bylines also don't always start with 'by' (https://medium.com/@rchang/advice-for-new-and-junior-data-scientists-2ab02396cf5b)
How accessible/responsive is this author to the public? *** How accessible is this author to the public. Do they have a website? Does this author have a publicly available email address? Are they on Twitter or Facebook? Do they often respond to readers?
What degree of attention does this author command from other individuals? *** How many people follow this author (on social media). How many other journalists follow this author? What speaking engagements does this author command?
What evidence is given for the primary claim? Select all that apply. *** Types of evidence for a claim:
Cause precedes effect
The correlation appears across multiple independent contexts
A plausible mechanism is proposed
An experimental study was conducted (natural experiments OK)
Experts are cited
Other kind of evidence
No evidence given
What is the name of the magazine, newspaper, journal, book that the article appeared within? *** Publications are parents of articles.
Publishers may have 1:M Publications. Publications newsline magazines, series, newspapers, etc.
cciv: (needs to go under Website) Dwell Time: Time a user spends on a page? Objectively captured by web logs. Sum of times / number of users. Note, this cannot identify the time spent actually reading the article.
How strongly do you agree or disagree that the page of the article has aggressive advertisements? *** The page of the article has aggressive advertisements. This is limited to a subjective assessment at this time.
How strongly do you agree or disagree that the page of the article has aggressive social shares? *** The page of the article has aggressive social shares, which may include calls to share the article within the text. This is limited to a subjective assessment at this time.
How is the title unrepresentative of the content of the article? (Select all that apply). *** Types of title representativeness
(A) Title is on a different topic than the body
(B) Title emphasizes different information than the body
(C) Title carries little information about the body
(D) Title takes a different stance than the body
(E) Title overstates claims or conclusions in the body
(F) Title understates claims or conclusions in the body
What clickbait techniques does this headline employ (select all that apply)? *** A typology of clickbait headlines:
Listicle (“6 Tips on …”)
Cliffhanger to a story (“You Won’t Believe What Happens Next”, “Man Divorces His Wife After Overhearing This Conversation”)
Provoking emotions, such as shock or surprise (“...Shocking Result”, “...Leave You in Tears”)
Hidden secret or trick (“Fitness Companies Hate Him...”, “Experts are Dying to Know Their Secret”)
Challenges to the ego (“Only People with IQ Above 160 Can Solve This”)
Defying convention (“Think Orange Juice is Good for you? Think Again!”, “Here are 5 Foods You Never Thought Would Kill You”)
Inducing fear (“Is Your Boyfriend Cheating on You?”)
Is the description an extreme exaggeration, an extreme minimization, or proportional to the event or situation described? *** The extent to which language in the text is proportional to the situation, or exaggerates or minimizes events. Exaggeration as defined in Webster 1913, "...the act of doing or representing in an excessive manner;..." and Minimization as defined in Oxford Living Dictionary accessed June 2018, "1.1 Represent or estimate at less than the true value or importance"