Can you tell, when you're looking at a random web page, whether you should trust it? When scanning a page of reviews or search results, do you know which matches come from legitimate sources and which are scams? When reading a news feed, can you tell which items ought to be believed and which are slanted, manipulative, propaganda, or outright lies? Perhaps most importantly, what happens when you inevitably guess wrong while making some of these credibility assessments, and you unknowingly share falsehoods to your community, helping spread some online pathogen? What if you are misled into making bad decisions for yourself and the people you care about, with potentially disastrous consequences?
The Credible Web Community Group formed at W3C, the organization which develops technical standards for the web, to look for technological solutions for this “credibility assessment” problem. It's not that we think tech can solve every problem, especially ones as deeply human and complex as this one, but it seems likely that some tech is making matters worse and that other designs could probably serve people better. For some of us, creating better approaches to credibility assessment seems like a good way to help.
This document is a work in progress and edits are welcome. The document collects ideas, mostly following discussion at meetings of the group. For simplicity and to keep the focus on improving the ideas, provenance/attribution information is generally omitted (that is, we don’t attach ideas to people). In some cases such details are available in meeting minutes and in other documents.
Web-centric technical solutions which require standardization
The web isn't everything but it's arguably the knowledge backbone of the world today. Everything can and usually does tie back to the web to some degree, so if the web is a reliable source of accuracy, that should help quite a bit.
Data interchange vocabularies are the likely main area to standardize. See below, Potential New Web Standards
Browser functionality is another possibility, but the bar is much higher, and none of the current proposals require that to function, although it would certainly help with deployment.
The regulation of content (called "censorship" in some contexts) is controversial. Nearly every country in the world considers publication of certain kinds of material immoral and illegal (such as child pornography). At the same time, many of the same countries express the value of free speech and a free press, sometimes in very broad language. To humans, the difference may be laughably clear, but in general a computer system designed to allow one kind of content to be blocked can also be used to block all the other kinds.[i]
Additionally, the people and computing systems involved in a potentially-problematic communication may be spread out over time and place, each subject to different cultural views and applicable legal authorities.
[ explain the Web currently handles this by not blocking anything but making it hard to publish anonymously, and folks who want to publish anonymously have to trust someone else to protect their identity, eg a social media platform. And we think this architecture makes sense to continue.]
Perhaps the best path forward is to aim for systems which allow users[j] to publish and view any available content they chose, but also to enable those individuals to be held accountable if they break the laws which apply to them. Legitimate needs for anonymity could be handled within such a scheme by using trusted proxies, such as journalists[k].[l]
Some proposals require or facilitate centralization, such as the idea of establishing and enforcing uniform standards[n][o] for journalism. These approaches have some appeal and might be effective, but centralization is antithetical to Web Architecture and would likely introduce problems with scaling, censorship, and political bias. We have been considering centralized approaches out-of-scope, and aim for designs which give the end-user final control over their computing platform[p].
Some proposals are likely to increase the degree to which consumers see only the content they want to see or their community wants them to see, regardless of its accuracy.
One theory on addressing this is to make clear to users when they are inside their bubble (their personal walled garden) and to allow them to easily step outside of it, whenever they want. The theory is that people won’t always choose to be sheltered, but will instead sometimes be looking for adventure, or something, “outside the wall”. [r]
It has also been suggested that prediction markets incentivize accurate broad knowledge, leading people outside their bubbles, and that commerce/trade in general motivates people to connect between “tribes” and help people get along regardless of “tribe”.
[Some solutions are likely to trade-off against privacy, to some degree. Eg a service which checks every page you visit gets to see every page you visit, potentially even employer-private, medical/hipaa, etc]
eg Facebook result that warning labels on false news increased engagement, so test thoroughly before deploying at scale. That’s very hard to do in decentralized/open systems.
? Is it possible to run real experiments? A/B testing? See https://civilservant.io/about_us.html
People are notoriously bad at seeing the downsides of work they support. As such, an adversarial approval process -- where work must be reviewed by disinterested and potentially hostile parties before it can proceed -- is often useful to vet work that needs to be of high quality.
It may be wise to set up a Review Board[s][t] to oversee potential interventions around credibilit[u]y. Models for this include IRBs, approval processes for medical treatments, and W3C's Horizontal Review process.
Such a review board might be useful for vendor-specific interventions as well as standards-based ones, and could bring considerable credibility to the process if staffed with people respected by diverse stakeholders.
TBD: broad & deep review, many different perspectives to spot flaws & relevant expertise
People necessarily make credibility assessments about content items and about the people who provided the content. It seems likely they make these assessments across a wide range of granularities.
For example, Harvard University would be large-granularity source, comprising many people over several centuries, but still having some degree of aggregate credibility. In contrast, in considering the credibility of Harvard Professor Jonathan Zittrain, one would be working at a smaller granularity. An even smaller granularity might be Professor Zittrain during a given conversation or on a given topic. For example, he might be much more credible when speaking about issues concerning the Internet and Society than about 12th century French poetry.[w][x][y]
Reasoning about the credibility of content[z][aa] similarly requires working with a range of granularities.[ab][ac][ad] For example, we might consider the credibility of a snapshot of Wikipedia, or of a particular Wikipedia page, or the text of a paragraph, or a specific claim.
In general, working with metadata about web content, it often works to design for web pages, then extend the granularity up to origins (loosely, domain names) and down to fragments (loosely, portions of a page or portions of media shown on the page). Note that the Web Annotation framework specifies a way to make essentially any portion of any version of a page be its own addressable fragment, although that is not yet widely implemented.
Sites which aggregate content from other providers, including social media sites, will need special consideration, as will sites which analyze or review other content, sometimes including false content as an example or evidence.
Credibility ratings for a web page may at first seem to be just about whether the content is trustworthy, but in practice websites can change their content or even secretly show different content to different observers, so the trustworthiness[ae] of the site maintainer is a factor as well. For assessments to securely apply to just the intended content of a page and not also include its provider, some mechanism is needed to securely indicate the exact content being considered[af].
[Move this section to the Intro, maybe under Terminology?]
Techniques for assessing credibility can be grouped roughly into the following four types of strategies. These grouping were originally based on particular projects which focused on only one of the areas. Each of these strategies could be applied with a wide range of computer assistance, and some technologies will likely combine elements of several strategies.
Users might experience credibility assessment tools in a variety of ways, including:
We can divide the UX into three main modes of interaction based on the user’s state of mind around credibility assessment.[ah][ai]
These only help when the user consciously wonders whether to trust the content.
Most of these are perhaps best done at the claim level, where the user indicates a specific questionable claim in the content, rather than a whole page or site.
These risk triggering a backfire effect, as users may cling to false beliefs more when corrected. Proper presentation, such as attributing the correction to a user-selected trusted source might help with this.
Some of these could impinge on the user's autonomy [am]or confuse the user.
Many of these actions have strong privacy and data ownership considerations. If these were built into the browser, where would the data end up shared? If these are built into anything else, how does that not centralize the feature and create a new silo?
(This section needs some actual threat modeling, still)
Wide range, some characteristics below, but sometimes attackers are masking their motives, etc.
See also https://medium.com/1st-draft/fake-news-its-complicated-d0f773766c79
See landscape spreadsheet for now
Maybe indicate active liaisons, and describe relationship, and which strategies they connect to.
[Maybe flesh these out in terms of the specific technical solutions ]
[ Talk about funding. Maybe there should be grant and investment and bootcamp programs around these interventions. ]
These are grouped by Assessment Strategy for clarity.
[Maybe rephrase in terms of a particular software products that could be built, and what the desired impact would be. Hopefully fill in, in time, with products that do that. Maybe useful for a hackathon, incubator, social investment fund.]
People in step 1 might be motivated by:
Machines programmed/trained to do step 1 will probably be faster and cheaper, and might in time become more accurate and trustworthy. They will likely also have unpredictable failure modes (bugs, unexpected training artifacts) which might pose a security vulnerability to be considered.
Gameability: signals which are financially prohibitive to fake
- attacker can't use best tools (reduce income)
- attacker needs expensive people/tools to create (increase cost)
See issues and discussion at https://github.com/w3c/cred-claims
Some of these rely on emerging technologies:
Zoom out to many items, across time, and you get a sort of automated reputation, this provider’s fact-check history. This can lead into the public reputation, which is vital to the business model of some providers.
-- white lists
-- what do you say about others
-- what do others say about some content
- Allow individuals and organizations to express their view on
credibility of others
- Agree on certain levels, eg "good faith participant"
- Have certain trusted agencies which establish reliable identity of
- Have certain trusted agencies vet self-disclosure statements
* Software working on your behalf can try to ascertain the reputation
of each website you visit based on what has been said about them by
entities you select, and by intermediate parties in a social graph.
* Software can aggregate what others have said about a particular web
page that bears on its credibility
-- what can one say about oneself to be more visibly trustworthy[be]
- Label article types (eg opinion)
- Label account types (eg parody)
* Software can highlight sites which offer transparency disclosures
* Software can highlight aspects of the transparency disclosures which
might be especially relevant to you, based on your values or stated
* Software can particularly highlight supportive and refutive claims
from known 3rd parties about self-descriptive claims. (Combines with reputation)
Each of the below potential standards is intended to allow systems to interoperate by sharing data with common syntax and semantics[bf]. The expectation is that data sharing will be done in a manner compatible with of schema.org using W3C standard data technologies like JSON-LD. These build on the underlying general graph model of RDF and support related technologies like the SPARQL query language, but can usually be used as ordinary JSON embedded in web pages. In each case, the purpose is to allow various independent systems to provide data feeds which can be easily and unambiguously understood by other systems across the Web.
Standard for expressing features of web pages and their content which bear on credibility and which are relatively expensive to fake. When humans are recognizing the features, this would allow their work to be shared. When software is recognizing the features, it would allow a more standard API between modules. Standardization would also allow people to be trained to work with these features in creating content and in mentally assessing credibility.
Standards for reputation of providers:
Standards for reputation of content:
Some work already done with the Trust Project and schema.org. Unclear what else might be valuable.
Define a way for one provider to embed content from another without endorsement. This would be for credibility a bit like nofollow was for SEO. An iframe would be the natural web way to do this, but 3rd party content on social media doesn’t currently work that way. There is also the question whether the visited site can be entirely blameless for harmful 3rd party content; perhaps it can embed it with indicators about what editorial/review process it followed, if any.
[Please add more, as long of you’ve read it and honestly recommend it. Also, link to external lists.]
See: March 2018 Review of Scientific Literature (Tucker et al)
Design Solutions for Fake News is a sprawling (200+ page) crowd-sourced Google-Doc, initiated by Eli Pariser after the 2016 US Election. It approaches many of the same ideas as this document, from some different angles.
Information Disorder: Toward an interdisciplinary framework for research and policy making by Claire Wardle and Hossein Derakhshan of First Draft includes specific recommendations for technology companies.
The Partisan Brain: Why People Are Attracted To Fake News And What To Do About It
Claire Wardle’s Misinformation Reading List
Web Literacy for Student Fact-Checkers
The Righteous Mind, by Jonathan Haidt
Factfulness: Ten Reasons We’re Wrong About the World -- and Why Things Are Better Than You Think, by Hans Rosling
We are grateful for review comments from these individuals. These people do not necessarily endorse this document. Please add your name/affiliation here if submitting review comments.
[a]Reminds me of my own model for systems of truth: https://docs.google.com/document/d/1rCjxzKakxtqGxunNJTvLl3wHQ9a1xKbOJs-1qlCJn_c/edit
The way I break it down, there's the phenomenon itself, observers, publishers, and platforms.
[b]maybe worth noting that these roles aren't mutually exclusive -- also speaks a little to why the situation is so complicated
[c]i also wonder if the role of "commenter" -- more than a consumer, has gone online publicly in some way around the content -- is also worth isolating a part from 'provider'. this is a bit related to a sharer/friends, who sometimes contextualizes the share with commentary. in other words i think the creator/author/publisher role is perhaps significantly different than than the commenter/sharer one?
[d]Making that chain of provenance explicit and visible would be one important kind of credibility indicator.
[e]and sorts/ranks. current definition seems a bit more passive for the kind of role this is now, what do you think?
[f]This makes it appear that you are choosing to drive the trolly or not. Where really your choice is what to run over.
[g]How is that the case for W3C and the 99.9% of tech professionals who aren't doing anything professional about disinformation? For them it's a question of whether to get involved in this problem space or not. And we don't want them to, if they're just going to make things worse, right?
[h]I agree that interventions (redirecting trolly) can definitely be worse than non-intervening.
But regardless, if you are involved in tech standards or work in an organization that impacts public discourse, you are driving the trolly (with varying levels of control and potential casualties). You're just not able to see those potential casualties because you don't know where to look.
It's definitely possible that not intervening could have catastrophic consequences.
Perhaps this section would make more sense to me as "tradeoffs of intervention", where you list hazards of intervention and hazards of non-intervention.
[i]trying to think of a different way of saying this cuz i think i know what you are saying but it's taken me a while to unpack. maybe need another sentence in between what seems straightforward to people and the agnostic ability for a system to sift thru content?
[k]I would suggest using "publishers" here instead of "journalists" because it is publications that make decisions about allowing anonymity.
[l]This whole paragraph makes a number of statements which may be too prescriptive/broad. Might be better to frame as "here is a set of perspectives and their pros/cons" as opposed to "here is what is best".
[m]I think it may be worth acknowledging that "pseudo-centralization" may be needed given that this is partly an adversarial and spam problem. E.g. the way email is essentially centralized now in practice despite being a generic protocol.
[n]This is a false choice. Standardization does NOT require "centralization". To the contrary, standardization enables rational decentralization. Witness the relatively simple standards enabling the Web. In my view, the next step that is clearly required to mature the Web is to specify "declarative" standards (e.g., XSDs) for each and every type of "content" that is deemed worthy of caring about its credibility. If no one cares enough to specify its semantics and structure, why should we?
[o]I don't quite understand. Obviously I agree many standards do enable decentralization. But I think standardizing on one entity to decide what constitutes legitimate content would be highly centralizing, and I think that's a common first approach to "fake news" as we see in some countries. This section is trying to say Dont Do That.
[p]..."which still may require behind-the-scenes structuring via vocabularies to enable those choices" (checking if am i understanding this correctly or no)
[q]It's worth highlighting the literature that notes that polarization is not necessarily a bad thing https://twitter.com/rasmus_kleis/status/1013822378697265152
Cleaner summary: https://twitter.com/rasmus_kleis/status/1014071175415246848
[r]http://graphics.wsj.com/blue-feed-red-feed/ is one effort to enable people to "step outside"
[s]Press councils can be another model: http://www.aipce.net/whatIsAPressCouncil.html
[v]A whole lot of what Mike Caulfield teaches has to do with looking upstream to find the origin of something that's traveled through intermediaries. Ways to make this hard work easier, if not automatic, would really help.
[w]I think it would be helpful for others reading this to describe this in abstract/scalable terms first, then provide the examples as shown.
From what I'm reading, it sounds like you're arguing that credibility of an entity is dependent on the topic at hand (seemingly dependent on their experience with the topic). And furthermore, entities can be individuals or collectives, the latter's credibility often being some aggregate of its members.
[x]I think it'd also be useful to break down the other things that could affect one's established credibility. For example, the thoroughness of one's journalism practices could be a better indicator of credibility than experience in some cases.
[y]That's an interesting point. Credibility from authority needs to be based on the subject, but journalists move from subject to subject frequently as they change assignments. Indicators of authority need to be bounded by subject matter, but they also need to be agile enough that they can keep up with a journalist.
[z]Is multimedia content (images/videos) and its authenticity and use also in the scope of this work? If yes, it would make sense to include it in the examples that follow.
[aa]Yes, good idea
[ab]This brings up a question in my head: at what point of granularity does one stop considering "credibility" and start considering "validity"?
To me at least, it seems like credibility deals more with the how people choose to believe (or not believe) the entities you mentioned above.
[ac]On the other hand, content and claims in a vacuum (without an owner) can either be said to be true or not.
...I'm thinking out loud here, though. I could see an argument for such claims to be "credible" by themselves based on certain attributes of their delivery or how well they align with the observer's worldview.
Just something to think about.
[ad]Yeah, i think probably we need a definition/comment about "credibility", and how it connects to truth, validity, and my favorite term, "accuracy".
[ae]trustworthiness or clear practice of publication protocol? also thinking that probably some bloggers and small outfits are trustworthy, but they take advantage of the editorial freedom that online publishing gives them, and dink with texts now and then. (which is why access timedate stamp is part of some citation processes)
[af]And its change history! The Wayback helps in a coarsely granular way. https://twitter.com/nyt_diff does a much more focused thing. But there's lots of space along the continuum between those extremes.
[ag]Presumably this can be tool-supported, when we're able to train recognizers with our initially-human-assigned indicators.
[ah]These modes are interesting, I presume they come from general observations. Or has there been some research into this?
[ai]I'm hoping someone brings in relevant research but for now, yeah, it's from thinking about this problem
[aj]Like comments sections, this could become an invitation for partisans to troll and express their doubts about viewpoints that differ from theirs, regardless of any evidence presented.
[ak]the three modes have overlap. for instance, users could choose to turn on filtering/blocking or tweak their personal algorithm to demote certain content. I think there is an assumption that all of these are platform operated but they need not be.
[al]I suspect this has been the dominant approach among those sites that are concerned about credibility. Display of trustmarks, esp. for health information sites, and awards are examples of passive guide posts. I would suggest that most users are not likely to be consciously focused on credibility issues unless they come across something that jars their sense of knowledge. Yet, a UX that includes credibility support mechanisms and that provides the least resistance to the user's exploration and inquiries will be more trusted.
[am]It may be possible to "nudge" folks without unduly impinging upon their autonomy. As Thaler & Sunstein have suggested, the key is to ensure that it is easy for them to make another choice if they wish.
[an]Is this an "attack" then? Should it be included here? Or, should section 7 be renamed "Influence Models" and then broken down into subsections that reflect increasing degree of threat/attack?
[ao]thought the example of zignal labs and stock market price/reputation attack correlation was interesting
[ap]This name feels vague and covers some of the other categories. I think it could be split up too - doing it for the lulz is different than inciting harassment out of anger.
[aq]I was trying to put "doing it for the lulz" into #5, but ... it'd be nice if there was an existing enumeration of this stuff. Agreed this name is lousy.
[ar]maybe we can lean on FD typology of motivations: https://medium.com/1st-draft/fake-news-its-complicated-d0f773766c79
[as]D&S Report: https://datasociety.net/pubs/oh/DataAndSociety_MediaManipulationAndDisinformationOnline.pdf
see language on: "The preservation of ambiguity (Poe’s Law: “Without a clear indication of the
author’s intent, it is difficult or impossible to tell the difference between an
expression of sincere extremism and a parody of extremism.”16)"
[at]Even engaging in violent behaviour against specific groups of people.
[au]making certain people's lives miserable
[av]Hmmmm. What's the intended impact there? Like, an intermediate impact is to make them miserable, maybe it's seen as punishing them, but why? Or is that in practice not knowable. In the case of eg sexism it seems to be about shaping the social order, or something like that.
[ax]We human beings can and should make it easier for intelligence augmentation tools to help us understand the relevance, value, and validity of what we see and hear. What would be required of us is to exercise more discipline in the structure and semantics of the information that we create and share. Helping us do that is the purpose of XML schemas (XSDs). At some point, information that cannot be readily validated will be properly evaluated and discounted as virtually worthless.
[ay]This section looks good!
[az]This shifts the trust target from the document to the tool, the signals it extracts, and the user ability to interpret them. We might want to clarify these aspects as well.
[ba]Do we want to discuss, here, the platform vs browser dynamic w/respect to how fact-checks are (or can be) found and displayed relative to content?
[bb]today i have been thinking about the efforts to fact-check the fact checkers (observe the observers), which puts things into a bit of a loop -- PolitiFact as making a claim itself, which gets checked by organizations like this reported conservative one https://www.newsbusters.org/fact-checkers (the idea that fact checking is liberal: https://www.washingtontimes.com/news/2018/mar/27/conservative-project-seeks-fact-check-fact-checker/) so thinking a bit about how this looks under both Inspection and Corroboration
[bc]aren't all these related to Inspection/credibility indicators?
[bd]There seems to be a practical distinction between the credibility signals that are intrinsic to the work and those which are added explicitly in the name of disclosure/transparency. It's true the lable "parody" could be done under either mode. In practice, I think the machine-readable "parody" label goes here while a human-readable one, which might in fact be misunderstood or unclear, would be under Inspection.
[bg]Lots of issues with this though in current form.
[bh]Should we perhaps link to https://github.com/w3c/cred-claims and https://github.com/schemaorg/schemaorg/issues/1969 to make the point there is more to do?
[bi]What about Expressing the "credibility history" of a provider (e.g. past verified or debunked posts)? It is somewhat related to bias, but more explicitly linked with a credibility prior.
[bj]That strikes me as more Corroboration -- claims which are verified or debunked -- just with a broader time horizon. That should definitely be added here somewhere. And yeah, as a user, I might expect that to be more about reputation than corroboration. Hrrrrrrrm.