//credweb.org/minutes/20180502

Minutes: Credible Web CG (02 May 2018)

Agenda
Attendees: sam boyer, Vagner Diniz, Scott Yates, Amy Zhang, Amy Zhang, smyles, An Xiao Mina, michel weksler, Davide Ceolin, Reto Gmür, John Connuck, Tim Weninger, Sandro Hawke, Sandro Hawke, Newton Calegari, Cheryl Langdon-Orr
Scribe(s): sam boyer

Topics

Minutes

# [Sandro Hawke] Today's agenda: https://credweb.org/agenda/20180502.html

Approval of last week's minutes

# Sandro Hawke says: he'd like a vote on approving the preceding two meetings' minutes, modulo attendees list

# [Vagner Diniz] +1 more time

# Sandro Hawke says: a few folks would like more time to review, so we'll defer approval of minutes to next week

# Sandro Hawke says: today's meeting is about the solution landscape part of chartering.

Calendar

# [Scott Yates] https://comocontentmoderationatscal2018.sched.com/

# Scott Yates says: he is going to be at Content Moderation at Scale conference, which is potentially related to our field of interest

# [Scott Yates] And I'll be happy to share notes after

# An Xiao Mina says: RightsCon in Toronto will be happening presently, which is also close to our interests. @Amy Zhang will be there in a panel on AI

# [An Xiao Mina] Doodle poll for in-person meeting https://doodle.com/poll/qudimieecq8kapdn

# Sandro Hawke says: he has reviewed the scheduling Doodle. Pretty good availability at the end of July, though some overlap with Decentralized Web Summit. If you haven't filled out the poll, please do so!

Introductions

# Sandro Hawke says: let's try to do fairly quick intros, as we do have a couple new people. Alphabetical order by firstname

# [Scott Yates] My name is Scott Yates, Entrepreneur In Residence for CableLabs, the research arm of the cable/broadband industry. I'm interested broadly in any solution that increases trust in the internet, so much of which is transported over broadband wires from cable providers.

My background is in journalism (NYU and several print publications) and three startups, including one named for a dog!

# [An Xiao Mina] I'm An. Co-chair with Sandro and completely new to the web standards world but have long been looking at misinformation, trust and social media. Director of product at Meedan, where we build tools for collaborative verification, and affiliated with the Berkman Klein Center, have been looking at networked journalism and networked movements in global contexts. Co-founded the Credibility Coalition, which helped formulate this group. Our focus is on statistical validation of indicators of content credibility that journalists and media researchers are considering, through an annotation workflow that we tested with 40 articles. Just presented this at the Web Conference in Lyon, and shortly after Perugia, Italy three weeks ago for the International Journalism Festival. Here's the paper, which @Amy Zhang led: https://credibilitycoalition.org/results/

# [Jon Udell] Jon Udell, director of integration at Hypothes.is. I'm looking for ways to improve the quality of the ClaimReview metadata that's being produced by leading fact-checking orgs, and used (in a limited way) by Google and Bing.

# [michel weksler] I'm Michel Weksler, an engineer on the payments team at airbnb. I am also airbnb's representative to the W3C. Airbnb is generally interested in credibility, and i am too, though this is not related to my dat to day work.

# [Amy Zhang] I am Amy Zhang, based out of Cambridge, MA. I am a PhD student at MIT CSAIL, and my research is on developing tools to enrich and improve online discussion. As a result, issues affecting online discourse such as misinformation and its social propagation are very interesting to me. I have experience as a software engineer (primarily web, front and backend) as well as UX design and HCI research. Recently, I have been working with the Credibility Coalition (https://credibilitycoalition.org) to facilitate research around article credibility indicators and their definition and annotation. You can find me on Twitter at @amyxzh.

# [Amy Zhang] Farnaz: Also a PhD student at MIT CSAIL. The research project I'm working on is a collaborative news reading platform where users can rely on the entities they trust and their perceived credibility to filter their news for them.

# [John Connuck] I'm @John Connuck representing Facebook. I'm an engineer on our news team focused on distinguishing high-quality, credible news

# [Reto Gmür] I'm with the swiss startup factsmission that wants to promote more fact oriented discourses by putting linked data technologies in action. An open source linked data tool to write claim-reviews for tweets is https://twee.fi

# [Tim Weninger] I am Tim Weninger, professor of computer science and engineering at the University of Notre Dame. I study social media manipulation, rating effects, and social influence biases in online social systems.

# [smyles] I am Stuart Myles, Director of Information Management at The Associated Press in NYC. I direct metadata strategy throughout AP's global news operations. I am also Chairman of the Board of the IPTC, the news technology standards body, where a particular area of interest is trust in the media. https://iptc.org/

# [michel weksler] I need to step out for ~20-30 minutes

# [Davide Ceolin] I'm Davide Ceolin, researcher at CWI, Amsterdam. I'm interested in information quality assessment online. Other interests include social media analysis, trust management, semantic web standards.

# [Ed Summers] I'm Ed Summers, a researcher at MITH at the University of Maryland. I'm interested in the use of web archives (e.g. archive.org) in establishing credibility on the web.

# [Vagner Diniz] Vagner Diniz: @vagner diniz I am the head of Web Technologies Study Center at NIc.br in Brazil. Nic.br has been a W3C member since 2006 and host the W3C Brazil office since 2008. NIC.br core activity is the Internet domain name registration for the country code .BR. I am an electronic engineer with masters in Public Administration. Our main concern is how to make Web more trustworthy considering our local agenda with a general elections coming soon next October.

Chartering

# Sandro Hawke says: chartering: he tried to write down a document that he called "landscape" instead of "charter", because he wants to be clear we're not actually talking about the charter yet - need more time for that to coalesce.

# [An Xiao Mina] landscape doc https://w3c.github.io/credweb/landscape.html

# [Sandro Hawke] https://credweb.org/landscape.html

# Sandro Hawke says: the document, as-written has three categories of approaches. Self-reporting - systems for self-reporting on credibility bits. Claim-based - assessing at details of meaning of content. Reputation - systems focused on networks of trustworthiness for information producers.

Out of Scope

# Sandro Hawke says: maybe the interesting thing to do initially is to focus on the things that are out of scope.

# Sandro Hawke says: he thinks the IEEE effort started with this kind of attempt

# John Connuck says: please elaborate on what these mean

# [smyles] http://sites.ieee.org/sagroups-7011/

# Sandro Hawke says: some ppl think the way to solve the fake news problem is to give a stamp of approval from a "real" news organization.

# [sam boyer] me: says this makes sense for certification, but maybe not quality metrics

# Sandro Hawke says: maybe we should split out quality metrics

# Sandro Hawke says: we might name some things that increase legitimacy, but not actually saying "this is a complete list"

# David Karger says: we could have different entities putting up the standards for quality in their own formats

# Owen Ambur says: he agrees - ideally they should be published ina machine-readable language, like SATML v2

# David Karger says: exactly, that's what he had in mind

# Sandro Hawke says: item 2 - assessment of credibility of content itself. we're not going to do this beyond giving examples.

# [Jon Udell] Out of scope: "Specifying quality metrics or certification standards for journalistic practice or other content providers"

But in scope (@karger): Mechanism by which providers can self-assert their use of such metrics and standards.

# Sandro Hawke says: third - developing algorithms or ML training sets to allow machines to assess credibility. Lots of people in this meeting will be doing these things as part of other projects, but not necessarily in scope here

# Reto Gmür says: he is concerned that cutting out algorithms may be difficult, as it is hard to describe what a reputational trust network is without defining an algorithm

# Sandro Hawke says: maybe just say "not a sophisticated algorithm"

# David Karger says: he thinks we can get pretty far on reputation networks without having to define any specific algorithms

# [smyles] q+

# [Ed Summers] aside: isn't 3 redundant given 2?

# Reto Gmür says: he's not so sure

# David Karger says: this is why he suggested it woudl be helpful to limit to e.g., algorithms that produce boolean output, vs. scalar output - it sufficiently constrains, perhaps, the algorithms we might use. And also that we have to figure out if we consider trustworhtiness to be boolean or not

# Stuart Myles, Director of Information Management at the AP says: he thinks maybe we could outline the algorithm, without implementing it

# Sandro Hawke says: again, reasonable for people who are in the group to do this work, but not necesarily the focus of the group

# [Cheryl Langdon-Orr] + Late addition to roll call, joined meeting around 30 mins after call start, apologies

# [An Xiao Mina] could 2 and 3 be mashed up? "Assessment of credibility content or sources, beyond examples and test data used for development and education, including developing algorithms or constructing machine-learning training sets to enable machines to assess credibility of content or sources"

# Sandro Hawke says: number 4 - any technology that disempowers users or is likely to lead to increased censorship

# Amy Zhang says: how can we be sure of that?

# Sandro Hawke says: we can't. it's a hammer to settle arguments ahead of time

# John Connuck says: he's uncomfortable with having something that's as loose as this; it could readily shut down conversation about reasonable approaches

# Amy Zhang says: she thinks it's not feasible to include this, because almost anything coudl lead to it

# Sandro Hawke says: the mission statement for the CG indicates that we're trying to be anti-censorship; this may be sufficient, doesn't need to be in non-goals

# Vagner Diniz says: he thinks this principle is very interesting; when we come across something that seems to have implications in this area, it's probably central to what we're doing. when something comes up like this, we shoudl talk about it!

# [Ed Summers] I agree re: it not being feasible.

# Reto Gmür says: it should be in the out of scope list, but it should be reworded so that it's clear it does not rely on censorship to function

# Sandro Hawke says: that makes sense to me as a way to put it

# Sandro Hawke says: number 5, avoid new systems if possible; reuse existing systems

# [smyles] q+

# Reto Gmür says: could this include terms? are we saying we can't make new ones?

# Sandro Hawke says: we'll definitely be making new terms, so not sure what you're saying

# [smyles] i wanted to ask the same question as @Reto Gmür - we're not saying "nothing new" we're just saying we want to build on existing technologies where possible

# sam boyer says: this is a hard problem, we should be free to make new things if we need them

# Sandro Hawke says: it's about a default - start by trying to reuse existing things. escape hatch is making new things. very strong preference for reusing existing standards

# Sandro Hawke says: number 6 - assessing hate speech or conversational health. he thinks this isn't in scope for us, but this was a late addition. there's enough to do without us focusing on that

# Sandro Hawke says: that @John Connuck noted the other day that far away, credibility and hate speech look different, but when you look up close they separate

# An Xiao Mina says: it's conceptually difficult to draw a boundary between hate speech and credibility. e.g., if you're spreading misinformation about an ethnic group, what you're doing is spreading hate speech. we might be able to draw a boundary, but misinformation about a group can often end up creating the conditions for hate speech.

# An Xiao Mina says: so, in principle this is right, but in practice this might be fuzzy

# Sandro Hawke says: the in-scope/out-of-scope decisions are up to the chair; this doesn't need to stand up in a court of law, just to ourselves

# Amy Zhang says: maybe the line could be information vs. behavior. behavior as thinking about actions of a user, their intent

# Sandro Hawke says: the intent to spread misinformation is in our territory

# Amy Zhang says: are we focused on the intent to spread misinformation, or the misinformation itself?

# John Connuck says: he'd like us to expand the scope beyond, or just focusing overly on, specific spreading of misinformation, and to distinguishing high, medium, low, quality publishers

# David Karger says: I'm not sure I understand this distinction

# [Vagner Diniz] q+ to say our focus is web credibilty. We should not define by the one negative aspect of web credibility

# John Connuck says: we could spend all of this group's time on eliminating misinformation. but it's a sub-question of the issue of what makes web publishers credible in general

# Vagner Diniz says: just remember that we should not put ... we should focus on web credibility, therefore what elements makes the web more credible. misinformation is one negative aspect of credibility. he woudl put more effort into defining criteria, indicators, aspects may make the web worse

# [Reto Gmür] I think our focus is not the intent but the information (if somebody intends to spread misinformation but by their mistake spreads true information we wouldn't care about the wrong intention)

# Vagner Diniz says: for example, i might try to pay more attention to transparency on the web, transparency of news on the web, and try to find some indicators that identify clearly the sources of the news - all those characteristics that could make the information more transparent without any kind of judgments about the content itself

# Reto Gmür says: but then that's not something for the out of scope list?

# Vagner Diniz says: he was tryign to respond to the question of what our focus is

# [Cheryl Langdon-Orr] I like the approach of a focus on transparency so that should be in scope

# Reto Gmür says: if this is uspposed to be part of a consensus, then we should drop things that don't have consensus within out of scope

# Sandro Hawke says: ok sure, i'm fine with that. i'm still viewing this as we don't need to be making decisions about this yet, there's still a lot of time to chew it through

# [Cheryl Langdon-Orr] is it not a tad too early for a consensus call I prefer at least a second read on things before deleting matters

# David Karger says he thinks it will be helpful to not devote a huge amount of time to the development of algorithms, though we need to keep them in mind when we think about representations so that what we create are representations that algorithms can actually use

# [Cheryl Langdon-Orr] We may find a Nexus exists with algorithms as opposed to focus on standardization or creation of them per se... definition here is important for it to be listed as Out Of Scope

# Reto Gmür says: he's ok with the formulation that David made

# [Jon Udell] I like @sam boyer's formulation: "Ok for algos to be out-of-scope so long as we stipulate that representations will be algo-friendly (per @karger).

# [Ed Summers] apologies, I have to leave a bit early today.

Claim-Based Approaches (Corroboration)

# Sandro Hawke says: Okay a meta point: his sense is that most people in the group are not necessarily interested in all of these approaches

# Jon Udell says: he's interested in all the ways in which we can improve indicators and representations, in general

# Sandro Hawke says: one of the reasons of doing this division is so that we can subdivide into smaller groups, where people can focus on the approaches that they're most interested in

# Sandro Hawke says: the size we hvae right now is reasonable, but if we grow then we'll definitely need to split down into smaller groups

# [Jon Udell] ^ in particular, improve workflows that produce and disseminate indicators and other metadata.

# Sandro Hawke says: for example, he knows there's a significant group of people in the International Fact Checking Network circle who are not on the call, but are interested in working on this. if we bring them in, we'd definitely want to do that in a subgroup

# Sandro Hawke says: let's talk about claim review, second item

# Sandro Hawke says: is Facebook using claimreview?

# John Connuck says: he's not sure

# Sandro Hawke says: ClaimReview is supervised by IFCN, consumed by Google and Bing, but only by explicit member sites of the IFCN (he thinks)

# [smyles] https://twitter.com/factchecknet?lang=en

# [Cheryl Langdon-Orr] I see topic based streams or work tracks are a useful methodology to deploy but we will still (I gather) regularly return work back to the full group for interaction and discussion... Am I correct in this assumption?

# Sandro Hawke says: professional fact-checkers are trying to figure out what to fact-check. Facebook provides a feed; there are a couple efforts to try to provide a feed

# Jon Udell says: Duke reporter's lab is creating a feed, send out a set of claims that would be good to review. Picked by machine, primarily from interviews

# Sandro Hawke says: right now, the IFCN has ~150 factchecking orgs, often in their local market, and then publishign the results back to the platforms and on the open web

# Sandro Hawke says: a related project is crosscheck, French elections; he thinks it didn't use anything more sophisticated than Slack, but some set of factcheckers sharing things that ought to be checked, and sending them out through the same platform. That had credibility because it was reputable newspapers validatin gthe factchecks

# An Xiao Mina says: Meedan helped with crosscheck, provided the mechanism for the news orgs to validate the factchecks

# Sandro Hawke says: it worked pretty well, no?

# [Jon Udell] The Duke project, https://reporterslab.org/tag/claimbuster/, finds claims that are candidates for checking, and emails them to fact-checking orgs who may choose to investigate them.

# An Xiao Mina says: it worked pretty well, in particular because these were competing organizations, but they came together in agreement on the checks, which had a significant impact on the credibility of the checks

# [Vagner Diniz] Reputable newpapers also publishes fakenews.

# Sandro Hawke says: Claimreview has no way of linking related claims, or representing counterclaims

# [Cheryl Langdon-Orr] the ecosystem around claims (claim review and rating etc.,) could still be "gamed"could it not??

# Jon Udell says: origin of the claim is a compound address - URI plus a location within a doc

# [Cheryl Langdon-Orr] It would just require a degree of collaboration (think organised propaganda at one end of the spectrum

# Sandro Hawke says: the way ClaimReview is formulated right now, it's the plain text of the claim

# Jon Udell says: yes, and it can, though usually does not include the original address for the thing

# [smyles] not all claims are on web pages... (tv, radio etc.) and what about memes? :thinking_face:

# ??? says: even if there wa sa statmeent made by a politician that doesn't have a URI, these could eventually be unified later on within the system

# [smyles] but even for claims on webpages, webpages can be updated

# Jon Udell says: his question is really whether it is a goal of the group to establish a canonical web address for a claim?

# Jon Udell says: in a lot of cases, there actually is an origin

# michel weksler says: are we only talking about text? maybe we're talking about video?

# [Cheryl Langdon-Orr] Human factor (from an end user/consumer perspective) is that trust is often "permanently" associated with a Trusted Source and once established is not (or can be very difficult to) get them to review or reassess the continued validity of the "trust" so primary source is very important

# Sandro Hawke says: identity, appearance, and review of the claim are all separate

# sam boyer says: : we can separate claim, appearances, and reviews of claims

# [Sandro Hawke] :-)

# [sam boyer] ;)

# Sandro Hawke says: looking a little more at the corroboration section...

# Sandro Hawke says: if something isn't properly contextualized, it can become false over time

# Jon Udell says: was meeting with factcheck.org people, and they had an instance of a claim that was retracted, but echoes of it still existed across the web

# Jon Udell says: remember that wayback machine exists - we still have an anchor for it, even if it's now a 404

# Reto Gmür says: these context issues could maybe be an item on the list - when claims are evaluated, they get incorporated into the (missed it) - so that their context can be fully appreciated later

# Sandro Hawke says: but how do we do that?

# John Connuck says: seems like we're getting into the weeds, not sure how this improves ClaimReview

# Sandro Hawke says: right - my goal was to end with charter text that says what we're going to do about this, so he wanted to spend some minutes just deciding what sort of issues we might work on

# Sandro Hawke says: meta: was expecting to do the same type of thing on reputation and inspection, but no time for it today

# Sandro Hawke says: has this been useful in terms of understanding what the charter is?

# Sandro Hawke says: should we be splitting into subgroups?

# [Owen Ambur] Comment not made by me but two of the use cases for the StratML standard include political party platforms and candidate "issue statements" or, more specifically, what they plan to do (i.e., their goals and objectives). http://stratml.us/carmel/iso/UC4SwStyle.xml

# [Cheryl Langdon-Orr] To early to decide on splits at this stage

# David Karger says: he thinks this call did a good job of clarifying what the group will not be doing, but fuzzy on what it will be doing - well, on what work will be done, but not on what will be delivered

Survey

# [Sandro Hawke] If you had to pick one of these 3 to focus on right now, which would it be? Like, you're at a conference, and there are three parallel breakout tracks

# [Scott Yates] I could pick one: Inspection

# [John Connuck] Inspection

# Sandro Hawke says: please type which of the three groups you would pick, if you had to pick just one

# [Reto Gmür] I couldn't decide between 2 and 3

# [Cheryl Langdon-Orr] I am usually in 1 room and remote participating in 2 others ;-)

# [Owen Ambur] Re: "David Karger says he thinks this call did a good job of clarifying what the group will not be doing, but fuzzy on what it will be doing - well, on what work will be done, but not on what will be delivered" -- As the plan comes together, I will render it in StratML format for inclusion in our collection at http://stratml.us/drybridge/index.htm#W3C

# [Jon Udell] Corroboration

# [Vagner Diniz] reputation or claim-based

# [sam boyer] corroboration, on the assumption that it would need to incorporate public social processes for that

# [Cheryl Langdon-Orr] But I suspect I would be sitting in the Reputation room

# [smyles] reputation, although claims and features would be close 2nd and 3rd

# [Cheryl Langdon-Orr] Bye all...

# [Vagner Diniz] thankyou. Bye

# [Cheryl Langdon-Orr] 0430 here so Morning !