//credweb.org/minutes/20180523

Minutes: Credible Web CG (23 May 2018)

Agenda
Attendees: Sandro Hawke, Newton Calegari, Giovanni Luca Ciampaglia, Scott Yates, Amy Zhang, sam boyer, Jon Udell, An Xiao Mina, Annette Greiner, Cheryl Langdon-Orr, Tzviya Siegman, Farnaz Jahanbakhsh, Julian Moreno Schneider, Ed Summers, Amy Zhang, Amy Zhang, Vagner Diniz, Dan Brickley, Ed Bice
Scribe(s): Amy Zhang and An Xiao Mina

Topics

Resolutions

Resolved: approve minutes of 9 May https://credweb.org/minutes/20180509.html

Minutes

# [Sandro Hawke] Agenda https://credweb.org/agenda/20180523.html

# [Sandro Hawke] (looking for good emoji to ack regrets, not seeing it....)

# [sam boyer] (but i will need to be out for a few minutes)

# [Jon Udell] https://credweb.org/agenda/20180523.html

# [Sandro Hawke] Proposed: approve minutes of 9 May https://credweb.org/minutes/20180509.html

# [Sandro Hawke] +1

# [Newton Calegari] +1

# [Annette Greiner] :)

# [Sandro Hawke] Resolved: approve minutes of 9 May https://credweb.org/minutes/20180509.html

# [Cheryl Langdon-Orr] +1

# [Ed Bice] +1

# [Amy Zhang] +1

# [Cheryl Langdon-Orr] Umm I am logged in I believe

# [Cheryl Langdon-Orr] But ...

# Sandro Hawke says: are there are any events. Looks nothing coming up

# [Scott Yates] If you have events, please do send them to scott@certifiedcontentcoalition.org

# Sandro Hawke says: please fill out the poll for in person meet-up

# [Reto Gmür] link to poll?

# Sandro Hawke says: he created a google calendar which is linked from credweb.org

# [Farnaz Jahanbakhsh] https://doodle.com/poll/qudimieecq8kapdn

events report

# Sandro Hawke says: he was at the W3C meeting. presented slides he mentioned two weeks ago. and did demos

# [Cheryl Langdon-Orr] I will only be remote participant if available regardless of date choice made

# .. confidential meeting but comments were generally positive

# .. added transparency as a 4th category for the draft charter

# An Xiao Mina says: she was at RightsCon with @Amy Zhang and @Ed Bice last week. there was a misinformation track

# .. many folks speaking about digital rights and misinfo space. 2 panels relevant to our group. with data & society, firstdraft

# .. will be sending out invites and info about the panel.

# .. 2nd panel with indian supreme court, foreign ministry, state dept. people interested from the policy side. what are our asks there?

# .. there could be a world in which we structure something for upcoming misinfocon

# Ed Bice says: the interest from that community is indicative of the issue as an anti-censorship issue

# ..verification is a form of anti-censorship. this is a profound change that should guide our work. Zeynep Tufekci has asserted this on occasion.

# Sandro Hawke says: he sent an email about the journalism trust initiative. @Scott Yates is there now.

# [Ed Summers] I'm not sure if this is relevant, but it came out from some folks at RightsCon: The Toronto Declaration: Protecting the rights to equality and non-discrimination in machine learning systems. https://www.accessnow.org/cms/assets/uploads/2018/05/Toronto-Declaration-D0V2.pdf

# Scott Yates says: this is his first standards body meeting. (RSF) reporters without borders have been working on standards.

# .. reporters saying there should be standards for publishers

# .. 60 people in the room. 14-18 months before we have standards. will have 3 at the end of the process.

# .. topics like identity verification, process controls, the Trust Project came up

# .. question of making new European standards vs using Trust Project standards?

# .. overall consensus was to move forward with new standards because of differences in process and details of standards

# Sandro Hawke says: what about machine readability

# Scott Yates says: these will be completely voluntary standards and certification from another body (not part of this effort). there won't be a machine readable signal.

# Ed Bice says: that we have talking points related to the Trust Project. Our work is looking broadly at credibility indicators. We are particularly interested in 3rd party indicators (human or machine), while Trust Project is about 1st person, self-identified indicators.

# [Dan Brickley] aside: having gone through the process of adapting many of the Trust Project results into machine readable (schema.org) markup, that kind of conversion is not a trivial thing to do. Not everything makes sense as markup (or ontology).

# .. response to Trust Project should be - bring these to the processes. otherwise, could be bad for the community. see how we are contributing.

# Sandro Hawke says: everyone knows that google and FB are the elephants in the room here. there's not going to be a lot of space for competing standards. folks will be pushed to be aligned.

# Scott Yates says: there wasn't a rejection of the Trust Project. Just that European standards have a very robust process that they have to go through. The Trust Project will be part of this process going forward.

# .. we should try to maintain communication with different groups.

Area 2 Corroboration

# Sandro Hawke says: this area is a very successful one so far.

# .. are there things that people are up for collaborating on this? where can we help?

# .. question is - what is fact-checking as a profession?

# .. asks Meedan folks (Or Mevan?)

# Mevan says: the process in the UK with FullFact might seem like nitpicking in a country with mass censorship. there is a moving definition.

# .. in short term: strip judgment and allow people to make up their own mind. long term: correct record when they are wrong, like in parliament

# .. long term: take outputs of factchecking and try to understand what is happening at scale. how info is transmitted and received and how to correct this in the long term. looking release of data.

# .. involves calling a person, finding best available data, talking to people who generated the data to begin with.

# Sandro Hawke says: how much work goes into 1 fact check?

# Mevan says: it can take hours to weeks. some of the best takes months, which becomes investigative journalism.

# ..on average takes about 12 hours. different orgs have different time scales. we have a double review process. checking is often the bottleneck.

# Sandro Hawke says: what is IFCN?

# [An Xiao Mina] IFCN Code of Principles: https://www.poynter.org/international-fact-checking-network-fact-checkers-code-principles

# Mevan says: it is run out of Poynter. international fact-checking network. 10 principles that was trying to standardize what a good fact-checker looks like. things like transparency of sources, transparency of funding. you must apply and be reviewed yearly to meet standards.

# Sandro Hawke says: how does IFCN manage political bias?

# [An Xiao Mina] *Re: "" -- quote .. asks Meedan folks (Or Mevan?) *

# Mevan says: this is why the code was developed. anyone can join but everybody has to go through that approval process. and their website and content is looked through by someone in that country who understands the political bias.

# Sandro Hawke says: factchecking is a new thing separate from reporting

# Mevan says: they've been going since 2010. factchecking has been systematically stripped out of some news orgs. because of difficulty of raising money for journalism in general.

# .. they've been around for a while.

# Sandro Hawke says: but isn't there 2 meanings to the term? Traditional journalists are supposed to check their facts and sometimes there are separate people that do that. but it's for the article.

# Mevan says: you could argue that it is what journalism is to begin with. FullFact exists because there were fewer factcheckers in the UK and there was an increase of Parliament press releases with incorrect information leading to changes in the law.

# Sandro Hawke says: what can we do to help? this is valuable to society. input side: how to help factcheckers determine what to check. also tools to help factcheckers do the work. and then how to spread the word once they're done.

# ..are there aspects of this that can be scaled outside factcheckers?

# Mevan says: they wrote an essay about crowdsourced factchecking.

# [Ed Bice] link please @Mevan

# [Scott Yates] I'd love to read that essay! post the link here.

# ..the activity of factchecking - I don't think it can be crowdsourced. Attempts in the past have failed.

# .. even the ones that have gotten off the ground are quoting factcheckers instead of doing primary research.

# .. this is because it is boring work. no one wants to do it.

# Sandro Hawke says: what about aggregating news sources and finding what is checkable?

# Mevan says: we read top news and rely on social media as well. have editorial meeting and find a balanced set of things to go on to factcheck. it's not the best system and we're not that happy with it.

# .. monitoring is a big part of that. we scrape newspapers, we collect TV headlines.

# .. we also look in Parliament and what claims come up that. this huge corpus of stuff being said and we just built a model that detects claim. we are coming up with a taxonomy of different types of claims.

# .. trying to cluster different type of claims. what are the biggest claims being made?

# [Scott Hale] *Re: "" -- quote .. monitoring is a big part of that. we scrape newspapers, we collect TV headlines. *

# .. as of yesterday we submitted a paper to publish the taxonomy

# Sandro Hawke says: are there others that are doing the same thing?

# Mevan says: the main reason why we're doing this is because ClaimBuster is one of the first that did claim detection. they work with Politifact.

# [Ed Bice] would be good to hear from Scott Hale about his work - early work, but closely aligned.

# .. they ask people if this an important claim. this is one way in which the model works.

# .. importance is subjective though. in the UK, the EU wouldn't have been important 5 years ago. but now it is. importance can change rapidly depending on context.

# .. so our taxonomy tries to decouple importance. so we can be more consistent across lots of countries or organizations.

# Sandro Hawke says: everyone who is identifying checkable claims should also say how important they think they are.

# Mevan says: the output of claim detection could be standardized. but it's far too early. only 3 approaches I know of in the world.

# ..trying different ways to detect claims. no one best way of doing it yet.

# [Ed Summers] aside: It would be interesting to hear a bit more what that social media interaction with FullFact looks like.

# Scott Hale says: how much work goes into ingesting news sources. no comprehensive set of news sources. common corpora of data?

# Mevan says: great segue into technical standards of factchecking discussed at Tech and Check. everyone is scraping and using CrowdTangle and Twitter APIs. trying to find a lightweight standard for the outputs of monitoring so it could be shared across factchecking orgs.

# .. we're keeping it practical. will be useful for factchecking community to share data and resources.

# .. early days. started 4 weeks ago.

# Sandro Hawke says: ways of marking up data to get metadata separate - don't know what kinds of aggregators are out there.

# Dan Brickley says: schema.org says we don't have that much news stuff until recently. for 3 years it wasn't getting used for anything. I've seen annotation around subtitles.

# .. class of tools that takes input from anywhere and looks for mentions of entities and topics.

# Mevan says: we are experimenting with that right now.

# Dan Brickley says: how deeply in the science universe are you wading? it shows up in TV news when it comes to science topics. do you try to model scholarly debate?

# Mevan says: I think there are outlets that are welcomed into factchecking community. try to correct when sci papers are incorrectly discussed in media. Science Center in the UK.

# Dan Brickley says: when we did ClaimReview a few years ago - what's the smallest thing we can do in this direction?

# ..Even just figuring out claims in multiple contexts is slippery as hell. dialogue with arxiv.org. field of modeling scientific dialogue. we don't want to get drawn into that without being clear what we're achieving.

# Mevan says: there are heated debates about what is a claim in our office. What I've realized is claim is different in different domains. In FullFact world, there's a particular idea of claim and what can or can't be checked. different from ppl in our communications team or our tech team or our director.

# .. not going to be 1 definition of what a claim is. going to have to be more sophisticated. why I wanted to go down taxonomy side. there are parameters about claims that change over time. importance, geography, data released in the world.

# ..ask checkability of claim is shifting, claim is shifting

# .. it gets far more complex when you think about is this an exact match? what about paraphrases?

# sam boyer says: this is helpful info

# Sandro Hawke says: can you give illustration about difference in interpretation of claims?

# [Scott Yates] +1 for @Mevan 's use of "aneurysm"

# Mevan says: difficulty is whether things are same claim.

# .. even the same claim coming from different orgs may have different meaning because of the context.

# .. lots of ideas that haven't been crystalized here.

# sam boyer says: is this the difference between concept and utterance of that concept?

# Mevan says: when does a claim start being a different claim?

# sam boyer says: with different types of claims, to what extent can you say type of claim is an outgrowth of a methodology, e.g., IFCN Code of Principles? Can we relate to method to claim?

# sam boyer says: Is the type of the claim an out growth of the method used to produce the claim? Method vs. claim

# Mevan says: not always; "I have 20 years of experience as a social worker" can be checked, but probably not worth checking unless of some unusual national importance

# [Owen Ambur] The Trust Project's plan is now documented in StratML format at http://stratml.us/drybridge/index.htm#TTP

# Sandro Hawke says: claims vs utterances differences could complicate relationships @**Dan Brickley

# ...Markup "claims" will allow for including both the utterance and the explicated claim that is actually being checked.

# Mevan says: what a factchecker believes are the same claim and what the public believe are the same claim may differ. Next project is to investigate what (average) people and fact checkers believe to be same claim. It is complicated. We need to be careful of overreaching --- if we say this is a match and should be censored/flagged, we need to be careful of free speech consequences

# Reto Gmür says: reviewing a variant of a claim (translation, different words to say same thing) should be straightforward

# Mevan says: Fullfact quality bar is very high and hence still requires human verification, etc.

# Dan Brickley says: difference on use and medium --- TV subtitles vs. web content, etc. --- is important

# Mevan says: yes. Currently not focused on automated tools for public because this is different to fact checker workflows

# sam boyer says: Can decouple claim and fact check? Yes (I missed some of this--sorry)

# [Jon Udell] @Mevan says a claim is an abstraction of a citing.

# Ed Bice says: the same utterance at different time points is same if facts have not changed, but is a different claim if the data has changed

# Sandro Hawke says: location vs time dependent aspects are very much depending on language. Is there better terminology?

# Dan Brickley says: "indexicals" used by philosophers.

# Mevan says: all depends on application. There are some similarities with fact checking, but this group may have different needs/results

# Reto Gmür says: Is it the same claim if the same utterance is made by a different person?

# Mevan says: it would be the same claim, but a different sighting. There would be direct relationship between the two and they would share the same fact check

# Jon Udell says: (asks) Is there a unique id for the abstract for the claim?

# Mevan says: Yes; internal only at the moment. Lots of duplication because don't have the time to go through them

# ....do we need expiry dates for claims? Who would do that work? All organizations are stretched in terms of resources. Fullfact has expiry dates in db/schema but doesn't have resources to populate them at present (but is super important)

# [sam boyer] i can picture a "Department of Claim Relations," and millenia of philosophers squirming with delight from their graves

# Dan Brickley says: (asks) How long does it take to bring a new person up to speed on fact checking (i.e., new fact checker)?

# Mevan says: there is no good answer. Mostly time and resources. 3-6 month training period at FullFact; 2.5-3 years to become a senior fact checker (has the ability to sign off on a fact check); depends a lot on person

# [Cheryl Langdon-Orr] I thought that @Ed Bice was making the point that with the example of 2 identical textual claims being made from a source say 24 hrs apart, but after the 1st and before the 2nd instance a third party highly respected ( or otherwise) repor.t of the data say a percentage of unemployment report, that the 2nd utterance would have possibly a higher level the of credibility or trust than the 1st as a verified current fact over over the 1st... If so then I certainly see that as an important variable

# [An Xiao Mina] *Re: "" -- quote i can picture a "Department of Claim Relations," and millenia of philosophers squirming with delight from their graves *

# Dan Brickley says: (asks) Could fact checking be a professional qualification in the future? @**Mevan

# Sandro Hawke says: he was hoping to get more towards what can be built; how to get more interoperability; by clearly lots of bigger questions that need to be tackled.

# Jon Udell says: (asks) Is part of the charter identifying what is a claim? I.e., Fullfact has a claim id that has multiple utterances

# [Sandro Hawke] charter issue: surface the URLs / claim identifgiers

# Mevan says: releasing claims data is difficult given some claims have expired, etc.; they want to ensure quality

# [Scott Hale] ?

# Sandro Hawke says: discussing an identifier could be problematic if underlying text changes

# [Dan Brickley] anyone in touch with https://debategraph.org/Stream.aspx?nid=61932&vt=ngraph&dc=focus lately?

# Jon Udell says: (asks if) Releasing data on sightings would be less problematic than releasing claims? Less chance of staleness

# Mevan says: Need to be neutral and fair; need equal recall (currently high precision)

# ....want to get to sharing everything, but need resources to do it properly

# ....want to do good

# Reto Gmür says: (asks) if the right to be forgotten in EU is a barrier?

# [Dan Brickley] (when we say 'sightings', they can also be multi-level, e.g. some public figure claims the moon is green cheese, this ends up on the TV news and newspapers, which themselves show up on different websites, media formats, etc.)

# [Cheryl Langdon-Orr] I appreciate your time in helping us better understand this approach @Mevan ... Thanks

# [Scott Hale] *Re: "" -- quote @Reto Gmür says (asks) if the right to be forgotten in EU is a barrier? *

# [Annette Greiner] "sighting" or "citing"?

# [Scott Hale] *Re: "" -- quote "sighting" or "citing"? *

# Ed Bice says: there is no end to depth of theory/passion on how claims might be standardized but need to practical and have direct value

# Mevan says: extracting metadata (geography, time, etc.) is useful for automated approaches.

# Dan Brickley says: future to discuss role of Wikipedia

# [Cheryl Langdon-Orr] Bye

# [Vagner Diniz] bye

# [Scott Hale] Thanks all. FYI I've edited my posts above to add "says" after @ mentions as I am told this is needed for the script to extract the data correctly. I've also changed "citings" to "sightings". This is why the comments all say "edited" now. Apologies for the confusion / error in not including "says" in the first instance.

# [Sandro Hawke] Excellent, thank you so much

# [An Xiao Mina] thanks so much for scribing scott :fountain_pen: