
ISWS 2025 - Human-centric AI Evaluation
Presentation
•
Computers
•
University
•
Practice Problem
•
Hard
Irene Celino
Used 6+ times
FREE Resource
18 Slides • 11 Questions
1
Human-centric AI Evaluation
Irene Celino - Cefriel - irene.celino@cefriel.com
International Semantic Web Research Summer School - Bertinoro (Italy) - June 10th 2025
2
Multiple Choice
Ice-breaker quiz: Which of the following Knowledge Graphs is well-known and widely used?
DBpasta
GeoMemes
Wikidata
OpenPHACToids
3
Why Knowledge Graphs still matter in the age of AI
“Scientific-technical” reasons
(Other) AI for KG: use of AI to automate or augment KG construction
KG for (other) AI: use of KG to train AI and to ground AI answers/predictions
KG and (other) AI are complementary – KG ground AI, AI helps scale KG
“Business” reasons
KG (and other AI) have a role in any knowledge-intensive tasks
KG (and other AI) can support knowledge workers
KW
4
Multiple Choice
Quiz: Who is a Knowledge Worker?
A person selling knowledge
A person changed by the information they process
A person applying knowledge to manual work
A job title for a know-it-all
5
Definitions of knowledge worker
The knowledge worker puts to work what he has learned in systematic education, that is, concepts, ideas and theories, rather than the man who puts to work manual skill or muscle
Peter Drucker, Management: Tasks, Responsibilities and Practices, 1974
The defining characteristic of knowledge workers is that they are themselves changed by the information they process. (To some extent, this is true of any human being, What distinguishes knowledge workers is that this is their primary motivation and the job they are paid to do)
Allison Kidd, The Marks are on the Knowledge Worker, CHI 1994
Knowledge workers are individuals whose primary job involves working with information, developing knowledge, and making decisions that drive productivity and innovation. […] Knowledge workers are the most valuable assets of the modern economy, contributing to the growth and competitiveness of organizations across various industries
Peter Drucker, Management challenges for the 21st century, 1999
MAXP
6
Multiple Choice
Quiz: A recent work by the Max-Plank Institute used KGs and LLMs to generate…
A sentient AI that now wants tenure
Paper extraction pipelines
Better grant proposals
Novel research ideas
7
Are the generated research ideas actually interesting and relevant?
Humans (110 research group leaders) expressed interest level on generated ideas
Manual annotations were used for predicting relevant generated ideas
Results: (1) some AI-generated ideas were genuinely compelling, (2) human feedback is crucial for aligning AI outputs with human expectations
X. Gu and M. Krenn: Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders, 2024
8
Why Evaluation Matters: From Performance to Human-Centeredness
Traditional metrics like accuracy and F1 score are not enough. We need to evaluate how AI affects humans: trust, understanding, and decision-making
Where to find suitable metrics to evaluate AI adopting a human-centered perspective?
XAI literature: e.g. human decision accuracy, fairness, trust, understanding
Social sciences!!
Subjective evaluation: e.g. perceived quality of results, perceived usefulness, etc.
J. Ma et al. “OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning, 2024
T. Miller, P. Howe, L. Sonenberg "Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences", 2017
T. Miller "Explanation in artificial intelligence: Insights from the social sciences“, 2018
PRKS
9
Multiple Choice
Quiz: Assessing LLM extraction from text for procedural KG creation,
some human evaluators said that…
The LLM results were flawless
The steps were “creatively” ordered
Procedures were sound, but emotionally confusing
They would have done better than the LLM
10
Expected (OR unexpected) results from human evaluation of AI
“Fitness for use” (perceived usefulness) is often more important than perceived quality of AI output
V. Carriero, I. Baroni, M. Scrocca, A. Azzini and I. Celino: Human Evaluation of Procedural Knowledge Graph Extraction from Text with Large Language Models, EKAW, 2024
I. Baroni, G. Re Calegari, D. Scandolari, I. Celino: AI-TAM: a model to investigate user acceptance and collaborative intention in human-in-the-loop AI applications", HCJ, 2022
C. Longoni, A. Bonezzi, C. Morewedge: Resistance To Medical Artificial Intelligence, JCR, 2019
HHJM
Pretty high perceived quality of AI output, but still “prejudice” that humans could do better
“Human touch” may still be preferred even when potentially risky
11
Multiple Choice
Quiz: Who do you blame more for a mistake, humans or machines?
It depends on the context
Humans, imperfect by design
Machines, always with bugs
No one, mistakes build character (and datasets)
12
A/B testing on how humans judge machines
Context-dependent preference for “humans” or for “machines”
to execute the same task in the same scenario
Who was perceived to be more at fault for injuring the pedestrian?
The human driver
The driverless car
C. Hidalgo et al.: How Humans Judge Machines, 2021
Who was perceived to be more at fault for misusing the national flag?
The human cleaner
The cleaning robot
13
When Do We Believe the “Machine”? e.g. ChatGPT
Trust (which is a multidimensional construct) in ChatGPT was influenced by pragmatic factors (usefulness, speed) and hedonic factors (entertainment, novelty)
M. Huschens et al.: Do You Trust ChatGPT? - Perceived Credibility of Human and AI-Generated Content, 2023
J. Buchanan, W. Hickman: Do people trust humans more than ChatGPT?, JBE, 2024
Y. Jung et al.: Do We Trust ChatGPT as much as Google Search and Wikipedia?, CHI 2024
EXPL
Perceived credibility of ChatGPT answers did NOT correlate with actual correctness
Users may trust ChatGPT more for factual or technical tasks, but less for emotional or ethical judgments
14
Multiple Choice
Quiz: In which cases does human reliance increase when
explanations are added to LLM answers?
Only on correct LLM answers
Only on incorrect LLM answers
On both correct and incorrect LLM answers
Never, explanations just confuse people more
15
AI (LLM) design choices to shape users’ trust
Inconsistencies in explanations also reduce reliance, suggesting that users are sensitive to logical coherence
S. Kim et al.: Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies, CHI 2025
another relevant study with similar results:
M. Sadeghi et al. Explaining the Unexplainable: The Impact of Misleading Explanations on Trust in Unreliable Predictions for Hardly Assessable Tasks, UMAP 2024
Explanations increase reliance - even on incorrect answers! This shows that explanations can be persuasive, regardless of accuracy
Sources reduce overreliance on incorrect answers, helping users calibrate their trust
16
What’s in an explanation, after all?
Knowledge Graphs (and Semantic Web technologies at large) can and should have a big role in explanations!
B. Mittelstadt, C. Russell, S. Wachter: Explaining explanations in AI, 2019
S. Chari et al.: Explanation ontology: A model of explanations for user-centered AI, ISWC 2020
F. Lecue: On the role of knowledge graphs in explainable AI, SWJ, 2019
I. Celino: Who is this Explanation for? Human Intelligence and Knowledge Graphs for eXplainable AI, 2020
CONF
Explanation as “what the human wants to know” opposed to “scientific explanation of the AI model internal processing”
Human explanation are usually: contrastive (why P and not Q?), selective (not all possible causes, only "relevant" ones), social (dialogue, interaction, iteration)
17
Multiple Choice
Quiz: In human-AI collaboration, what is the relationship between
human self-confidence and AI confidence?
Human confidence is higher than AI confidence
Human confidence aligns with AI confidence
Human confidence is lower than AI confidence
Human and AI confidences are not correlated
18
Calibrating confidence, avoiding both over-trust and under-trust
Users’ self-confidence tends to align with the AI’s expressed confidence, leading to miscalibrated self-confidence, especially if the AI is overconfident or underconfident --> AI confidence should be carefully calibrated, especially in high-stakes or collaborative contexts
J. Li et al.: As Confidence Aligns: Exploring the Effect of AI Confidence on Human Self-confidence in Human-AI Decision Making, CHI 2025
Designers should consider how AI behavior shapes human behavior, not just task outcomes --> greater attention to the cognitive alignment between humans and AI - not just functional alignment
19
Theory of Mind and Social Intelligence
Theory of Mind is the ability to attribute mental states - beliefs, intents, desires, emotions, knowledge - to oneself and others and to understand that others have beliefs, desires, and intentions that are different from one's own
D. Premack, G. Woodruff: Does the chimpanzee have a theory of mind?, Behavioral and Brain Sciences, 1978
TOM
Social intelligence is the ability to reason about others’ beliefs, intentions, and actions
Social intelligence is a critical dimension of human intelligence
Does AI have social intelligence?
20
Multiple Choice
Quiz: Does AI demonstrate a Theory of Mind (ToM) comparable to humans?
No, AI lacks any social reasoning
Yes, but only at a basic level
Yes, AI shows high-order ToM
Only when prompted with emotional emojis
21
AI Social Intelligence and “Reasoning”
Evaluation tasks:
Inverse Reasoning (IR): Inferring the beliefs or goals of others based on their actions
Inverse Inverse Planning (IIP): A more complex task involving recursive reasoning about others’ reasoning
Results:
Humans consistently outperformed GPT models across all tasks.
GPT models showed only basic (order-0) social reasoning, while humans demonstrated higher-order (≥2) reasoning
LLMs often relied on pattern recognition shortcuts rather than genuine social inference
J. Wang et al.: Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities, 2024
another interesting work:
G. Riva et al.: Psychomatics - A Multidisciplinary Framework for Understanding Artificial Minds, 2024
CRIT
22
Multiple Choice
Quiz: When do Knowledge Workers increase their Critical Thinking?
When the AI makes obvious mistakes
When they trust the AI
When they trust their own judgment
After their third coffee and a motivational quote
23
AI and Critical Thinking
GenAI shifts critical thinking toward information verification and response integration
H. Lee et al.: The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers, CHI 2025
Higher confidence in GenAI is associated with less critical thinking
Higher self-confidence is associated with more critical thinking
24
Multiple Choice
Quiz: In a recent questionnaire, after testing AI tools in their daily job,
industry workers declared that...
They have a high trust in AI
They are not confident they will learn to use AI
They find the cognitive load to use AI very low
They are scared their job will be replaced by AI
25
AI, human factors and ethical principles
Ethical principles, guidelines and regulations (e.g. AI Act, HLEG Ethics Guidelines for Trustworthy AI) call for a careful assessment of human factors in relation to the adoption of AI solutions!
A. Azzini, I. Baroni, I. Celino: Assessing human factors in AI adoption by employees: a composite questionnaire for subjective user evaluation, TCAI@HHAI 2025
Fear of job replacement by AI, low trust in AI, and increased cognitive load are real issues to take into account when designing AI tools
The large popularity and low entry barrier of tools like ChatGPT make industry employees have the perception that AI is easy to learn
26
Human-AI Collaboration!!!
H. Li at al. Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling, TVCG 2025
27
Take-home messages
Always ask yourself who will use your AI system
(and design it with the target users in mind)Always perform human evaluation of AI systems!
(even only at qualitative, small-scale level)Whenever possible, design human-in-the-loop AI systems
(the future is in human-AI collaboration, getting the best of both)
Bonus point: Always challenge the way you are using AI! (video: How Stanford Teaches AI-Powered Creativity https://www.youtube.com/watch?v=wv779vmyPVY)
28
Multiple Choice
Final quiz: In preparing this tutorial, Irene did NOT rely on AI for…
Coming up with wrong quiz answers
Picking the images
Polishing her phrasing
Crafting the tutorial’s storyline
29
Irene Celino - irene.celino@cefriel.com
Cefriel - viale Sarca 226, 20126 Milano - Italy
(images from Unsplash)
Thank you for your participation!
Human-centric AI Evaluation
Irene Celino - Cefriel - irene.celino@cefriel.com
International Semantic Web Research Summer School - Bertinoro (Italy) - June 10th 2025
Show answer
Auto Play
Slide 1 / 29
SLIDE
Similar Resources on Wayground
22 questions
CMCO ICT LESSON 1 (SEM 1) 28/7/2021
Presentation
•
University
20 questions
Future Tenses
Presentation
•
University
20 questions
Mixed Numbers
Presentation
•
4th - 5th Grade
20 questions
Sentence Problems - Fragments
Presentation
•
University
21 questions
Chromebook Tips Lesson
Presentation
•
12th Grade
22 questions
RELATIVE CLAUSES
Presentation
•
University
21 questions
Present Simple and Present Continuous
Presentation
•
University
24 questions
Best Practices in Negotiation
Presentation
•
University
Popular Resources on Wayground
20 questions
"What is the question asking??" Grades 3-5
Quiz
•
1st - 5th Grade
20 questions
“What is the question asking??” Grades 6-8
Quiz
•
6th - 8th Grade
10 questions
Fire Safety Quiz
Quiz
•
12th Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
34 questions
STAAR Review 6th - 8th grade Reading Part 1
Quiz
•
6th - 8th Grade
20 questions
“What is the question asking??” English I-II
Quiz
•
9th - 12th Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
47 questions
8th Grade Reading STAAR Ultimate Review!
Quiz
•
8th Grade
Discover more resources for Computers
15 questions
LGBTQ Trivia
Quiz
•
University
36 questions
8th Grade US History STAAR Review
Quiz
•
KG - University
25 questions
5th Grade Science STAAR Review
Quiz
•
KG - University
16 questions
Parallel, Perpendicular, and Intersecting Lines
Quiz
•
KG - Professional Dev...
20 questions
5_Review_TEACHER
Quiz
•
University
10 questions
Applications of Quadratic Functions
Quiz
•
10th Grade - University
10 questions
Add & Subtract Mixed Numbers with Like Denominators
Quiz
•
KG - University
20 questions
Block Buster Movies
Quiz
•
10th Grade - Professi...