ISWS 2025 - Human-centric AI Evaluation

Presentation

•

Computers

•

University

•

Practice Problem

•

Hard

Irene Celino

Used 6+ times

FREE Resource

18 Slides • 11 Questions

Human-centric AI Evaluation

Irene Celino - Cefriel - irene.celino@cefriel.com

International Semantic Web Research Summer School - Bertinoro (Italy) - June 10th 2025

Multiple Choice

Ice-breaker quiz: Which of the following Knowledge Graphs is well-known and widely used?

DBpasta

GeoMemes

Wikidata

OpenPHACToids

Why Knowledge Graphs still matter in the age of AI

“Scientific-technical” reasons

(Other) AI for KG: use of AI to automate or augment KG construction
KG for (other) AI: use of KG to train AI and to ground AI answers/predictions
KG and (other) AI are complementary – KG ground AI, AI helps scale KG

“Business” reasons

KG (and other AI) have a role in any knowledge-intensive tasks
KG (and other AI) can support knowledge workers

Multiple Choice

Quiz: Who is a Knowledge Worker?

A person selling knowledge

A person changed by the information they process

A person applying knowledge to manual work

A job title for a know-it-all

Definitions of knowledge worker

The knowledge worker puts to work what he has learned in systematic education, that is, concepts, ideas and theories, rather than the man who puts to work manual skill or muscle

Peter Drucker, Management: Tasks, Responsibilities and Practices, 1974

The defining characteristic of knowledge workers is that they are themselves changed by the information they process. (To some extent, this is true of any human being, What distinguishes knowledge workers is that this is their primary motivation and the job they are paid to do)

Allison Kidd, The Marks are on the Knowledge Worker, CHI 1994

Knowledge workers are individuals whose primary job involves working with information, developing knowledge, and making decisions that drive productivity and innovation. […] Knowledge workers are the most valuable assets of the modern economy, contributing to the growth and competitiveness of organizations across various industries

Peter Drucker, Management challenges for the 21st century, 1999

MAXP

Multiple Choice

Quiz: A recent work by the Max-Plank Institute used KGs and LLMs to generate…

A sentient AI that now wants tenure

Paper extraction pipelines

Better grant proposals

Novel research ideas

Are the generated research ideas actually interesting and relevant?

Humans (110 research group leaders) expressed interest level on generated ideas
Manual annotations were used for predicting relevant generated ideas
Results: (1) some AI-generated ideas were genuinely compelling, (2) human feedback is crucial for aligning AI outputs with human expectations

X. Gu and M. Krenn: Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders, 2024

Why Evaluation Matters: From Performance to Human-Centeredness

Traditional metrics like accuracy and F1 score are not enough. We need to evaluate how AI affects humans: trust, understanding, and decision-making
Where to find suitable metrics to evaluate AI adopting a human-centered perspective?
- XAI literature: e.g. human decision accuracy, fairness, trust, understanding
- Social sciences!!
  - Subjective evaluation: e.g. perceived quality of results, perceived usefulness, etc.

J. Ma et al. “OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning, 2024

T. Miller, P. Howe, L. Sonenberg "Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences", 2017

T. Miller "Explanation in artificial intelligence: Insights from the social sciences“, 2018

PRKS

Multiple Choice

Quiz: Assessing LLM extraction from text for procedural KG creation,
some human evaluators said that…

The LLM results were flawless

The steps were “creatively” ordered

Procedures were sound, but emotionally confusing

They would have done better than the LLM

Expected (OR unexpected) results from human evaluation of AI

“Fitness for use” (perceived usefulness) is often more important than perceived quality of AI output

V. Carriero, I. Baroni, M. Scrocca, A. Azzini and I. Celino: Human Evaluation of Procedural Knowledge Graph Extraction from Text with Large Language Models, EKAW, 2024

I. Baroni, G. Re Calegari, D. Scandolari, I. Celino: AI-TAM: a model to investigate user acceptance and collaborative intention in human-in-the-loop AI applications", HCJ, 2022

C. Longoni, A. Bonezzi, C. Morewedge: Resistance To Medical Artificial Intelligence, JCR, 2019

HHJM

Pretty high perceived quality of AI output, but still “prejudice” that humans could do better

“Human touch” may still be preferred even when potentially risky

Multiple Choice

Quiz: Who do you blame more for a mistake, humans or machines?

It depends on the context

Humans, imperfect by design

Machines, always with bugs

No one, mistakes build character (and datasets)

A/B testing on how humans judge machines

Context-dependent preference for “humans” or for “machines”
to execute the same task in the same scenario

Who was perceived to be more at fault for injuring the pedestrian?

The human driver
The driverless car

C. Hidalgo et al.: How Humans Judge Machines, 2021

Who was perceived to be more at fault for misusing the national flag?

The human cleaner
The cleaning robot

When Do We Believe the “Machine”? e.g. ChatGPT

Trust (which is a multidimensional construct) in ChatGPT was influenced by pragmatic factors (usefulness, speed) and hedonic factors (entertainment, novelty)

M. Huschens et al.: Do You Trust ChatGPT? - Perceived Credibility of Human and AI-Generated Content, 2023

J. Buchanan, W. Hickman: Do people trust humans more than ChatGPT?, JBE, 2024

Y. Jung et al.: Do We Trust ChatGPT as much as Google Search and Wikipedia?, CHI 2024

EXPL

Perceived credibility of ChatGPT answers did NOT correlate with actual correctness

Users may trust ChatGPT more for factual or technical tasks, but less for emotional or ethical judgments

Multiple Choice

Quiz: In which cases does human reliance increase when
explanations are added to LLM answers?

Only on correct LLM answers

Only on incorrect LLM answers

On both correct and incorrect LLM answers

Never, explanations just confuse people more

AI (LLM) design choices to shape users’ trust

Inconsistencies in explanations also reduce reliance, suggesting that users are sensitive to logical coherence

S. Kim et al.: Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies, CHI 2025

another relevant study with similar results:
M. Sadeghi et al. Explaining the Unexplainable: The Impact of Misleading Explanations on Trust in Unreliable Predictions for Hardly Assessable Tasks, UMAP 2024

Explanations increase reliance - even on incorrect answers! This shows that explanations can be persuasive, regardless of accuracy

Sources reduce overreliance on incorrect answers, helping users calibrate their trust

What’s in an explanation, after all?

Knowledge Graphs (and Semantic Web technologies at large) can and should have a big role in explanations!

B. Mittelstadt, C. Russell, S. Wachter: Explaining explanations in AI, 2019
S. Chari et al.: Explanation ontology: A model of explanations for user-centered AI, ISWC 2020

F. Lecue: On the role of knowledge graphs in explainable AI, SWJ, 2019

I. Celino: Who is this Explanation for? Human Intelligence and Knowledge Graphs for eXplainable AI, 2020

CONF

Explanation as “what the human wants to know” opposed to “scientific explanation of the AI model internal processing”
Human explanation are usually: contrastive (why P and not Q?), selective (not all possible causes, only "relevant" ones), social (dialogue, interaction, iteration)

Multiple Choice

Quiz: In human-AI collaboration, what is the relationship between
human self-confidence and AI confidence?

Human confidence is higher than AI confidence

Human confidence aligns with AI confidence

Human confidence is lower than AI confidence

Human and AI confidences are not correlated

Calibrating confidence, avoiding both over-trust and under-trust

Users’ self-confidence tends to align with the AI’s expressed confidence, leading to miscalibrated self-confidence, especially if the AI is overconfident or underconfident --> AI confidence should be carefully calibrated, especially in high-stakes or collaborative contexts

J. Li et al.: As Confidence Aligns: Exploring the Effect of AI Confidence on Human Self-confidence in Human-AI Decision Making, CHI 2025

Designers should consider how AI behavior shapes human behavior, not just task outcomes --> greater attention to the cognitive alignment between humans and AI - not just functional alignment

Theory of Mind and Social Intelligence

Theory of Mind is the ability to attribute mental states - beliefs, intents, desires, emotions, knowledge - to oneself and others and to understand that others have beliefs, desires, and intentions that are different from one's own

D. Premack, G. Woodruff: Does the chimpanzee have a theory of mind?, Behavioral and Brain Sciences, 1978

TOM

Social intelligence is the ability to reason about others’ beliefs, intentions, and actions
- Social intelligence is a critical dimension of human intelligence
- Does AI have social intelligence?

Multiple Choice

Quiz: Does AI demonstrate a Theory of Mind (ToM) comparable to humans?

No, AI lacks any social reasoning

Yes, but only at a basic level

Yes, AI shows high-order ToM

Only when prompted with emotional emojis

AI Social Intelligence and “Reasoning”

Evaluation tasks:
- Inverse Reasoning (IR): Inferring the beliefs or goals of others based on their actions
- Inverse Inverse Planning (IIP): A more complex task involving recursive reasoning about others’ reasoning
Results:
- Humans consistently outperformed GPT models across all tasks.
- GPT models showed only basic (order-0) social reasoning, while humans demonstrated higher-order (≥2) reasoning
- LLMs often relied on pattern recognition shortcuts rather than genuine social inference

J. Wang et al.: Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities, 2024

another interesting work:
G. Riva et al.: Psychomatics - A Multidisciplinary Framework for Understanding Artificial Minds, 2024

CRIT

Multiple Choice

Quiz: When do Knowledge Workers increase their Critical Thinking?

When the AI makes obvious mistakes

When they trust the AI

When they trust their own judgment

After their third coffee and a motivational quote

AI and Critical Thinking

GenAI shifts critical thinking toward information verification and response integration

H. Lee et al.: The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers, CHI 2025

Higher confidence in GenAI is associated with less critical thinking

Higher self-confidence is associated with more critical thinking

Multiple Choice

Quiz: In a recent questionnaire, after testing AI tools in their daily job,

industry workers declared that...

They have a high trust in AI

They are not confident they will learn to use AI

They find the cognitive load to use AI very low

They are scared their job will be replaced by AI

AI, human factors and ethical principles

Ethical principles, guidelines and regulations (e.g. AI Act, HLEG Ethics Guidelines for Trustworthy AI) call for a careful assessment of human factors in relation to the adoption of AI solutions!

A. Azzini, I. Baroni, I. Celino: Assessing human factors in AI adoption by employees: a composite questionnaire for subjective user evaluation, TCAI@HHAI 2025

Fear of job replacement by AI, low trust in AI, and increased cognitive load are real issues to take into account when designing AI tools

The large popularity and low entry barrier of tools like ChatGPT make industry employees have the perception that AI is easy to learn

Human-AI Collaboration!!!

H. Li at al. Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling, TVCG 2025

Take-home messages

Always ask yourself who will use your AI system
(and design it with the target users in mind)
Always perform human evaluation of AI systems!
(even only at qualitative, small-scale level)
Whenever possible, design human-in-the-loop AI systems
(the future is in human-AI collaboration, getting the best of both)

Bonus point: Always challenge the way you are using AI! (video: How Stanford Teaches AI-Powered Creativity https://www.youtube.com/watch?v=wv779vmyPVY)

Multiple Choice

Final quiz: In preparing this tutorial, Irene did NOT rely on AI for…

Coming up with wrong quiz answers

Picking the images

Polishing her phrasing

Crafting the tutorial’s storyline

Irene Celino - irene.celino@cefriel.com
Cefriel - viale Sarca 226, 20126 Milano - Italy

(images from Unsplash)

Thank you for your participation!

Human-centric AI Evaluation

Irene Celino - Cefriel - irene.celino@cefriel.com

International Semantic Web Research Summer School - Bertinoro (Italy) - June 10th 2025

Show answer

Auto Play

Slide 1 / 29

SLIDE

Similar Resources on Wayground

24 questions

Adjectives – gradable and non-gradable

Presentation

•

University

25 questions

HDD and SSD

Presentation

•

University

25 questions

01 Python Basics

Presentation

•

University

24 questions

Module 1 / Unit 2 / Using a Workstation (ITCC103-102I)

Presentation

•

University

20 questions

The Origins of Buddhism

Presentation

•

KG - University

20 questions

Paragraph

Presentation

•

University

20 questions

Third Person Singular - Present Simple

Presentation

•

University

20 questions

Simple Present Tense

Presentation

•

University

Popular Resources on Wayground

20 questions

Math Review

Quiz

•

3rd Grade

15 questions

Fast food

Quiz

•

7th Grade

20 questions

Context Clues

Quiz

•

6th Grade

20 questions

Inferences

Quiz

•

4th Grade

19 questions

Classifying Quadrilaterals

Quiz

•

3rd Grade

20 questions

Figurative Language Review

Quiz

•

6th Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

10 questions

Identify Fractions, Mixed Numbers & Improper Fractions

Quiz

•

3rd - 4th Grade

Discover more resources for Computers

20 questions

Guess The App

Quiz

•

KG - Professional Dev...

11 questions

NFL Football logos

Quiz

•

KG - Professional Dev...

19 questions

Minecraft

Quiz

•

6th Grade - Professio...

40 questions

8th Grade Math Review

Quiz

•

8th Grade - University

20 questions

Block Buster Movies

Quiz

•

10th Grade - Professi...

10 questions

Would you rather...

Quiz

•

KG - University

40 questions

Flags of the World

Quiz

•

KG - Professional Dev...

14 questions

Superhero

Quiz

•

1st Grade - University