LA 2025 Workshop

Actualizing Ethical Principles for Curating Large-Scale Datasets in the Era of Generative AI

September 15th-16th

Workshop Proposal

Overview

AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard critical ethical issues, despite the fact that data represents and impacts real people. While progress has been made in establishing best practices for curating datasets in a more ethical fashion, the unprecedented scale of training data and complexity of current and potential AI use cases presents unique challenges that AI researchers and practitioners must now face. For example, large foundation models such as open AI are being used in a wide array of generative AI tools that are increasingly central to daily life and work.

This two-day, in-person workshop aims to unite interdisciplinary researchers and practitioners in an effort to identify the challenges unique to curating ethical datasets for AI models---and then begin to ideate best practices for tackling those challenges. Grounded in interdisciplinary exchange, our aim is to cultivate a diverse community of researchers and practitioners interested in defining the future of ethical responsibility in the composition, process, and release of datasets for AI model training.

Composition: The makeup of the dataset itself, including the data schema, data instances, and annotations.

Process: The process that goes into curating the dataset, such as collecting data and annotating it.

Release: The release of the dataset to be used by others for evaluation and modeling purposes.

We plan to disseminate the outcomes of this workshop to the AI community and beyond by developing a conceptual framework of both the challenges and potential solutions associated specifically with curating datasets for foundation models. We will invite interested workshop participants to contribute to a future publication on these insights.

This will be the first of a three-workshop series with the second occurring at CSCW on October 18th.

Key Information

Workshop date: September 15-16, 2025
Workshop location: ASU California Center Broadway (1111 S Broadway Suite 100, Los Angeles, CA 90015)
Lodging Information: LA Hotels with ASU preferred rates
Contact: Shawn Walker and Katie Pine

Call for Participation

We invite researchers and practitioners from diverse backgrounds to apply to participate in this workshop. We are particularly interested in individuals who have an interest in or experience with dataset curation, AI model training, or ethical considerations in AI.

We expect to have 20-25 individuals interested in exploring design's role in transforming complex challenges into constructive opportunities to participate in our two-day, in-person workshop. The first day will consist of panels and discussions, while the second day will focus on hands-on activities and collaborative design exercises.

As part of the process we ask that attendees submit a short statement of interest outlining their background, relevant experience, and what they hope to gain from the workshop. We also welcome those who would like to participate in an ignite talk to do so.

We have collected a set of Responsible Data resources to help inform the workshop discussions. We encourage you to review these resources before applying to participate in the workshop. We welcome any contributions you have to the resource list. View Resources.

Apply to Attend the Workshop

Workshop Agenda

We are excited to share the agenda for our workshop. The first day will consist of panels and discussions, while the second day will focus on hands-on activities and collaborative design exercises.

Via generous sponsorship from ASU's New College of Interdisciplinary Arts & Sciences, we will be providing lunch on both days, as well as coffee breaks. We will also be hosting an optional dinner on the first day for those who would like to continue the discussions in a more informal setting.

We will be updating the agenda as we finalize the details, so please check back for the latest information.

We look forward to seeing you at the workshop!

Note: The agenda is subject to change.

September 15, 2025

Time	Activity and Description
10:00-10:30am	Welcome and Opening Remarks
10:30-12:00pm	Panel #1 - Data Ethics in Health AI Julie Liss, Arizona State University Beza Merid,Arizona State University Angel Hsing-Chi Hwang, USC Annenberg Yan Liu, USC Machine Learning Center co-Director
12:00-1:30pm	Lunch Break
1:30-2:45pm	Lightening Talks
2:45-3:00pm	Coffee Break
3:00-4:30pm	Panel #2 - Data Ethics in AI Michael Simeone, Arizona State University Nicholas Proferes, Arizona State University Dora Zhao, Stanford University
4:30-5:00pm	Day 1 Closing Remarks
	5:30pm (Optional) Post-Workshop Dinner Pine & Crane - Downtown LA 1120 S. Grand Ave. Unit 101 Los Angeles, CA 90015

September 16, 2025

Time	Activity and Description
9:00-9:30am	Welcome and Opening Remarks
9:30-10:30am	Group Session # 1: Ethical Principles for Dataset Curation
10:30-11:00am	Coffee Break
11:00-12:00pm	Group Session # 2: Challenges in Dataset Curation
12:00-1:30pm	Lunch Break
1:30-2:30pm	Group Session #3: Challenges to Ethical Dataset Curation
2:30-2:45pm	Coffee Break
2:45-3:45pm	Group Session #4: Framework Writing
3:45-4:00pm	Coffee Break
4:00-4:45pm	Group Share/Discussion
4:45-5:00pm	Closing Remarks

Organizers

Actualizing Ethical Principles for Curating Large-Scale Datasets in the Era of Generative AI

Overview

Key Information

Call for Participation

Workshop Agenda

Organizers

Morgan Klaus Scheuerman

Dora Zhao

Kathleen H. Pine

Shawn Walker

Alice Xiang