LA 2025 Workshop

Actualizing Ethical Principles for Curating Large-Scale Datasets in the Era of Generative AI

September 15th-16th

Overview

AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard critical ethical issues, despite the fact that data represents and impacts real people. While progress has been made in establishing best practices for curating datasets in a more ethical fashion, the unprecedented scale of training data and complexity of current and potential AI use cases presents unique challenges that AI researchers and practitioners must now face. For example, large foundation models such as open AI are being used in a wide array of generative AI tools that are increasingly central to daily life and work.

This two-day, in-person workshop aims to unite interdisciplinary researchers and practitioners in an effort to identify the challenges unique to curating ethical datasets for AI models---and then begin to ideate best practices for tackling those challenges. Grounded in interdisciplinary exchange, our aim is to cultivate a diverse community of researchers and practitioners interested in defining the future of ethical responsibility in the composition, process, and release of datasets for AI model training.

  • Composition: The makeup of the dataset itself, including the data schema, data instances, and annotations.
  • Process: The process that goes into curating the dataset, such as collecting data and annotating it.
  • Release: The release of the dataset to be used by others for evaluation and modeling purposes.
  • We plan to disseminate the outcomes of this workshop to the AI community and beyond by developing a conceptual framework of both the challenges and potential solutions associated specifically with curating datasets for foundation models. We will invite interested workshop participants to contribute to a future publication on these insights.

    This will be the first of a three-workshop series with the second occurring at CSCW on October 18th.

    Key Information

    Call for Participation

    We invite researchers and practitioners from diverse backgrounds to apply to participate in this workshop. We are particularly interested in individuals who have an interest in or experience with dataset curation, AI model training, or ethical considerations in AI.

    We expect to have 20-25 individuals interested in exploring design's role in transforming complex challenges into constructive opportunities to participate in our two-day, in-person workshop. The first day will consist of panels and discussions, while the second day will focus on hands-on activities and collaborative design exercises.

    As part of the process we ask that attendees submit a short statement of interest outlining their background, relevant experience, and what they hope to gain from the workshop. We also welcome those who would like to participate in an ignite talk to do so.

    We have collected a set of Responsible Data resources to help inform the workshop discussions. We encourage you to review these resources before applying to participate in the workshop. We welcome any contributions you have to the resource list. View Resources.

    Workshop Agenda

    We are excited to share the agenda for our workshop. The first day will consist of panels and discussions, while the second day will focus on hands-on activities and collaborative design exercises.

    Via generous sponsorship from ASU's New College of Interdisciplinary Arts & Sciences, we will be providing lunch on both days, as well as coffee breaks. We will also be hosting an optional dinner on the first day for those who would like to continue the discussions in a more informal setting.

    We will be updating the agenda as we finalize the details, so please check back for the latest information.

    We look forward to seeing you at the workshop!

    Note: The agenda is subject to change.

    September 15, 2025

    Time Activity and Description
    10:00-10:30am Welcome and Opening Remarks
    10:30-12:00pm Panel #1 - Data Ethics in Health AI

    Julie Liss, Arizona State University
    Beza Merid,Arizona State University
    Angel Hsing-Chi Hwang, USC Annenberg
    Yan Liu, USC Machine Learning Center co-Director

    12:00-1:30pm Lunch Break
    1:30-2:45pm Lightening Talks
    2:45-3:00pm Coffee Break
    3:00-4:30pm Panel #2 - Data Ethics in AI

    Michael Simeone, Arizona State University
    Nicholas Proferes, Arizona State University
    Dora Zhao, Stanford University

    4:30-5:00pm Day 1 Closing Remarks
    5:30pm (Optional) Post-Workshop Dinner

    Pine & Crane - Downtown LA
    1120 S. Grand Ave. Unit 101
    Los Angeles, CA 90015




    September 16, 2025

    Time Activity and Description
    9:00-9:30am Welcome and Opening Remarks
    9:30-10:30am Group Session # 1: Ethical Principles for Dataset Curation
    10:30-11:00am Coffee Break
    11:00-12:00pm Group Session # 2: Challenges in Dataset Curation
    12:00-1:30pm Lunch Break
    1:30-2:30pm Group Session #3: Challenges to Ethical Dataset Curation
    2:30-2:45pm Coffee Break
    2:45-3:45pm Group Session #4: Framework Writing
    3:45-4:00pm Coffee Break
    4:00-4:45pm Group Share/Discussion
    4:45-5:00pm Closing Remarks

    Organizers

    Dora Zhao
    Dora Zhao

    Stanford University

    Kathleen H. Pine
    Kathleen H. Pine

    Arizona State University

    Shawn Walker
    Shawn Walker

    Arizona State University

    Alice Xiang
    Alice Xiang

    Sony AI