IEEE VIS 2024 Content: Guided Statistical Workflows with Interactive Explanations and Assumption Checking

Guided Statistical Workflows with Interactive Explanations and Assumption Checking

Yuqi Zhang - New York University, New York, United States

Adam Perer - Carnegie Mellon University, Pittsburgh, United States

Will Epperson - Carnegie Mellon University, Pittsburgh, United States

Screen-reader Accessible PDF

Room: Bayshore VI

2024-10-16T18:12:00ZGMT-0600Change your timezone on the schedule page
2024-10-16T18:12:00Z
Exemplar figure, described by caption below
GuidedStats assists users with statistical analyses through guided workflows. It automatically verifies assumptions and provides actionable suggestions. At the current step, the user is checking assumptions, with the explanation offering more details about the relevant statistical concepts.
Fast forward
Keywords

Data science tools, computational notebooks, analytical guidance

Abstract

Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users’ decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.