Guided Statistical Workflows with Interactive Explanations and Assumption Checking
Yuqi Zhang - New York University, New York, United States
Adam Perer - Carnegie Mellon University, Pittsburgh, United States
Will Epperson - Carnegie Mellon University, Pittsburgh, United States
Screen-reader Accessible PDF
Download camera-ready PDF
Room: Bayshore VI
2024-10-16T18:12:00ZGMT-0600Change your timezone on the schedule page
2024-10-16T18:12:00Z
Fast forward
Keywords
Data science tools, computational notebooks, analytical guidance
Abstract
Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users’ decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.