 ## Summarizing data

One-column and two-column summaries; charts and inferential statistics

The Summary view lets you view column summaries, create new columns, and perform simple tests

Performing an ANOVA test of a numeric column’s mean across values of another column
An analysis of variance (ANOVA) test indicates whether a numeric column’s mean appears to be correlated with another column’s values

Performing an ANOVA test of three or more numeric columns’ means
An analysis of variance (ANOVA) test indicates whether the means of three or more numeric columns appear to differ

Viewing the best-fit line on a scatterplot
A best-fit line minimizes the sum of the squared distance between the line and all the points on a scatterplot

Viewing box plots of a numeric column broken down by values of another column
Box plots provide a visual representation of a numeric column’s summary statistics

Performing a chi-square test of a category column’s independence from another category column
A chi-square test indicates whether the values of two category columns appear to be related

Performing a chi-square test of two or more category columns’ distributions
A chi-square test indicates whether the distribution of values appears to differ across two or more category columns

Performing a Cochran’s Q test for the marginal homogeneity of three or more binary columns that are matched or related
Cochran’s Q test indicates whether the distribution of binary values appears to differ across three or more columns whose rows are related

Copying another column’s missing values
The missing value definitions from one column can be copied to other columns

Copying another column’s category labels
The category labels from one column can be copied to other columns

Copying another column’s category order
The ordering of a column’s categories can be copied to other columns

Viewing the correlation coefficient between two numeric columns
The correlation coefficient describes how well one column can be predicted by another

Partitioning a non-category covariate into discrete groups in the Summary view
View partitioning options with the Configure button below the covariate view

Exporting a table’s correlation matrix
The matrix of column correlations can be exported from the Table menu

Exporting a table’s summary statistics
A table’s summary statistics can be exported from the Table menu

Performing a Friedman test of the relative values of three or more numeric columns
A Friedman test is a non-parametric test for related observations

Viewing a histogram of a numeric column
Histograms provide a visual summary of a numeric column’s distribution

Viewing histograms of a numeric column broken down by values of another column
Histograms provide a visual summary of a numeric column’s distribution

Testing the statistical independence of two category columns
An independence test indicates whether the values of two category columns are correlated

Performing a Kolmogorov-Smirnov test of uniformity on a numeric column
A Kolmogorov-Smirnov test can indicate whether a numeric column appears to be uniformly distributed

Performing a Kolmogorov-Smirnov test of a numeric column’s distribution across values of another column
A Kolmogorov-Smirnov test indicates whether a numeric column’s value distribution tends to differ across groups

Performing a Kolmogorov-Smirnov test of two or more numeric columns’ distributions
A Kolmogorov-Smirnov test indicates whether the distribution of values tends to differ across columns

Performing a Kruskal-Wallis test of a numeric column’s median across values of another columns
A Kruskal-Wallis test indicates whether a numeric column’s values tend to differ in size across groups

Performing a Kruskal-Wallis test of three or more numeric columns’ medians
A Kruskal-Wallis test indicates whether the values of numeric columns tend to differ in size

Assigning colors to a table’s labels
Label colors let you customize the appearance of charts

Performing a one-sample log-rank test for constant hazards on a date or numeric column
A one-sample log-rank test can indicate whether a survival times are characterized by a constant hazard rate

Performing a log-rank test for the equality of hazard rates across values of another column
A log-rank test can indicate whether hazard rates vary across two or more groups

Performing a log-rank test for the equality of hazard rates of two or more columns
A log-rank test can indicate whether hazard rates vary across two or more survival-times columns

Performing a Mann-Whitney test of a numeric column’s median across two categories
A Mann-Whitney test indicates whether a numeric column’s values tend to be larger in one of two categories

Performing a Mann-Whitney test of two numeric columns’ medians
A Mann-Whitney test indicates whether values tend to be larger in one of two columns

Performing a McNemar test for the marginal homogeneity of two category columns that are matched or related
A McNemar test indicates whether the distribution of values appears to differ across two columns whose rows are related

Treating data values as missing
Define missing values to exclude them from calculations

Viewing the multinomial confidence intervals of a category column
Multinomial confidence intervals indicate the amount of uncertainty associated with estimates of each category’s proportion in the population

Adjusting a p-value to account for multiple comparisons
The Šidák correction can control the Type I error when performing two or more independent tests

Performing a paired t-test of two numeric columns’ means
A paired t-test indicates whether the mean difference between two numeric columns is non-zero

Viewing a pie chart of a category column
Pie charts provide a visual summary of the relative proportions of category values appearing in the data

Viewing the estimated population of a category column broken down by values of another column
Proportion bars provide a quick visual summary of a category’s distribution and the associated statistical uncertainty

Viewing a Q-Q plot of a numeric column against a normal or uniform distribution
A Q-Q plot provides a visual indicator of whether a column appears to be distributed according to a particular mathematical distribution

Viewing the coefficient of determination between two numeric columns
The coefficient of determination describes how well one column can be predicted by another

Performing a repeated measures ANOVA test of three or more numeric columns’ means
An repeated measures ANOVA test indicates whether the means of three or more numeric columns mean appear to differ when observations in each row are related

Viewing a scatterplot of two numeric columns
A scatterplot plots on numeric column against another

Performing a Shapiro-Wilk test of normality on a numeric column
A Shapiro-Wilk test can indicate whether a numeric column appears to be normally distributed

Performing a Shapiro-Wilk test on the normality of differences between two columns
A Shapiro-Wilk test can indicate whether the difference between two columns appears to be normally distributed

Viewing a column’s summary statistics
Summary statistics can be viewed by control-clicking the header of the summary table

Viewing a survival curve of a numeric or date column
Survival curves provide a visual summary of the longevity of observations in the data

Viewing survival curves of a date or numeric column broken down by values of another column
Survival curves provide a visual summary of survival times

Performing a t-test of a numeric column’s mean across two categories
A two-sample t-test indicates whether a numeric column’s mean appears to differ across two categories

Performing a two-way ANOVA test of a numeric column’s mean across values of two other columns
An two-way analysis of variance (ANOVA) test indicates whether a numeric column’s mean is affected by the interaction of two other columns’ values

Performing an unpaired t-test of two numeric columns’ means
A two-sample t-test indicates whether the means of two numeric columns appear to differ

Assigning labels to the values of a category column
Value labels provide a human-friendly description of numeric or text values

Performing a Wilcoxon signed-rank test of the median difference between two numeric columns
A Wilcoxon signed-rank test is a non-parametric test for paired observations

Back to Wizard Help