Z-Score or Standard Score in statistics is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured. Observed values above the mean have positive standard scores, while values below the mean have negative standard scores. The standard …

# Category: GLOSSARY

Data science glossary.

## What is Unsupervised Learning?

Unsupervised Learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. The clusters are modelled using a measure of similarity …

## What is Type II Error?

Type II Error in statistical hypothesis testing is incorrectly retaining a false null hypothesis (a “false negative”). A type II error (or error of the second kind) is the failure to reject a false null hypothesis. Examples of type II errors would be a blood test failing to detect the disease it was designed to …

## What is Type I Error?

Type I Error in statistical hypothesis testing is the incorrect rejection of a true null hypothesis (a false positive). More simply stated, a type I error is detecting an effect that is not present. A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. Usually, a …

## What is True Positive Rate (Sensitivity)?

True Positive Rate (Sensitivity) is a statistical measure which measures the proportion of positives that are correctly identified as such (for example, the percentage of sick people who are correctly identified as having the condition). Another way to understand it, with examples in the context of medical tests is that sensitivity is the extent to …

## What is True Negative Rate (Specificity)?

True Negative Rate (Specificity) is a statistical measure which measures the proportion of negatives that are correctly identified as such (for example, the percentage of healthy people who are correctly identified as not having the condition). Specificity is the extent to which positives really represent the condition of interest and not some other condition being …

## What is Three Sigma Rule?

Three Sigma Rule in the empirical sciences express a conventional heuristic that “nearly all” values are taken to lie within three standard deviations of the mean, i.e. that it is empirically useful to treat 99.7% probability as “near certainty”.The rule states that even for non-normally distributed variables, at least 88.8% of cases should fall within …

## What is Support Vector Machines (SVM)?

Support Vector Machines (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be …

## What is Supervised Learning?

Supervised Learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal). A supervised learning algorithm …

## What is Statistical Significance?

Statistical Significance in statistical hypothesis testing is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the probability of obtaining results at least as extreme as those observed, given that the null hypothesis is true. The significance level, α, is the probability …