Hadoop Hive is a runtime Hadoop support structure that allows anyone who is already fluent with SQL (which is commonplace for relational data-base developers) to leverage the Hadoop platform right out of the gate. Hive allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements. HQL is limited …

## What is Hadoop Pig?

Hadoop Pig was initially developed at Yahoo to allow people using Hadoop to focus more on analyzing large datasets and spend less time writing mappers and reduce programs. This would allow people to do what they want to do instead of thinking about mapper and reducer tasks. Name Pig was given to the programming language …

## What is Z-Score or Standard Score?

Z-Score or Standard Score in statistics is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured. Observed values above the mean have positive standard scores, while values below the mean have negative standard scores. The standard …

## What is Unsupervised Learning?

Unsupervised Learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. The clusters are modelled using a measure of similarity …

## What is Type II Error?

Type II Error in statistical hypothesis testing is incorrectly retaining a false null hypothesis (a “false negative”). A type II error (or error of the second kind) is the failure to reject a false null hypothesis. Examples of type II errors would be a blood test failing to detect the disease it was designed to …

## What is Type I Error?

Type I Error in statistical hypothesis testing is the incorrect rejection of a true null hypothesis (a false positive). More simply stated, a type I error is detecting an effect that is not present. A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. Usually, a …

## What is True Positive Rate (Sensitivity)?

True Positive Rate (Sensitivity) is a statistical measure which measures the proportion of positives that are correctly identified as such (for example, the percentage of sick people who are correctly identified as having the condition). Another way to understand it, with examples in the context of medical tests is that sensitivity is the extent to …

## What is True Negative Rate (Specificity)?

True Negative Rate (Specificity) is a statistical measure which measures the proportion of negatives that are correctly identified as such (for example, the percentage of healthy people who are correctly identified as not having the condition). Specificity is the extent to which positives really represent the condition of interest and not some other condition being …

## What is Three Sigma Rule?

Three Sigma Rule in the empirical sciences express a conventional heuristic that “nearly all” values are taken to lie within three standard deviations of the mean, i.e. that it is empirically useful to treat 99.7% probability as “near certainty”.The rule states that even for non-normally distributed variables, at least 88.8% of cases should fall within …

## What is Support Vector Machines (SVM)?

Support Vector Machines (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be …