Terminology
What is Machine Learning
  • Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
  • What is the difference amongs Artifical Intelligence (AI), Machine Learning (ML), Data Mining and Pattern Recognization
  • Artificial Intelligence human-like intelligence displayed by software and/or machines, is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”, AI concentrated on mimicking human decision making processes and carrying out tasks in ever more human ways

  • Machine learning algorithms that can learn from data to make predictions, focuses on the development of computer programs that can access data and use it learn for themselves.

  • Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes, the process of digging through data to discover hidden connections and predict future trends, knowledge discovery in databases (KDD)

  • Pattern recognition the automated recognition of patterns and regularities in data
  • Supervised Machine Learning
  • Starting from the analysis of a known training dataset by given "right answers", the trained model is able to make predictions about the output values.
  • Unsupervised Machine Learning
  • Classify or label data without given "right answers"
  • Semi-supervised machine learning
  • Use both labeled and unlabeled data for training
  • Reinforcement machine learning
  • a learning method that interacts with its environment by producing actions and discovers errors or rewards
  • Classification and Regression, supervised learning
  • classification is about predicting a label
  • regression is about predicting a quantity
  • Clustering and Associative, unsupervised learning
  • clustering the data based on relationships among the variables in the data
  • Association analysis
  • Convex Function
  • In mathematics, a real-valued function defined on an n-dimensional interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above or on the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions.
  • Underfitting and Overfitting
  • Bias referes to the error that is introduced by approximating a real-life problem
  • Variance is due to the model's excessive sensitivity to small variations in the training data, refers to the amount by which factor would change if we estimated it using a different training data set
  • Irreducible error is due to the noisiness of the data itself. The only way to reduce this part of the error is to clean up the data
  • Higher the degrees of freedom, may cause overfitting, low bias, high variance
  • Lower the degrees of freedom, may cause underfitting, high bias, low variance
  • Avoid Underfitting and Overfitting
  • Cross Validation
  • Learning Curves