101.school
CoursesAbout
Search...⌘K
Generate a course with AI...

    Data Science 101

    Receive aemail containing the next unit.
    • Introduction to Data Science
      • 1.1Concept and Need of Data Science
      • 1.2Roles in Data Science
      • 1.3Basics of Mathematics for Data Science
      • 1.4Basic Statistics and Probability for Data Science
    • Basics of Programming for Data Science
      • 2.1Introduction to Python
      • 2.2Python Libraries for Data Science – NumPy & Pandas
      • 2.3Data Visualization with Matplotlib and Seaborn
    • Introduction to Machine Learning and Predictive Analytics
      • 3.1Overview of Machine Learning
      • 3.2Types of Machine Learning - Supervised and Unsupervised Learning
      • 3.3Basic Regression Models
      • 3.4Basics of Classification Models
    • Advanced Predictive Analytics and Beginning Your Data Science Journey
      • 4.1Introduction to Neural Networks
      • 4.2Overview of Deep Learning
      • 4.3Real Life Use Cases of Predictive Analytics
      • 4.4How to Start and Advance your Data Science Career

    Introduction to Machine Learning and Predictive Analytics

    Basics of Classification Models

    statistical model

    Statistical model.

    Classification is a type of supervised learning where the outcome (target variable) is categorical. It involves training a model to predict or categorize the class labels of the target variable based on the input features. Some of the common applications of classification models include email spam detection, customer churn prediction, and disease diagnosis.

    Logistic Regression

    Logistic Regression is a classification algorithm used when the response variable is categorical. Unlike linear regression, which uses a straight line to model the relationship between variables, logistic regression uses the logistic function to model the probability of a certain class or event.

    The logistic function, also known as the sigmoid function, can take any real-valued number and map it into a value between 0 and 1. This makes it suitable for modeling the probability of a binary outcome.

    Decision Trees

    Decision Trees are a type of flowchart-like structure in which each internal node represents a feature (or attribute), each branch represents a decision rule, and each leaf node represents an outcome. The topmost node in a decision tree is known as the root node.

    Decision trees are simple to understand and interpret, and they can handle both categorical and numerical data. However, they can easily overfit the data if not properly pruned.

    Understanding Confusion Matrix

    A confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known. It contains information about actual and predicted classifications done by a classification system.

    The four terms used in confusion matrix are:

    • True Positives (TP): The cases in which we predicted YES and the actual output was also YES.
    • True Negatives (TN): The cases in which we predicted NO and the actual output was NO.
    • False Positives (FP): The cases in which we predicted YES and the actual output was NO.
    • False Negatives (FN): The cases in which we predicted NO and the actual output was YES.

    Evaluation Metrics for Classification Models

    There are several metrics used to evaluate the performance of classification models, including:

    • Accuracy: It is the ratio of the number of correct predictions to the total number of input samples.
    • Precision: It is the ratio of the number of true positives to the sum of true positives and false positives. It shows how precise your model is out of those predicted positive, how many of them are actual positive.
    • Recall (Sensitivity): It is the ratio of the number of true positives to the sum of true positives and false negatives. It shows how many of the actual positives your model is able to capture.
    • F1-Score: It is the harmonic mean of Precision and Recall. It tries to find the balance between precision and recall.

    By understanding these basics of classification models, you can start to apply these techniques to your own data and begin to see the power of machine learning in action.

    Test me
    Practical exercise
    Further reading

    Hey there, any questions I can help with?

    Sign in to chat
    Next up: Introduction to Neural Networks