101.school
CoursesAbout
Search...⌘K
Generate a course with AI...

    Data Science 101

    Receive aemail containing the next unit.
    • Introduction to Data Science
      • 1.1Concept and Need of Data Science
      • 1.2Roles in Data Science
      • 1.3Basics of Mathematics for Data Science
      • 1.4Basic Statistics and Probability for Data Science
    • Basics of Programming for Data Science
      • 2.1Introduction to Python
      • 2.2Python Libraries for Data Science – NumPy & Pandas
      • 2.3Data Visualization with Matplotlib and Seaborn
    • Introduction to Machine Learning and Predictive Analytics
      • 3.1Overview of Machine Learning
      • 3.2Types of Machine Learning - Supervised and Unsupervised Learning
      • 3.3Basic Regression Models
      • 3.4Basics of Classification Models
    • Advanced Predictive Analytics and Beginning Your Data Science Journey
      • 4.1Introduction to Neural Networks
      • 4.2Overview of Deep Learning
      • 4.3Real Life Use Cases of Predictive Analytics
      • 4.4How to Start and Advance your Data Science Career

    Introduction to Data Science

    Basic Statistics and Probability for Data Science

    study of the collection, analysis, interpretation, and presentation of data

    Study of the collection, analysis, interpretation, and presentation of data.

    In this unit, we will delve into the fundamental concepts of statistics and probability, which form the backbone of data science. Understanding these concepts is crucial for data analysis and predictive modeling.

    Introduction to Statistics

    Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In data science, we primarily focus on two types of statistics:

    1. Descriptive Statistics: This involves methods of organizing, picturing, and summarizing information from data. It provides simple summaries about the sample and the measures. These summaries may be either quantitative (i.e., mean, median, mode) or visual (i.e., graphs and charts).

    2. Inferential Statistics: This involves methods of using information from a sample to draw conclusions (inferences) about the population. It allows us to make predictions or generalizations about a population from a sample of data.

    Understanding Different Types of Data

    Data can be classified into four types:

    1. Nominal: This is a categorical variable with no order or priority (e.g., Gender, Marital Status).
    2. Ordinal: This is a categorical variable with an order (e.g., Ratings on a scale of 1-5).
    3. Interval: Numeric scale with no defined zero point (e.g., Temperature in Celsius).
    4. Ratio: Numeric scale with a defined zero point (e.g., Age, Salary).

    Measures of Central Tendency

    These are statistical measures that identify a single value as representative of an entire distribution. The three most common measures of central tendency are:

    1. Mean: The average of all data points.
    2. Median: The middle value in a data set.
    3. Mode: The most frequently occurring value in a data set.

    Measures of Dispersion

    These are statistical measures that describe the variability or spread in a data set. The most common measures of dispersion include:

    1. Range: The difference between the highest and lowest values in a data set.
    2. Variance: The average of the squared differences from the mean.
    3. Standard Deviation: The square root of the variance, giving us a measure of the average distance between each data point and the mean.

    Introduction to Probability

    Probability is a mathematical framework for quantifying our uncertainty. It provides a way of summarizing the uncertainty that comes from our laziness and ignorance. It's an essential tool in predicting what will happen next, thus, it underlies all machine learning models.

    Probability Distributions

    A probability distribution describes how a random variable is distributed. It tells us which outcomes are likely, which are less likely, and how likely they are. Each class of probability distributions includes a wide range of specific distributions:

    1. Normal Distribution: Also known as the Gaussian distribution, is a continuous probability distribution for a real-valued random variable. The graph of the normal distribution is characterized by its bell shape and symmetrical nature.

    2. Binomial Distribution: A discrete probability distribution of the number of successes in a sequence of n independent experiments.

    3. Poisson Distribution: A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.

    Understanding these concepts will provide a solid foundation for the more advanced data science techniques to come.

    Test me
    Practical exercise
    Further reading

    Hi, any questions for me?

    Sign in to chat
    Next up: Introduction to Python