Descriptive statistics by using Pandas and Scipy Library–
In real-time, data science becomes most useful for generating prediction through data visualization. Python is most popular language for analyzing data.
Here, we discuss about Descriptive statistics in detail, with the help of Pandas and SciPy libraries in python.
Firstly, you will learn about these two libraries which we are using in python project. Afterwards, we will discuss on, what is Descriptive statistics?.
I will provide you short note on these libraries.
Pandas is a library in python that is used for data analysis (in python).
SciPyis a library that contains all algebraic functions and builds on the NumPy extension.
Now, we will learn about descriptive statistics.
What is Descriptive statistics?
Descriptive statistics can be used for computing statistical measure of one or more sample. This means it describe and summarize data in meaningful way.
Descriptive statistics is divided into two parts. These are-
1-It describes the values of observation in a variable.
Like,we have descriptive statistics example –Sum, Median, Mean, Max etc.
2-It also describes variable spread.
Like, we have descriptive statistics example –Standard Deviation, Variance, Counts, Quartiles etc.
We can use descriptive statistics analysis for analyzing data. For this we can use python in Jupyter notebook.
You can follow below steps when working with Jupyter notebook in python.
Step 1-Import libraries like Pandas and SciPy.
Import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from scipy import stats
Step 2-Create an Excel file and save as .csv file. Here, I have created Fruits.csv file and kept inside folder YoursTechnicalTeacher.
Important Points to remember-
1-read.csv method can be used to read the data from csv file into a data frame.
2-head() method is used to show the first five rows from data.
3-Now we will starts on values of observation in a variable.
i-Sum– we can find sum by adding all the values.
we have values a,b,c,d,e,f
Then sum is =a+b+c+d+e+f
When we need sum of data in row wise manner we can define axis=1.
ii-Median– it gives you middle value from datasets.
If total number from numbers(n) is odd value then median =((n+1)/2)th term.
If total number from numbers(n) is even value then median =average value of (n/2)thand ((n+2)/2)th term.
iii-Mean– it will give you average value from datasets.
we have values a,b,c,d,e,f
Then mean is =(a+b+c+d+e+f)/6
iv-Max-Tt will give you maximum value from datasets.
idmax() method-it gives you the index value of the row that contained the maximum value.
4-Now, we will work on variable distribution.
- Variance– It will gives average value by adding of squares of difference between all numbers and means.
- Standard deviation-it will gives you square root of variance.
- Count– It will give us number of occurrence for items in a datasets. Also, it will show unique value.
Note–One of the important method that is , used to describe all the descriptive statistics for each variables in a data set all at one time.
I hope, you have understood all the steps in descriptive statistics in detail. Therefore,it will help you for analyzing data in data science. These are also data analysis method.
Thank you and happy coding!!