CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4
CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4
2 COURSE OBJECTIVES:
· To understand the
python libraries for data science
· To understand the
basic Statistical and Probability measures for data science.
· To learn
descriptive analytics on the benchmark data sets.
· To apply
correlation and regression analytics on standard data sets.
· To present and
interpret data using visualization packages in Python.
LIST OF EXPERIMENTS:
1. Download, install and explore the features of
NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web
and exploring various commands for doing descriptive analytics on the Iris data
set.
5. Use the diabetes data set from UCI and Pima
Indians Diabetes data set for performing the following:
a. Univariate analysis: Frequency, Mean, Median,
Mode, Variance, Standard Deviation, Skewness and Kurtosis.
b. Bivariate
analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis
for the two data sets.
6. Apply and explore various plotting functions on
UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap List of
Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib,
Pandas, statmodels, seaborn, plotly, bokeh
Note: Example data sets like: UCI, Iris, Pima
Indians Diabetes etc.
TOTAL: 60 PERIODS
COURSE OUTCOMES: At the end of this course, the
students will be able to: CO1: Make use of the python libraries for data
science CO2: Make use of the basic Statistical and Probability measures for
data science. CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets CO5:
Present and interpret data using visualization packages in Python
Ex 4c. Exploring various commands for doing descriptive analytics on the Iris data set.
Aim
To explore various commands for doing descriptive
analytics on the Iris data set.
Procedure
To understand idea
behind Descriptive Statistics.
Load the
packages we will need and also the `iris` dataset.
load_iris() loads in an object containing
the iris dataset, which I stored in `iris_obj`.
Basic
statistics.
This number is the number of rows in the
dataset, and can be obtained via `count()`.
Mean
for every numeric column
Median
for every numeric column
variance is a measure of dispersion, roughly the
“average” squared distance of a data point from the mean.
The standard deviation is the square root of the variance and
interpreted as the “average” distance a data point is from the mean.
The maximum
and minimum values.
Program Code
import pandas as pd
from pandas import DataFrame
from sklearn.datasets import load_iris
# sklearn.datasetsincludes common example datasets
# A function to load in the iris dataset
iris_obj = load_iris()
# Dataset preview
iris_obj.data
iris = DataFrame(iris_obj.data, columns=iris_obj.feature_names,index=pd.Index([i for i in range(iris_obj.data.shape[0])])).join(DataFrame(iris_obj.target, columns=pd.Index(["species"]), index=pd.Index([i for i in range(iris_obj.target.shape[0])])))
iris # prints iris data
Commands
iris_obj.feature_names
iris.count()
iris.mean()
iris.median()
iris.var()
iris.std()
iris.max()
iris.min()
iris.describe()
Result
Exploring various commands for doing descriptive
analytics on the Iris data set successfully executed.