Wednesday, 7 September 2022

CS3361 Exploring various commands for doing descriptive analytics on the Iris data set.

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4

 CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 · To understand the python libraries for data science 

· To understand the basic Statistical and Probability measures for data science. 

· To learn descriptive analytics on the benchmark data sets. 

· To apply correlation and regression analytics on standard data sets. 

· To present and interpret data using visualization packages in Python. 

LIST OF EXPERIMENTS: 

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages. 

2. Working with Numpy arrays 

3. Working with Pandas data frames 

4. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set. 

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:

a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

 b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets. 

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting 

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh 

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc. 

TOTAL: 60 PERIODS 

COURSE OUTCOMES: At the end of this course, the students will be able to: CO1: Make use of the python libraries for data science CO2: Make use of the basic Statistical and Probability measures for data science. CO3: Perform descriptive analytics on the benchmark data sets. CO4: Perform correlation and regression analytics on standard data sets CO5: Present and interpret data using visualization packages in Python


  

Ex 4c. Exploring various commands for doing descriptive analytics on the Iris data set. 

Aim

To explore various commands for doing descriptive analytics on the Iris data set. 

Procedure

To understand idea behind Descriptive Statistics.

Load the packages we will need and also the `iris` dataset.

load_iris()  loads in an object containing the iris dataset, which I stored in `iris_obj`. 

Basic statistics.

This number is the number of rows in the dataset, and can be obtained via `count()`.

Mean for every numeric column

Median for every numeric column

variance is a measure of dispersion, roughly the “average” squared distance of a data point from the mean.

The standard deviation is the square root of the variance and interpreted as the “average” distance a data point is from the mean.

The maximum and minimum values.


Program Code

import pandas as pd

from pandas import DataFrame

from sklearn.datasets import load_iris   

# sklearn.datasetsincludes common example datasets

# A function to load in the iris dataset

iris_obj = load_iris()   

# Dataset preview

iris_obj.data   

iris = DataFrame(iris_obj.data, columns=iris_obj.feature_names,index=pd.Index([i for i in range(iris_obj.data.shape[0])])).join(DataFrame(iris_obj.target, columns=pd.Index(["species"]), index=pd.Index([i for i in range(iris_obj.target.shape[0])])))

iris # prints iris data

 

Commands

iris_obj.feature_names 

iris.count()

iris.mean()

iris.median() 

iris.var()

iris.std()

iris.max()

iris.min()

iris.describe()

 

Result

Exploring various commands for doing descriptive analytics on the Iris data set successfully executed.

No comments:

Post a Comment

CCS 365 Software Defined Network Lab Manual

 CCS 365 Software Defined Network Lab Manual 1) Setup your own virtual SDN lab i) Virtualbox/Mininet Environment for SDN - http://mininet.or...