Notes for Everyone

Monday, 3 October 2022

6 c. Apply and explore histograms and three dimensional plotting functions on UCI data sets

Aim

To apply and explore histograms and three dimensional plotting functions on UCI data sets

Procedure

ü Download CSV file and upload to explore.

ü A histogram is basically used to represent data provided in a form of some groups.

ü To create a histogram the first step is to create bin of the ranges, then distribute the whole range of the values into a series of intervals, and count the values which fall into each of the intervals.

ü Bins are clearly identified as consecutive, non-overlapping intervals of variables.The matplotlib.pyplot.hist() function is used to compute and create histogram of x.

ü The first one is a standard import statement for plotting using matplotlib, which you would see for 2D plotting as well.

ü The second import of the Axes3D class is required for enabling 3D projections. It is, otherwise, not used anywhere else.

Program

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt # To visualize

from mpl_toolkits.mplot3d import Axes3D

data = pd.read_csv('d:\\diabetes.csv')

data

data['Glucose'].plot(kind='hist')

Output

fig = plt.figure(figsize=(4,4))

ax = fig.add_subplot(111, projection='3d')

Output

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

x = data['Age'].values

y = data['Glucose'].values

z = data['Outcome'].values

ax.set_xlabel("Age (Year)")

ax.set_ylabel("Glucose (Reading)")

ax.set_zlabel("Outcome (0 or 1)")

ax.scatter(x, y, z, c='r', marker='o')

plt.show()

Output

Result

The histograms and three dimensional plotting functions on UCI data sets are successfully executed.

Saturday, 1 October 2022

5 c. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: Multiple Regression

Aim

Multiple regression is like linear regression, but with more than one independent value, meaning that we try to predict a value based on two or more variables.

Procedure

The Pandas module allows us to read csv files and return a DataFrame object.

Then make a list of the independent values and call this variable X.

Put the dependent values in a variable called y.

From the sklearn module we will use the LinearRegression() method to create a linear regression object.

This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression object with data that describes the relationship.

We have a regression object that are ready to predict age values based on a person Glucose and BloodPressure

Program

import pandas as pd

from sklearn import linear_model

df = pd.read_csv (r'd:\\diabetes.csv')

print (df)

X = df[['Glucose', 'BloodPressure']]

y = df['Age']

regr = linear_model.LinearRegression()

regr.fit(X, y)

predictedage = regr.predict([[150, 13]])

print(predictedage)

Output

[28.77214401]

5 b. Linear Regression and Logistic Regression with the Diabetes Dataset Using Python Machine Learning

Aim

In this experiment we use the diabetes dataset from sklearn and then we need to implement the Linear Regression over this:

Procedure

Load sklearn Libraries.

Load Data

Load the diabetes dataset

Split Dataset

Creating Model Linear Regression and Logistic Regression

Make predictions using the testing set

Finding Coefficient And Mean Square Error

Program

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn import datasets, linear_model

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

#To calculate accuracy measures and confusion matrix

from sklearn import metrics

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]

diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets

diabetes_y_train = diabetes_y[:-20]

diabetes_y_test = diabetes_y[-20:]

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# Create Logistic regression object

Logistic_model = LogisticRegression()

Logistic_model.fit(diabetes_X_train, diabetes_y_train)

# The coefficients

print('Coefficients: \n', regr.coef_)

# The mean squared error

print('Mean squared error: %.2f'

% mean_squared_error(diabetes_y_test, diabetes_y_pred))

# The coefficient of determination: 1 is perfect prediction

print('Coefficient of determination: %.2f'

% r2_score(diabetes_y_test, diabetes_y_pred))

y_predict = Logistic_model.predict(diabetes_X_train)

#print("Y predict/hat ", y_predict)

y_predict

Output

Coefficients:

[938.23786125]

Mean squared error: 2548.07

Coefficient of determination: 0.47

5 d. Compare the results of the above analysis for the two data sets.

5 d. Compare the results of the above analysis for the two data sets.

Aim

In this program, we can compare the results of the two different data sets.

Procedure

Step 1: Prepare the datasets to be compared

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames

Step 3: Compare the values between the two Pandas DataFrames

In this step, you’ll need to import the NumPy package.

Let’s say that you have the following data stored in a CSV file called car1.csv

While you have the data below stored in a second CSV file called car2.csv

Program

import pandas as pd

import numpy as np

data_1 = pd.read_csv(r'd:\car1.csv')

df1 = pd.DataFrame(data_1)

data_2 = pd.read_csv(r'd:\car2.csv')

df2 = pd.DataFrame(data_2)

df1['amount1'] = df2['amount1']

df1['prices_match'] = np.where(df1['amount'] == df2['amount1'], 'True', 'False')

df1['price_diff'] = np.where(df1['amount'] == df2['amount1'], 0, df1['amount'] - df2['amount1'])

print(df1)

Output

Model City Year amount amount1 prices_match price_diff

0 Maruti Chennai 2022 600000 600000 True 0

1 Hyndai Chennai 2022 700000 700000 True 0

2 Ford Chennai 2022 800000 850000 False -50000

3 Kia Chennai 2022 900000 900000 True 0

4 XL6 Chennai 2022 1000000 1000000 True 0

5 Tata Chennai 2022 1100000 1150000 False -50000

6 Audi Chennai 2022 1200000 1200000 True 0

7 Ertiga Chennai 2022 1300000 1300000 True 0

Please click here to download the Dataset

Dataset 1: car1.csv

Dataset 2: car2.csv

Wednesday, 21 September 2022

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

. a. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:
Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files,

Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

COURSE OUTCOMES: At the end of this course, the students will be able to: CO1: Make use of the python libraries for data science CO2: Make use of the basic Statistical and Probability measures for data science. CO3: Perform descriptive analytics on the benchmark data sets. CO4: Perform correlation and regression analytics on standard data sets CO5: Present and interpret data using visualization packages in Python

Thursday, 8 September 2022

CS3351 DIGITAL PRINCIPLES AND COMPUTER ORGANIZATION Study Material

Download

CS3351 DIGITAL PRINCIPLESAND COMPUTER ORGANIZATION L T P C 3 0 2 4

COURSE OBJECTIVES:

· To analyze and design combinational circuits.

· To analyze and design sequential circuits

· To understand the basic structure and operation of a digital computer.

· To study the design of data path unit, control unit for processor and to familiarize with the hazards.

· To understand the concept of various memories and I/O interfacing.

UNIT I COMBINATIONAL LOGIC

Combinational Circuits – Karnaugh Map - Analysis and Design Procedures – Binary Adder – Subtractor – Decimal Adder - Magnitude Comparator – Decoder – Encoder – Multiplexers - Demultiplexers

UNIT II SYNCHRONOUS SEQUENTIAL LOGIC

Introduction to Sequential Circuits – Flip-Flops – operation and excitation tables, Triggering of FF, Analysis and design of clocked sequential circuits – Design – Moore/Mealy models, state minimization, state assignment, circuit implementation - Registers – Counters.

UNIT III COMPUTER FUNDAMENTALS

Functional Units of a Digital Computer: Von Neumann Architecture – Operation and Operands of Computer Hardware Instruction – Instruction Set Architecture (ISA): Memory Location, Address and Operation – Instruction and Instruction Sequencing – Addressing Modes, Encoding of Machine Instruction – Interaction between Assembly and High Level Language.

UNIT IV PROCESSOR 9

Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control, Microprogrammed Control – Pipelining – Data Hazard – Control Hazards.

UNIT V MEMORY AND I/O 9 Memory Concepts and Hierarchy – Memory Management – Cache Memories: Mapping and Replacement Techniques – Virtual Memory – DMA – I/O – Accessing I/O: Parallel and Serial Interface – Interrupt I/O – Interconnection Standards: USB, SATA

45 PERIODS

PRACTICAL EXERCISES: 30 PERIODS

1. Verification of Boolean theorems using logic gates.

2. Design and implementation of combinational circuits using gates for arbitrary functions.

3. Implementation of 4-bit binary adder/subtractor circuits.

4. Implementation of code converters.

5. Implementation of BCD adder, encoder and decoder circuits

6. Implementation of functions using Multiplexers.

7. Implementation of the synchronous counters

8. Implementation of a Universal Shift register.

9. Simulator based study of Computer Architecture

COURSE OUTCOMES: At the end of this course, the students will be able to:

CO1 : Design various combinational digital circuits using logic gates

CO2 : Design sequential circuits and analyze the design procedures

CO3 : State the fundamentals of computer systems and analyze the execution of an instruction

CO4 : Analyze different types of control design and identify hazards

CO5 : Identify the characteristics of various memory systems and I/O communication

TOTAL:75 PERIODS

TEXT BOOKS

1. M. Morris Mano, Michael D. Ciletti, “Digital Design : With an Introduction to the Verilog HDL, VHDL, and System Verilog”, Sixth Edition, Pearson Education, 2018.

2. David A. Patterson, John L. Hennessy, “Computer Organization and Design, The Hardware/Software Interface”, Sixth Edition, Morgan Kaufmann/Elsevier, 2020.

REFERENCES

1. Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Naraig Manjikian, “Computer Organization and Embedded Systems”, Sixth Edition, Tata McGraw-Hill, 2012.

2. William Stallings, “Computer Organization and Architecture – Designing for Performance”, Tenth Edition, Pearson Education, 2016.

3. M. Morris Mano, “Digital Logic and Computer Design”, Pearson Education, 2016

Wednesday, 7 September 2022

CS3361 Exploring various commands for doing descriptive analytics on the Iris data set.

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

· To understand the python libraries for data science

· To understand the basic Statistical and Probability measures for data science.

· To learn descriptive analytics on the benchmark data sets.

· To apply correlation and regression analytics on standard data sets.

· To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:

a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

Ex 4c. Exploring various commands for doing descriptive analytics on the Iris data set.

Aim

To explore various commands for doing descriptive analytics on the Iris data set.

Procedure

To understand idea behind Descriptive Statistics.

Load the packages we will need and also the `iris` dataset.

load_iris() loads in an object containing the iris dataset, which I stored in `iris_obj`.

Basic statistics.

This number is the number of rows in the dataset, and can be obtained via `count()`.

Mean for every numeric column

Median for every numeric column

variance is a measure of dispersion, roughly the “average” squared distance of a data point from the mean.

The standard deviation is the square root of the variance and interpreted as the “average” distance a data point is from the mean.

The maximum and minimum values.

Program Code

import pandas as pd

from pandas import DataFrame

from sklearn.datasets import load_iris

# sklearn.datasetsincludes common example datasets

# A function to load in the iris dataset

iris_obj = load_iris()

# Dataset preview

iris_obj.data

iris = DataFrame(iris_obj.data, columns=iris_obj.feature_names,index=pd.Index([i for i in range(iris_obj.data.shape[0])])).join(DataFrame(iris_obj.target, columns=pd.Index(["species"]), index=pd.Index([i for i in range(iris_obj.target.shape[0])])))

iris # prints iris data

Commands

iris_obj.feature_names

iris.count()

iris.mean()

iris.median()

iris.var()

iris.std()

iris.max()

iris.min()

iris.describe()

Result

Exploring various commands for doing descriptive analytics on the Iris data set successfully executed.