Notes for Everyone

Saturday, 1 October 2022

5 b. Linear Regression and Logistic Regression with the Diabetes Dataset Using Python Machine Learning

Aim

In this experiment we use the diabetes dataset from sklearn and then we need to implement the Linear Regression over this:

Procedure

Load sklearn Libraries.

Load Data

Load the diabetes dataset

Split Dataset

Creating Model Linear Regression and Logistic Regression

Make predictions using the testing set

Finding Coefficient And Mean Square Error

Program

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn import datasets, linear_model

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

#To calculate accuracy measures and confusion matrix

from sklearn import metrics

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]

diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets

diabetes_y_train = diabetes_y[:-20]

diabetes_y_test = diabetes_y[-20:]

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# Create Logistic regression object

Logistic_model = LogisticRegression()

Logistic_model.fit(diabetes_X_train, diabetes_y_train)

# The coefficients

print('Coefficients: \n', regr.coef_)

# The mean squared error

print('Mean squared error: %.2f'

% mean_squared_error(diabetes_y_test, diabetes_y_pred))

# The coefficient of determination: 1 is perfect prediction

print('Coefficient of determination: %.2f'

% r2_score(diabetes_y_test, diabetes_y_pred))

y_predict = Logistic_model.predict(diabetes_X_train)

#print("Y predict/hat ", y_predict)

y_predict

Output

Coefficients:

[938.23786125]

Mean squared error: 2548.07

Coefficient of determination: 0.47

5 d. Compare the results of the above analysis for the two data sets.

5 d. Compare the results of the above analysis for the two data sets.

Aim

In this program, we can compare the results of the two different data sets.

Procedure

Step 1: Prepare the datasets to be compared

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames

Step 3: Compare the values between the two Pandas DataFrames

In this step, you’ll need to import the NumPy package.

Let’s say that you have the following data stored in a CSV file called car1.csv

While you have the data below stored in a second CSV file called car2.csv

Program

import pandas as pd

import numpy as np

data_1 = pd.read_csv(r'd:\car1.csv')

df1 = pd.DataFrame(data_1)

data_2 = pd.read_csv(r'd:\car2.csv')

df2 = pd.DataFrame(data_2)

df1['amount1'] = df2['amount1']

df1['prices_match'] = np.where(df1['amount'] == df2['amount1'], 'True', 'False')

df1['price_diff'] = np.where(df1['amount'] == df2['amount1'], 0, df1['amount'] - df2['amount1'])

print(df1)

Output

Model City Year amount amount1 prices_match price_diff

0 Maruti Chennai 2022 600000 600000 True 0

1 Hyndai Chennai 2022 700000 700000 True 0

2 Ford Chennai 2022 800000 850000 False -50000

3 Kia Chennai 2022 900000 900000 True 0

4 XL6 Chennai 2022 1000000 1000000 True 0

5 Tata Chennai 2022 1100000 1150000 False -50000

6 Audi Chennai 2022 1200000 1200000 True 0

7 Ertiga Chennai 2022 1300000 1300000 True 0

Please click here to download the Dataset

Dataset 1: car1.csv

Dataset 2: car2.csv

Wednesday, 21 September 2022

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

. a. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:
Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files,

Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

COURSE OUTCOMES: At the end of this course, the students will be able to: CO1: Make use of the python libraries for data science CO2: Make use of the basic Statistical and Probability measures for data science. CO3: Perform descriptive analytics on the benchmark data sets. CO4: Perform correlation and regression analytics on standard data sets CO5: Present and interpret data using visualization packages in Python

Thursday, 8 September 2022

CS3351 DIGITAL PRINCIPLES AND COMPUTER ORGANIZATION Study Material

Download

CS3351 DIGITAL PRINCIPLESAND COMPUTER ORGANIZATION L T P C 3 0 2 4

COURSE OBJECTIVES:

· To analyze and design combinational circuits.

· To analyze and design sequential circuits

· To understand the basic structure and operation of a digital computer.

· To study the design of data path unit, control unit for processor and to familiarize with the hazards.

· To understand the concept of various memories and I/O interfacing.

UNIT I COMBINATIONAL LOGIC

Combinational Circuits – Karnaugh Map - Analysis and Design Procedures – Binary Adder – Subtractor – Decimal Adder - Magnitude Comparator – Decoder – Encoder – Multiplexers - Demultiplexers

UNIT II SYNCHRONOUS SEQUENTIAL LOGIC

Introduction to Sequential Circuits – Flip-Flops – operation and excitation tables, Triggering of FF, Analysis and design of clocked sequential circuits – Design – Moore/Mealy models, state minimization, state assignment, circuit implementation - Registers – Counters.

UNIT III COMPUTER FUNDAMENTALS

Functional Units of a Digital Computer: Von Neumann Architecture – Operation and Operands of Computer Hardware Instruction – Instruction Set Architecture (ISA): Memory Location, Address and Operation – Instruction and Instruction Sequencing – Addressing Modes, Encoding of Machine Instruction – Interaction between Assembly and High Level Language.

UNIT IV PROCESSOR 9

Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control, Microprogrammed Control – Pipelining – Data Hazard – Control Hazards.

UNIT V MEMORY AND I/O 9 Memory Concepts and Hierarchy – Memory Management – Cache Memories: Mapping and Replacement Techniques – Virtual Memory – DMA – I/O – Accessing I/O: Parallel and Serial Interface – Interrupt I/O – Interconnection Standards: USB, SATA

45 PERIODS

PRACTICAL EXERCISES: 30 PERIODS

1. Verification of Boolean theorems using logic gates.

2. Design and implementation of combinational circuits using gates for arbitrary functions.

3. Implementation of 4-bit binary adder/subtractor circuits.

4. Implementation of code converters.

5. Implementation of BCD adder, encoder and decoder circuits

6. Implementation of functions using Multiplexers.

7. Implementation of the synchronous counters

8. Implementation of a Universal Shift register.

9. Simulator based study of Computer Architecture

COURSE OUTCOMES: At the end of this course, the students will be able to:

CO1 : Design various combinational digital circuits using logic gates

CO2 : Design sequential circuits and analyze the design procedures

CO3 : State the fundamentals of computer systems and analyze the execution of an instruction

CO4 : Analyze different types of control design and identify hazards

CO5 : Identify the characteristics of various memory systems and I/O communication

TOTAL:75 PERIODS

TEXT BOOKS

1. M. Morris Mano, Michael D. Ciletti, “Digital Design : With an Introduction to the Verilog HDL, VHDL, and System Verilog”, Sixth Edition, Pearson Education, 2018.

2. David A. Patterson, John L. Hennessy, “Computer Organization and Design, The Hardware/Software Interface”, Sixth Edition, Morgan Kaufmann/Elsevier, 2020.

REFERENCES

1. Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Naraig Manjikian, “Computer Organization and Embedded Systems”, Sixth Edition, Tata McGraw-Hill, 2012.

2. William Stallings, “Computer Organization and Architecture – Designing for Performance”, Tenth Edition, Pearson Education, 2016.

3. M. Morris Mano, “Digital Logic and Computer Design”, Pearson Education, 2016

Wednesday, 7 September 2022

CS3361 Exploring various commands for doing descriptive analytics on the Iris data set.

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

· To understand the python libraries for data science

· To understand the basic Statistical and Probability measures for data science.

· To learn descriptive analytics on the benchmark data sets.

· To apply correlation and regression analytics on standard data sets.

· To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:

a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

Ex 4c. Exploring various commands for doing descriptive analytics on the Iris data set.

Aim

To explore various commands for doing descriptive analytics on the Iris data set.

Procedure

To understand idea behind Descriptive Statistics.

Load the packages we will need and also the `iris` dataset.

load_iris() loads in an object containing the iris dataset, which I stored in `iris_obj`.

Basic statistics.

This number is the number of rows in the dataset, and can be obtained via `count()`.

Mean for every numeric column

Median for every numeric column

variance is a measure of dispersion, roughly the “average” squared distance of a data point from the mean.

The standard deviation is the square root of the variance and interpreted as the “average” distance a data point is from the mean.

The maximum and minimum values.

Program Code

import pandas as pd

from pandas import DataFrame

from sklearn.datasets import load_iris

# sklearn.datasetsincludes common example datasets

# A function to load in the iris dataset

iris_obj = load_iris()

# Dataset preview

iris_obj.data

iris = DataFrame(iris_obj.data, columns=iris_obj.feature_names,index=pd.Index([i for i in range(iris_obj.data.shape[0])])).join(DataFrame(iris_obj.target, columns=pd.Index(["species"]), index=pd.Index([i for i in range(iris_obj.target.shape[0])])))

iris # prints iris data

Commands

iris_obj.feature_names

iris.count()

iris.mean()

iris.median()

iris.var()

iris.std()

iris.max()

iris.min()

iris.describe()

Result

Exploring various commands for doing descriptive analytics on the Iris data set successfully executed.

Working with Numpy arrays CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

2. Working with Numpy arrays

CS3361 DATA SCIENCE LABORATORY

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files,

Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS