Notes for Everyone

Wednesday, 7 September 2022

CS3361 Exploring various commands for doing descriptive analytics on the Iris data set.

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

· To understand the python libraries for data science

· To understand the basic Statistical and Probability measures for data science.

· To learn descriptive analytics on the benchmark data sets.

· To apply correlation and regression analytics on standard data sets.

· To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:

a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

COURSE OUTCOMES: At the end of this course, the students will be able to: CO1: Make use of the python libraries for data science CO2: Make use of the basic Statistical and Probability measures for data science. CO3: Perform descriptive analytics on the benchmark data sets. CO4: Perform correlation and regression analytics on standard data sets CO5: Present and interpret data using visualization packages in Python

Ex 4c. Exploring various commands for doing descriptive analytics on the Iris data set.

Aim

To explore various commands for doing descriptive analytics on the Iris data set.

Procedure

To understand idea behind Descriptive Statistics.

Load the packages we will need and also the `iris` dataset.

load_iris() loads in an object containing the iris dataset, which I stored in `iris_obj`.

Basic statistics.

This number is the number of rows in the dataset, and can be obtained via `count()`.

Mean for every numeric column

Median for every numeric column

variance is a measure of dispersion, roughly the “average” squared distance of a data point from the mean.

The standard deviation is the square root of the variance and interpreted as the “average” distance a data point is from the mean.

The maximum and minimum values.

Program Code

import pandas as pd

from pandas import DataFrame

from sklearn.datasets import load_iris

# sklearn.datasetsincludes common example datasets

# A function to load in the iris dataset

iris_obj = load_iris()

# Dataset preview

iris_obj.data

iris = DataFrame(iris_obj.data, columns=iris_obj.feature_names,index=pd.Index([i for i in range(iris_obj.data.shape[0])])).join(DataFrame(iris_obj.target, columns=pd.Index(["species"]), index=pd.Index([i for i in range(iris_obj.target.shape[0])])))

iris # prints iris data

Commands

iris_obj.feature_names

iris.count()

iris.mean()

iris.median()

iris.var()

iris.std()

iris.max()

iris.min()

iris.describe()

Result

Exploring various commands for doing descriptive analytics on the Iris data set successfully executed.

Working with Numpy arrays CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

2. Working with Numpy arrays

CS3361 DATA SCIENCE LABORATORY

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3361 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files,

Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

b. Bivariate analysis: Linear and logistic regression modeling

c. Multiple Regression analysis

d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets.

a. Normal curves

b. Density and contour plots

c. Correlation and scatter plots

d. Histograms

e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

Wednesday, 24 August 2022

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 lab manual

CS3362 DATA SCIENCE LABORATORY L T P C 0 0 4 2 COURSE OBJECTIVES:

 To understand the python libraries for data science

 To understand the basic Statistical and Probability measures for data science.

 To learn descriptive analytics on the benchmark data sets.

 To apply correlation and regression analytics on standard data sets.

 To present and interpret data using visualization packages in Python.

LIST OF EXPERIMENTS:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy arrays

3. Working with Pandas data frames

4. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and Kurtosis. b. Bivariate analysis: Linear and logistic regression modeling c. Multiple Regression analysis d. Also compare the results of the above analysis for the two data sets.

6. Apply and explore various plotting functions on UCI data sets. a. Normal curves b. Density and contour plots c. Correlation and scatter plots d. Histograms e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap

List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS

Friday, 19 August 2022

CS3391 OBJECT ORIENTED PROGRAMMING L T P C 3 0 0 3

CS3391 OOP Syllabus R21

Anna University Regulation 2021 CSE CS2291 OOP Notes, OBJECT ORIENTED PROGRAMMING Lecture Handwritten Notes for all 5 units are provided below. Download link for CSE 3rd Sem OBJECT ORIENTED PROGRAMMING Notes are listed down for students to make perfect utilization and score maximum marks with our study materials.

Download the syllabus

CS3391 OBJECT ORIENTED PROGRAMMING L T P C 3 0 0 3

COURSE OBJECTIVES:

· To understand Object Oriented Programming concepts and basics of Java programming language

· To know the principles of packages, inheritance and interfaces

· To develop a java application with threads and generics classes

· To define exceptions and use I/O streams

· To design and build Graphical User Interface Application using JAVAFX

UNIT I INTRODUCTION TO OOP AND JAVA 9

Overview of OOP – Object oriented programming paradigms – Features of Object Oriented Programming – Java Buzzwords – Overview of Java – Data Types, Variables and Arrays – Operators – Control Statements – Programming Structures in Java – Defining classes in Java – Constructors-Methods -Access specifiers - Static members- JavaDoc comments

UNIT II INHERITANCE, PACKAGES AND INTERFACES 9

Overloading Methods – Objects as Parameters – Returning Objects –Static, Nested and Inner Classes. Inheritance: Basics– Types of Inheritance -Super keyword -Method Overriding – Dynamic Method Dispatch –Abstract Classes – final with Inheritance. Packages and Interfaces: Packages – Packages and Member Access –Importing Packages – Interfaces.

UNIT III EXCEPTION HANDLING AND MULTITHREADING 9

Exception Handling basics – Multiple catch Clauses – Nested try Statements – Java’s Built-in Exceptions – User defined Exception. Multithreaded Programming: Java Thread Model–Creating a Thread and Multiple Threads – Priorities – Synchronization – Inter Thread Communication[1]Suspending –Resuming, and Stopping Threads –Multithreading. Wrappers – Auto boxing.

UNIT IV I/O, GENERICS, STRING HANDLING 9

I/O Basics – Reading and Writing Console I/O – Reading and Writing Files. Generics: Generic Programming – Generic classes – Generic Methods – Bounded Types – Restrictions and Limitations. Strings: Basic String class, methods and String Buffer Class..

UNIT V JAVAFX EVENT HANDLING, CONTROLS AND COMPONENTS 9

JAVAFX Events and Controls: Event Basics – Handling Key and Mouse Events. Controls: Checkbox, ToggleButton – RadioButtons – ListView – ComboBox – ChoiceBox – Text Controls – ScrollPane. Layouts – FlowPane – HBox and VBox – BorderPane – StackPane – GridPane. Menus – Basics – Menu – Menu bars – MenuItem.

COURSE OUTCOMES: On completion of this course, the students will be able to

CO1:Apply the concepts of classes and objects to solve simple problems

CO2:Develop programs using inheritance, packages and interfaces

CO3:Make use of exception handling mechanisms and multithreaded model to solve real world problems

CO4:Build Java applications with I/O packages, string classes, Collections and generics concepts CO5:Integrate the concepts of event handling and JavaFX components and controls for developing GUI based applications

TOTAL:45 PERIODS

TEXT BOOKS:

1. Herbert Schildt, “Java: The Complete Reference”, 11 th Edition, McGraw Hill Education, New Delhi, 2019

2. Herbert Schildt, “Introducing JavaFX 8 Programming”, 1 st Edition, McGraw Hill Education, New Delhi, 2015

REFERENCE:

1. Cay S. Horstmann, “Core Java Fundamentals”, Volume 1, 11 th Edition, Prentice Hall, 2018.

CD3291 DATA STRUCTURES AND ALGORITHMS L T P C 3 0 0 3

COURSE OBJECTIVES:

· To understand the concepts of ADTs

· To design linear data structures – lists, stacks, and queues

· To understand sorting, searching, and hashing algorithms

· To apply Tree and Graph structures

UNIT I ABSTRACT DATA TYPES 9

Abstract Data Types (ADTs) – ADTs and classes – introduction to OOP – classes in Python – inheritance – namespaces – shallow and deep copying Introduction to analysis of algorithms – asymptotic notations – divide & conquer – recursion – analyzing recursive algorithms

UNIT II LINEAR STRUCTURES 9

List ADT – array-based implementations – linked list implementations – singly linked lists – circularly linked lists – doubly linked lists – Stack ADT – Queue ADT – double ended queues – applications UNIT

III SORTING AND SEARCHING 9

Bubble sort – selection sort – insertion sort – merge sort – quick sort – analysis of sorting algorithms – linear search – binary search – hashing – hash functions – collision handling – load factors, rehashing, and efficiency

UNIT IV TREE STRUCTURES 9

Tree ADT – Binary Tree ADT – tree traversals – binary search trees – AVL trees – heaps – multi[1]way search trees

UNIT V GRAPH STRUCTURES 9

Graph ADT – representations of graph – graph traversals – DAG – topological ordering – greedy algorithms – dynamic programming – shortest paths – minimum spanning trees – introduction to complexity classes and intractability

TOTAL: 45 PERIODS

COURSE OUTCOMES: At the end of the course, the student should be able to:

CO1:Explain abstract data types

CO2:Design, implement, and analyze linear data structures, such as lists, queues, and stacks, 27 according to the needs of different applications

CO3:Design, implement, and analyze efficient tree structures to meet requirements such as searching, indexing, and sorting

CO4:Model problems as graph problems and implement efficient graph algorithms to solve them

TEXT BOOK:

1. Michael T. Goodrich, Roberto Tamassia, and Michael H. Goldwasser, “Data Structures & Algorithms in Python”, An Indian Adaptation, John Wiley & Sons Inc., 2021

REFERENCES:

1. Lee, Kent D., Hubbard, Steve, “Data Structures and Algorithms with Python” Springer Edition 2015

2. Rance D. Necaise, “Data Structures and Algorithms Using Python”, John Wiley & Sons, 2011

3. Aho, Hopcroft, and Ullman, “Data Structures and Algorithms”, Pearson Education, 1983.

4. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, “Introduction to Algorithms", Second Edition, McGraw Hill, 2002.

5. Mark Allen Weiss, “Data Structures and Algorithm Analysis in C++”, Fourth Edition, Pearson Education, 2014

CS3353 FOUNDATIONS OF DATA SCIENCE L T P C 3 0 0 3

COURSE OBJECTIVES:

· To understand the data science fundamentals and process.

· To learn to describe the data for the data science process.

· To learn to describe the relationship between data.

· To utilize the Python libraries for Data Wrangling.

· To present and interpret data using visualization libraries in Python

UNIT I INTRODUCTION 9

Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the model– presenting findings and building applications - Data Mining - Data Warehousing – Basic Statistical descriptions of Data

UNIT II DESCRIBING DATA 9

Types of Data - Types of Variables -Describing Data with Tables and Graphs –Describing Data with Averages - Describing Variability - Normal Distributions and Standard (z) Scores

UNIT III DESCRIBING RELATIONSHIPS 9

Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula for correlation coefficient – Regression –regression line –least squares regression line – Standard error of estimate – interpretation of r2 –multiple regression equations –regression towards the mean

UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING 9

Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, boolean logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and selection – operating on data – missing data – Hierarchical indexing – combining datasets – aggregation and grouping – pivot tables

UNIT V DATA VISUALIZATION 9

Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots – Histograms – legends – colors – subplots – text and annotation – customization – three dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.

COURSE OUTCOMES: At the end of this course, the students will be able to:

CO1: Define the data science process

CO2: Understand different types of data description for data science process

CO3: Gain knowledge on relationships between data

CO4: Use the Python Libraries for Data Wrangling

CO5: Apply visualization Libraries in Python to interpret and explore data

TOTAL:45 PERIODS

TEXTBOOKS:

1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016. (Unit I)

2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017. (Units II and III)

3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Units IV and V)

REFERENCE:

1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,2014