Nội dung text Syllabus.pdf
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY (DEEMED TO BE UNIVERSITY) SCHOOL OF COMPUTER ENGINEERING BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE & ENGINEERING Course Objectives: The objective of the course is to learn, understand, and practice of Data Mining and Data Warehousing Course Outcomes: The students learning outcomes are designed to specify what the students will be able to perform after completion of the course: CO1: understand the basic principles, concepts & applications of data mining and familiar with mathematical foundations of data mining tools. CO2: understand the fundamental concepts, benefits, problem areas associated with data warehousing along with various architectures and main components of a data warehousing. CO3: characterize the kinds of patterns that can be discovered by association rule mining algorithms. CO4: understand various classification and prediction algorithms to solve the real problems. CO5: understand various clustering algorithms to solve the real problems. CO6: develop ability to design various algorithms based on data mining tools to solve web, spatial, Temporal, text and multimedia data. TEXT BOOK: 1. J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, 4th Edition, Morgan Kaufman,2015. REFERENCE: 1. H. Dunham. Data Mining: Introductory and Advanced Topics. Pearson Education. 2006. 2. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. 2000. 3. D. Hand, H. Mannila and P. Smyth. Principles of Data Mining.Prentice-Hall. 2001. LESSON PLAN UNIT TOPIC BOOKS FOR REFERENCE NO. OF HOURS REQUIRED TEACHING METHODOLOGY UNIT I: INTRODUCTION TO DATA MINING (5 Hrs.) 1.1 Introduction to Data Mining What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? A Brief History of Data Mining and Data Mining Society Text Book 1 Online, PPT, Handouts 1.2 Major Issues in Data Mining Mining Methodology User Interaction Text Book 1 Online, PPT, Handouts Sub.Code : IT 3031 Year / Sem : III / V Sub.Name : DATA MINING AND DATA WAREHOUSING Batch : 2020-2024 Faculty Name : Dr. Hrudaya Kumar Tripathy, Dr. Ajay Kumar Jena, Ms. Santwana Sagnika, Dr. Satarupa Mohanty, Dr. Satyaranjan Jena, Dr. Minakhi Rout (CC) Academic Year : 2022-2023
Effificiency and Scalability Diversity of Database Types Data Mining and Society 1.3 Data Mining Metrics Getting to Know Your Data Data Objects and Attribute Types (About Attribute, Nominal Attributes, Binary Attributes,Ordinal Attributes, Numeric Attributes, Discrete versus Continuous Attributes) Data Mining from a Database Perspective Text Book 1 Online, PPT, Handouts 1.4 Basic Statistical Descriptions of Data Measuring the Central Tendency: Mean, Median, and Mode Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Text Book 1 Online, PPT, Handouts 1.5 A Statistical Perspective on Data Mining. Graphic Displays of Basic Statistical Descriptions of Data 1 Online, PPT, Handouts UNIT II: DATA WAREHOUSING AND PREPROCESSING (8 Hrs.) 2.1 Data Warehousing, Data Warehousing Architecture What Is a Data Warehouse? Differences between Operational Database Systems and Data Warehouses Data Warehousing: A Multitiered Architecture Text Book 1 Online, PPT, Handouts 2.2 Data Warehouse Models: Enterprise Warehouse, Data Mart, and Vir tual Warehouse Extraction, Transformation, and Loading Text Book 1 Online, PPT, Handouts 2.3 OLTP, OLAP Data Cube: A Multidimensional Data Model Stars, Snowflflakes, and Fact Constellations: Schemas for Multidimensional Data Models Text Book 1 Online, PPT, Handouts 2.4 Typical OLAP Operations From Online Analytical Processing to Multidimensional Data Mining Text Book 1 Online, PPT, Handouts 2.5 Preprocessing Techniques A Statistical Perspective on Data Mining Data Preprocessing (Data Quality: Why Preprocess the Data?, Major Tasks in Data Preprocessing) Data Cleaning (Missing Values, Noisy Data) Text Book 1 Online, PPT, Handouts 2.6 Data Integration (Entity Identifification Problem, Redundancy and Correlation Analysis, Tuple Duplication, Data Value Conflflict Detection and Resolution) Text Book 1 Online, PPT, Handouts 2.7 Similarity Measures Text Book 1 Online, PPT, Handouts 2.8 Data Sampling Probability Sampling Non-Probability Sampling Text Book 1 Online, PPT, Handouts
U N I T I I I : A S S O C I A T I O N R U L E S ( 5 H r s . ) 3 . 1 B a s i c A l g o r i t h m s fo r A s s o c i a t i o n R u l e M a r k e t B a s k e t A n a l y s i s F r e q u e n t I t e m s e t s , a n d C l o s e d I t e m s e t s A s s o c i a t i o n R u l e s T e x t B o o k 1 O n l i n e , P P T , H a n d o u t s 3 . 2 I n c r e m e n t a l A s s o c i a t i o n R u l e s A p r i o r i A l g o r i t h m : F i n d i n g F r e q u e n t I t e m s e t s b y C o n f i f i n e d C a n d i d a t e G e n e r a t i o n T e x t B o o k 1 O n l i n e , P P T , H a n d o u t s 3 . 3 G e n e r a t i n g A s s o c i a t i o n R u l e s f r o m F r e q u e n t I t e m s e t s T e x t B o o k 1 O n l i n e , P P T , H a n d o u t s 3 . 4 M e a s u r i n g t h e Q u a l i t y o f R u l e s W h i c h P a t t e r n s A r e I n t e r e s t i n g ? — P a t t e r n E v a l u a t i o n M e t h o d s I m p r o v i n g t h e E ff i f i c i e n c y o f A p r i o r i T e x t B o o k 1 O n l i n e , P P T , H a n d o u t s 3 . 5 A d v a n c e d A s s o c i a t i o n R u l e A s s o c i a t i o n s a n d C o r r e l a t i o n m e t h o d s T e x t B o o k 1 O n l i n e , P P T , H a n d o u t s U N I T I V : C L A S S I F I C A T I O N ( 9 H r s ) 4 . 1 I s s u e s r e g a r d i n g C l a s s i f i c a t i o n a n d P r e d i c t i o n O t h e r c l a s s i f i c a t i o n m e t h o d s 8 . 1 . 1 , 8 . 1 . 2 1 O n l i n e , P P T , H a n d o u t s 4 . 2 S t a t i s t i c a l - B a s e d A l g o r i t h m s R e g r e s s i o n F u n d a m e n t a l I d e a s w i t h s o m e e x a m p l e s o n R e g r e s s i o n m o d e l s . 1 O n l i n e , P P T , H a n d o u t s 4 . 3 B a y e s i a n C l a s s i f i c a t i o n 8 . 3 . 1 , 8 . 3 . 2 1 O n l i n e , P P T , H a n d o u t s 4 . 4 D i s t a n c e - B a s e d A l g o r i t h m s K N e a r e s t N e i g h b o u r ( K N N ) 9 . 5 . 1 1 O n l i n e , P P T , H a n d o u t s 4 . 5 D e c i s i o n T r e e - B a s e d A l g o r i t h m s D e c i s i o n T r e e I s s u e s F a c e d b y D T A l g o r i t h m s 8 . 2 . 1 1 O n l i n e , P P T , H a n d o u t s 4 . 6 I D 3 A l g o r i t h m E n t r o p y , P r u n i n g 8 . 2 . 2 ( O n l y E n t r o p y a n d i n fo r m a t i o n G a i n ) , 8 . 2 . 3 1 O n l i n e , P P T , H a n d o u t s 4 . 7 N e u r a l N e t w o r k N N P r o p a g a t i o n a n d E r r o r S u p e r v i s e d L e a r n i n g i n N N 9 . 2 . 1 1 O n l i n e , P P T , H a n d o u t s 4 . 8 P e r c e p t r o n s M L P ( M u l t i l a y e r P e r c e p t r o n ) 9 . 2 . 2 , 9 . 2 . 3 1 O n l i n e , P P T , H a n d o u t s 4 . 9 A d v a n c e d C l a s s i f i c a t i o n m e t h o d s ( G e n e t i c , R o u g h S e t , F u z z y S e t ) 9 . 6 . 1 , 9 . 6 . 2 , 9 . 6 . 3 ( O n l y a p p r o a c h ) 1 O n l i n e , P P T , H a n d o u t s U N I T V : C L U S T E R I N G ( 5 H r s ) 5 . 1 H i e r a r c h i c a l A l g o r i t h m s A g g l o m e r a t i v e H i e r a r c h i c a l c l u s t e r i n g a l g o r i t h m ( A G N E S ) D e n d o g r a m 1 0 . 3 , 1 0 . 3 . 1 1 O n l i n e , P P T , H a n d o u t s 5 . 2 D i v i s i v e H i e r a r c h i c a l c l u s t e r i n g a l g o r i t h m ( D I A N A ) E x a m p l e 1 0 . 3 1 O n l i n e , P P T , H a n d o u t s 5 . 3 P a r t i t i o n a l A l g o r i t h m s k - m e a n s 1 0 . 1 . 1 , 1 0 . 1 . 2 , , 1 0 . 2 . 1 1 O n l i n e , P P T , H a n d o u t s
5.4 Clustering Large Databases 10.4.1 1 Online, PPT, Handouts 5.5 Clustering with Categorical Attributes 1 Online, PPT, Handouts UNIT VI: ADVANCED TECHNIQUES (4 Hrs) 6.1 Web Mining From reference/web contents/resea rch articles, etc. 1 Online, PPT, Handouts 6.2 Spatial Mining From reference/web contents/resea rch articles, etc. 1 Online, PPT, Handouts 6.3 Temporal Mining, Text Mining From reference/web contents/resea rch articles, etc. 1 Online, PPT, Handouts 6.4 Multimedia Mining From reference/web contents/resea rch articles, etc. 1 Online, PPT, Handouts DMDW Activity Chart 1. Activity based Teaching and Learning: Considering the guidelines circulated and after discussing with the faculty members, following omponent wise description of each activity list is proposed: Activity List Component wise distributions of the activities are listed below. i). Problem Solving : 15 Marks ii). Quiz : 10 Marks iii). Critical Thinking : 05 Marks i). Problem solving (15 marks): Activity/Assignment Assignments have to be solved in a group/individual and mentioned below for reference only. Faculties are free to give their own assignments and evaluation is to be done by respective assigned subject teacher. Subject teacher have to decide the number of groups and students for each group. Students are expected to write the solution in the writing pad and submit the soft copy to the subject teacher. Assignment-1 (Introduction) Assignment-2 (Data Warehousing and Preprocessing) Assignment-3 (Association Rules) Assignment-4 (Classification)