Nội dung text 1. DMBI Winter 23.pdf
1 GUJARAT TECHNOLOGICAL UNIVERSITY BE – SEMESTER - VI EXAMINATION – WINTER 2023 Subject Code : 3163209 Date : 15/12/2023 Subject Name : Data Mining and Business Intelligence Total Marks : 70 Q : 1 Q.1 (a) Define: Data, Information and Knowledge. [03] Data : Data are any facts, numbers or text that can be processed by computer. Organizations are accumulating vast and growing data in different formats. example: o Operational data : sales, cost, inventory, payroll o No operational data : forecast data, macro economic data o Meta data : data about data itself Information : This is what you get when you process and organize data. It provides context and meaning to the raw data. The pattern, associations or relationship among all this data can provide information. examples : o Analysis of retail point of sales transaction data can yield information, about which products are selling. Knowledge : It's what you get when you analyze information and apply your experience and understanding. Information can be converted into knowledge about historical patterns and full trends. example : o summary info about retail supermarket sales can be converted by retailer to determine which items are most susceptible for promotional efforts. Q.1 (b) List out and explain the major steps involved in the ETL process. [04] ETL stands for Extract, Transform, Load and it is a process used in data warehousing to extract data from various sources, transform it into a format suitable for loading into a data warehouse, and then load it into the warehouse. It is a process in which an ETL tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into the Data Warehouse system. Let us understand each step of the ETL process in-depth:
2 1. Extraction : The first step of the ETL process is extraction. In this step, data from various source systems is extracted which can be in various formats like relational databases, No SQL, XML, and flat files into the staging area. It is important to extract the data from various source systems and store it into the staging area first and not directly into the data warehouse because the extracted data is in various formats and can be corrupted also. Hence loading it directly into the data warehouse may damage it and rollback will be much more difficult. Therefore, this is one of the most important steps of ETL process. 2. Transformation : The second step of the ETL process is transformation. In this step, a set of rules or functions are applied on the extracted data to convert it into a single standard format. It may involve following processes/tasks: o Filtering - loading only certain attributes into the data warehouse. o Cleaning - filling up the NULL values with some default values, mapping U.S.A, United States, and America into USA, etc. o Joining - joining multiple attributes into one. o Splitting - splitting a single attribute into multiple attributes. o Sorting - sorting tuples on the basis of some attribute (generally key-attribute). 3. Loading : The third and final step of the ETL process is loading. In this step, the transformed data is finally loaded into the data warehouse. Sometimes the data is updated by loading into the data warehouse very frequently and sometimes it is done after longer but regular intervals. The rate and period of loading solely depends on the requirements and varies from system to system. Overall, ETL process is an essential process in data warehousing that helps to ensure that the data in the data warehouse is accurate, complete, and up-to-date. However, it also comes with its own set of challenges and limitations, and organizations need to carefully consider the costs and benefits before implementing them.
3 Q.1 (c) Explain three tier data warehouse Architecture in details. [07] 1. Bottom-Tier : o Bottom tier is a warehouse database server. o It’s always DBMS relation. o Handling query and materialization. o That is data partitioning. o It is a back-end tools. 2. Middle-Tier : o It is a relation in OLAP server (ROLAP). o Extended relational DBMS operation on multidimensional. o Hybrid OLAP (HOLAP) : Its user flexibility. o Multidimensional OLAP (MOLAP) : Special purpose. o OLAP servers is support snowflake. 3. Top-Tier : o It is a front-end client layer. o Query reporting. o Data mining tool. o Client query. o Interface between middle tier. Advantages of 3-tier Architecture : Focuses on achieving logical separation. Performance is very high. Develop business logical layer. It can solve more complex problem. ----------------------------------------------------------------------------------------------------------------------------------------------------------
4 Q : 2 Q.2 (a) Differentiate ROLAP server and MOLAP server. [03] S.NO ROLAP MOLAP 1. ROLAP stands for Relational Online Analytical Processing. While MOLAP stands for Multidimensional Online Analytical Processing. 2. ROLAP is used for large data volumes. While it is used for limited data volumes. 3. The access of ROLAP is slow. While the access of MOLAP is fast. 4. In ROLAP, Data is stored in relation tables. While in MOLAP, Data is stored in multidimensional array. 5. In ROLAP, Data is fetched from data-warehouse. While in MOLAP, Data is fetched from MDDBs database. 6. In ROLAP, Complicated sql queries are used. While in MOLAP, Sparse matrix is used. 7. In ROLAP, Static multidimensional view of data is created. While in MOLAP, Dynamic multidimensional view of data is created. Q.2 (b) What is Cuboid? Explain any three OLAP Operations on Data Cube with example. [04] Cuboid : A cuboid is a sub-section of an OLAP cube. It's like a smaller cube carved out of the larger one, focusing on a specific combination of dimensions. The data cube is used by the users of the decision support system to see their data. In cube, each dimension represents some attribute in database and cell represents some measure of interest. Users can apply queries on cubes to receive data. OLAP Operations : 1) Roll-up : The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure shows the result of roll-up operations performed on the dimension location. The hierarchy for the location is defined as the Order Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location hierarchy from the level of the city to the level of the country.