SUMMARY CHAPTER 4


Summary Chapter 4:

DATABASES AND DATA WAREHOUSES

1. Structured, Unstructured, and Semi-Structured Information

·       structured information Facts and data that are reasonably ordered, or that can be broken down into component parts and organized into hierarchies.

·       unstructured information Information that has no inherent structure or order, and the parts can’t be easily linked together.

·       semi-structured information Information category that falls between structured and unstructured information. It includes facts and data that show at least some structure, such as web pages and documents, which bear creation dates,  titles, and authors.

Metadata: Data about data that clarifies the nature  of the information.

The Quality of Information Not all information has high quality, as anyone who surfs the net knows. Here are the most important characteristics that affect quality:

·       Accuracy. Mistakes in birth dates, spelling, or price reduce the quality of the information.

·       Precision. Rounding to the nearest mile might not reduce quality much when you estimate the drive to the mall. However, for property surveys, “about 2 miles” is unacceptable.

·       Completeness. Omitting the zip code on the customer’s address record might not be a problem because the zip can be determined by the address. But leaving off the house number would delay the order.

·       Consistency. Reports that show “total sales by region” may conflict because the people generating the reports are using slightly different definitions. When results are inconsistent, the quality of both reports is in question.

·       Timeliness. Outdated information has less value than up-to-date information and thus is lower quality unless you are looking for historical trends. The actual definition for what is up-to-date varies. In stock trading, timeliness is measured in fractions of a second.

·       Bias. Biased information lacks objectivity, and that reduces its value and quality. 

·       Duplication. Information can be redundant, resulting in misleading and exaggerated summaries. In customer records, people can easily appear more than once if their address changes.

2. FILE ORGANIZATION TERMS AND CONCEPTS

A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases.

·       Table. A group of records for the same entity, such as employees. Each row is one record, and the fields of each record are arranged in the table’s column.

·       Record. A means to represent an entity, which might be a person, a product, a purchase order, an event, a building, a vendor, a book, a video, or some other “thing” that has meaning to people. The record is made up of attributes of that thing.

·       Fields. An attribute of an entity. A field can contain numeric data or text, or a combination  of the two. Each Field should have Data Definition: Specifies the characteristics of a field, such as the type of data it will hold or the maximum number of characters it can contain.

3. PROBLEMS WITH THE TRADITIONAL FILE ENVIRONMENT

·       Data Redundancy and Inconsistency Data redundancy is the presence of duplicate data in multiple data files so that the same data are stored in more than place or location. Data redundancy occurs when different groups in an organization independently collect the same piece of data and store it independently of each other. Data redundancy wastes storage resources and also leads to data inconsistency, where the same attribute may have different values.

To be most useful, a database must handle three types of relationships with a minimum of redundancy: The one-to-one relationship is relatively easy to accommodate, and even file processing systems can handle it. The one-to-many relationship between records is somewhat more challenging. The many-to-many relationship is also more complicated to support.

·       Program-Data Dependence Program-data dependence refers to the coupling of data stored in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data. Every traditional computer program has to describe the location and nature of the data with which it works.

·       Lack of Flexibility A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion. The information required by ad hoc requests is somewhere in the system but may be too expensive to retrieve.

·       Poor Security Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of knowing who is accessing or even making changes to the organization’s data.

·       Lack of Data Sharing and Availability Because pieces of information in different files and different parts of the organization cannot be related to one another, it is virtually impossible for information to be shared or accessed in a timely manner.

4. DATABASE MANAGEMENT SYSTEMS

Relational DBMS. The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS. Relational databases represent data as two-dimensional tables (called relations). Tables may be referred to as files. Each table contains data on an entity and its attributes. Below, the step to develop and manage relational databases:

·       Entities and Attributes. Each of the entities in the model will become a table, named with a noun that describes the data contained in the entity. It will have attributes, or fields, that describe the entity.

·       Primary Key and Uniqueness. Each record in a table must have one primary key, which is a field, or a group of fields, that makes the record unique in that table. This approach ensures that each record has a unique primary key and that no one accidentally gives the same ID number to two different people. Because the autonumber has no other meaning, there would be no reason to ever change it.

·       Normalizing The Data Model. This multistep process is called normalization, and it minimizes duplication of information in the tables—a condition that can cause many kinds of problems that diminish the database’s integrity. It also helps avoid inconsistencies that can occur when users try to insert, edit, or delete data.

·       Relationship and Foreign Keys. The relational model’s elegance really shines when the entities are connected to one another in meaningful ways, relying on foreign keys.



5. DATA WAREHOUSES

A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and may include data from Web site transactions. The data warehouse consolidates and standardizes information from different operational databases so that the information can be used across the enterprise for management analysis and decision making.

Companies often build enterprise-wide data warehouses, where a central data warehouse serves the entire organization, or they create smaller, decentralized warehouses called data marts. A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users.

6. MULTIDIMENSIONAL DATA ANALYSIS AND DATA MINING

·       Online Analytical Processing (OLAP). OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension.

·       Data Mining. Data mining is more discovery-driven. Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters, and forecasts.

·       Text Mining and Web Mining. Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information. The Web is another rich source of valuable information, some of which can now be mined for patterns, trends, and insights into customer behavior. The discovery and analysis of useful patterns and information from the World Wide Web is called Web mining.

The Human Element and Ownership Issues Affect Information Management

·       Ownership Issues. a company may set the policy that all information resources are company-owned, in practice, people often view these resources more protectively, even when compliance and security don’t demand tight access controls. Norms about how records are used emerge over time, and though many are unwritten, they can certainly affect employees’ behavior.

·       Databases Without Boundaries. Databases without boundaries are also part of emergency disaster relief. Online databases can help victims find missing family members, organize volunteers, or link people who can provide shelter to those who need it.

·       Balancing Stakeholders and Information Needs. Meeting all these needs is a balancing act that requires leadership, compromise, negotiation, and well-designed databases. As a shared information resource, the database fulfills its role exceptionally well to provide a solid backbone for the whole organization and all its stakeholders.



Source: Management Information System (Managing The Digital Firm) Twelve Edition book by Kenneth C. Laudon and Jane P. Laudon


Name: Anastasya Syanne Titahena
NIM: 01082180022
Major: Informatics

Komentar

Postingan populer dari blog ini

SUMMARY CHAPTER 6

Ethics, Privacy and Security