SUMMARY CHAPTER 4
Summary
Chapter 4:
DATABASES
AND DATA WAREHOUSES
1. Structured,
Unstructured, and Semi-Structured Information
· structured
information Facts and data that are reasonably ordered, or that can be broken
down into component parts and organized into hierarchies.
· unstructured
information Information that has no inherent structure or order, and the parts
can’t be easily linked together.
· semi-structured
information Information category that falls between structured and unstructured
information. It includes facts and data that show at least some structure, such
as web pages and documents, which bear creation dates, titles, and authors.
Metadata:
Data about data that clarifies the nature
of the information.
The
Quality of Information Not all information has high quality, as anyone who
surfs the net knows. Here are the most important characteristics that affect
quality:
· Accuracy.
Mistakes in birth dates, spelling, or price reduce the quality of the
information.
· Precision.
Rounding to the nearest mile might not reduce quality much when you estimate
the drive to the mall. However, for property surveys, “about 2 miles” is
unacceptable.
· Completeness.
Omitting the zip code on the customer’s address record might not be a problem
because the zip can be determined by the address. But leaving off the house
number would delay the order.
· Consistency.
Reports that show “total sales by region” may conflict because the people
generating the reports are using slightly different definitions. When results
are inconsistent, the quality of both reports is in question.
· Timeliness.
Outdated information has less value than up-to-date information and thus is
lower quality unless you are looking for historical trends. The actual
definition for what is up-to-date varies. In stock trading, timeliness is
measured in fractions of a second.
· Bias.
Biased information lacks objectivity, and that reduces its value and quality.
· Duplication.
Information can be redundant, resulting in misleading and exaggerated
summaries. In customer records, people can easily appear more than once if
their address changes.
2. FILE
ORGANIZATION TERMS AND CONCEPTS
A
computer system organizes data in a hierarchy that starts with bits and bytes
and progresses to fields, records, files, and databases.
· Table.
A group of records for the same entity, such as employees. Each row is one
record, and the fields of each record are arranged in the table’s column.
· Record.
A means to represent an entity, which might be a person, a product, a purchase
order, an event, a building, a vendor, a book, a video, or some other “thing”
that has meaning to people. The record is made up of attributes of that thing.
· Fields.
An attribute of an entity. A field can contain numeric data or text, or a
combination of the two. Each Field
should have Data Definition: Specifies the characteristics of a field, such as
the type of data it will hold or the maximum number of characters it can contain.
3. PROBLEMS
WITH THE TRADITIONAL FILE ENVIRONMENT
· Data
Redundancy and Inconsistency Data redundancy is the presence of duplicate data
in multiple data files so that the same data are stored in more than place or
location. Data redundancy occurs when different groups in an organization
independently collect the same piece of data and store it independently of each
other. Data redundancy wastes storage resources and also leads to data
inconsistency, where the same attribute may have different values.
To
be most useful, a database must handle three types of relationships with a
minimum of redundancy: The one-to-one relationship is relatively easy to
accommodate, and even file processing systems can handle it. The one-to-many
relationship between records is somewhat more challenging. The many-to-many
relationship is also more complicated to support.
· Program-Data
Dependence Program-data dependence refers to the coupling of data stored in
files and the specific programs required to update and maintain those files
such that changes in programs require changes to the data. Every traditional
computer program has to describe the location and nature of the data with which
it works.
· Lack
of Flexibility A traditional file system can deliver routine scheduled reports
after extensive programming efforts, but it cannot deliver ad hoc reports or
respond to unanticipated information requirements in a timely fashion. The
information required by ad hoc requests is somewhere in the system but may be
too expensive to retrieve.
· Poor
Security Because there is little control or management of data, access to and
dissemination of information may be out of control. Management may have no way
of knowing who is accessing or even making changes to the organization’s data.
· Lack
of Data Sharing and Availability Because pieces of information in different
files and different parts of the organization cannot be related to one another,
it is virtually impossible for information to be shared or accessed in a timely
manner.
4. DATABASE
MANAGEMENT SYSTEMS
Relational
DBMS. The most popular type of DBMS today for PCs as well as for larger
computers and mainframes is the relational DBMS. Relational databases represent
data as two-dimensional tables (called relations). Tables may be referred to as
files. Each table contains data on an entity and its attributes. Below, the
step to develop and manage relational databases:
· Entities
and Attributes. Each of the entities in the model will become a table, named
with a noun that describes the data contained in the entity. It will have
attributes, or fields, that describe the entity.
· Primary
Key and Uniqueness. Each record in a table must have one primary key, which is
a field, or a group of fields, that makes the record unique in that table. This
approach ensures that each record has a unique primary key and that no one
accidentally gives the same ID number to two different people. Because the
autonumber has no other meaning, there would be no reason to ever change it.
· Normalizing
The Data Model. This multistep process is called normalization, and it
minimizes duplication of information in the tables—a condition that can cause
many kinds of problems that diminish the database’s integrity. It also helps
avoid inconsistencies that can occur when users try to insert, edit, or delete
data.
· Relationship
and Foreign Keys. The relational model’s elegance really shines when the
entities are connected to one another in meaningful ways, relying on foreign
keys.
5. DATA
WAREHOUSES
A
data warehouse is a database that stores current and historical data of
potential interest to decision makers throughout the company. The data
originate in many core operational transaction systems, such as systems for
sales, customer accounts, and manufacturing, and may include data from Web site
transactions. The data warehouse consolidates and standardizes information from
different operational databases so that the information can be used across the
enterprise for management analysis and decision making.
Companies
often build enterprise-wide data warehouses, where a central data warehouse
serves the entire organization, or they create smaller, decentralized
warehouses called data marts. A data
mart is a subset of a data warehouse in which a summarized or highly focused
portion of the organization’s data is placed in a separate database for a
specific population of users.
6. MULTIDIMENSIONAL
DATA ANALYSIS AND DATA MINING
· Online
Analytical Processing (OLAP). OLAP supports multidimensional data analysis,
enabling users to view the same data in different ways using multiple
dimensions. Each aspect of information—product, pricing, cost, region, or time
period—represents a different dimension.
· Data
Mining. Data mining is more discovery-driven. Data mining provides insights
into corporate data that cannot be obtained with OLAP by finding hidden
patterns and relationships in large databases and inferring rules from them to
predict future behavior. The patterns and rules are used to guide decision making
and forecast the effect of those decisions. The types of information obtainable
from data mining include associations, sequences, classifications, clusters,
and forecasts.
· Text
Mining and Web Mining. Text mining tools are now available to help businesses
analyze these data. These tools are able to extract key elements from large
unstructured data sets, discover patterns and relationships, and summarize the
information. The Web is another rich source of valuable information, some of
which can now be mined for patterns, trends, and insights into customer
behavior. The discovery and analysis of useful patterns and information from
the World Wide Web is called Web mining.
The
Human Element and Ownership Issues Affect Information Management
· Ownership
Issues. a company may set the policy that all information resources are
company-owned, in practice, people often view these resources more
protectively, even when compliance and security don’t demand tight access
controls. Norms about how records are used emerge over time, and though many
are unwritten, they can certainly affect employees’ behavior.
· Databases
Without Boundaries. Databases without boundaries are also part of emergency
disaster relief. Online databases can help victims find missing family members,
organize volunteers, or link people who can provide shelter to those who need
it.
· Balancing
Stakeholders and Information Needs. Meeting all these needs is a balancing act
that requires leadership, compromise, negotiation, and well-designed databases.
As a shared information resource, the database fulfills its role exceptionally
well to provide a solid backbone for the whole organization and all its
stakeholders.
Source: Management Information System (Managing The Digital Firm) Twelve Edition book by Kenneth C. Laudon and Jane P. Laudon
Name: Anastasya Syanne Titahena
NIM: 01082180022
Major: Informatics
Komentar
Posting Komentar