Foundations of Business Intelligence: Database and Information Management
After reading this chapter, you will be able to:
Describe basic file organization concepts and the problems of managing data resources in a traditional file environment.
Describe the principles of a database management system and the features of a relational database.
Apply important database design principles.
Evaluate tools and technologies for providing information from databases to improve business performance and decision making.
Assess the role of information policy, data administration, and data quality assurance in the management of organizational data resources.
File Organization Concepts
Problems with the Traditional File Environment
6.2 THE DATABASE APPROACH TO DATA MANAGEMENT
Database Management Systems
Capabilities of Database Management Systems
6.3 USING DATABASES TO IMPROVE BUSINESS PERFORMANCE AND DECISION MAKING
Business Intelligence, Multidimensional Data Analysis, and Data Mining
Databases and the Web
6.4 MANAGING DATA RESOURCES
Establishing an Information Policy
Ensuring Data Quality
Improving Decision Making by Redesigning the Customer Database: Dirt Bikes USA
Achieving Operational Excellence: Building a Relational Database for Inventory Management
Improving Decision Making: Searching Online Databases for Overseas Business Resources
LEARNING TRACK MODULE
Database Design, Normalization, and Entity-Relationship Diagramming
Introduction to SQL
Hierarchical and Network Data Models
DNA Databases: Crime Fighting Weapon or Threat to Privacy?
What Can Be Done About Data Quality
OPENING CASE: NASCAR RACES TO MANAGE ITS DATA
NASCAR’s experience illustrates the importance of data management and database systems for business. NASCAR has experienced phenomenal growth but its future growth and business performance depend on what it can or can’t do with its customer data. How businesses store, organize, and manage their data has a tremendous impact on organizational effectiveness.
The chapter-opening diagram calls attention to important points raised by this case and this chapter. Data about NASCAR fans and potential customers had been stored in a number of different databases where they could not be easily retrieved and analyzed. Management decided that NASCAR’s business strategy needed to focus on creating customer intimacy, which necessitated integrating the data from all these disparate sources into a single comprehensive fan database. In addition to using appropriate technology, NASCAR had to correct and reorganize the data into a standard format and establish rules with its business partners for accessing information in the new database.
Organizing Data in a traditional
Information is becoming as important a business resource as money, material, and people. Even though a company compiles millions of pieces of data doesn’t mean it can produce information that its employees, suppliers, and customers can use. Businesses are realizing the competitive advantage they can gain by compiling useful information, not just data.
Why should you learn about organizing data? Because it’s almost inevitable that someday you’ll be establishing or at least working with a database of some kind. As with anything else, understanding the lingo is the first step to understanding the whole concept of managing and maintaining information. It all comes down to turning data into useful information, not just a bunch of bits and bytes.
The first few terms, field, record, file, database, are depicted in Figure 6–1, which shows the relationship between them.
An entity is basically the person, place, thing, or event on which we maintain information. Each characteristic or quality describing an entity is called an attribute.
Many problems such as data redundancy, data inconsistency, program-data dependence, inflexibility, poor data security, and of data sharing and availability among applications have occurred with traditional file environments.
Data Redundancy and Inconsistency
We’ve spoken about “islands of information” before. Building and maintaining databases is where this situation is most evident and most troublesome. Usually it begins in all innocence, but it can quickly grow to monstrous proportions.
For instance, after you move and change addresses, you notify everyone of your new address including your bank. Everything is going smoothly with your monthly statements. All of a sudden, at the end of the year, the bank sends a Christmas card to your old address. Why? Because your new address was changed in one database, but the bank maintains a separate database for its Christmas card list and your address was never changed in it.
If you received two Christmas cards, you’re probably a victim of data redundancy. That is, your information is now in two separate databases with duplicate records. In this instance, each database file has different data on the same record. That can be a nightmare on Main Street!
It is easy to see that the problem with data redundancy is that it wastes storage resources. Another problem with data redundancy is that it will also lead to data inconsistency. In the Christmas card example illustrated above, the address field in the two databases has two different values.
Even more troublesome is when several departments or individuals decide to set up their own islands of information. This usually happens because they find the main system inflexible or it just doesn’t fit their needs. So they set up their own fields and records and files and use them in their own programs to manipulate data according to their needs. Now each department is spending dollars and time to establish and maintain islands of information because of program-data dependence.
Taking this problem even further, the fields and records for marketing probably don’t have the same structure and meaning as the fields and records for accounting, or those for production. Each record describes basically the same entity (customers or products), but it is very possible that each database file will have different information, or attributes, in records concerning the same entity.
All of this may happen with the best of intentions. All departments began with the goal of making their part of the organization more efficient. Eventually these good intentions can cost big dollars to bring the islands together, resolve data conflicts, and retrain people to understand the new database structures.
Lack of Flexibility
A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion. To send Christmas cards to every person in the database would be programmed and easily executed. However, lets assume that you only want to send Christmas cards out to individuals who are 55 years old and live in Texas. This ad hoc request will not be easy to retrieve.
Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of knowing who is accessing or even making changes to the organization’s data.
Lack of Data Sharing and Availability
Pieces of information in different files and different parts of the organization cannot be related to one another. This situation makes it virtually impossible for information to be shared or accessed in a timely manner. For example, assume that the marketing department has a promotional program whereby all individuals placing an order last month of $1,000 or more are issued a $50 gift card on their next purchase. After placing their order, the customer later returns the product and the accounting department issues them a full credit. If the marketing department maintains their own customer sales database, then they would have no way of knowing that the customer returned the product. Thus, even customers who returned their orders would still be issued the $50 gift card. In this example you can see the necessity for data sharing and making it available to the different parts of the organization that need to access it.
The Database Approach to Data
The key to establishing an effective, efficient database is to involve the entire organization as much as possible, even if everyone will not immediately be connected to it or use it. Perhaps they won’t be a part of it in the beginning, but they very well could be later on.
You’ve heard the old saying, “Don’t put all your eggs in one basket.” When it comes to data, just the opposite is true. You want to put all your corporate data in one system that will serve the organization as a whole. Doing so makes it easier, cheaper and more efficient to use the data across the entire organization. It makes it easier to use in applications and makes it available through many different delivery methods.
A database management system (DMBS) is software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. The DBMS acts as an interface between application programs and the physical data files.
Physical views of items are often different from the logical views of the same items when they are actually being used. For instance, assume you store tablets of paper in your lower-right desk drawer. You store your pencils in the upper-left drawer. When it comes time to write your request for a pay raise, you pull out the paper and pencil and put them together on your desktop. It isn’t important to the task at hand where the items were stored physically; you are concerned with the logical idea of the two items coming together to help you accomplish the task.
The physical view of data focuses on where the data are actually stored in the record or in a file. The physical view is important to programmers who must manipulate the data as they are physically stored in the database.
Does it really matter to the user that the customer address is physically stored on the disk before the customer name? Probably not. However, when users create a report of customers located in Indiana, they generally will list the customer name first and then the address. So it’s more important to the end user to bring the data from its physical location on the storage device to a logical view in the output device, whether screen or paper.
How a DBMS Solves the Problems of the Traditional File Environment
A DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same data are repeated. A DBMS eliminates data inconsistency because the DBMS can help the organization ensure that every occurrence of redundant data has the same values. The DBMS enables the organization to centrally manage data, their use, and security.
Every tool has its job. You wouldn’t use a screwdriver to pound a nail in the wall, nor would you use a hammer to turn a bolt. Each type of database that we discuss in this section has its own advantages and disadvantages, so you should choose the right type of database for the job you want to do.
The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS. A relational DBMS uses tables in which data are stored to extract and combine data in whatever form or format the user needs. The tables are sometimes called files, although that is actually a misnomer, since you can have multiple tables in one file. (Make sure you review the description of fields and records in the text.)
Suppose you decide to create a database for your newspaper delivery business. In order to succeed, you need to keep accurate, useful information for each of your customers. You set up a database to maintain the information. The actual information about a single customer resides in a table called a row. Rows are commonly referred to as records, or in a very technical term, a tuple. Thus, for each customer, you create a record. Within each record you have the following fields: customer name, address, ID, date last paid. Smith, Jones, and Brooks are the records within a file you decide to call Paper Delivery. The entities then are Smith, Jones, and Brooks, the people about whom you are maintaining information. The attributes are customer name, address, ID, and date last paid.
Each record requires a key field, or unique identifier. The key field in this file is the ID number; perhaps you’ll use phone number because it will be unique for each record. This is a very simplistic example of a database, but it should help you understand the terminology.
In a relational database, each table contains a primary key, a unique identifier for each record. To make sure the tables relate to each other, the primary key from one table is stored in a related table as a secondary key. For instance, in the customer table the primary key is the unique customer ID. That primary key is then stored in the order table as the secondary key so that the two tables have a direct relationship.
Number of Items Ordered
Relational database tables can be combined to deliver data required by users, provided that any two tables share a common data element.
Use these three basic operations to develop relational databases:
Select: Create a subset of records meeting the stated criteria.
Join: Combine related tables to provide more information than individual tables.
Project: Create a new table from subsets of previous tables.
The biggest problem with these databases is the misconception that every data element should be stored in the same table. In fact, each data element should be analyzed in relation to other data elements, with the goal of making the tables as small in size as possible. The ideal relational database will have many small tables, not one big one. On the surface that may seem like extra work and effort, but by keeping the tables small, they can serve a wider audience because they are more flexible. This setup is especially helpful in reducing redundancy and increasing the usefulness of data.
The hierarchical DBMS presents data to users in a treelike structure. Think of a mother and her children. A child only has one mother and inherits some of her characteristics, such as eye color or hair color. A mother might have one or more children to whom she passes some of her characteristics but usually not exact ones. The child then goes on to develop her own characteristics separate from the mother.
In a hierarchical database, characteristics from the parent are passed to the child by a pointer, just as a human mother will have a genetic connection to each human child. You can demonstrate this concept to students by showing them how this database pointer works by illustrating the simple hierarchy illustrated below.
A network data model is a variation of the hierarchical model. Take the same scenario with one parent and many children and add a father and perhaps a couple of stepparents. Now the parents aren’t restricted to only one (the mother), but to many parents. That is, a parent can have many children and a child can have many parents. The parents pass on certain characteristics to the children, but the children also have their own distinct characteristics.
As with hierarchical structures, each relationship in a network database must have a pointer from all the parents to all the children and back, as this figure demonstrates.
These two types of databases, the hierarchical and the network, work well together because they can easily pass data back and forth. But these structures are not easily manipulated and require extensive technical programming to meet changing requirements. Because they are difficult to build in the first place, some businesses are hesitant to replace them with newer relational data models. They are referred to as legacy systems — systems that continue to be used because of the high cost of replacing them.
Many companies are moving away from strictly text-based database systems. Data as objects can be pictures, groups of text, voice, and audio. Object-oriented databases bring the various objects from many different sources and get them working together. If you combine the capabilities of a relational DBMS and an object-oriented database, you create an object-relational DBMS.
The next time you go to your dentist’s office, you might see a good example of an object-oriented database management system. Many sophisticated dental database programs include a traditional text-based record of your treatment history, and will also include objects such as computer-stored X-ray films, and maybe a digital photograph of the inside of your mouth. All these objects are maintained as a database record. When you visit your dentist, she can retrieve your record on the computer terminal, update your treatment history, and take new X-rays and a new digital photo, all on the computer. On the screen, she can compare last year’s X-rays with this year’s. She may even use a graphic tooth chart to mark which teeth need attention.
A Database Management System (DBMS) is basically another software program like Word or Excel or e-mail. This type of software is more complicated; it permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. A DBMS has three components, all of them important for the long-term success of the system.
Data definition language. Marketing looks at customer addresses differently from Shipping, so you must make sure that all database users are speaking the same language. Think of it this way: marketing is speaking French, production is speaking German, and human resources is speaking Japanese. They are all saying the same thing, but it’s very difficult for them to understand each other. Defining the data definition language itself sometimes gets shortchanged. The programmers who are creating the language sometimes say “Hey, an address is an address, so what.” That’s when it becomes critical to involve users in the development of the data definition language.
Querying and Reporting
Data manipulation language. This is a formal language programmers use to manipulate the data in the database and make sure they are formulated into useful information. The goal of this language should be to make it easy for users. The basic idea is to establish a single data element that can serve multiple users in different departments, depending on the situation. Otherwise, you’ll be employing programmers to get information from the database that users should be able to get on their own.
Data manipulation languages are getting easier to use and more prevalent. SQL (Structured Query Language) is the most prominent language and is now embedded in desktop applications such as Microsoft Access.
Because SQL is becoming a popular, easy method of extracting data, let’s look at a couple of the commands it uses.
Select Statement: Used to query data for specific information
Conditional Selection: Used to specify which rows of a table are displayed, based on criteria contained in the WHERE clause
Joining Two Tables: Used to combine data from two or more tables and display the results
SQL commands can be embedded in application programs written in many different languages. The manipulative characteristics of SQL have led to its popularity.
Don’t start pounding on the keyboard just yet! That’s a common mistake that may cause you many headaches later on. You have a lot of work to do before you touch the computer.
First, you should think long and hard about how you use information in your current situation. Think of how it is organized, stored, and used. Now imagine how this information could be organized better and used more easily throughout the organization. What part of the current system would you be willing to get rid of and what would you add? Involve as many end users in this planning stage as possible. They are the ones who will prosper or suffer because of the decisions you make at this point.
Normalization and Entity-Relationship Diagrams
Determine the relationships between each data element that you currently have (entity-relationship diagram). The data don’t necessarily have to be in a computer for you to consider the impact. Determine which data elements work best together and how you will organize them in tables. Break your groups of data into as small a unit as possible (normalization). Even when you say it’s as small as it can get, go back through again. Avoid redundancy between tables. Decide what the key identifier will be for each record. See, you’ve done all this and you haven’t even touched the computer yet!
Give it your best shot in the beginning: it costs a lot of time, money, and frustration to go back and make changes or corrections or to live with a poorly-designed database.
A distributed database, which is stored in more than one physical location, is usually found in very large corporations that require immediate, fast access to data at multiple sites. There are two ways to structure distributed databases:
Partition a central database so that each remote processor has the necessary data to serve its local area.
Replicate the central database at all remote locations.
As the book points out, there are lots of disadvantages so you should be careful to determine if this is the right way for you to run your business.
Using Databases to Improve Business
Performance and Decision Making
Corporations and businesses go to great lengths to collect and store information on their suppliers and customers. What they haven’t done a good job of in the past is fully using the data to take advantage of new products or markets. They’re trying, though, as we see in this section.
As organizations want and need more information about their company, their products, and their customers, the concept of data warehousing has become very popular. Remember those islands of information we keep talking about? Unfortunately, too many of them have proliferated over the years and now companies are trying to rein them in using data warehousing.
No, data warehouses are not great big buildings with shelves and shelves of bits and bytes stored on them. They are huge computer files that store old and new data about anything and everything that a company wants to maintain information on.
What Is a Data Warehouse?
A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. The data warehouse consolidates and standardizes information from different operational databases so that the information can be used across the enterprise for management analysis and decision making.
As Figure 6–13 shows, the data come from a variety of sources, both internal and external to the organization. They are then stored together in a data warehouse from which they can be accessed and analyzed to fit the user’s needs.
Since the data warehouse can be cumbersome, a company can break the information into smaller groups called data marts. It’s easier and cheaper to sort through smaller groups of data. It’s still useful to have a huge data warehouse, though, so that information is available to everyone who wants or needs it. You can let the user determine how the data will be manipulated and used. Using a data warehouse correctly can give management a tremendous amount of information that can be used to trim costs, reduce inventory, put products in the right stores at the right time, and attract new customers.
As technology improves, so does our ability to manipulate information maintained in databases. Have you ever played with a Rubik Cube — one of those little multicolored puzzle boxes you can twist around and around to come up with various color combinations? That’s a close analogy to how multidimensional data analysis or online analytical processing (OLAP) works. In theory, it’s easy to change data around to fit your needs.
By using datamining, a digital firm can get more information than ever before from its data. One danger in datamining is the problem of getting information that on the surface may seem meaningful, but when put into context of the organization’s needs, simply doesn’t provide any useful information.
For instance, datamining can tell you that on a hot summer day in the middle of Texas, more bottled water is sold in convenience stores than in grocery stores. That’s information managers can use to make sure more stock is targeted to convenience stores. Datamining could also reveal that when customers purchase white socks, they also purchase bottled water 62 percent of the time. We seriously doubt there is any correlation between the two purchases. The point is that you need to beware of using datamining as a sole source of decision making and make sure your requests are as focused as possible.
Many companies collect lots of data about their business and customers. The most difficult part has been to turn that data into useful information. With improved database technology, organizations are creating new opportunities for connecting with their customers by extracting information easier and more precisely from their data warehouses. Firms are using better datamining techniques to target customers and suppliers with just the right information at the right time.
The types of information obtainable from data mining include:
Associations — occurrences linked to a single event.
Sequences — events are linked over time.
Classification — recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules.
Clustering — works in a manner similar to classification when no groups have yet been defined.
Forecasting — uses a series of existing values to forecast what other values will be.
Predictive analysis uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events, such as the probability a customer will respond to an offer or purchase a specific product.
MIS IN ACTION
Explore the Web site for the Combined DNA Index System (CODIS) and answer the following questions.
How does CODIS work? How is it designed?
What information does CODIS maintain?
Who is allowed to use CODIS?
How does CODIS aid criminal investigations?
Even though Web browsers have only been around for a few years, they are far easier to use than most of the query languages associated with the other programs on mainframe computer systems. Many companies are finding out that it’s easier to provide their “road warriors” with Web-like browsers attached to the computer at the main office. Employees anywhere can have up-to-the-minute access to any information they need. It’s also proving cheaper to create “front-end” browser applications that can more easily link information from disparate systems than to try to combine all the systems on the “back-end”.
One of the easiest ways to make databases available to any user is by linking the internal databases to the Web through software programs that provide a connection to the database without major reconfigurations. A database server, which is a special dedicated computer, maintains the DBMS. A software program, called an application server, processes the transactions and offers data access. A user making an inquiry through the Web server can connect to the organization’s database and receive information in the form of a Web page.
Figure 6–16 shows how an application server provides the interface between the database and the Web.
The benefits of using a Web browser to access a database are as follows:
Less training for users
No changes to the internal database
Cheaper than building a new system
Creating new efficiencies and opportunities
Provide employees with integrated firmwide views of information
Managing Data Resources
Setting up a database is only a start. In order to make sure that the data for your business remain accurate, reliable, and readily available to those who need it, your business will need special policies and procedures for data management.
ESTABLISHING AN INFORMATION POLICY
Every organization needs to have rules on how the data are to be organized and maintained, and who is allowed to view the data or change them.
Information policy — No one part of the organization should feel that it owns information to the exclusivity of other departments or people in the organization. A certain department may have the primary responsibility for updating and maintaining the information, but that department still has to share it across the whole company. Well-written information policies can outline the rules for using this important resource, including how it will be shared, disseminating, acquiring, standardizing, classifying, and inventorying information.
Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. This function can help define and structure the information requirements for the entire organization to ensure it receives the attention it deserves. Data administration is responsible for the following:
Developing information policies
Planning for data
Overseeing logical database design
Data dictionary development
Monitoring the usage of data by techies and non-techies
Data governance deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise, with special emphasis on promoting privacy, security, data quality, and compliance with government regulations.
As we discussed in Chapter 4, poor data quality can have far-reaching implications that can make a company legally liable. Although all employees may share in the responsibility for maintaining good quality data, managers obviously have the greater share. And, there is a lot more to a useful database than simply listing a bunch of data elements and hoping people use them as intended.
Data quality audits can help companies identify the accuracy and completeness of data. Data cleansing also known as data scrubbing can detect and correct data and enforce consistency among different sets of data. This last point is important if an organization has combined several databases from different sources since chances are great that there are erroneous or mismatched data.
INTERACTIVE SESSION: MANAGEMENT
WHAT CAN BE DONE ABOUT DATA QUALITY?
A well-designed database and information policy will go a long way toward ensuring that the business has the information it needs. However, additional steps must be taken to ensure that the data in organizational databases are accurate and remain reliable. The Interactive Session on Management describes some companies that experienced data quality problems and how they tried to solve them. These problems were multi-faceted.
MIS IN ACTION
Visit the Web site of a data quality software vendor such as Dataflux, Trillium Software, or Group 1. Explore the site and answer the following questions:
What are the capabilities of the software for ensuring data completeness, accuracy, and consistency?
What steps would your business need to take to use this software?
Software skills: Database design, querying and reporting
Business skills: Customer profiling
Dirt Bikes sells primarily through its distributors. It maintains a small customer database with the following data: customer’s name, address, telephone number, model purchased, date of purchase, and distributor. These data are collected by its distributors when they make a sale and forwarded to Dirt Bikes. Dirt Bikes would like to be able to market more aggressively to its customers. The Marketing Department would like to be able to send customers e-mail notices of special racing events and of sales on parts. It would also like to learn more about customers’ interests and tastes: their ages, years of schooling, another sport in which they are interested, and whether they attend dirt bike racing events. Additionally Dirt Bikes would like to know whether customers own more than one motorcycle. (Some Dirt Bikes customers own two or three motorcycles from Dirt Bikes U.S.A. or other manufacturers.) If a motorcycle was purchased from Dirt Bikes, the company would like to know the date of purchase, model purchased, and distributor. If the customer owns a non-Dirt Bikes motorcycle, the company would like to know the manufacturer and model of the other motorcycle (or motorcycles), and the distributor from whom the customer purchased that motorcycle.
Redesign Dirt Bikes’s customer database so that it can store and provide the information needed for marketing. You will need to develop a design for the new customer database and then implement that design using database software. Consider using multiple tables in your new design. Populate each new table with ten records.
Develop several reports that would be of great interest to Dirt Bikes’s marketing and sales department (for example, lists of repeat Dirt Bikes customers, Dirt Bike customers who attend racing events, or the average age and years of schooling of Dirt Bikes customers) and print them.
Software skills: Database design, querying and reporting
Business skills: Inventory management
This exercise requires that students know how to create queries and reports using information from multiple tables. The solutions provided here were created using the query wizard and report wizard capabilities of Access. Students can, of course, create more sophisticated reports if they wish.
The database would need some modification to answer other important questions about the business. The owners might want to know, for example, which are the fastest-selling bicycles. The existing database shows products in inventory and their suppliers. The owners might want to add an additional table (or tables) in the database to house information about product sales, such as the product identification number, date placed in inventory, date of sale, purchase price, and customer name, address, and telephone number. Management could use this enhanced database to create reports on best selling bikes over a specific period, the number of bicycles sold during a specific period, total volume of sales over a specific period, or best customers. Students should be encouraged to think creatively about what other pieces of information should be captured on the database that would help the owners manage the business.
Prepare a report that identifies the five most expensive bicycles. The report should list the bicycles in descending order from most expensive to lease expensive, the quantity on hand for each, and the markup percentage for each.
Prepare a report that lists each supplier, its products, their quantities on hand, and associated reorder levels. The report should be sorted alphabetically by supplier. Within each supplier category, the products should be sorted alphabetically.
Prepare a report listing only the bicycles that are low in stock and need to be reordered. The report should provide supplier information for the items identified.
Write a brief description of how the database could be enhanced to further improve management of the business. What tables or fields should be added? What additional reports would be useful?
Software skills: Online database
Business Skills: Researching services for overseas operations
The Internet is a valuable source of databases where users can search for services and products in areas or countries that are far from their own locations. Your company is located in Greensboro, North Carolina and manufactures office furniture of various types. You have recently acquired several new customers in Australia, and a study you commissioned indicates that with a presence there, you could greatly increase your sales. Moreover, your study indicates that you could do even better if you actually manufactured many of your products locally (in Australia). First you need to set up an office in Melbourne to establish a presence, and then you need to begin importing from the United States. You then can plan to start producing locally.
You will soon be traveling to the area to make plans to actually set up an office, and you want to meet with organizations that can help you with your operation. You will need to engage people or organizations that offer many services necessary for you to open your office, including lawyers, accountants, import-export experts, telecommunications equipment and support, and even trainers who can help you to prepare your future employees to work for you.
1. List the companies you would contact to interview on your trip to determine whether they can help you with functions you think are vital to establishing your office.
Start by searching for U.S. Department of Commerce advice on doing business in Australia. Then try the following online databases to locate companies that you would like to meet with during your coming trip: Australian Business Register, Australia Trade Now, and the Nationwide Business Directory of Australia. If necessary, you should also try search engines such as Yahoo! (www.yahoo.com) and Google (www.google.com).
2. Rate the databases you used for accuracy of name, completeness, ease-of-use, and general helpfulness.
3. What does this exercise tell you about the design of databases?
The U.S. Department of Commerce Web site contains a fair amount of economic information. However, it may be simpler to direct your students to go to http://www.aol.com. The Web site for the Nationwide Business Directory of Australia is http://www.nationwide.com.au
Database Design, Normalization, and Entity-Relationship Diagramming
Introduction to SQL
Hierarchical and Network Data Models
Students will find Learning Track modules on these topics at the Laudon Web site for this chapter.
1. Describe basic file organization concepts and the problems of managing data resources in a traditional file environment.
A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases. Traditional file management techniques make it difficult for organizations to keep track of all of the pieces of data they use in a systematic way and to organize these data so that they can be easily accessed. Different functional areas and groups were allowed to develop their own files independently. Over time, this traditional file management environment creates problems such as data redundancy and inconsistency, program-data dependence, inflexibility, poor security, and lack of data sharing and availability.
2. Describe the principles of a database management system and the features of a relational database.
A database management system (DBMS) consists of software that permits centralization of data and data management so that businesses have a single consistent source for all their data needs. A single database services multiple applications. The most important feature of the DBMS is its ability to separate the logical and physical views of data. The user works with a logical view of data. The DBMS retrieves information so that the user does not have to be concerned with its physical location.
The principal capabilities of a DBMS include a data definition capability, a data dictionary capability, and a data manipulation language. The data definition language specifies the structure and content of the database. The data dictionary is an automated or manual file that stores information about the data in the database, including names, definitions, formats, and descriptions of data elements. The data manipulation language, such as SQL, is a specialized language for accessing and manipulating the data in the database.
The relational database is the primary method for organizing and maintaining data today in information systems. It organizes data in two-dimensional tables with rows and columns called relations. Each table contains data about an entity and its attributes. Each row represents a record and each column represents an attribute or field. Each table also contains a key field to uniquely identify each record for retrieval or manipulation.
3. Apply important database design principles.
Designing a database requires both a logical design and a physical design. The logical design models the database from a business perspective. The organization’s data model should reflect its key business processes and decision-making requirements. The process of creating small, stable, flexible, and adaptive data structures from complex groups of data when designing a relational database is termed normalization. A well-designed relational database will not have many-to-many relationships, and all attributes for a specific entity will only apply to that entity. An entity-relationship diagram graphically depicts the relationship between entities (tables) in a relational database. Database design also considers whether a complete database or portions of the database can be distributed to more than one location to increase responsiveness and reduce vulnerability and costs. There are two major types of distributed databases: replicated databases and partitioned databases.
4. Evaluate tools and technologies for providing information from databases to improve business performance and decision making.
Powerful tools are available to analyze and access the information in databases. A data warehouse consolidates current and historical data from many different operational systems in a central database for reporting and analysis. Data warehouses support multidimensional data analysis, also known as online analytical processing (OLAP). OLAP represents relationships among data as a multidimensional structure, which can be visualized as cubes of data and cubes within cubes of data, enabling more sophisticated data analysis. Data mining analyzes large pools of data, including the contents of data warehouses, to find patterns and rules that can be used to predict further behavior and guide decision making. Conventional databases can be linked via middleware to the Web or a Web interface to facilitate user access to an organization’s internal data.
5. Assess the role of information policy, data administration, and data quality assurance in the management of organizational data resources.
Developing a database environment requires policies and procedures for managing organizational data as well as a good data model and database technology. A formal information policy governs the maintenance, distribution, and use of information in the organization. In large corporations, a formal data administration function is responsible for information policy, as well as for data planning, data dictionary development, and monitoring data usage in the firm.
Data that are inaccurate, incomplete, or inconsistent create serious operational and financial problems for businesses because they may create inaccuracies in product pricing, customer accounts, and inventory data, and lead to inaccurate decisions about the actions that should be taken by the firm. Firms must take special steps to make sure they have a high level of data quality. These include using enterprise-wide data standards, databases designed to minimize inconsistent and redundant data, data quality audits, and data cleansing software.
The following alphabetical list identifies the key terms discussed in this chapter.
Attributes — a piece of information describing a particular entity.
Business Intelligence — Applications and technologies to help users make better business decisions.
Data administration — a special organizational function for managing the organization’s data resources that is concerned with information policy, data planning, maintenance of data dictionaries, and data quality standards.
Data cleansing — activities for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant. Also known as data scrubbing.
Data definition language — the component of a database management system that defines each data element as it appears in the database.
Data dictionary — an automated or manual tool for storing and organizing information about the data maintained in a database.
Data governance — deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise, with special emphasis on promoting privacy, security, data quality, and compliance with government regulations.
Data inconsistency — the presence of different values for the same attribute when the same data are stored in multiple locations.
Data manipulation language — a language associated with a database management system that end users and programmers use to manipulate data in the database.
Data mart — a small data warehouse containing only a portion of the organization’s data for a specified function or population of users.
Data mining — analysis of large pools of data to find patterns and rules that can be used to guide decision making and predict future behavior.
Data quality audit — a survey and/or sample of files to determine accuracy and completeness of data in an information system.
Data redundancy — the presence of duplicate data in multiple data files.
Data warehouse — a database with reporting and query tools that stores current and historical data extracted from various operational systems and consolidated for management reporting and analysis.
Database — a group of related files.
Database (rigorous definition) — a collection of data organized to service many applications at the same time by storing and managing data so that they appear to be in one location.
Database administration — refers to the more technical and operational aspects of managing data, including physical database design and maintenance.
Database management system (DBMS) — special software to create and maintain a database and enable individual business applications to extract the data they need without having to create separate files or data definitions in their computer programs.
Database server — a computer in a client/server environment that is responsible for running a database management system (DBMS) to process structured query language (SQL) statements and perform database management tasks.
Distributed database — a database that is stored in more than one physical location. Parts or copies if the database are physically stored in one location, and other parts or copies are stored and maintained in other locations.
Entity — a person, place, thing, or event about which information must be kept.
Entity-relationship diagram — a methodology for documenting databases illustrating the relationship between various entities in the database.
Field — a group of characters into a word, a group of words, or a complete number, such as a person’s name or age.
File — A group of records of the same type.
Foreign key — field in a database table that enables users to find related information in another database table.
Information policy — formal rules governing the maintenance, distribution, and use of information in an organization.
Key field — a field in a record that uniquely identifies instances of that record so that it can be retrieved, updated, or sorted.
Normalization — the process of creating small stable data structures for complex groups of data when designing a relational database.
Object-oriented DBMS — an approach to data management that stores both data and the procedures acting on the data as objects that can be automatically retrieved and shared; the objects can contain multimedia.
Object-relational DBMS — a database management system that combines the capabilities of a relational database management system (DBMS) for storing traditional information and the capabilities of an object-oriented DBMS for storing graphics and multimedia.
Online analytical processing (OLAP) — capability for manipulating and analyzing large volumes of data from multiple perspectives.
Predictive analysis — Use of datamining techniques,historical data, and assumptions about future conditions to predict outcomes of events.
Primary key — unique identifier for all the information in any row of a database table.
Program-data dependence — the close relationships between data stored in files and the software programs that update and maintain those files. Any change in data organization or format requires a change in all the programs associated with those files.
Record — a group of related fields.
Relational DBMS — a type of logical database model that treats data as if they were stored in two-dimensional tables. It can related data stored in one table to data in another as long as the two tables share a common data element.
Structured query language (SQL) — the standard data manipulation language for relational database management systems.
Tuple — a row or record in a relational database.