Magazine: Information Systems Management, Winter 1999
Database development methodologies are a subset of full life cycle methodologies that can be used to build databases from the requirements phase through to implementation. Database development methodologies that are iterative, with extensive user involvement and that are packaged with reusable templates or deliverables offer a good opportunity for success. When selecting one for an organization, find one that is simple to learn, simple to use, and electronically deployed for quick access by project teams.
DATABASE DEVELOPMENT IS ONE of the fundamental objectives of the data management function and certainly one of the end products of the process. In recent years, several trends have impacted the way that databases are built and the role they play in the overall organization. Some of these trends include, data warehousing, object-oriented technology, E-commerce, and the emergence of very large databases (VLDBs). Other changes to the landscape include the popularity of complex data types (e.g., BLOBs, video), universal databases, and object databases. Despite this, the basis of many online transaction processing applications (OLTP) that run the business is still the relational database and the flat files. This fact is not going to change dramatically over the next few years. If anything, the relational database has proven its value as an enterprise enabler and, like the IBM mainframe, is here to stay for the foreseeable future.
This column defines a database development methodology and approach that has proven successful on a variety of projects, such as $100,000 to $15,000,000 budgets, mainframe, client/server, three-tier with OO, and package implementations. This approach promotes viewing methodologies as flexible frameworks that are customized for every specific instance. This allows data-oriented teams to use their personal insight and experience alongside the best practices embedded in the methodology. This column also defines organizational roles for a data-oriented environment.
BENEFITS
The complexity that is inherent in constructing relational database solutions can be reduced by using proven database development methodologies on projects. Methodologies are an excellent type of best practices and project lessons. Use of methodologies therefore reduces risk on development projects. Methodologies define activities and deliverables that are constructed in projects that were successful. Following these successful lessons can reduce project development time while increasing product quality. Furthermore, the use of methodologies simplifies thc process of tracking project progress because there are clear benchmarks that can be reviewed by the project manager. Methodologies that offer templates/deliverables also allow a quickstart to the development process.
SELECTING A DATABASE DEVELOPMENT METHODOLOGY
Development methodologies with well-defined database development phases are commonly available in the marketplace. Some are freely available with modeling or project management tools, and others are found on the World Wide Web. Many of the larger consulting firms have developed proprietary methodologies based on their corporate project experiences and proven best practices. These can be purchased separately, or they can be bundled with consulting/mentoring services retained from the firm. The following list identifies some of the features that should be included in any database development methodology that is being considered for deployment in an organization:
· Linkage to a full life cycle development methodology: A full life cycle methodology supports more than database development. The database development methodology chosen either should be a component of a larger full life cycle methodology, or it should link seamlessly with one. Failure to do this could result in mismatched techniques or the development of deliverables that are not used.
· Techniques: Many popular development methodologies support a combination of techniques to streamline development of deliverables. The traditional waterfall approach involves producing deliverables in a sequential fashion. Deliverable B is not started until deliverable A is completed and signed off. This approach, however, historically has proven to be slow on many projects of all sizes. As a result of this experience, a rapid application development (PAD) approach has gained popularity in the past ten years. RAD produces deliverables in a much smaller timeframe than the older waterfall approach. Iteration and prototyping are cornerstones of most RAD approaches, as are teams that combine technical resources and users during the analysis and design phases of the project lifecycle. RAD has proven to be successful on smaller projects, but has been problematic on the larger ones due to the complexity of the business requirements. A relatively new approach combines the best elements of both the waterfall and RAD approaches and has proven valuable on larger development projects.
· Support: A development methodology (or a database development methodology) is a product, whether the organization has paid for it or not. As such, it is important for the methodology to be supported by the vendor into the future. An unsupported methodology becomes obsolete in short order. Some questions to ask the vendor include "How much research is being conducted to improve the methodology? .... Is there a hotline for technical support?" and "When is the next release of the methodology being released?"
· Price: The price of the methodology should be considered in whole and in parts and assessed against the value that is received. Consider the one-time cost, the training costs, upgrade costs, yearly license fees, costs per user, customization costs, hardware/software support costs, and costs for future releases.
· Vendor: Consider the stability and market share of the vendor providing the methodology. The vendor's references also should be checked to ascertain their support for clients. Vendors who are more stable and who have more marketshare are more likely to improve their methodology with new techniques in the future;
· Proven Success: One of the surest ways of selecting a suitable methodology is to check the references of similar organizations that have used it successfully on development projects.
· Electronic Availability: The methodology should be available electronically through Lotus Notes, the Internet, or CD-ROM. It also should be available on paper. This makes the methodology widely available to those using it across the organization.
· Templates/Deliverables: Reusable templates and deliverables are a good source of best practices that provide the means for quickstarting development projects. Many methodologies are demonstrated with these, but the templates/deliverables are not provided to customers. In such cases, it is valuable to try to negotiate the inclusion of templates/deliverables as part of the transaction. If the templates/deliverables still are not offered by the vendor, but the rest of the methodology is acceptable, a pilot project should be used to create reusable templates and deliverables for future projects to use. Although this may slow the pilot project down in the short term, subsequent projects will run more efficiently. It is also desirable to select a methodology architecture that allows additional templates and deliverables to be added to the database on an ongoing basis.
· Linkages to newer architectures: The methodology also should support linkages with modules that support data warehousing, object technology, E-commerce, and Web architectures. Flexibility in expanding the methodology directly or through deliverable linkages is desirable.
· Ease of Learning and Use: Methodologies that are easy to learn and use are more likely to be used on projects. Some methodologies are packaged with training courses from the vendor or other third parties.
It is not unusual to add to this list of features or to assign more weight to a handful of them because of their importance to a specific organization. Experience has shown that complicating the selection process does not improve necessarily the quality of the final selection. In fact, this can lead to wasted time and intense team debates or arguments that end in worthless stalemates. It is preferrable to quickly build a short list of candidate methodologies by disqualifying candidates that are weak on one or two key features (e.g., not available electronically or purchase price is greater than $100,000). The short list then can be compared to maybe five or six of the features that are of key importance to the organization. It also is useful to conduct a limited number of pilot projects that test the value of a methodology before making a final selection. It is also not unusual to pilot two different methodologies in a conference room pilot (CRP) in order to make a final determination. This process can take between six weeks and six months.
HIGH LEVEL DATABASE DEVELOPMENT METHODOLOGY
This section defines a high level methodology for database development. This methodology provides a good start for small- to medium-sized projects; however, a formal third-party methodology should be considered for projects that require more than six months of development effort. The activities discussed in this section are mapped to the standard project development framework, which consists of the following main phases: requirements, architecture, design, development, testing, implementation, and postimplementation. These phases can be conducted in parallel or sequentially depending on the exact nature of the methodology and are restricted to database-specific activities.
The subprocesses that are described in this section fit into a larger full life cycle methodology that would address such activities as corporate sponsorship for the project, project plan definition, organization building, team building, user interface development, application design, technology selection, acceptance testing, and deployment. It is assumed that these activities are completed outside the database development methodology phases.
Define Business Requirements
Business requirements are captured for any system development effort. The requirements also should be used to build the logical data model. They will feed such things as the number of entities, attribute names, and types of data stored in each attribute. These often are categorized by subject area.
Borrow or Create the Data Model
With a solid understanding of the business requirements, it is a good idea to search the market for a data model that can be purchased from a third party. This subsequently can be customized for the organization.
Build Logical Data Model
The logical data model is built iteratively. The first view usually is done at a high level, beginning with a subject area or conceptual data model. Subsequent levels contain more detail. The process of normalization also is applied at this stage. There are many good books on normalization; normal forms will not be covered here. Foreign key fields and potential indexes also will not be considered here. It is not necessary to build the logical data model for performance at this time, and physical considerations are left until a later process.
Verify the Data Model
The logical data model is validated iteratively with users, the fields of the user interface, and process models. It is not unusual to make changes to the data model during this verification process. New requirements which need to be fitted into the data model also may be identified
Build Data Architecture
The data architecture is defined in the context of the physical data environment. Considerations, such as the database server, distribution, components, and partitioning are considered in this step.
Build the Physical Data Model
The logical data model is converted to a physical data model based on the specific database that is used. The physical data model will vary with the choice of database products and tools. The physical data model also contains such objects as indexes, foreign keys, triggers, views, and user-defined datatypes. The physical data model is optimized for performance and usually is denormalized for this reason. Denormalization can result in redundancy, but can improve system performance. Building the physical data model is not a one-stop process. Do not expect to build a final version of the physical data model on the first attempt.
Refine the Data Model
The physical data model continuously is refined as more information becomes available and the results of stress testing and benchmarking become available to the database development team. The logical data model also should be maintained as the physical data model is refined.
Complete Transaction Analysis
Transaction analysis is used to review system transactions so that the physical data model can be refined for optimum system performance. Transaction analysis results are only meaningful after the business requirements and systems design are fairly solid. Transaction analysis produces statistics showing the access frequency for the tables in the database, time estimates, and data volumes.
Populate the Data
After the database structure is established and the database is created it is necessary to populate the database. This can be done through data scripts, applications, or data conversions. This can be an extensive set of activities that require substantial data mapping, testing, and parallel activities. It is expected that the details of this are included in the full life cycle methodology.
Complete Testing
Testing a database usually is done in the context of applications and is covered in the full life cycle methodology. Some specific types of testing, such as stress testing, bench marking, and regression testing can be used to refine the performance of the physical data model. These require high volumes of data, testing tools, and distribution tools.
DELIVERABLES
Some of the important deliverables that are created from inception to the creation of a physical database are discussed in this section. It is useful to build a reference database that contains samples of each of these deliverables so that project teams know in advance what they are attempting to build:
· Requirements Document: This is the statement of the business requirements for the application being developed. This deliverable can contain narrative and any number of models or prototypes to capture and represent the business requirements.
· Conceptual Model/Subject Areas: This is a high-level view of the business subject areas that are within the scope of the data model (e.g., accounting, administration, billing, engineering).
· Logical Data Model: This contains entities, attributes, and business rules within the subject areas. The model also shows relationships between the entities. Key fields and foreign keys also can be identified in this model.
· Transaction Analysis: This is a list of transactions supported by the system, the entities (and possibly the fields) that are accessed by the transactions, and the frequency with which they are accessed. A CRUD (Create, Read, Update, and Delete) matrix is a useful input for helping with this analysis.
· Physical Data Model: A denormalized version of the logical data model that is optimized for performance under a specific technical environment and refined through the transaction analysis results. The physical data model usually is refined throughout a development cycle and is not finished until implementation. The physical data model contains physical objects such as tables, fields, indexes, foreign keys, primary keys, views, user-defined data types, and rules.
· Object Model: An object model supports the logical data model. This often serves as an intermediate layer between an object-based user interface and a relational back-end database.
· Validation Models: This is a cross-reference of models, such as process models, to the logical data model to prove its validity. This often includes a mapping between the logical data model with a user interface and reports to identify gaps.
· Conversion Strategy: This is a statement of the strategy used to convert data into a new application. The level of detail can vary significantly. This could be anything from high-level principles to detailed conversion scripts.
TOOLS
Modeling tools are critical for the database development process. There are a number of tools with various add-ons that can be used in this process. Modeling tools should offer support for both data models and process models. It is also becoming more useful for modeling tools to support object models or link to other tools that do. Tools that support reverse reengineering from physical databases to generate logical data models or scripts are useful for organizations that require extensive changes to data structures (possibly following a corporate merger).
There are many other tools that are useful in the database development process. Some of these include CASE tools, conversion tools, testing tools, and database server tools.
ORGANIZATION
When staffing a project that involves a data initiative, it is necessary to fill specific roles. The roles defined in this section are generally specific to the data initiative. These roles often are complimented by other roles in full implementation projects. Projects that have high object-oriented content skew the organization towards object modeling skillsets.
· Project Sponsor: Projects should not be initiated or conducted without a senior project sponsor who is positioned to remove obstacles and ensure that the project team has the full support they require to be successful.
· Project Manager: The project manager is in charge of the entire project, including the data initiative.
· Business User: The business user provides the business rules for the application, which are used to derive the entities and attributes necessary to save the data.
· Business Analyst: The business analyst provides a critical link between the business user and the data architect by understanding the business requirements and translating them into technical words.
· Data Architect: The data architect has the responsibility for defining the data architecture. This could be distributed, central, stand-alone, or integrated with a sophisticated overall architecture.
· Data Analyst: The data analyst works with the business analyst to build a consistent view of each element of the data. This person understands the linkage between the business and the individual items of data.
· Data Modeler: The data modeler works with the data architect to build a logical relational data model and also may get involved in transforming the logical data model into a physical data model.
· Object Modeler: The object modeler becomes involved in projects to build an object model, including messages and methods. This person also may be responsible for mapping the object model to the corporate data model.
· Database Administrator: The database administrator implements the physical database, maintains and optimizes the physical environment, restricts access to the database by controlling privilege levels for users, offers advice to the development team for converting the logical data model to the physical data model, and holds the overall responsibility for running the database environment on a data-to-day basis.
· Network Administrator: The network administrator maintains the physical network, has the responsibility for maintaining the integrity of the physical environment that supports the data environment, and operates at the operating system level and the hardware level. For example, this person would add more physical disk to support larger databases.
· Developer: The developer uses the database(s) for application development.
PITFALLS
Misuse or misinterpretation of how methodologies should be executed can result in significantly negative impacts to project timelines. It is not unusual for organizations to use methodologies as process charts or recipes without streamlining any of the activities. This can result in a considerable amount of wasted time as deliverables or activities are produced without an understanding of how they are leading toward a solution. Methodologies should be adjusted for specific projects. Activities or deliverables that are not necessary should be dropped from the project plan.
Methodologies that are too complicated or difficult to learn and use frequently are avoided by project teams. There are some methodologies that may contain information for thousands of project contingencies. However, they require thousands of megabytes of storage or dozens of manuals to store. During tight project timeframes, such methodologies quickly are sidelined.
It is important to update methodologies over time. New project experiences and best practices should be included in the methodology at specific intervals.
CONCLUSION
Database development methodologies are a subset of full life cycle methodologies. Project teams can access a third-party database development methodology or follow the high-level framework described in this column for database development. Database development methodologies also should support parallel development, iteration, high-user involvement and be accompanied by a database of reusable templates or sample deliverables.
Recommended Reading
Deloitte & Touche Consulting Group Framework for Computing Solutions.
Maguire, Steve, Writing Solid Code (Redmond, WA: Microsoft Press, 1993).
Purba, Sanjiv, Developing Client/Server Systems Using Sybase SQL Server System 11 (New York: John Wiley & Sons, 1995).
Smith, Patrick N. Client/Server Computing. 2nd Ed. (Indianapolis, IN: Sams Publishing, 1994).
Willian, Perry, Effective Methods for Software Testing (New York, NY: John Wiley & Sons, 1995).
~~~~~~~~
By Sanjiv Purba
SANJIV PURBA is a Senior Manager with the Deloitte (5 Touche Consulting Group, Toronto, Canada.
Copyright of Information Systems Management is the property of Auerbach Publications Inc. and its content may not be copied without the copyright holder's express written permission except for the print or download capabilities of the retrieval software used for access. This content is intended solely for the use of the individual user.
Source: Information Systems Management, Winter99, Vol. 16 Issue 1, p72, 6p.
Item Number: 1394174