|
Boces
builds a Virtual Data Warehouse with CCA Analytics
The
Case for a Data Warehouse
Imagine
all your data about a single topic coming together seamlessly
from a number of disparate systems throughout your enterprise
for decision support and analysis. Furthermore, the data is
cleansed and possibly even transformed or aggregated for analytical
use. Finally, you provide a variety of query, reporting, data
mining, and analytical tools to end-users. They can explore
the data freely and answer their own questions without excessive
involvement from the IT department. That is the concept of
a data warehouse in a nutshell.
However,
building a data warehouse isnt as simple and straightforward
as the concept. Consuming a tremendous amount of programmer
and financial resources, a full-blown data warehouse architecture
requires the precise integration of numerous new and existing
software components. As such, building a data warehouse is
not a project that is lightly undertaken.
The
Virtues of a Virtual Data Warehouse
Now suppose
you could quickly build a virtual data warehouse as a working
prototype of a data warehouse. And while you worked on the
design and implementation of the data warehouse, the virtual
data warehouse is up and running, fulfilling the needs of
end users.
Unlike
a data warehouse, the virtual data warehouse does not store
data separately from the operational store. It applies appropriate
business intelligence tools against views of operational data,
providing the end-user with some decision support capabilities
quickly
and easily.
And because users can satisfy most of their own information
requests, the IT department particularly of a small
or mid-size company can devote more time to the design
and implementation of the data warehouse.
As one
Model 204 user has discovered, the recently-released CCA Analytics
product is the ideal business intelligence tool for building
a virtual data warehouse.
Meet
BOCES
The Board
of Cooperative Education Services (BOCES) in Syracuse, New
York, provides student-oriented research data to approximately
2,500 school principals, administrators and researchers in
almost every school district in the state. Every request
differs;
each involves vast information about any number of students
and requires that results be displayed in diverse formats.
Recently, despite their best efforts to keep up with the ever-growing
demand, IT began to fall behind with reports. They knew they
needed a data warehouse.
The
Challenge at BOCES
After
making the decision to build a data warehouse for their internal
clients, the modestly-sized IT department at BOCES had a real
challenge before them.
- BOCES
has 10 years of research data amounting to 30 million records
of 3,000 fields, utilizing 161 different record types.
- All
the data is stored in 58 Model 204 databases on the organizations
IBM S390 MVS enterprise server.
- It
will take some time just to figure out what data the users
would need to access in the data warehouse, let alone integrate
the other components such as data cleansing, transformation,
and analytical tools.
- Programming
staff estimates it will take between 8 and 18 months to
fully construct a data warehouse.
- Most
troubling is this Catch-22: they still must continue to
serve their large end-user base, which leaves very little
time to build the data warehouse that they hope will eventually
reduce their workload.
Choosing
a Business Intelligence Tool
Realizing
that this created a no-win situation, BOCES considered an
interim, virtual data warehouse solution, hoping to free up
IT resources to devote to the data warehouse project. The
first task was to choose a business intelligence tool that
was up to the
challenge
of turning BOCES operational data into a virtual data
warehouse. Among other products, BOCES considered a new product
from Computer Corporation of America called CCA Analytics
Analyzing
CCA Analytics
CCA Analytics
is a new business intelligence tool for analyzing even the
largest enterprise databases. It relies on Model 204s
trademark indexing technology to deliver maximum performance
for analytical applications. For decades, CCAs database
products have used bit mapped indices, which are recognized
for their superior performance in query and analytical operations.
Any number of fields in the database can be indexed, resulting
in the creation of an index table for each one. The practical
result is that most analytical operations are executed against
the key tables, never accessing the actual data records. Functions
such as summaries and value distributions are completed almost
instantly, even with exceedingly large databases.
Consider
a database that contains name and address information for
50 million U.S. citizens. Summarizing the number of people
by state takes less than a second, regardless of the number
of data records, since all work is done at the index table
level. The State field index table is very small, containing
only 50 unique values.
For more
complex queries and analyses, CCA Analytics combines index
table information through the use of relational and logical
operators, producing similarly fast results. In a sense, the
index tables ARE the data warehouse, with the actual data
records being used only when a full drill-down is required.
This, of course, is completely transparent to the user, who
doesnt need to worry about whether their operations
are being completed at the index table or detail record level.
Back
at BOCES: Choosing CCA Analytics
After
a brief review of popular business intelligence tools, BOCES
chose CCA Analytics.
"CCA
Analytics was attractive to us for several reasons," explains
Larry Dismore, director of the Central New York Regional Information
Center at BOCES. "It provided all the basic query, reporting,
and analytical functions that we felt our users need. It was
from
our database vendor, so we knew there would be no integration
problems and that the learning curve for producing custom
functionality would be minimal. The performance was pretty
much unbeatable. We were not naive about the risks of throwing
large numbers of researchers at operational data, and we were
very concerned about the performance of both the analytical
and the operational applications. The indexing technology
inherent to CCA Analytics made this a non-issue."
Designing
the Virtual Warehouse
Like
building a data warehouse, constructing a virtual data warehouse
requires some understanding of the users informational
needs. But because a virtual data warehouse consists mainly
of index tables and logical views of the data, rather than
physical copies of operational data, BOCES was not concerned
with getting the views right the first time.
If they
built a view that users didnt need remove it.
If they omitted a view, which later surfaced as a requirement
build it on the fly. The use of the warehouse was undisturbed.
BOCES
constructed numerous views of all current databases, providing
the users maximum range to analyze the data. This helped their
users understand what data was available to them, which translated
into more accurate user requirements for the data
warehouse.
In two weeks, the IT department completed and delivered the
initial implementation of the virtual data warehouse.
Accessing
the Virtual Data Warehouse
Working
in an intuitive, Windows-based environment, pioneer users
find it relatively easy to access and analyze the data, given
the appropriate database view. Here are some tasks they can
accomplish:
- Query
and Segment Data - Users can easily create simple or
highly complex queries in a point-and-click environment.
The values associated with each field are provided to the
user upon request, helping the user find the exact data
needed. Users can search the data based on the result of
a macro or formula, versus a specific value. Because users
dont always know what theyre looking for when
they first begin, they can iteratively refine their queries,
eventually narrowing down the data in a step-by-step fashion.
When the user is satisfied with the selection set of records
and fields, the user can name and save the selection criteria
for easy reuse later.
- Summarize
the Data - Summaries of the data such as counts, averages,
and maximum and minimum values are easily gotten. Summaries
can be grouped by any number of different fields, or even
by computed values. Summaries generally appear instantaneously,
regardless of database size, because computations are executed
against the index table structures, rather than against
the actual database records.
- Display
and Manipulate the Data - The virtual data warehouse
comes with numerous tools for data display, including reports,
charts, and graphs. These output formats are also interactive;
the user can manipulate them to form new queries, drill
down to underlying data, or view the data from different
perspectives. When you require a more specialized view of
the data, you can easily export the data into other analytical
packages such as spreadsheets or OLAP tools.
Is
a Virtual Data Warehouse in Your Future?
Not every
application or enterprise is a good candidate for a virtual
data warehouse. BOCES data and application has several
characteristics that contributed to their swift success, including:
- Clean,
Usable Data - BOCES operational data did not have
serious integrity problems that prevented it from being
effective for business intelligence purposes prior to cleansing.
Their data did not need to be substantially transformed.
Aggregations, which will likely be performed for the data
warehouse, were not essential due to the ease and speed
of dynamic summaries through the index tables.
- Appropriate
Hardware Platform - Because BOCES operational
databases are stored on their IBM OS/390™ enterprise
server, they can handle the additional user load. On a smaller
server, the number of users accessing the virtual data warehouse
would most likely be limited perhaps not meeting
the original requirement.
- Appropriate
Database Platform - The BOCES operational databases
are stored in Model 204. If a significantly less robust
DBMS was in use, the number of users accessing the warehouse
would most likely be limited.
- Network
Support - The infrastructure was already in place to
connect the PCs to the mainframe, thus allowing users fast
access to the new virtual data warehouse.
- Homogeneous
Data - Although Analytics, with Model 204 structures,
can analyze any data source, the BOCES project was particularly
efficient since all data was in Model 204.
How
did you score?
If several
of the previous factors were not positive, the virtual data
warehouse solution might have looked more and more like a
full-blown data warehouse implementation, and possibly not
worth the added effort.
A
Virtual Data Warehouse for the Long Term
A virtual
data warehouse not only quickly solves a business intelligence
need, it also contributes significantly to the effective design
of the data warehouse a benefit that saves both time
and money. When the data warehouse is up and running, the
virtual data
warehouse
continues as the testing ground. Even a well-designed data
warehouse needs adjustments over time. These areas can be
researched by the virtual data warehouse.
And from
another viewpoint, the strengths of the data warehouse will
undoubtedly generate more interest in its use by other end-users,
who in turn require enhancements to the data warehouse to
satisfy their needs. These new demands can be immediately
satisfied
by the ever-flexible virtual data warehouse, always one step
ahead.
|