Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
Rocket
Customer Support
CCA Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
February 26, 2010
     
Model 204: ANALYZE This! Efficiency in the Ordered Index - Part 1 Printer-friendly version
System 1032: Data and Dataset Security Printer-friendly version

 

Model 204
USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.

ANALYZE This! Efficiency in the Ordered Index - Part 1
By James Damon



The ANALYZE command provides a way for a file manager to get a clearer picture of the way the Ordered Index is structured for a particular field in a file. It reports on each Table D page involved in storing the index information for the field in question and produces a table summarizing space utilization and pointer information. You can use that information to confirm your knowledge of the data, provide additional data consistency checks, and to assist you in tuning the Ordered Index. However, before we can discuss the ANALYZE command, we must examine the structure of the Ordered Index.

 

Understanding the Ordered Index
The Ordered Index is a tree structure (btree), built entirely in Table D. Table D pages that are part of the Ordered Index are called nodes and are classified into three types:

  1.

The root is the highest level node of the btree. Btrees are characterized by the number of levels they contain – depth. The root page is at the highest level designated as 1 – this level always contains exactly one node. The set of nodes it points to are at the next highest level, level 2. The set of nodes the level 2 nodes point to are at level 3, etc. The leaf nodes are always at the lowest level.

Searches for leaf node data start at the root node. Therefore, an N-level btree requires N logical page reads to arrive at the correct leaf node.

  2. Intermediate nodes are any nodes between the root node and the leaf nodes. There may be 0, 1, or many intermediate levels. The entries on intermediate nodes are ordered based on fieldname values. All entries on a given intermediate node page point to either lower intermediate nodes at the next lower level or to leaf nodes. Only the lowest level intermediate nodes contain pointers to leaf nodes.
  3. Leaf nodes are found only at the lowest level of the btree. They contain lists of pointers to Table B records or to record lists or bit patterns. A leaf node contains only values for one field, sorted in ascending order by value.

The Figure 1 schematic of the Ordered Index helps illustrate what the ANALYZE command actually does.

Let’s define two fields CITY and ZIP as ordered:

CITY  (ORDERED CHARACTER)
ZIP (ORDERED NUMERIC)

After the file is populated with data, the Ordered Index might look something like Figure 1.

Figure 1. A schematic of Table D pages for the CITY and ZIP fields

The Anatomy of the Tree
The intermediate nodes contain what are called separator strings. These strings identify what separates values on one leaf page from another. In Figure 1, the separator strings tell us that CITY values less than DENVER are found on leaf page D21 and CITY values less than FRESNO are found on leaf page D5. ZIP values less than 01820 are found on leaf page D41, and so on.

Each leaf node points to its right and left neighbor. This speeds up searches that process values in ascending or descending order by avoiding a trip back to the root node and other intermediate nodes to find the next leaf page.

The Ordered Index pictured in Figure 1 is a tree of depth 3. This means that there is a root node, some number of pages at the intermediate node level and some at the leaf level. You can display this information by issuing the following VIEW command.

> VIEW OIDEPTH, OINODES, OILEAVES
OIDEPTH 3 NO. LEVELS OF ORDERED INDEX NODES
OINODES 3027 NO. ORDERED INDEX NODES: ALL LVLS
OILEAVES 3019 NO. ORDERED INDEX NODES: LEAVES

The number of intermediate nodes is OINODES – OILEAVES – 1. Whenever a btree node fills to capacity, it splits. When the root node splits another level is added to the tree, increasing its depth by one. This in turn, increases by one, the number of logical, and sometimes physical I/Os necessary to process every Ordered Index FIND.

Studying the Tree Leaves
If we examined the entries on each Table D leaf page we would see that:

  1)

All values pertain to only one field

  2) All values are sorted in ascending, EBCDIC sequence
  3) Each value contains pointers -- immediate, record lists, or bit patterns -- to the records that contain that value. For example:
    Immediate: On leaf page 21: CITY=AKRON, RN=3,6,85,95,345,7162
    Record list: On leaf page 5: CITY=EUGENE, RNs=TABLED page 45, Record list #8
    Bit pattern: On leaf page 41: ZIP=00128, RNs=TABLED page 115 - Bit Pattern
    Note: RNs=record numbers


The ultimate objective of a search in the Ordered Index is to locate the leaf page containing the first value sought for the field in question. In this Ordered Index, the following FIND statement:

C1: FIND ALL RECORDS FOR WHICH CITY IS LIKE D*

when evaluated requires the following I/Os to Table D:

Root node

Page 9 at the intermediate level
Page 21, the first page with D values for CITY
Page 5, the second and last page with D values of CITY

From the information found in the entries for values of CITY that start with D, a bit pattern is constructed and that bit pattern is the found set labeled C1.

ANALYZE command
Detailed Ordered index Information for each field in a file can be produced with the ANALYZE command. Figure 2 shows the statistics generated for the root, intermediate, and leaf nodes for the CITY field when the command ANALYZE CITY is issued.

ROOT NODE VERSION NUMBER = 550265                             
***  M204.0005: ANALYZE FIELDNAME = CITY                
                AVG.    OFFSET  COMP.   KEY     PAGE    AVG.  
        PAGES   ENTRY   AREA    SIZE    AREA    USAGE%  UNUSED
ROOT    1       2       24              14              6106  
I-NODE  1       469     958             4865    94      321   
LEAF    235     340     700     3       3795    73      1645  
                                                              
MRIB:      IMMEDIATE    LIST         BITMAP       TOTAL       
  ENTRIES  932          73685                     74617       
  RECORDS  932          418436                    419368      
    PAGES               2524                      2524        
SRIB:      6647                                               
                                                              
***  M204.0003: ANALYZE DONE                                  
        

Figure 2. Typical output from an ANALYZE command.

In Summary
This article illustrates how Model 204 locates values for a particular field and then locates records containing that value when you make a query. If you routinely conduct searches that employ the Ordered Index, you are tapping the efficiency and speed of Model 204. Be sure that your frequently queried fields are defined as ORDERED CHARACTER or ORDERED NUMERIC to take advantage of this unparalleled, high performance feature of Model 204.

Coming Attractions
Looking again at Figure 2, if CITY is used extensively in Ordered Index searches and accounts for a significant percentage of all Ordered Index activity, then tuning the Ordered Index for this field could yield substantial performance improvements. The information required to perform that tuning is available from the ANALYZE command.

Most sites process thousands, if not millions, of Ordered Index searches on a typical production day. The performance of these searches is critical to achieving the operational or business objectives of the organization. That performance is determined, in large part, by the efficiency of the Ordered Index and how well it has been tuned to provide for maximum data compression. In turn, data compression is directly related to minimizing I/O and CPU consumption and maximizing Ordered Index performance. Knowing how to tune the Ordered Index requires that you understand the output provided by the ANALYZE command. I’ll discuss that output in the next issue of CCAPRINT.

System 1032
USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.


Data and Dataset Security
By Tym Stegner

Tym

Securing My Data
When looking to implement security for your data and datasets at your site, there are many options from which to implement an overall plan: physical security of the data center, native VMS file permissions, application of System 1032 dataset-level security so that only designated applications or users have update authority, and so on.

This multi-part article begins the process of reviewing the System 1032 capabilities for securing datasets in your VMS environment. We focus first on the VMS security capability, using a System 1032 library catalog as an example.

Layers of Security
The VMS operating system is already a very secure computing environment. Users are not granted privileges by default, nor do they usually have free reign of either the file system or the system applications. Also, the file system itself is not usually wide open, although conditions do differ from site to site. And, individual directories or devices have their own protections, above and beyond what may be applied to individual files.

The overt security layers are as follows:
1.

User Login and Program Access

2. Disk, Directory, and File Access

Access Modes and Permissions
VMS users operate within four general modes of file and application access.

System access is reserved to system administrators and suitably privileged users.
Group access is determined at your login by the system administrator. Groups or classes of user accounts may be designated for group access permissions. The group designation may be literal or numeric.
User access is generally for files and directories your process has created.
World access is designated for files and directories to be generally available.

For each of these modes of access there exist five concurrent permission levels: Control, Delete, Execute, Read, and Write. The following example illustrates the permissions for owner [WORK, TSTEG] (some code was omitted):

$ DIRECTORY/FULL LOGIN.COM

LOGIN.COM;145                 File ID:  (22188,10,0)
Size:            7/9          Owner:    [WORK,TSTEG]
Created:   20-JAN-2010 14:51:58.56
Revised:   20-JAN-2010 14:51:58.65 (2)
…
File protection:   System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: None

Control, which is rarely seen, provides the capability to modify a file’s access level.
Delete provides the ability for the user to deactivate a file.
Execute provides the ability to run a program (an executable image).
Read provides the ability to examine, but not change, the contents of a file.
Write provides the ability to update or add to a file.

Thus, the LOGIN.COM file is fully accessible for System and Owner users, has limited read-only access for members of the same group, and no access at all specified for general (World) users.

User Login and Program Access
The first layer of security is access to the system itself. This might be physical access via a hard-wired terminal, but more likely these days will be the Telnet or S/Telnet programs for access via TCPIP. A prerequisite is that the system administrator has set up a user account for you. The user account describes your working environment as well as establishing permissions and access times and/or modes.

Without a login to the system, you have no opportunity to do anything. While the system may have guest accounts, if these exist, their access and permissions are usually severely curtailed, to the extent of a captive process only able to perform limited commands and execute designated programs. Guest or other captive accounts may not even allow command line access.

Once you have successfully connected and logged in, you are now in your pre-determined user environment. Within this environment, you can access files and run system and user applications. The VMS command SHOW PROCESS gives you information about what your user environments capabilities may be. (Uninformative code was omitted in the following example.)

$ show process/all

21-JAN-2010 13:52:53.89   User: TSTEG        Process ID:   20401C43
                          Node: APE          Process name: "TSTEG"

Terminal:           TNA87: (Host: pcnnn.cca-int.com Port: 1250 )
User Identifier:    [WORK,TSTEG]
Base priority:      4
Default file spec:  DISK$2:[TSTEG]

Devices allocated:  APE$TNA87:

Process Quotas: …

Accounting information: …

Authorized privileges:
 GROUP   GRPNAM  NETMBX  PRMCEB  PRMMBX  SETPRV  SYSPRV  TMPMBX

Process privileges:
 GROUP                may affect other processes in same group
 GRPNAM               may insert in group logical name table
 NETMBX               may create network device
 TMPMBX               may create temporary mailbox

Process rights:
 INTERACTIVE
 REMOTE

System rights:
 SYS$NODE_APE

There is 1 process in this job:
  TSTEG (*)

    

In the previous display, several significant elements are highlighted in bold red. These areas concern process rights and capabilities, and are explained in more depth, as follows.

User Identifier Code (UIC) is the primary identifier for a user process. It can be expressed as [keyword,keyword], or [number, number]. In the previous display, the UIC expression shows the username TSTEG is a member of the group WORK. By default all users on a VMS system are members of a group that denotes the type of account they receive.

Authorized privileges is the list of process privileges available to this particular process. These privileges are not activated by default, but you can enable them as needed. Thirty-nine privileges are defined on a VMS system; the average user will generally only have TMPMBX and PRMMBX, but the rules at your site may vary.

Process privileges may duplicate the Authorized privileges or not, and represent those privileges enabled by default for this process. A user may turn off privileges as required.

Process rights is a list of rights identifiers. The rights identifiers can be optionally applied by the system administrator to operating system objects, such as executables, directories, or files, and are used to grant access to individual users or entire groups. The two process rights listed in our example, INTERACTIVE and REMOTE, designate the ability to access the system interactively and remotely (network access).

More information about qualities of the user process can be found in the VMS manual describing the AUTHORIZE command. Note that System 1032’s internal security arrangements make use of some elements of VMS security, notably UICs and rights identifiers, but also add additional qualities not found in VMS.

File Access Example
At the next layer in, we have the concept of protecting access to programs, disks, directories and files. While most of the VMS commands are available to anyone on the system, you can prevent access to applications by limiting program access previously described or by applying rights identifiers to the image. In the latter case a user process must hold that rights identifier to be allowed to run the program.

In a similar fashion, file protections and rights identifiers can be applied to all the elements on each subsequent layer, including the disk where the programs and files reside, the directory structures within which the applications and files are organized, and down to the files themselves.

In the following example, we look at a file that is not generally available:

$ DIRECTORY S1032_TOOLS:S1032_DBA.DML

Directory SYS$SYSDEVICE:[S1032.V9811.TOOLS]

S1032_DBA.DML;4

Total of 1 file.

      

The DBA Tools library is available in some distributions of System 1032. However, this library contains utility procedures that are used only at the instruction of System 1032 Customer Support, so the file is protected from being accessed by non-system users. This is demonstrated when we try to get more information about the library file:

$ DIRECTORY/SECURITY S1032_TOOLS:S1032_DBA.DML

Directory SYS$SYSDEVICE:[S1032.V9811.TOOLS]

S1032_DBA.DML;4    insufficient privilege or object protection 
violation

Total of 1 file.
      

We can see the file exists, but that’s about it. To see additional information, we must use either a system account (probably), or elevate our process privileges:

$ set process/privilege=READALL
$ directory/security   s1032_tools:s1032_dba.dml

Directory SYS$SYSDEVICE:[S1032.V9811.TOOLS]

S1032_DBA.DML;4      [S1032]          (RWED,RWED,,)

Total of 1 file.
$ set process/privilege=NOREADALL
$

      

We can now see that the file is owned by a different UIC than ours ([WORK,TSTEG]), and has file protections to prevent even seeing the file by GROUP or WORLD users.

In Summary for Part I
In this multi-part article, we started examining the concepts of the security of System 1032 datasets, which begins with VMS access security. Part 1 has outlined the layers of security access protections available by default in the VMS operating system for access to System 1032 files.

Part 2 will look at the security layers available within datasets and other catalogs.

Copyright © 2010 Computer Corporation of America.
All right reserved. Published in the United States of America.


Contact CCA Webmaster
Copyright 2010