Model 204 USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USERS SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.
Retrieving, Updating and Deleting Large Objects in V6R1.0, Part III By James Damon
Last month I reviewed the facilities in File Management and User Language for defining and storing large objects (see CCAPRINT May 2008). In this article I am reviewing the User Language statements that support the retrieving, updating, and deleting of large objects.
Retrieving a Large Object As I explained in Part II, large objects can be stored in a Model 204 file only through the universal buffer. The same applies to retrieval. You can retrieve a large object (or a piece of a large object) by moving it from Table E to your universal buffer with an assignment statement using the following syntax:
BUFFER,pos,length = lobfieldname,position,length
Last month, we stored the plot summary of Moby Dick in the file LOBFILE, as an example of storing a large object. The following User Language program retrieves that plot summary from LOBFILE and fills the universal buffer. It also issues a PAI for all fields in the record.
RESET OUTCCC=51IN LOBFILE BEGINFR WHERE AUTHOR=MELVILLE %LEN = $LOBLEN(SUMMARY) BUFFER,1,%LEN = SUMMARY,1,%LEN PAIEND FOREND
Note that I reset OUTCCC=51 to output 50 bytes of the large object on each line to count column positions more easily. Column 51 is reserved. The program produced the following output:
SUMMARY = ?? ?z??{ ? ?Aiming to join a whaling crew, Ishmael heads for Nantucket, the older of the two U.S. centers of the whaling industry. Time problems force him to stop for the night in the new, more powerful whalingcenter of New Bedford, Massachusetts. Lacking money, he lodges at the Souter Inn. The innkeeper, Peter Coffin, puts him in a room with the mysteriouss tattooed cannibal, Queequeg, a harpooner. Despite Ishmael's initial reservations, the two becomefriends. ...AUTHOR = MELVILLESOURCE = WIKIPEDIA
The first line of output is reserved for the hexadecimal, 27-byte, large object descriptor for the fieldin our example, SUMMARY.
Updating a Large Object Now that the large object is in the universal buffer, we can easily update it there, and then replace it in Table E by issuing a CHANGE statement. Suppose we want to change New Bedford, which begins in position 211 of the large object, to Boston. (Note that counting bytes to position 211 excludes counting the first line, the large object descriptor.) The next program will accomplish that update:
BEGINIMAGE I NAME IS STRING LEN 11 INITIAL ' Boston'END IMAGEPREPARE IMAGE IFR WHERE AUTHOR = MELVILLE %LEN = $LOBLEN(SUMMARY) BUFFER,1,%LEN = SUMMARY,1,%LEN WRITE IMAGE I ON BUFFER POSITION=211 MAXLEN=11 CHANGE SUMMARY,1,%LEN TO BUFFER,1,%LEN PAIEND FOREND
The CHANGE statement uses the contents of the universal buffer to update (essentially, replace) the large object, SUMMARY, in Table E as shown in the output from the PAI statement:
SUMMARY = ?? ?z??{ ? ?Aiming to join a whaling crew, Ishmael heads for Nantucket, the older of the two U.S. centers of the whaling industry. Time problems force him to stop for the night in the new, more powerful whalingcenter of Boston, Massachusetts. Lacking money, he lodges at the Souter Inn. The innkeeper, Peter Coffin, puts him in a room with the mysteriouss tattooed cannibal, Queequeg, a harpooner. Despite Ishmael's initial reservations, the two becomefriends. ...AUTHOR = MELVILLESOURCE = WIKIPEDIA
When the CHANGE statement is deployed, the large object is not shifted left or right to compress blanks or to achieve any kind of alignment. The update occurs in place, updating only the columns specified, or the entire object, depending on the use of position and length in the CHANGE statement.
A more common use of CHANGE with large objects is to receive pieces of a large object, say from MQGET calls or $SOCKET READ calls, into the universal buffer and build a large object from those individual pieces. You may have a five, 10 or many megabytes large object that is too large, due to network considerations, to transmit in a single statement. Instead, you bring pieces of the large object into the universal buffer with multiple MQGET or $SOCKET READ statements and keep adding those pieces to the existing large object in Table E using the CHANGE statement. The following example illustrates the use of CHANGE in this context.
Assume that a 5-megabyte object has been placed in an MQ queue in a series of 100,000 byte messages.
BEGIN%LEN = 100000MQGET BUFFER FROM mq.queue.name /? GET FIRST 100000 BYTES OF ?/ /? LOB ?/S1: STORE RECORD /? AND USE TO START LARGE ?/ /? OBJECT ?/ Large.object.fieldname = BUFFER,1,%LEN RESERVE 5000000 END STORE%POS = 1 + %LENFRN S1 REPEAT WHILE $STATUS = 0 /? MQGET UNTIL NO MORE MSGS ?/ MQGET BUFFER FROM mq.queue.name CHANGE large.object.fieldname,%POS,%LEN TO BUFFER,1,%LEN %POS = %POS + %LEN END REPEATEND FOREND
Deleting a Large Object For binary large objects such as .jpg, .doc, .pdf objects and many others, the object will typically be modified by an external application like Photoshop, Word or Adobe. That modified version will then be used to replace the existing version in Table E with:
DELETE large.object.fieldname
ADD large.object.fieldname = BUFFER,pos,length
The same will be true for most character large objects. Since the number of updates to a CLOB will generally be numerous, the more efficient method will be to delete the old object with:
DELETE character.large.object
ADD character.large.object = BUFFER,pos,length
Reclaiming Space When a large object is deleted, the Table B record large object descriptor is deleted from the Table B record, as with any other DELETE FIELD operation and the Table E page or pages are released and returned to the Table E free page pool.
Summary A large object by definition is large, and application processing by passing it through small variables is likely not an efficient mechanism. The universal buffer is a solution for managing large object data so that a minimum amount of data movement occurs upon each object reference. This results in CPU savings for applications that store and retrieve large objects.
Coming Attractions In the next and concluding article in this series, I will introduce the User Language statements that are available to unload a file containing large objects and discuss how to reorganize such files.
System 1032 USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USERS SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.
Redefining Attribute Values By Tym Stegner In our previous article, Updating Attributes, a user wanted to change an attribute from a simple text field into a complex text field. Inherent in that request was the need to rearrange the original data to correctly populate the revised attributes. Note that the basic data type was unchanged, although the Text attribute became a Group of three Text attributes.
What if you wanted to update an attribute to an entirely different data type? Furthermore, you wanted to edit the data values? How might you go about doing that data conversion? This article examines just such a request from another customer.
Converting Data to Another Data Type Sometimes modifying an attribute definition is easier than translating the underlying data to be changed. For example, a customer wrote:
Question: "Can you offer me some suggestions on to how to modify a field from data type Integer to data type Text without having to delete all the leading zeros that may result? I know that I can eliminate the leading zeros by using an intermediate procedure, but I want to avoid doing that, because we have a few hundred million records."
The following snippet of code illustrates how the customers data is stored after loading. When the integer data is dumped out, System 1032 adds leading zeros. Text fields accept leading zeros as text characters and store them accordingly during LOAD, so all the new text attributes converted from integer have leading zeros.
PN CC ------------------ ------------------ TN1 000000000111111111 TN2 000000000222222222
Examining Record Descriptor Attribute Conversions The System 1032 Record Descriptor (RD) feature has many options that can convert data read from data files to store data in the form the database designer requires. Usually, a database is designed to use like-for-like data conversions, such as Integer/Integer, or Real/Real, or Decimal/Decimal. However, less well-known is that System 1032 can convert text and numeric data between data types, for inclusion into differently typed attributes from the RD fields.
Table 1 displays supported data type conversions. Y indicates the conversion is allowed. The left column is the Internal attribute data type, and the top row is the external text/numeric string data type.
Table 1: System 1032 Data Type Conversion Table
Date-Time
Revisiting the Customer Query The data type conversions presented in Table 1 are easily accomplished. However, back to our customers question. There is no option for an RD that lets you dump numeric data without leading zeros. Even if you dump the numeric data as binary data, then load it directly into the text field, the leading zeros are still present in the text field. We offered the customer the following two methods of dropping the leading zeros from the data.
1. Using WRITE or PRINT Processing to Suppress Leading Zeros
You can create an intermediate data file using either WRITE processing or PRINT processing. The PRINT command can give you a fixed-column format or the WRITE command can create a delimited-format file.
While the PRINT command is easier to set up on the creation side, as the columnar nature of PRINT output lends itself easily to the creation of fixed-format data files, it is easier to create the RD for a delimited data file. The March, 2008 CCAPrint article, "Extracting System 1032 Data to Delimited Format" has an example of how a program can create a delimited data file from a dataset as a starting point for this processing.
This same article also describes a method to obtain right-justified numeric output. The code assigns a space character to the $FMT_CPROT system variable, then utilizes the check-protect format specifier in numeric formats, such as 'I*8' for a right-adjusted eight digit integer output. The check-protect specifier normally would output leading asterisks (*) on the numeric, according to the default value of $FMT_CPROT.
2. Using the TPU Editor to Drop Leading Zeros You can use an RD to dump the data you want to convert and use another mechanism to eliminate the leading zeros, then reload the data into a text attribute. You can use the TPU editor to strip the leading zeros from the numeric fields before loading the data back into the revised datasets.
The following TPU code is one approach to stripping leading zeros:
Figure 1. TPU code that removes leading zeros
$! FIXZEROES.COM$! Usage:$! $ @FIXZEROES <Data-File>$ If P1 .Eqs. "" Then Exit 44$!$ On Error Then Exit$ Editx/TPU/Nodisplay/Nosection/Command=Sys$input 'P1'!PROCEDURE S32_TrimZeros( My_Buf ); POSITION( BEGINNING_OF( My_Buf ) ); My_Pat1 := ( " " + SPAN("0") + ("," | ";" | ANY("123456789")) ); LOOP Src_Range := SEARCH_QUIETLY( My_Pat1, FORWARD, EXACT ); EXITIF Src_Range = 0; POSITION( Src_Range ); MOVE_HORIZONTAL( +1 ); LOOP ERASE_CHARACTER( +1 ); EXITIF ( CURRENT_CHARACTER <> "0" ); ENDLOOP; COPY_TEXT( "0" ); ENDLOOP;ENDPROCEDURE;! Start of executable code.My_Input := GET_INFO(COMMAND_LINE, "FILE_NAME" );My_Main := CREATE_BUFFER("Main", My_Input );POSITION( BEGINNING_OF( My_Main ) );S32_TrimZeros( My_Main );EXIT;$!$ Exit
The basic operation of this code is:
In Summary In this article, we reviewed a customer question to convert data from integer to text, and postulated solutions to remove the leading zeros that DUMP processing had automatically prepended to the numeric values. These solutions offer a choice of outputting the data using WRITE or PRINT, which do not prepend zeroes, or to process the output file with the TPU editor, to remove the leading zeros prior to the LOAD operation.
Because the customer making the query has millions of records over several datasets, we could not promote one method of stripping leading zeros over the other, as each method has its own resource costs. While the WRITE/PRINT approach requires fewer passes through the data records, the necessary commands must be written to produce the output, and processing the command requires additional processor time.
For more information on data conversion during data input operations, see the documentation at System 1032 User's Guide, Module 3, Chapter 2, "Importing and Exporting Data".
Copyright © 2008 Computer Corporation of America. All right reserved. Published in the United States of America.