MOGLiCC Tutorial Part E

E) Applying the ExcelStandardModelProvider: Managing large amounts of data in varying format

Following the tutorial parts A to D we have now a nice generation mechanism to convert data stored in the MOGLiCC model file into SQL scripts. For a small amount of data, this is completely sufficient. But if we imagine a requirements engineer coming again and again with an increasing amount of data which is needed in the database (e.g. for test purpose), this generation mechanism becomes ineffective because the data must be transferred manually into the model file. For mass data, this costs a great deal of time and is error-prone. Hence, we need to automate this step. We can do so, by using the ExcelStandardModelProvider. We reach an agreement with the requirements engineer: the data is maintained in an Excel sheet which is comfortable for the requirements engineer and helpful for importing into a MOGLiCC model. Let's see how this works.

1 - At first we need an Excel file that contains the data that is currently contained in the model file of the StandardModelProvider. You find such an Excel file named TutorialData.xlsx here. Press the 'Raw' button to download it and save it in your ./input/ExcelStandardModelProvider folder. Then, we have to tell the ExcelStandardModelProvider that this is our data file and where in this file our data is exactly located. We do so by replacing the content of input/ExcelStandardModelProvider/excel.properties by the following lines:

SQLInserData.filename=TutoriaData.xlsx

SQLInsertData.matrix.REMITTER=1#B:3
SQLInsertData.matrix.ORDER=1#F:3

This defines our data source: the file, the sheet and the cells. For each class (or database table) a data location within the Excel file must be provided. The property keys consist of the model name that refers to the model file of the StandardModelProvider and a key word 'filename' or matrix. A matric always refer to a class which must represent the last part of the property key. The value "1#B:3" represents cell B3 on sheet 1. This is the upper left cell of a data matrix, where attribute data is supposed to be found for the class REMITTER. In addition, we have to tell the StandardModelProvider to use its Excel-extension. We achieve this by adding metainfo useExtensionPlugin ExcelStandardModelProvider to the model in the file ./input/StandardModelProvider/SQLModelFile.txt. Now execute MOGLiCC.

2 - Oh no, there is an error message. The error report says "No ExcelFilename defined in '_excel.properties'". But why? Ah - we misspelled the model name in the first line of the excel.properties file the 't' of 'SQLInsertData' is missing. We add it restart and - ARRRGGH - another error. This time the defined excel file was not found. Oh Man, a misspelling again: the l is missing in the Excel file name. Ok, we correct and restart, and what the f*** is that: The error report says "No matrix data found for matrix <1#B:2>" and this is true. B3 is valid for the second sheet. But we use the first sheet of the file and the upper left cell of the remitter matrix there is D2 and D9 for the order matrix. Well, we correct the cells and restart again, and SUCCESS! But why, actually? When comparing the excel model data source and the standard model data we see that all attributes are defined in both files and each attributes must be unique in its class. Well, if a class or attribute is already defined in the standard model data, the excel data is simply added to it. Because all table values and table types are located both in the standard model file and the Excel file, we delete all attributes and their metadata from the standard model file SQLModelFile.txt which looks that way afterwards:

model SQLInsertData
  metainfo resultFileName insert.sql
  metainfo resultDirectory <applicationRootDir>/InsertSQLScripts
  metainfo useExtensionPlugin ExcelStandardModelProvider  
  
    class REMITTER
	  # metainfo skip true # ATTENTION: setting skip to false won't do, instead remove this metainfo or comment it out

    class ORDER

3 - After clearing the standard model file, we restart again and have a look onto the result. It says in the last line "Done with 0 statements for 2 data tables". What??? This looks very strange. You may find the problem with a sharp eye. If not, debugging would be necassary. To debug template code we simply run MOGLiCC but define additional output for intermediary results and internal variables. For that purpose, we open the template createClassSpecificInsertStatements.tpl and write in line 7 simply $attributeDescriptor. Save, start and view the outputfile to see outputs like BuildUpAttributeDescriptor [name=ID, metaInfoList=fieldType, fieldValue1, fieldValue2]. This looks good, so we have to search further. Add in line 12 $metaInfoList, save, start and view. Now you see [] for all metainfos. This knowledge leads us to the problem: the template code searches metainfos that are named tableValue, but in the Excel data they are called fieldValue. Well, we remove our "debug output lines" and adapt our template code by replacing $attributeDescriptor.getMetaInfosWithNameStartingWith("tableValue") by $attributeDescriptor.getMetaInfosWithNameStartingWith("fieldValue") both in the LineInserter's main template and in createClassSpecificInsertStatements.tpl. Save, restart and view the output. Now it's fine.

4 - Now let's add a new dataset to our Excel data: e.g. in table order by adding fieldValue4 with values ORDERNUMBER=123456 and REMITTERID=123. Save, restart and view the output. Great, it worked as supposed. Now let's consider the processing of mass data. A column must be added for each dataset which would lead to an intensive horizontal scrolling. This is not a good idea. If the matrix would have a structure that the column must be added for a new dataset, vertically scrolling would be much more convenient. For that purpose look on sheet 2. The matrices are rotated, i.e. columns and rows are exchanged. The ExcelStandardModelProvider can read this structure as well, but we have to configure it. Thus, we add the following new line to _./input/ExcelStandardModelProvider/excel.properties

SQLInsertData.ROTATION.MODE=Attribute Names In First Row   # Default is: Attribute Names In First Column

In addition we have to adapt the matrix definitions, because the upper left cells of the two matrices in sheet 2 are different.

SQLInsertData.matrix.REMITTER=2#B:3
SQLInsertData.matrix.ORDER=2#F:3

Save, restart, view and see that the generation is ok. The rotated matrix has been processed correctly.

5 - Now let's imagine, that the requirements engineer produces excel sheets that from time to time contain the attribute name type instead of fieldType. What will happen with modification. Well, try it and write 'type' in Excel file sheet 2 cell F4. Save, restart, view the output and see some messages "Unkown columnType (!MetaInfo FOR 'fieldType' NOT FOUND!)". Instead of correcting this value manually in the excel sheet all times this problem occurs, we can define a mapping that (in case of need) does the replacement automatically. We add a line to _./input/ExcelStandardModelProvider/excel.properties resulting in:

SQLInsertData.filename=TutorialData.xlsx

# Default ROTATION.MODE is: Attribute Names In First Column
SQLInsertData.ROTATION.MODE=Attribute Names In First Row   

SQLInsertData.rename.AttributeMetaInfo.Occurrence.type.1=fieldType

SQLInsertData.matrix.REMITTER=2#B:3
SQLInsertData.matrix.ORDER=2#F:3

This instructs the ExcelStandardModelProvider to replace the first occurrence of 'type' by 'fieldType'. Of course this is not only possible for 'AttributeMetaInfo' but also for 'AttributeName'. The renaming feature is not related with the rotation mode! Note, that the renaming is not possible for metainfos starting with 'fieldValue' because the exact names of these metainfo vary. Save, restart, view and see that the generation result is ok.

6 - Now let's add a new column to the ORDER table. We name it DATE_OF_RECEIPT and enter the following values: type=date, fieldValue1=20.11.2015, fieldValue2=01.02.2016 and fieldValue3=15.13.2014. Make sure, that these cells are formatted simply as text. Save, restart, view and see that the generated file contains "Unkown columnType (date) used for attribute DATE_OF_RECEIPT of class ORDER". So we have to add a new field type when parsing the model data. This is done in template fillArraysOfCommaSeparatedLists.tpl. We add a new elseif branch:

    #elseif( $columnType.equals("date") )
        # do nothing

Save, restart, view and see that the generated result is basically ok. Now, we handle the date validation and the exact date format. Date handling often depends on a special context. The date handling procedure is often implemented as a static Java utility class. MOGLiCC allows to address such code from the template file. For that purpose, the jar file MyDateUtils.jar that contains the class com.iksgmbh.utils.DateUtil is placed in the directory ./lib/dropins. This class uses the joda DateTime class. Thus, we also have to put joda-time-1.jar in the dropins folder. That way, the DateUtil class can be loaded from MOGLiCC by reflection. Now we can modify the last change to

    #elseif( $columnType.equals("date") )
        #set( $tableValue = $dateUtil.validateAndFormatDateStringToSQL($tableValue, "dd.MM.yyyy") )

This way, the date value read from the Excel file is validated and transformed by an external date utility. Please note, that only static methods like validateAndFormatDateStringToSQL can be executed from a class loaded from a dropin jar file! Save, restart, view and see that the generation result contains invalid validation result for one date value: Invalid date or date format '15.13.2014', expected was 'dd.MM.yyyy'. The month value is invalid. After correcting the date the generation result looks good.

7 Two improvements are still necessary for handling mass data. First, for each date value to validate and format a new utility instance is created. To save resources and to use only one instance it looks good on the first glance to create the instance in the two main templates. Alternatively, you could create the instance in createClassSpecificInsertStatements.tpl. That way we code the instance once, but at runtime for each class an instance is created. We take the second solution an move the lines

        #set( $stringInstance = "" )
        #set( $dateUtil = $stringInstance.getClass().forName("com.iksgmbh.utils.DateUtil").getConstructor().newInstance() )

to createClassSpecificInsertStatements.tpl. Save, restart, view and see that the generated result is fine.

8 - The second improvement is an error report which is created in case one or more the date validations have failed. Important for this mechanism is, that the validation error does not terminate the data processing. The full generation must take place and in the end all validation error must be available in the error report. How to make that? At first, we modify the validation call in fillArraysOfCommaSeparatedLists.tpl:

    #elseif( $columnType.equals("date") )
		#set( $tableValue = $dateUtil.validateAndFormatDateStringToSQL($tableValue, "dd.MM.yyyy") )
		#if( $tableValue.startsWith("Invalid date or date format") )
			#set( $forgetThisValue = $arrayOfDateValidationFailures.add($tableValue) )     # tricky: declaring the Velocity variable forgetThisValue avoids an unwanted output !
		#end

Here, the variable arrayOfDateValidationFailures is used. We declare this variable in body of .input\VelocityModelBasedLineInserter\SQLInsertScript\Main.tpl:

#set( $classDescriptorList = $model.getClassDescriptorList() )
#set( $isFirstClass = true )
#set( $classCounter = 0 )
#set( $statementCounter = 0 )
#set( $arrayOfDateValidationFailures = [])

#foreach( $classDescriptor in $classDescriptorList )
    ...
#end

'
'
PRINT Done with $statementCounter statements for $classCounter data tables.


#if($arrayOfDateValidationFailures.size() > 0)

    #set( $newLine="
")

    #set( $errorMessage = "Processing the model date revealed the following date validation failures:" + $newLine )	

    #foreach ($dateValidationFailure in $arrayOfDateValidationFailures)
	
        #set( $errorMessage = $errorMessage + $dateValidationFailure + $newLine)	 
	
    #end

    $TemplateJavaUtility.throwMOGLiCCException($errorMessage)

#end

In the end of the main template the occurrence of date validation failures is checked. Save, restart, view and see that the expected error report has been created. Introduce another invalid date into the Excel data and verify that the list of validation errors appears in the error report.

Well, we have it. We are now able to easily pump a big amount of data maintained in the Excel file into a database by using MOGLiCC to transform the data read from Excel into SQL scripts. Hopefully, you learned to apply MOGLiCC, to copy with problems when working with MOGLiCC and considering MOGLiCC as valueable tool in your every day work. Good luck!

<- Back to the tutorial main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOGLiCC Tutorial Part E

E) Applying the ExcelStandardModelProvider: Managing large amounts of data in varying format

Clone this wiki locally