Tuesday, January 14, 2014

SQLXML Bulk import

Import a single table

Get the XSD schema file and add the following markups to it:
  1. Mark the parent  elements which do not have a corresponded database table as sql:is-constant="1"
  2. Mark the iterated element which each instance should go to a record as sql:relation="DB_table_name"
  3. Mark the elements which do not have a corresponded database field as sql:mapped="false"
  4. Assign the field name to the elements which database field name is different with sql:field="DB_field_name"
  5. Make sure the field length are roomy enough to fit all the data
  6. Automatic convertion of date fields does not work properly, so such DB field should be type of string.

Setup a relationship between tables

  1. Add the sql:relationship element to xsd:annotation/xsd:appinfo
  2. Add attributes which declare the name="Relationship_annotation_name" (random identifier string) and the related DB tables and PK-FK fields as described here. 
  3. To the same element where you declared the child DB table name with sql:relation="DB_table_name" add the attribute which references to the declared relationship: sql:relationship="Relationship_annotation_name"
See more examples here.

Execute the prepared bulk import setup

Download and install the SQLXMLBulkload object and execute a VbScript like the following:

set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0")
objBL.ConnectionString = "Connection String to the database"
objBL.ErrorLogFile = "error.log"
objBL.Execute "annotated_schema.xsd", "input_data.xml"

If the object returns an error, check this site, may be it will give you a clue. The log file sometimes contains a bit more information than in the thrown error description.

Object properties to configure

If the tables already exist the .SchemaGen property of the SQLXMLBulkload object should remain FALSE.
If it is set to TRUE, the tables will be created automatically.
String XSD type is declared as [nvarchar](1000).

.KeepIdentity could be useful when a normalized table structure is created, in other words when an attempt to insert a reference record fails when the record already was created (with the primary key). If the property is set to True the records are inserted one by one which is a lot slower than the default bulk inserting. But the reference records duplicates will not be created.

.KeepNulls is useful to left the fields NULL when there is no actual value in the XML.

Here's the list of all properties of the SQLXMLBulkload object.