Question How to import excel xlsx file to sql with changing header location and name

yazomoto

New member
Joined
Aug 11, 2013
Messages
2
Programming Experience
1-3
How to import data from excel xlsx files to sql database if in some files the row and content of the headers are changing (even if they change they point the same data, see example)? I work in Visual basic environment with VS 2012
Here is the example:

file-example.jpg DB-Test.png

The problem is number of headers are changing from file to file. I planned to use an "Aliases" table in SQL (above on the right I showed a sample of Aliases table) which will store the alises used for the same header name for e.g. in first table "education level" is written as column header in second file "edulvl" but they all point to the same data "education level" so they all should be recorded in DB as alises of "education_level" fieldname.


In short: Files can be either xls or xlsx columns are changing position and name (alias) from file to file Place of data (together with columns) change position from file to file Number of columns change

Simple Algorithm of the code:

  1. user inputs through fileDialog fulladdress of file then clicks
  2. import program checks whether it is xls or xlsx file program
  3. opens the file
  4. finds location of data
  5. reads column names
  6. searches them in Aliases DB table
  7. if nothing matches,
  8. asks for user input to decide if it is an alias of a field or a new field
  9. if mathches with an earlier input in aliases table in DB finds out the field name in "Project Table" in DB using Aliases table.
  10. When reading is completed New fields are created in "project table" and in "aliases" table new entries are made
  11. using the information gathered from xl file the data is positioned and mapped so the data import process is started

Thanks in advance!
 
When you open and read the Excel file, you read it through ODBC and a query right? If so, just change your column names through aliases in your select query. You can do it all in just one query to your SQL database, through the OPENDATASOURCE() SQL function. In order for this to work you need to enable ad-hoc distributed queries on your SQL instance. The header will still cause problems, personnally I would just reformat it, but you can also go around this with a WHERE clause... The code below will still import the column headers as a record, but it will skip the header (so long as there is nothing in the second column in the header section...)

OPENDATASOURCE (Transact-SQL)

INSERT INTO dbo.SomeSQLTable 
SELECT [col1] AS MyFirstField,
       [col2] AS MySecondField
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0','Data Source=C:\somepath\xlfile.xls;Extended Properties=Excel 8.0')...[sheet1$]
WHERE NOT ISNULL([col2]);
 
Thank you for your answer.

Yes ODBC is a good option to read excel.

Reformatting the excel file is also ok but automating this needs a little work since position, order and name of the fields can change. Using a code I need to find the position of the fields and then rename them, delete unneeded top rows and 'save as' reformatted file into a new file.

In order to find and rename the headers I think the easiest way seems like moving in the excel file cell by cell in the beginning (for this one it is needed to have appropriate libraries referenced).

Parsing the sql query inside the code depending on opened xl file seems like a good way to adapt to possible changes.

VB.NET:
[COLOR=#333333][FONT=Consolas]SELECT 
         [col1] AS MyFirstField,[/FONT][/COLOR]
[COLOR=#333333][FONT=Consolas]         [col2] AS MySecondField
[/FONT][/COLOR]

I don't know how to go around reformatting the fields by a "where" clause by the way... Any ideas?
 
Back
Top