Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk Import Unicode with SQL Server 2016

since we have migrated to the SQL Server 2016 we now trying to import Unicode characters into a table via BULK IMPORT using non-XML format files and UTF-8 encoded data files (with newline Unix (LF)) . The format files specify the host file data length but not the terminator. The host file data type is SQLCHAR. My BULK INSERT statement looks like:

SET @cmd = N'Bulk Insert  myTable from ''D:\DATA\datafile'' with (DATAFILETYPE =''widechar'', KEEPNULLS, FORMATFILE = ''D:\DATA\fmt\formatfile.ftm'' ,MAXERRORS = 0, codepage=65001, FIRSTROW = 1)'

EXECUTE (@cmd)

Format file:

    9.0
    7
    1       SQLCHAR       0       8       ""   1     column_1 ""
    2       SQLCHAR       0       8       ""   2     column_2 ""
    3       SQLCHAR       0       12       ""   3     column_3 ""
    4       SQLCHAR       0       8       ""   4     column_4 ""
    5       SQLCHAR       0       12       ""   5     column_5 ""
    6       SQLCHAR       0       40       ""   6     column_6 ""
    7       SQLCHAR       0       8       ";\n"   7     column_7 ""

Table definition:

CREATE TABLE #myTable 
(  [column_1] [nvarchar](8) NOT NULL,
    [column_2] [nvarchar](8) NULL,
    [column_3] [nvarchar](12) NULL,
    [column_4] [nvarchar](8) NOT NULL,
    [column_5] [nvarchar](12) NULL,
    [column_6] [nvarchar](40) NULL,
    [column_7] [datetime] NULL)

The BULK INSERT can import the ordinary characters (like A,B,C...) without any troubles. But when it comes to characters like Ä or Ü the BULK INSERT seems to split these characters in two, so that the file data length increases by one (if the string contains just one character like Ä). The last character from string is going to be inserted as first character to the next table column. Any idea how to solve this issue? As the data files comes from a third party I cannot influence the creation/manipulation of those.

P.S.: BTW, the BCP tool does the same.

Any help is appreciated!

Thank you!

like image 771
Sergej Scheibel Avatar asked Sep 05 '25 17:09

Sergej Scheibel


1 Answers

I didn't convert my UTF-8 encoded txt. Just use this:

--bulk insert
BULK INSERT dbo.tablename
FROM 'C:\Users\user\Desktop\myfile.txt'
WITH (
FirstRow = 2, --(if skipping a header row)
FIELDTERMINATOR = '|',
ROWTERMINATOR   = '\n',
CODEPAGE = '65001', 
DATAFILETYPE = 'Char')
like image 194
Hrvoje Avatar answered Sep 08 '25 12:09

Hrvoje