Encode CSV from Visual Basic as UTF-8 without BOM
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Having a time trying to export a TXT/CSV from Visual Basic 2015 as UTF-8 without the BOM. The CSV is then being uploaded nightly to a web server over FTP and interpreted in PHP. Using the code below, the file looks clean in Command Prompt using 'type file.txt', but the Linux server is seeing the file as UTF-16 with the Byte-Order Mark and Preview in Mac are showing 0xFE (ÿþ) at the start of the file. Any ideas?

    Dim UTFStream As Object
    UTFStream = CreateObject("adodb.stream")
    UTFStream.Charset = "UTF-8"
    UTFStream.Open
    UTFStream.loadfromfile("\\drive\folder\Upload\" & obFileName & ".csv")

    UTFStream.Position = 3 'skip BOM

    Dim BinaryStream As Object
    BinaryStream = CreateObject("adodb.stream")
    BinaryStream.Open
    'BinaryStream.Charset = "UTF-8"
    'Strips BOM (first 3 bytes)
    UTFStream.CopyTo(BinaryStream)

    UTFStream.Flush
    UTFStream.Close

    BinaryStream.SaveToFile("c:\test\testb1.txt")
    BinaryStream.Flush
    BinaryStream.Close
1)You have a file with BOM 2)You export without BOM 3)Have you checked if the new saved file is without BOM? 4)How do you upload to FTP server? 5)Does the uploaded file have the BOM? 6)Then you interpret with PHP, I can't understand iF this is the problem 7)After you have elaborated the file with PHP, does the file have BOM?
Stefano Balzarotti 29 days ago
The file being created by the above VB code is UTF-16LE according to: http://validator.w3.org/i18n-checker/check#validate-by-upload+
Mike 29 days ago
Sorry I can't understand the problem, to make UTF-8 you just need to uncomment 'BinaryStream.Charset = "UTF-8", because for default it's saved as UTF16 Little Endian
Stefano Balzarotti 29 days ago
Sorry for the confusion. With BinaryStream.Charset = "UTF-8" uncommented, we get a UTF-8 with BOM file.
Mike 29 days ago
Yes, sorry i was a bit confused, I had to test the code to understand the problem :), but if I understood correctly I posted a solution
Stefano Balzarotti 29 days ago

Crowdsource coding tasks.

1 Solution


Sorry now I understand the problem, UTFStream should be set as UTF-8 charset, but BinaryStream should be set binary to skip the BOM.

BinaryStream.Type = adTypeBinary

Here I show you a working sample:

Const adTypeText = 2
Const adModeReadWrite = 3
 Const adTypeBinary = 1
 Const adLF = 10
Const adSaveCreateOverWrite = 2
Const adWriteLine = 1
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.LoadFromFile ("S:\aaa.csv")

UTFStream.Position = 3 'skip BOM

Dim BinaryStream As Object
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Open
BinaryStream.Type = adTypeBinary
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream

UTFStream.Flush
UTFStream.Close

BinaryStream.SaveToFile ("S:\testb1.txt")
BinaryStream.Flush
BinaryStream.Close

the missing part in your code is BinaryStream.Type = adTypeBinary with adTypeBinary that is const with value 1

Only thing missing might be the final 'adSaveCreateOverWrite' - but the code works. Thank you so much for your help!
Mike 29 days ago
you can use adSaveCreateOverWrite if you want overwrite the file when you save it: BinaryStream.SaveToFile "S:\testb1.txt", adSaveCreateOverWrite
Stefano Balzarotti 29 days ago