gotqn February 2016

SQL CLR function based on .net ComputeHash is not working with Cyrrilic

I have written the following SQL CLR function in order to hash string values larger then 8000 bytes (the limit of input value of the T-SQL built-it HASHBYTES function):

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlBinary HashBytes(SqlString algorithm, SqlString value)
{
    HashAlgorithm algorithmType = HashAlgorithm.Create(algorithm.Value);

    if (algorithmType == null || value.IsNull)
    {
        return new SqlBinary();
    }
    else
    {
        byte[] bytes = Encoding.UTF8.GetBytes(value.Value);
        return new SqlBinary(algorithmType.ComputeHash(bytes));
    }
}

It is working fine for Latin strings. For example, the following hashes are the same:

SELECT dbo.fn_Utils_GetHashBytes ('MD5', 'test'); -- 0x098F6BCD4621D373CADE4E832627B4F6
SELECT HASHBYTES ('MD5', 'test');                 -- 0x098F6BCD4621D373CADE4E832627B4F6

The issue is it is not working with Cyrillic strings. For example:

SELECT dbo.fn_Utils_GetHashBytes ('MD5 ', N'даровете на влъхвите') -- NULL
SELECT HashBytes ('MD5 ',N'даровете на влъхвите') -- 0x838B1B625A6074B2BE55CDB7FCEA2832

SELECT dbo.fn_Utils_GetHashBytes ('SHA256', N'даровете на влъхвите') -- 0xA1D65374A0B954F8291E00BC3DD9DF655D8A4A6BF127CFB15BBE794D2A098844
SELECT HashBytes ('SHA2_256',N'даровете на влъхвите') -- 0x375F6993E0ECE1864336E565C8E14848F2A4BAFCF60BC0C8F5636101DD15B25A 

I am getting NULL for MD5, although the code returns value if it is executed as console application. Could anyone tell what I am doing wrong?


Also, I've got the function from here and one of the comments says that:

Careful with CLR SP parameters being silently truncated to 8000

Answers


Remus Rusanu February 2016

 Encoding.UTF8.GetBytes(...)

SQL Server has no concept of UTF-8. Use UCS-2 (UTF-16) or ASCII. The encoding used must match what you'd pass to HASHBYTES. You can easily see that HASHBYTES will hash differently VARCHAR vs. NVARCHAR:

select HASHBYTES('MD5', 'Foo')  -- 0x1356C67D7AD1638D816BFB822DD2C25D
select HASHBYTES('MD5', N'Foo') -- 0xB25FF0AD90D09D395090E8A29FF4C63C

Best would be to change the SQLCLR function to accept the bytes, not a string, and deal with the cast to VARBINARY in the caller.

 SELECT dbo.fn_Utils_GetHashBytes ('MD5', CAST(N'даровете на влъхвите' AS VARBINARY(MAX));

FYI SQL Server 2016 has lifted the 8000 bytes restriction on HASHBYTES:

For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.


srutzky February 2016

For a detailed walk-through that explains why you are seeing the differences, please see my answer to the following Question:

TSQL md5 hash different to C# .NET md5

And for anyone who does not wish to compile and deploy this themselves, this function is available in the Free version of the SQL# library of SQLCLR functions, stored procedures, etc (which I am the creator of, but Util_Hash and Util_HashBinary, among many others, are free). There is one difference between what is shown in the Question the two Util_Hash* functions in SQL#: the function shown in the Question takes a NVARCHAR / SqlString input parameter whereas the SQL# functions take VARBINARY / SqlBinary input. The differences are:

  • Accepting VARBINARY input also works for binary source data (files, images, encrypted values, etc)
  • While accepting VARBINARY input does require an extra step of doing a CONVERT(VARBINARY(MAX), source_string) in the call to the function, doing so preserves whatever Code Page is being used for VARCHAR data. While not used that often, this can be handy when working with non-Unicode data.

Regarding the warning from the other post of:

Careful with CLR SP parameters being silently truncated to 8000 bytes - I had to tag the parameter with [SqlFacet(MaxSize = -1)] otherwise bytes after the 8000th would simply be ignored!

and yet you not experiencing the same thing: this due to changes in how SSDT generates the T-SQL wrapper objects for SQLCLR objects. In earlier versions (especially those that came with Visual Studio prior to VS 2013), the default behavior was to use

Post Status

Asked in February 2016
Viewed 1,896 times
Voted 9
Answered 2 times

Search




Leave an answer