SQL Server - Collation - Scalability & Portability
Most people (including Developers) if asked about database "collation", will likely look at as if you had two heads. There is a good reason for this - For the most part, no one 'worries' about it, until of course they have a need to migrate data from one server to another, then suddenly it becomes 'one of the most important things in the world' !
Collation governs the set of rules a computer applies when comparing and sorting data. In databases, this means for example when you return a list of items in alphabetical order, what does 'Alphabetical really mean?'
For example is "Donut", "DONUT", and "donut" sorted the same way ? - If so you want a case-insensitive collation.
How about "éclair" vs "eclair" -If so you want an accent-insensitive collation.
Hmm, I suddenly feel a big hungry for a sweet now for some reason ?
For the most part we do not have to worry about these things because collation is a user preference, similar to you clock formatting. You might like to see your clock date as 2020-02-29, where another user might prefer 02/29/2020 or 29/02/2020. Regardless of what your preference for your clock setting should not cause an application to crash or otherwise work improperly. It is the same with collation.
So what's the problem?
When you have database using one collation and your 'server' has a different collation and your stored procedures are using tempdb, you migrate your database over and suddenly you seen an error that looks like this.
Msg 468, Level 16, State 9, Line 15
Cannot resolve the collation conflict between "Latin1_General_100_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
SQL Server will not automatically cast collations between databases that are different. This is not a problem so long as your database does not attempt to join with other tables in other databases on columns of text of a different collation BUT if your procedure/function acesses tempdb, the tempdb collation will be different and you will see the error above.
To make your stored procedures and functions 'collation' agnostic you need to ensure that when making SQL statements that link tables OR use tempdb that the collation is specified accordingly.
For example, suppose you have a stored procedure that contains the creates the following tmp table:
CREATE TABLE #tmpEmployees ( EmployeeNumber varchar(20), EmployeeFirstName varchar(50) EmployeeLastName nvarchar(50) EmployeeStatus bit
)
By Default, the columns of this table will have the default collation of whatever the tempdb collation is (which by default is whatever the server collation is) (eg: SQL_Latin1_General_CP1_CI_AS)
If this collation differs from the specific database collation, you will likely receive an error when joining this temporary table to an existing table in your database
ie:
SELECT * From #tmpEmployees INNER JOIN EmployeeRoles ON #tmpEmployees.EmployeeNumber = EmployeeRoles.EmployeeId
Note: In order for SQL to throw an error one or more of the compare operators need to be comparing a TEXT based field, If the comparision is not text based (ie: if EmployeeID were of type 'int', in the above, NO ERROR IS CREATED.
Also, if the temp table was created with a 'SELECT INTO #tmpTable' the temp table will automatically INHERIT the database default and therefore no error will occur!!
When you see the 'cannot resolve the conflict' it means your comparision is on TEXT that is different.
How to solve ?
The easy way to solve this is whenever you create temp tables be sure to add "COLLATE DATABASE_DEFAULT" at the end of field
ie:
CREATE TABLE #tmpEmployees
(
EmployeeNumber varchar(50) COLLATE DATABASE_DEFAULT,
EmployeeFirstName varchar(50) COLLATE DATABASE_DEFAULT,
EmployeeLastName nvarchar(50) COLLATE DATABASE_DEFAULT,
EmployeeStatus bit
)
This will ensure that the temp table gets created with the SAME COLLATION your source database and no error will occur.
In addition if you are doing text based comparisons between two tables of differing collation, you can explicitly cast to the required collation like this
SELECT * From Employees INNER JOIN EmployeeRoles ON Employees.EmployeeNumber COLLATE Latin1_General_CS_AS = EmployeeRoles.EmployeeId COLLATE Latin1_General_CS_AS
This can be useful when you cannot control the table (ie: it is not a temp table, or it is a join within code) to ensure your application remains agnostic to the user's choice of server collation.
Comments
Post a Comment