资讯

历史

科技

环境与自然

成长

游戏

财经

文学与艺术

美食

健康

家居

文化

情感

汽车

三农

军事

旅行

运动

教育

生活

星座命理

SQL中的NULL值处理：特性与解决方案

创作时间:

作者:

@小白创作中心

SQL中的NULL值处理：特性与解决方案

引用

CSDN

https://m.blog.csdn.net/2401_86652632/article/details/145063329

在SQL的世界里，NULL值的处理方式常常让人感到困惑和意外。本文将深入探讨SQL中NULL值的独特性质、处理方式及其解决方案，帮助读者更好地理解和应对NULL值带来的问题。

一、SQL NULL值的独特性质

（一）与常规值的比较结果

我们先来建立一个基础的认知。在SQL中，使用逻辑等于（“=”）运算符比较不同的值时，结果可能会超出你的预期。例如，空字符串与空字符串比较（select '' = '';）会返回1（表示true），因为它们是相等的；数字1与1比较（select 1 = 1;）也返回1，这符合我们的常规理解。然而，当比较两个NULL值时（select null = null;），返回的结果却是NULL。这是因为NULL本质上是一个占位符，表示未知的值。两个未知值不能简单地被认定为相等，所以其比较结果既不是true也不是false，而是NULL，这确实让人感觉很奇特。

相比之下，使用IS运算符来检查NULL值的一致性时，select null is null;会返回1（true），因为IS运算符专门用于检查两个值是否都为NULL类型。通过下面这个更详细的示例，我们可以更清楚地看到NULL值与其他值在比较时的差异：

drop table if exists sample;
CREATE TABLE if not exists sample (
     id INTEGER PRIMARY KEY, -- 自动递增
     name TEXT -- UNIQUE（如果取消注释此行）
);
INSERT INTO sample (name) VALUES (NULL), (NULL), ('test'); --, ('test');
SELECT
    a.id as id1,
    b.id as id2,
    coalesce(a.name, 'null') || ', ' || coalesce(b.name, 'null') as names,
    a.name = b.name as equal_comparison,
    a.name IS b.name as is_comparison
FROM sample a
    CROSS JOIN sample b -- 在表的所有记录上创建一个 n by m 循环
WHERE a.id < b.id;

在这个示例中，当比较两个NULL值时，equal_comparison列的值为NULL，而is_comparison列的值为1，这清晰地展示了两种比较方式的不同结果。

（二）在唯一性约束中的表现

在唯一性约束方面，NULL值的行为也令人费解。通常我们认为，具有UNIQUE约束的列应该保证所有值都是唯一的，但对于NULL值却并非如此。例如，我们创建一个如下的表结构：

drop table if exists sample;
create table if not exists sample (
    id TEXT primary key,
    email TEXT,
    deleted_at TEXT,
    UNIQUE(email, deleted_at)
) strict;

然后尝试插入两条看起来似乎违反唯一性约束的记录：

insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null);
insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);

令人惊讶的是，这两条记录都能成功插入表中。这是因为在SQL中，每个NULL值都被视为与其他NULL值不同，所以这两行数据在数据库看来是满足唯一性约束的。

二、这种处理方式的原因探究

根据SQLite的文档，SQLite以及其他遵循SQL标准的数据库之所以这样处理NULL值，是为了与其他数据库的实现方式保持一致。然而，有趣的是，实际上没有一个数据库完全遵循SQL标准规范中关于NULL值的处理方式。SQL标准文档似乎建议NULL值在所有情况下都应该是不同的，但在实际的SELECT DISTINCT语句或UNION操作中，目前测试过的SQL引擎都没有将NULL值视为不同的值。

以我们之前创建的表结构为例，UNIQUE(email, deleted_at)约束确保没有两行具有相同的email和deleted_at组合，但允许存在相同email的多行记录，只要deleted_at不同。这种处理方式虽然在一定程度上保持了数据库之间的兼容性，但也给开发者带来了不少困惑。

三、确保唯一性的方法

（一）使用生成列（Generated Column）

为了解决NULL值不确定性带来的问题，我们可以创建一个具有确定性值的新字段。一种方法是使用生成列，它在插入和更新数据时会自动生成值。例如：

CREATE TABLE sample (
    id TEXT PRIMARY KEY,
    email TEXT,
    deleted_at TEXT, -- 可为空
    _deleted_at_coalesced TEXT GENERATED ALWAYS 
        AS (COALESCE(deleted_at, '1970-01-01')) STORED, -- 不可为空
    UNIQUE(email, _deleted_at_coalesced)
) STRICT;

在这个示例中，当deleted_at为NULL时，_deleted_at_coalesced字段会被设置为'1970-01-01'。这样就为每行数据提供了一个确定的值，从而确保了唯一性约束的有效性。

我们可以通过以下完整示例来测试这种方法：

drop table if exists sample;
CREATE TABLE sample (
    id TEXT PRIMARY KEY,
    email TEXT,
    deleted_at TEXT, 
    _deleted_at_coalesced TEXT GENERATED ALWAYS AS (COALESCE(deleted_at, '1970-01-01')) STORED, 
    UNIQUE(email, _deleted_at_coalesced)
) STRICT;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null); -- 由于 email 和生成列的约束，此行插入将失败，取消注释可测试
-- insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
insert into sample (id, email, deleted_at) values ('3', 'ray@mail.com', '2024-11-12T00:00:00.000Z');
insert into sample (id, email, deleted_at) values ('4', 'ray@mail.com', '2024-11-11T01:00:00.000Z');
insert into sample (id, email, deleted_at) values ('6', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T02:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('7', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T03:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('8', 'different@mail.com', null);
select * from sample;

需要注意的是，这种方法虽然有效，但也存在一个缺点。当尝试删除相同的记录两次（即当记录元组已经存在时），可能会出现问题。例如：

drop table if exists sample;
CREATE TABLE sample (
    id TEXT PRIMARY KEY,
    email TEXT,
    deleted_at TEXT, 
    _deleted_at_coalesced TEXT GENERATED ALWAYS AS (COALESCE(deleted_at, '1970-01-01')) STORED, 
    UNIQUE(email, _deleted_at_coalesced)
) STRICT;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null);
update sample set deleted_at = '2024-11-11T03:00:00.000Z' where id is 1;
insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
select * from sample; -- 由于 email 和生成列元组已经存在，此行更新将失败，取消注释可测试
-- update sample set deleted_at = '2024-11-11T03:00:00.000Z' where id is 2;

（二）使用部分索引（Partial Index）

另一种更好的解决方法是使用部分索引。我们可以在email字段上创建一个部分索引，条件是deleted_at字段为NULL。例如：

CREATE UNIQUE INDEX if not exists idx_sample_email_deleted_at
    ON sample(email) WHERE deleted_at IS NULL;

通过以下测试示例可以看到这种方法的效果：

drop table if exists sample;
create table if not exists sample (
    id TEXT primary key,
    email TEXT,
    deleted_at TEXT
) strict;
CREATE UNIQUE INDEX if not exists idx_sample_email_deleted_at
    ON sample(email) WHERE deleted_at IS NULL;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null); -- 由于 idx_sample_email_deleted_at 索引，此行插入将失败，取消注释可测试
-- insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
insert into sample (id, email, deleted_at) values ('3', 'ray@mail.com', '2024-11-12T00:00:00.000Z');
insert into sample (id, email, deleted_at) values ('4', 'ray@mail.com', '2024-11-11T01:00:00.000Z');
insert into sample (id, email, deleted_at) values ('6', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T02:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('7', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T03:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('8', 'different@mail.com', null);
select * from sample;

使用部分索引不仅可以确保唯一性约束的有效性，而且不会使表变得更宽，无需管理额外的字段，占用空间更少，并且在重复删除相同记录对时也不容易出错。