| This page is user contributed documentation. See the bottom of the page for information about the author. | |
Referential integrity is the feature of a database ensuring implied relationships in the database are enforced. It is a feature of most database systems, and protects users from accidentally (or intentionally!) creating discrepencies in their database.
In this chapter, we'll work with a sample set of data involving people, tasks they are responsible for, and appointments you have scheduled with them. As you fire employees, you delete them from your person table, but want to be certain you can't leave tasks no one is in charge of, or leave appointments where the persons name cannot be located.
If you're not familiar with the concepts of Primary Keys and Foreign Keys, it might be helpful to review a basic database primer, as this tutorial assumes basic knowledge of these concepts.
First, because we'll be playing with the system tables, it's best to do this in a practice database :
# CREATE DATABASE RI_test;
# \c ri_test
Now, let's create practice tables. We'll have one parent table, 'pers', and two child tables, 'tasks' and 'appts'.
The parent table is straightforward :
# CREATE TABLE pers (pid INT NOT NULL PRIMARY KEY,
pname TEXT NOT NULL);
The first child table will hold tasks for which a person is responsible :
# CREATE TABLE tasks (taskid SERIAL NOT NULL PRIMARY KEY,
pid INT NOT NULL
CONSTRAINT tasks__ref_p
REFERENCES pers,
task TEXT NOT NULL);
A few notes :
So, let's add some sample data :
INSERT INTO pers VALUES (1, 'Jeff Brown');
INSERT INTO pers VALUES (2, 'Maria Lane');
INSERT INTO tasks (pid, task) VALUES (1, 'Write contract');
INSERT INTO tasks (pid, task) VALUES (1, 'Upgrade database');
So far so good.
If we try to insert a child that has no parent, i.e. :
INSERT INTO tasks (pid, task) VALUES (3, 'Install Linux');
we'll get a referential integrity error. This error will refer to our constraint name (if any), and will block the entering of this data into 'tasks'. In addition, if this is part of a transaction, it will rollback the entire transaction, as always happens in PostgreSQL.
Also, if we try to change a parent that has children :
DELETE FROM pers WHERE pname = 'Jeff Brown';
We'll get the same problem, as we cannot delete Jeff while he has tasks assigned.
This will work, as there are presently no child tasks for Maria :
DELETE FROM pers WHERE pname = 'Maria Lane';
If you did the above step, you'll need to add Maria back for the examples further on :
INSERT INTO pers VALUES (2, 'Maria Lane');
Default (NO ACTION) or RESTRICT - which mean the same thing to PostgreSQL - ON UPDATE and ON DELETE rules make sense for many situations like this. You wouldn't want to be able to delete a staff person from your database if they had certain resposibilities, otherwise you would never know who was in charge of different tasks.
A More Permissive Child Table : Appointments
Let's create a second child, 'appts' (short for Appointments) :
# CREATE TABLE appts (apptid SERIAL NOT NULL PRIMARY KEY,
pid INT NOT NULL
CONSTRAINT appt__ref_pers
REFERENCES pers
ON UPDATE CASCADE
ON DELETE CASCADE
DEFERRABLE,
apptsubj TEXT NOT NULL,
apptdate DATE NOT NULL);
and insert some data into it :
INSERT INTO appts (pid, apptsubj, apptdate)
VALUES (1, 'Discuss raise', '2001-01-15');
INSERT INTO appts (pid, apptsubj, apptdate)
VALUES (2, 'Plan project', '2001-01-20');
This works just fine. Also, just like before :
INSERT INTO appts (pid, apptsubj, apptdate)
VALUES (3, 'Termination', '2001-01-20');
does not work, as there is no pid = 3 in the 'pers' table.
However, unlike the 'tasks' table, 'appts' is set to CASCADE deletes and updates. This means if we update a person's pid, or delete a person entirely, PostgreSQL will allow and assist this by first deleting or updating any tasks which exist for the person.
So :
UPDATE pers SET pid=14 WHERE pid=2;
This will change Maria's pid to 14, first changing the connecting pid in the 'appts' table.
DELETE FROM pers WHERE pid=14;
This will delete Maria's appointment, then delete Maria.
Other Relationships
Notice that CASCADE in this class only refers to the CASCADE between the 'pers' and 'appt' tables. Trying to DELETE Jeff would still fail because even though the pers-appt relationship would CASCADE, the pers-tasks relationship would fail, and PostgreSQL would report the DELETE attempt as a violation of this referential integrity.
SET NULL and SET DEFAULT can be useful options, especially for ON DELETE. For example, we might have a table, 'offices', keeping track of which office a worker uses. It might contain fields for officeid, officelocation, and persid. If we wanted to delete a person, we shouldn't be stopped just because the person has a related office, but similarly, we don't want to delete an office only because this person is being deleted. In some cases, the best option may be to set the persid field for 'offices' to NULL or DEFAULT, leaving the office in place, but making it clear this office is now unused.
Note that ON UPDATE and ON DELETE can have different rules. It's very common for instance to ON UPDATE CASCADE but ON DELETE RESTRICT - allowing people to change their ID's, but not allowing deletion of tasks if a related person exists.
It's also possible to change the actions for a relationship, and this requires a little hacking in the system catalog tables. See Hacking Referential Integrity, below.
Deferring transactions
By default, referential integrity is checked for every single relationship, for every single insert, delete or update which could affect this relationship.
This means :
INSERT INTO tasks (pid, task) VALUES (5, 'Open sales office');
INSERT INTO pers (pid, name) VALUES (5, 'Helen Kim');
would fail, because at the time of the first attempted insert, there is no person with pid=5. Much of the time, this is the most intuitive setting.
However, sometimes you may not be able to predict the exact order data arrives. Perhaps you are receiving data loaded from a text file, or from across the web. It's possible the data may not arrive in the order above (task, then associated person). When this happens, you can choose to defer the transaction checking.
In order to defer a transaction, three things must happen :
For the table 'appts' relationship to 'pers', we have declared this as DEFERRED. So, if we do :
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
INSERT INTO appts (pid, apptsubj, apptdate)
VALUES (5, 'Negotiation', '2001-01-20');
INSERT INTO pers VALUES (5, 'Helen Kim');
COMMIT;
This works just fine.
Notice having the relationship declared as DEFERRABLE is not enough - we must also use SET to explicity set CONSTRAINTS to deferred. In this example, we have set all relationships to DEFERRED; instead we could set only a single constraint to deferred, as in :
SET CONSTRAINTS appts__ref_pers DEFERRABLE;
It may be convenient to have a relationship already set, rather than having to set this for every transaction. To do this, add "INITIALLY DEFERRED" to the CONSTRAINT ... REFERENCES declaration for the table.
i.e. :
# CREATE TABLE appts (apptid SERIAL NOT NULL PRIMARY KEY,
pid INT NOT NULL
CONSTRAINT appt__ref_pers
REFERENCES pers
ON UPDATE CASCADE
ON DELETE CASCADE
DEFERRABLE
INITIALLY DEFERRED,
apptsubj TEXT NOT NULL,
apptdate DATE NOT NULL);
Now, we can simply :
BEGIN;
INSERT INTO appts (pid, apptsubj, apptdate)
VALUES (5, 'Negotiation', '2001-01-20');
INSERT INTO pers VALUES (5, 'Helen Kim');
COMMIT;
With no explicit SET command. This
is especially convenient if you
work in a programming setting which abstracts SQL
commands and makes it difficult to execute an arbitrary,
nonstandard SQL command such as "SET...".
Referential integrity works great in PostgreSQL. However, PostgreSQL does not yet have SQL-synax commands to change actions, turn on/off referential integrity, etc.
These actions can be performed by editing the information stored in the system catalog tables directly.
WARNING
To edit the system catalog tables, you must be a superuser in PostgreSQL. In addition, you should be VERY CAREFUL when editing these tables, and make sure you have a backup first (via pg_dump). An accidental table-wide UPDATE or DELETE could delete all of your tables, ruin your indexes, corrupt your database, etc. Practice this by working in a test database, preferrably on a machine without any other critical databases.
There are several system catalog tables of interest to us :
pg_class
all "classes" including tables, views, sequences, etc.
pg_trigger
all triggers. PostgreSQL handles referential
integrity using behind-the-scenes triggers, so this
is where all of your referential integrity controls
are stored.
pg_proc
all PostgreSQL procedures. We won't need
to make any changes to this, but we will use it to show what the referential integrity procedures used actually are.
To see (most) of the system tables, you can use the command \dS in psql.
For example, let's collect information from pg_class about our test tables :
# SELECT oid, relname FROM pg_class WHERE relname IN ('pers','appts','tasks');
PostgreSQL usually lowercases unquoted system identifiers such as tables for us automatically (i.e. I can "CREATE TABLE foo" and "SELECT * FROM FOO" still works.) However, when examining pg_class, you must work in a case-sensitive manner, or use case-insensitive operators.
This query returns :
| oid | relname |
| 9100 | pers |
| 9110 | tasks |
| 9120 | appts |
(3 rows)
Your OIDs will be different. Don't worry, but do take notice of what they are.
Now, if we look in pg_trigger, we can find the triggers used by our tables.
# SELECT * from pg_trigger WHERE tgrelid in (9100,
9110, 9120);
Note, the output table from this command was too wide to fit in this HTML page. You can find it here.
The columns in pg_trigger are :
tgrelid
The table the trigger is on. The table
which is inserted/updated/deleted and calls the trigger.
tgname
Name of the trigger. Trigger names are
generated by the referential integrity feature of
PostgreSQL and are unimportant, except they are unique.
tgfoid
OID of the function to be is called. This
is an important column; by changing the function called,
we can change which action (CASCADE, etc.) is performed.
See below.
tgtype
What kind of trigger is this? UPDATE, DELETE, etc.
tgenabled
Is this trigger enabled?
tgisconstraint
Is this trigger part of a constraint? Non-Referential
integrity triggers user-declared by CREATE TRIGGER
may be FALSE for this, but referential integrity triggers
will always be true.
tgconstrname
The name of the constraint which calls this trigger. If you named your
constraints (as we did, i.e. pers__ref_tasks), this will be the constraint name,
otherwise it will be unnamed.
tgconstrrelid
OID of table which had the constraint set.
tgdeferrable
Can this constraint be deferred? Equivalent
to DEFERRABLE. See above.
tginitdeferred
Is this constraint initially deferred? Equivalent
to INITIALLY DEFERRED. See above.
tgnargs
Number of arguments for the referential integrity
function. As of PostgreSQL 7.1, this always
seems to be 6, and should not be edited.
tgattr
Unsure
tgargs
The actual arguments to the referential integrity
function.
To help make this more understandable and usable, a view is helpful :
CREATE VIEW dev_ri
AS
SELECT t.oid as trigoid,
c.relname as trig_tbl,
t.tgfoid,
f.proname as trigfunc,
t.tgenabled,
t.tgconstrname,
c2.relname as const_tbl,
t.tgdeferrable,
t.tginitdeferred
FROM pg_trigger t,
pg_class c,
pg_class c2,
pg_proc f
WHERE t.tgrelid=c.oid
AND t.tgconstrrelid=c2.oid
AND tgfoid=f.oid
AND tgname ~ '^RI_'
ORDER BY t.oid;
This view requires PostgreSQL 7.1 because of the ORDER BY statement. For PostgreSQL versions earlier than 7.1, you should remove the ORDER BY statement.
# select * from dev_ri;
Note, the output table from this command was too wide to fit in this HTML page. You can find it here.
Now, it's much easier to understand what's happening. For example, from trigger oid=263753, we can see for inserts to the 'tasks' table, RI_FKey_check_ins is called, which checks the 'pers' table.
From here, we can :
UPDATE pg_trigger SET tgenabled=FALSE WHERE oid=xxx;UPDATE pg_trigger SET tgdeferrable=[
TRUE | FALSE ] WHERE oid=xxx;UPDATE pg_trigger SET tginitdeferred=[
TRUE | FALSE ] WHERE oid=xxx;SELECT oid, proname FROM pg_proc where proname ~ '^RI_';| oid | proname |
| 1646 | RI_FKey_cascade_del |
| 1647 | RI_FKey_cascade_upd |
| 1644 | RI_FKey_check_ins |
| 1645 | RI_FKey_check_upd |
| 1654 | RI_FKey_noaction_del |
| 1655 | RI_FKey_noaction_upd |
| 1648 | RI_FKey_restrict_del |
| 1649 | RI_FKey_restrict_upd |
| 1652 | RI_FKey_setdefault_del |
| 1653 | RI_FKey_setdefault_upd |
| 1650 | RI_FKey_setnull_del |
| 1651 | RI_FKey_setnull_upd |
In our example, to set updates on pers(pid) to CASCADE, rather than NO ACTION on pers-tasks :
UPDATE pg_trigger SET tgfoid=1647 WHERE oid=xxx;
where xxx is our current noaction_upd trigger for pers-tasks
Note these changes often require a new
backend. Quit and
restart psql, or reset your client connection, and
you should be able
to test out your new settings.