Java Persistence/Identity and Sequencing
From Wikibooks, the open-content textbooks collection
Contents
|
[edit] Identity
An object id (OID) is something that uniquely identifies an object. Within a VM this is typically the object's pointer. In a relational database table a row is uniquely identified in its' table by its' primary key. When persisting objects to a database you need a unique identifier for the objects, this allows you to query the object, define relationships to the object, and update and delete the object. In JPA the object id is defined through the @Id annotation or <id> element and should correspond to the primary key of the object's table.
[edit] Example id annotation
... @Entity public class Employee { @Id private long id ... }
[edit] Example id XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id name="id"/> <entity/>
[edit] Common Problems
[edit] Strange behavior, unique constraint violation.
- You must never change the id of an object. Doing so will cause errors, or strange behavior depending on your JPA provider. Also do not create two objects with the same id, or try persisting an object with the same id as an existing object. If you have an object that may be existing use the
EntityManagermerge()API, do not usepersist()for an existing object, and avoid relating an un-managed existing object to other managed objects.
[edit] No primary key.
- See No Primary Key.
[edit] Sequencing
An object id can either be a natural id or a generated id. A natural id is one that occurs in the object and has some meaning in the application. Examples of natural ids include user ids, email addresses, phone numbers, and social insurance numbers. A generated id is one that is generated by the system. A sequence number in JPA is a sequential id generated by the JPA implementation and automatically assigned to new objects. The benefits of using sequence numbers are that they are guaranteed to be unique, allow all other data of the object to change, are efficient values for querying and indexes, and can be efficiently assigned. The main issue with natural ids is that everything always changes at some point; even a person's social insurance number can change. Natural ids can also make querying, foreign keys and indexing less efficient in the database.
In JPA an @Id can be easily assigned a generated sequence number through the @GeneratedValue annotation, or <generated-value> element.
[edit] Example generated id annotation
... @Entity public class Employee { @Id @GeneratedValue private long id ... }
[edit] Example generated id XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id name="id"> <generated-value/> </id> <entity/>
[edit] Sequence Strategies
There are several strategies for generating unique ids. Some strategies are database agnostic and others make use of built-in databases support.
JPA provides support for several strategies for id generation defined through the GenerationType enum, TABLE, SEQUENCE and IDENTITY.
The choice of which sequence strategy to use is important as it effect performance, concurrency and portability.
[edit] Table sequencing
Table sequencing uses a table in the database to generate unique ids. The table has two columns, one stores the name of the sequence, the other stores the last id value that was assigned. There is a row in the sequence table for each sequence object. Each time a new id is required the row for that sequence is incremented and the new id value is passed back to the application to be assigned to an object. This is just one example of a sequence table schema, for other table sequencing schemas see Customizing.
Table sequencing is the most portable solution because it just uses a regular database table, so unlike sequence and identity can be used on any database. Table sequencing also provides good performance because it allows for sequence pre-allocation, which is extremely important to insert performance, but can have potential concurrency issues.
In JPA the @TableGenerator annotation or element is used to define a sequence table. The TableGenerator defines a pkColumnNamefor the column used to store the name of the sequence, valueColumnNamefor the column used to store the last id allocated, and pkColumnValuefor the value to store in the name column (normally the sequence name).
[edit] Example sequence table
SEQUENCE_TABLE
| SEQ_NAME | SEQ_COUNT |
| EMP_SEQ | 123 |
| PROJ_SEQ | 550 |
[edit] Example table generator annotation
... @Entity public class Employee { @Id @GeneratedValue(strategy=TABLE, generator="EMP_SEQ") @TableGenerator(name="EMP_SEQ", table="SEQUENCE_TABLE", pkColumnName="SEQ_NAME", valueColumnName="SEQ_COUNT", pkColumnValue="EMP_SEQ") private long id; ... }
[edit] Example table generator XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id name="id"> <generated-value strategy="TABLE" generator="EMP_SEQ"/> <table-generator name="EMP_SEQ" table="SEQUENCE_TABLE" pk-column-name="SEQ_NAME", value-column-name="SEQ_COUNT", pk-column-value="EMP_SEQ"/> </id> <entity/>
[edit] Common Problems
[edit] Error when allocating a sequence number.
- Errors such as "table not found", "invalid column" can occur if you do not have a SEQUENCE table defined in your database, or its' schema does not match what your configured, or what your JPA provider is expecting by default. Ensure you create the sequence table correctly, or configure your
@TableGeneratorto match the table that you created, or let your JPA provider create you tables for you (most JPA provider support schema creation). You may also get an error such as "sequence not found", this means you did not create a row in the table for your sequence. You must insert an initial row in the sequence table for your sequence with the initial id (i.e.INSERT INTO SEQUENCE_TABLE (SEQ_NAME, SEQ_COUNT) VALUES ("EMP_SEQ", 0)), or let your JPA provider create your schema for you.
[edit] Deadlock or poor concurrency in the sequence table.
- See concurrency issues.
[edit] Sequence objects
Sequence objects use special database objects to generate ids. Sequence objects are only supported in some databases, such as Oracle and Postgres. In Oracle a SEQUENCE object has a name, INCREMENT, and other database object settings. Each time the <sequence>.NEXTVAL is selected the sequence is incremented by the INCREMENT.
Sequence objects provide the optimal sequencing option, as they are the most efficient and have the best concurrency, however they are the least portable as most databases do not support them. Sequence objects support sequence preallocation through setting the INCREMENT on the database sequence object to the sequence preallocation size.
In JPA the @SequenceGenerator annotation or <sequence-generator> element is used to define a sequence object. The SequenceGenerator defines a sequenceName for the name of the database sequence object, and an allocationSize for the sequence preallocation size or sequence object INCREMENT.
[edit] Example sequence generator annotation
... @Entity public class Employee { @Id @GeneratedValue(strategy=SEQUENCE, generator="EMP_SEQ") @SequenceGenerator(name="EMP_SEQ", sequenceName="EMP_SEQ", allocationSize=100) private long id; ... }
[edit] Example sequence generator XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id name="id"> <generated-value strategy="SEQUENCE" generator="EMP_SEQ"/> <sequence-generator name="EMP_SEQ" sequence-name="EMP_SEQ" allocation-size="100"/> </id> <entity/>
[edit] Common Problems
[edit] Error when allocating a sequence number.
- Errors such as "sequence not found", can occur if you do not have a SEQUENCE object defined in your database. Ensure you create the sequence object, or let your JPA provider create your schema for you (most JPA providers support schema creation). When creating your sequence object, ensure the sequence's
INCREMENTmatches yourSequenceGenerator'sallocationSize. The DDL to create a sequence object depends on the database, for Oracle it is,CREATE SEQUENCE EMP_SEQ INCREMENT BY 100 START WITH 100.
[edit] Invalid, duplicate or negative sequence numbers.
- This can occur if you sequence object's
INCREMENTdoes not match yourallocationSize. This results in the JPA provider thinking it got back more sequences than it really did, and ends up duplicating values, or with negative numbers. This can also occur on some JPA providers if you sequence object'sSTARTS WITHis 0 instead of a value equal or greater to theallocationSize.
[edit] Identity sequencing
Identity sequencing uses special IDENTITY columns in the database to allow the database to automatically assign an id to the object when its' row is inserted. Identity columns are supported in many databases, such as MySQL, DB2, SQL Server, Sybase and Postgres. Oracle does not support IDENTITY columns but they can be simulated through using sequence objects and triggers.
Although identity sequencing seems like the easiest method to assign an id, they have several issues. One is that since the id is not assigned by the database until the row is inserted the id cannot be obtained in the object until after commit or after a flush call. Identity sequencing also does not allow for sequence preallocation, so can require a select for each object that is inserted, potentially causing a major performance problem, so in general are not recommended.
In JPA there is no annotation or element for identity sequencing as there is no additional information to specify. Only the GeneratedValue's strategy needs to be set to IDENTITY.
[edit] Example identity annotation
... @Entity public class Employee { @Id @GeneratedValue(strategy=IDENTITY) private long id; ... }
[edit] Example identity XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id name="id"> <generated-value strategy="IDENTITY"/> </id> <entity/>
[edit] Common Problems
[edit] null is inserted into the database, or error on insert.
- This typically occurs because the @Id was not configured to use an @GeneratedValue(strategy="IDENTITY"). Ensure it is configured correctly. It could also be that your JPA provider does not support identity sequencing on the database platform that you are using, or you have not configured your database platform. Most providers require that you set the database platform through a persistence.xml property, most provider also allow you to customize your own platform if it is not directly supported. It may also be that you did not set your primary key column in your table to be an identity type.
[edit] Object's id is not assign after persist.
- Identity sequencing requires the insert to occur before the id can be assigned, so it is not assigned on persist like other types of sequencing. You must either call
commit()on the current transaction, or callflush()on theEntityManager. It may also be that you did not set your primary key column in your table to be an identity type.
[edit] Poor insert performance.
- Identity sequencing does not support sequence preallocation, so requires a select after each insert, in some cases doubling the insert cost. Consider using a sequence table, or sequence object to allow sequence preallocation.
[edit] Advanced
[edit] Composite Primary Keys
A composite primary key is one that is made up of several columns in the table. A composite primary key can be used if no single column in the table is unique. In general it is normally more efficient and much simpler to have a singleton primary key, such as a generated sequence number, but sometimes a composite primary key is desirable and unavoidable.
Composite primary keys can be common in legacy database schemas, where cascaded keys can sometimes be used. This is where you have a model where dependent objects include their parent's primary key, i.e. COMPANY's primary key is COMPANY_ID, DEPARTMENT's primary key is composed of COMPANY_ID, and DEP_ID, EMPLOYEE's primary key is composed of COMPANY_ID, DEP_ID, and EMP_ID, and so on. Some OO Java designers may find this type of model disgusting, but some DBA's actually think it is the correct model. Issues with the model include the obvious fact that Employee's cannot switch departments, but also foreign key relationships become more complex and all primary key queries, updates, deletes, caching become less efficient. On the plus side, each department has control over their own ids, and it you need to partition the database EMPLOYEE table, you can easily do so based on the COMPANY_ID or DEP_ID, as these are included in every query.
Other common usages of composite primary key include many-to-many relationships where the join table has additional columns, so the table is mapped to an object, whose primary key consists of both foreign key columns. Also dependent or aggregate one-to-many relationships where the child object's primary key consists of its' parent's primary key and a locally unique field.
There are two methods of declaring a composite primary key in JPA, IdClass and EmbeddedId.
[edit] Id Class
An IdClass defines a seperate Java class to represent the primary key. It is defined through the @IdClass annotation or <id-class> XML element. The IdClass must define an attribute (field/property) that mirrors each Id attribute in the entity. It must have the same attribute name and type. When using an IdClass you still require to mark each Id attribute in the entity with @Id.
The main purpose of the IdClass is to be used as the structure passed to the EntityManager find() and getReference() API. Some JPA products also use the IdClass as a cache key to track an object's identity. Because of this, it is required (depending on JPA product) to implement an equals() and hashCode() method on the IdClass. Ensure that the equals() method checks each part of the primary key, and correctly uses equals for objects and == for primitives. Ensure that the hashCode() method will return the same value for two equal objects.
- TopLink / EclipseLink : Do not require the implementation of
equals()orhashCode()in the id class.
[edit] Example id class annotation
... @Entity @IdClass(EmployeePK.class) public class Employee { @Id private long employeeId @Id private long companyId @Id private long departmentId ... }
[edit] Example id class XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <id-class class="org.acme.EmployeePK"/> <id name="employeeId"/> <id name="companyId"/> <id name="departmentId"/> <entity/>
[edit] Example id class
... public class EmployeePK { private long employeeId; private long companyId; private long departmentId; public EmployeePK(long employeeId, long companyId, long departmentId) { this.employeeId = employeeId; this.companyId = companyId; this.departmentId = departmentId; } public boolean equals(Object object) { if (object instanceof EmployeePK) { EmployeePK pk = (EmployeePK)object; return employeeId == pk.employeeId && companyId == pk.companyId && departmentId == pk.departmentId; } else { return false; } } public int hashCode() { return employeeId + companyId + departmentId; } }
[edit] Embedded Id
An EmbeddedId defines a seperate Embeddable Java class to contain the entities primary key. It is defined through the @EmbeddedId annotation or <embedded-id> XML element. The EmbeddedId's Embeddable class must define each id attribute for the entity using Basic mappings. All attributes in the EmbeddedId's Embeddable are assumed to be part of the primary key.
The EmbeddedId is also used as the structure passed to the EntityManager find() and getReference() API. Some JPA products also use the EmbeddedId as a cache key to track an object's identity. Because of this, it is required (depending on JPA product) to implement an equals() and hashCode() method on the EmbeddedId. Ensure that the equals() method checks each part of the primary key, and correctly uses equals for objects and == for primitives. Ensure that the hashCode() method will return the same value for two equal objects.
- TopLink / EclipseLink : Do not require the implementation of
equals()orhashCode()in the id class.
[edit] Example embedded id annotation
... @Entity public class Employee { @EmbeddedId private EmployeePK id ... }
[edit] Example embedded id XML
<entity name="Employee" class="org.acme.Employee" access="FIELD"> <embedded-id class="org.acme.EmployeePK"/> <entity/> <embeddable name="EmployeePK" class="org.acme.EmployeePK" access="FIELD"> <basic name="employeeId"/> <basic name="companyId"/> <basic name="departmentId"/> <embeddable/>
[edit] Example embedded id class
... @Embeddable public class EmployeePK { @Basic private long employeeId @Basic private long companyId @Basic private long departmentId public EmployeePK(long employeeId, long companyId, long departmentId) { this.departmentId = employeeId; this.departmentId = companyId; this.departmentId = departmentId; } public boolean equals(Object object) { if (object instanceof EmployeePK) { EmployeePK pk = (EmployeePK)object; return employeeId == pk.employeeId && companyId == pk.companyId && departmentId == pk.departmentId; } else { return false; } } public int hashCode() { return employeeId + companyId + departmentId; } }
[edit] Primary Keys through OneToOne Relationships
A common model is to have a dependent object share the primary key of its parent. In the case of a OneToOne the child's primary key is the same as the parent, and in the case of a ManyToOne the child's primary key is composed of the parent's primary key and another locally unique field.
Unfortunately JPA does not handle this model well, and things become complicated, so to make your life a little easier you may consider defining a generated unique id for the child. It would be simple if JPA allowed the @Id annotation on a OneToOne or ManyToOne mapping, but it does not. JPA requires that all @Id mappings be Basic mappings, so if your Id comes from a foreign key column through a OneToOne or ManyToOne mapping, you must also define a Basic @Id mapping for the foreign key column. The reason for this is in part that the Id must be a simple object for identity and caching purposes, and for use in the IdClass or the EntityManager find() API.
Because you now have two mappings for the same foreign key column you must define which one will be written to the database (it must be the Basic one), so the OneToOne or ManyToOne foreign key must be defined to be read-only. This is done through setting the JoinColumn attributes insertable and updatable to false, or by using the @PrimaryKeyJoinColumn instead of the @JoinColumn.
A side effect of having two mappings for the same column is that you now have to keep the two in synch. This is typically done through having the set method for the OneToOne attribute also set the Basic attribute value to the target object's id. This can become very complicated if the target object's primary key is a GeneratedValue, in this case you must ensure that the target object's id has been assigned before relating the two objects.
Some times I think that JPA primary keys would be much simpler if they were just defined on the entity using a collection of Columns instead of mixing them up with the attribute mapping. This would leave you free to map the primary key field in any manner you desired. A generic List could be used to pass the primary key to find() methods, and it would be the JPA provider's responsibility for hashing and comparing the primary key correctly instead of the user's IdClass. But perhaps for simple singleton primary key models the JPA model is more straight forward.
- TopLink / EclipseLink : Allow the primary key to be specified as a list of columns instead of using
Idmappings. This allowsOneToOneandManyToOnemapping foreign keys to be used as the primary key without requiring a duplicate mapping. It also allows the primary key to be defined through any other mapping type. This is set through using aDescriptorCustomizerand theClassDescriptoraddPrimaryKeyFieldNameAPI.
- Hibernate : Allows the
@Idannotation to be used on aOneToOneorManyToOnemapping.
[edit] Example OneToOne id annotation
... @Entity public class Address { @Id @Column(name="OWNER_ID") private long ownerId; @OneToOne @PrimaryKeyJoinColumn(name="OWNER_ID", referencedColumnName="EMP_ID") private Employee owner; ... public void setOwner(Employee owner) { this.owner = owner; this.ownerId = owner.getId(); } ... }
[edit] Example OneToOne id XML
<entity name="Address" class="org.acme.Address" access="FIELD"> <id name="ownerId"> <column name="OWNER_ID"/> </id> <one-to-one name="owner"> <primary-key-join-column name="OWNER_ID" referencedColumnName="EMP_ID"/> </one-to-one> <entity/>
[edit] Example ManyToOne id annotation
... @Entity @IdClass(PhonePK.class) public class Phone { @Id @Column(name="OWNER_ID") private long ownerId; @Id private String type; @ManyToOne @PrimaryKeyJoinColumn(name="OWNER_ID", referencedColumnName="EMP_ID") private Employee owner; ... public void setOwner(Employee owner) { this.owner = owner; this.ownerId = owner.getId(); } ... }
[edit] Example ManyToOne id XML
<entity name="Address" class="org.acme.Address" access="FIELD"> <id-class class="org.acme.PhonePK"/> <id name="ownerId"> <column name="OWNER_ID"/> </id> <id name="type"/> <many-to-one name="owner"> <primary-key-join-column name="OWNER_ID" referencedColumnName="EMP_ID"/> </many-to-one> <entity/>
[edit] Advanced Sequencing
[edit] Concurrency and Deadlocks
One issue with table sequencing is that the sequence table can become a concurrency bottleneck, even causing deadlocks. If the sequence ids are allocated in the same transaction as the insert, this can cause poor concurrency, as the sequence row will be locked for the duration of the transaction, preventing any other transaction that needs to allocate a sequence id. In some cases the entire sequence table or the table page could be locked causing even transactions allocating other sequences to wait or even deadlock. If a large sequence pre-allocation size is used this becomes less of an issue, because the sequence table is rarely accessed. Some JPA providers use a separate (non-JTA) connection to allocate the sequence ids in, avoiding or limiting this issue. In this case, if you use a JTA data-source connection, it is important to also include a non-JTA data-source connection in your persistence.xml.
[edit] Guaranteeing Sequential Ids
Table sequencing also allows for truly sequential ids to be allocated. Sequence and identity sequencing are non-transactional and typically cache values on the database, leading to large gaps in the ids that are allocated. Typically this is not an issue and desired to have good performance, however if performance and concurrency are less of a concern, and true sequential ids are desired then a table sequence can be used. By setting the allocationSize of the sequence to 1 and ensuring the sequence ids are allocated in the same transaction of the insert, you can guarantee sequence ids without gaps (but generally it is much better to live with the gaps and have good performance).
[edit] Running Out of Numbers
One paranoid delusional fear that programmers frequently have is running out of sequence numbers. Since most sequence strategies just keep incrementing a number it is unavoidable that you will eventually run out. However as long a large enough numeric precision is used to store the sequence id this is not an issue. For example if you stored your id in a NUMBER(5) column, this would allow 99,999 different ids, which on most systems would eventually run out. However if you store your id in a NUMBER(10) column, which is more typical, this would store 9,999,999,999 ids, or one id each second for about 300 years (longer than most databases exist). But perhaps your system will process a lot of data, and (hopefully) be around a very long time. If you store your id in a NUMBER(20) this would be 99,999,999,999,999,999,999 ids, or one id each millisecond for about 3,000,000,000 years, which is pretty safe.
But you also need to store this id in Java. If you store the id in a Java int, this would be a 32 bit number , which is 4,294,967,296 different ids, or one id each second for about 200 years. If you instead use a long, this would be a 64 bit number, which is 18,446,744,073,709,551,616 different ids, or one id each millisecond for about 600,000,000 years, which is pretty safe.
[edit] Customizing
JPA supports three different strategies for generating ids, however there are many other methods. Normally the JPA strategies are sufficient, so you would only use a different method in a legacy situation.
Sometimes the application has an application specific strategy for generating ids, such as prefixing ids with the country code, or branch number. There are several ways to integrate a customize ids generation strategy, the simplest is just define the id as a normal id and have the application assign the id value when the object is created.
Some JPA products provide additional sequencing and id generation options, and configuration hooks.
- TopLink, EclipseLink : Several additional sequencing options are provided. A
UnaryTableSequenceallows a single column table to be used. AQuerySequenceallows for custom SQL or stored procedures to be used. An API also exists to allow a user to supply their own code for allocating ids.
- Hibernate : A GUID id generation options is provided through the
@GenericGeneratorannotation.
[edit] Primary Keys through Triggers
A database table can be defined to have a trigger that automatically assign its' primary key. Generally this is normally not a good idea (although some DBAs may think it is), and it is better to use a JPA provider generated sequence id, or assign the id in the application. The main issue with the id being assigned in a trigger is that the application and object require this value back. For non-primary key values assigned through triggers it is possible to refresh the object after committing or flushing the object to obtain the values back. However this is not possible for the id, as the id is required to refresh an object.
If you have an alternative way to select the id generated by the trigger, such as selecting the object's row using another unique field, you could issue this SQL select after the insert to obtain the id and set it back in the object. You could perform this select in a JPA @PostPersist event. Some JPA providers may not allow/like a query execution during an event, they also may not pick up a change to an object during an event callback, so there may be issues with doing this. Also some JPA providers may not allow the primary key to be un-assigned/null when not using a GeneratedValue, so you may have issues. Some JPA providers have built-in support for returning values assigned in a trigger (or stored procedure) back into the object.
- TopLink / EclipseLink : Provide a
ReturningPolicythat allows for any field values including the primary key to be returned from the database after an insert or update. This is defined through the@ReturnInsert,@ReturnUpdateannotations, or the<return-insert>,<return-update>XML elements in the eclipselink-orm.xml.
[edit] Primary Keys through Events
If the application generates its' own id instead of using a JPA GeneratedValue, it is sometimes desirable to perform this id generation in a JPA event, instead of the application code having to generate and set the id. In JPA this can be done through the @PrePersist event.
[edit] No Primary Key
Sometimes your object or table has no primary key. The best solution in this case is normally to add a generated id to the object and table. If you do not have this option, sometimes there is a column or set of columns in the table that make up a unique value. You can use this unique set of columns as your id in JPA. The JPA Id does not always have to match the database table primary key constraint, nor is a primary key or a unique constraint required.
If your table truly has no unique columns, then use all of the columns as the id. Typically when this occurs the data is read-only, so even if the table allows duplicate rows with the same values, the objects will be the same anyway, so it does not matter that JPA thinks they are the same object. The issue with allowing updates and deletes is that there is no way to uniquely identify the object's row, so all of the matching rows will be updated or deleted.
If your object does not have an id, but its' table does, this is fine. Make the object and Embeddable object, embeddable objects do not have ids. You will need a Entity that contains this Embeddable to persist and query it.

