Java Persistence/Persisting

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Persisting[edit | edit source]

JPA uses the EntityManager API for runtime usage. The EntityManager represents the application session or dialog with the database. Each request, or each client will use its own EntityManager to access the database. The EntityManager also represents a transaction context, and in a typical stateless model a new EntityManager is created for each transaction. In a stateful model, an EntityManager may match the lifecycle of a client's session.

The EntityManager provides an API for all required persistence operations. These include the following CRUD operations:

The EntityManager is an object-oriented API, so does not map directly onto database SQL or DML operations. For example to update an object, you just need to read the object and change its state through its set methods, and then call commit on the transaction. The EntityManager figures out which objects you changed and performs the correct updates to the database, there is no explicit update operation in JPA.

Detached vs Managed[edit | edit source]

JPA defines two main states for an object for a given persistence context, managed and detached.

A managed object is one that was read in the current persistence context (EntityManager/JTA transaction). A managed object is registered with the persistence context and the persistence context will track changes to that object and maintain its object identity. If the same object is read again, in the same persistence context, or traversed through another managed object's relationship, the same identical (==) object will be returned. Calling persist on a new object will also make it become managed. Calling merge on a detached object will return the managed copy of the object. An object should never be managed by more than one persistence context. An object will be managed by its persistence context until the persistence context is cleared through clear, or the object is forced to be detached through detach. A removed object will no longer be managed after a flush or commit. On a rollback, all managed objects will become detached. In a JTA managed EntityManager all managed objects will be detached on any JTA commit or rollback.

A detached object is one that is not managed in the current persistence context. This could be an object read through a different persistence context, or an object that was cloned or serialized. A new object is also considered detached until persist is called on it. An object that was removed and flushed or committed, will become detached. An object could be considered both managed in the context of one persistence context, and detached in the context of another persistence context.

A managed object should only ever reference other managed objects, and a detached object should only reference other detached objects. Avoid relating or mixing detached and managed objects, this will normally lead to issues, as your application could access two copies of the same object causing loss of changes or stale data. Incorrectly relating managed and detached objects is probably one of the most common issues users run into in JPA.

Persist[edit | edit source]

The EntityManager.persist() operation is used to insert a new object into the database. persist does not directly insert the object into the database: it just registers it as new in the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be inserted into the database.

If the object uses a generated Id, the Id will normally be assigned to the object when persist is called, so persist can also be used to have an object's Id assigned. The one exception is if IDENTITY sequencing is used, in this case the Id is only assigned on commit or flush because the database will only assign the Id on INSERT. If the object does not use a generated Id, you should normally assign its Id before calling persist.

The persist operation can only be called within a transaction, an exception will be thrown outside of a transaction. The persist operation is in-place, in that the object being persisted will become part of the persistence context. The state of the object at the point of the commit of the transaction will be persisted, not its state at the point of the persist call.

persist should normally only be called on new objects. It is allowed to be called on existing objects if they are part of the persistence context, this is only for the purpose of cascading persist to any possible related new objects. If persist is called on an existing object that is not part of the persistence context, then an exception may be thrown, or it may be attempted to be inserted and a database constraint error may occur, or if no constraints are defined, it may be possible to have duplicate data inserted.

persist can only be called on Entity objects, not on Embeddable objects, or collections, or non-persistent objects. Embeddable objects are automatically persisted as part of their owning Entity.

Calling persist is not always required. If you related a new object to an existing object that is part of the persistence context, and the relationship is cascade persist, then it will be automatically inserted when the transaction is committed, or when the persistence context is flushed.

Example persist[edit | edit source]

EntityManager em = getEntityManager();
em.getTransaction().begin();

Employee employee = new Employee();
employee.setFirstName("Bob");
Address address = new Address();
address.setCity("Ottawa");
employee.setAddress(address);

em.persist(employee);

em.getTransaction().commit();

Cascading Persist[edit | edit source]

Calling persist on an object will also cascade the persist operation to across any relationship that is marked as cascade persist. If a relationship is not cascade persist, and a related object is new, then an exception may be thrown if you do not first call persist on the related object. Intuitively you may consider marking every relationship as cascade persist to avoid having to worry about calling persist on every objects, but this can also lead to issues.

One issue with marking all relationships cascade persist is performance. On each persist call all of the related objects will need to be traversed and checked if they reference any new objects. This can actually lead to O(n²) performance issues if you mark all relationships cascade persist, and persist a large new graph of objects. If you just call persist on the root object, this is ok. However, if you call persist on each object in the graph, then you will traverse the entire graph for each object in the graph, and this can lead to a major performance issue. The JPA spec should probably define persist to only apply to new objects, not already part of the persistence context, but it requires persist apply to all objects, whether new, existing, or already persisted, so can have this issue.

A second issue is that if you remove an object to have it deleted, if you then call persist on the object, it will resurrect the object, and it will become persistent again. This may be desired if it is intentional, but the JPA spec also requires this behavior for cascade persist. So if you remove an object, but forget to remove a reference to it from a cascade persist relationship, the remove will be ignored.

I would recommend only marking relationships that are composite or privately owned as cascade persist.

Merge[edit | edit source]

The EntityManager.merge() operation is used to merge the changes made to a detached object into the persistence context. merge does not directly update the object into the database, it merges the changes into the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be updated in the database.

Normally merge is not required, although it is frequently misused. To update an object you simply need to read it, then change its state through its set methods, then commit the transaction. The EntityManager will figure out everything that has been changed and update the database. merge is only required when you have a detached copy of a persistence object. A detached object is one that was read through a different EntityManager (or in a different transaction in a JEE managed EntityManager), or one that was cloned, or serialized. A common case is a stateless SessionBean where the object is read in one transaction, then updated in another transaction. Since the update is processed in a different transaction, with a different EntityManager, it must first be merged. The merge operation will look-up/find the managed object for the detached object, and copy each of the detached objects attributes that changed into the managed object, as well as cascading any related objects marked as cascade merge.

The merge operation can only be called within a transaction, an exception will be thrown outside of a transaction. The merge operation is not in-place, in that the object being merged will never become part of the persistence context. Any further changes must be made to the managed object returned by the merge, not the detached object.

merge is normally called on existing objects, but can also be called on new objects. If the object is new, a new copy of the object will be made and registered with the persistence context, the detached object will not be persisted itself.

merge can only be called on Entity objects, not on Embeddable objects, or collections, or non-persistent objects. Embeddable objects are automatically merged as part of their owning Entity.

Example merge[edit | edit source]

EntityManager em = createEntityManager();
Employee detached = em.find(Employee.class, id);
em.close();
...
em = createEntityManager();
em.getTransaction().begin();
Employee managed = em.merge(detached);
em.getTransaction().commit();

Cascading Merge[edit | edit source]

Calling merge on an object will also cascade the merge operation across any relationship that is marked as cascade merge. Even if the relationship is not cascade merge, the reference will still be merged. If the relationship is cascade merge the relationship and each related object will be merged. Intuitively you may consider marking every relationship as cascade merge to avoid having to worry about calling merge on every objects, but this is normally a bad idea.

One issue with marking all relationships cascade merge is performance. If you have an object with a lot of relationships, then each merge call can require to traverse a large graph of objects.

Another issues arises if your detached object is corrupt in some way. For example say you have an Employee who has a manager, but that manager has a different copy of the detached Employee object as its managedEmployee. This may cause the same object to be merged twice, or at least may not be consistent which object will be merged, so you may not get the changes you expect merged. The same is true if you didn't change an object at all, but some other user did, if merge cascades to this unchanged object, it will revert the other user's changes, or throw an OptimisticLockException (depending on your locking policy). This is normally not desirable.

I would recommend only marking relationships that are composite or privately owned as cascade merge.

Transient Variables[edit | edit source]

Another issue with merge is transient variables. Since merge is normally used with object serialization, if a relationship was marked as transient (Java transient, not JPA transient), then the detached object will contain null, and null will be merged into the object, even though it is not desired. This will occur even if the relationship was not cascade merge, as merge always merges the references to related objects. Normally transient is required when using serialization to avoid serializing the entire database when only a single, or small set of objects are required.

One solution is to avoid marking anything transient, and instead use LAZY relationships in JPA to limit what is serialized (lazy relationships that have not been accessed, will normally not be serialized). Another solution is to manually merge in your own code.

Some JPA providers provide extended merge operations, such as allowing a shallow merge or deep merge, or merging without merging references.

Remove[edit | edit source]

The EntityManager.remove() operation is used to delete an object from the database. remove does not directly delete the object from the database, it marks the object to be deleted in the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be deleted from the database.

The remove operation can only be called within a transaction, an exception will be thrown outside of a transaction. The remove operation must be called on a managed object, not on a detached object. Generally you must first find the object before removing it, although it is possible to call EntityManager.getReference() on the object's Id and call remove on the reference. Depending on how you JPA provider optimizes getReference and remove, it may not require reading the object from the database.

remove can only be called on Entity objects, not on Embeddable objects, or collections, or non-persistent objects. Embeddable objects are automatically removed as part of their owning Entity.

Example remove[edit | edit source]

EntityManager em = getEntityManager();
em.getTransaction().begin();
Employee employee = em.find(Employee.class, id);
em.remove(employee);
em.getTransaction().commit();

Cascading Remove[edit | edit source]

Calling remove on an object will also cascade the remove operation across any relationship that is marked as cascade remove.

Note that cascade remove only effects the remove call. If you have a relationship that is cascade remove, and remove an object from the collection, or dereference an object, it will not be removed. You must explicitly call remove on the object to have it deleted. Some JPA providers provide an extension to provide this behavior, and in JPA 2.0 there will be an orphanRemoval option on OneToMany and OneToOne mappings to provide this.

Reincarnation[edit | edit source]

Normally an object that has been removed, stays removed, but in some cases you may need to bring the object back to life. This normally occurs with natural ids, not generated ones, where a new object would always get an new id. Generally the desire to reincarnate an object occurs from a bad object model design, normally the desire to change the class type of an object (which cannot be done in Java, so a new object must be created). Normally the best solution is to change your object model to have your object hold a type object which defines its type, instead of using inheritance. But sometimes reincarnation is desirable.

When done in two separate transactions, this is normally fine, first you remove the object, then you persist it back. This can be more complex if you wish to remove and persist an object with the same Id in the same transaction. If you call remove on an object, then call persist on the same object, it will simply no longer be removed. If you call remove on an object, then call persist on a different object with the same Id the behavior may depend on your JPA provider, and probably will not work. If you call flush after calling remove, then call persist, then the object should be successfully reincarnated. Note that it will be a different row, the existing row will have been deleted, and a new row inserted. If you wish the same row to be updated, you may need to resort to using a native SQL update query.

Advanced[edit | edit source]

Refresh[edit | edit source]

The EntityManager.refresh() operation is used to refresh an object's state from the database. This will revert any non-flushed changes made in the current transaction to the object, and refresh its state to what is currently defined on the database. If a flush has occurred, it will refresh to what was flushed. Refresh must be called on a managed object, so you may first need to find the object with the active EntityManager if you have a non-managed instance.

Refresh will cascade to any relationships marked cascade refresh, although it may be done lazily depending on your fetch type, so you may need to access the relationship to trigger the refresh. refresh can only be called on Entity objects, not on Embeddable objects, or collections, or non-persistent objects. Embeddable objects are automatically refreshed as part of their owning Entity.

Refresh can be used to revert changes, or if your JPA provider supports caching, it can be used to refresh stale cached data. Sometimes it is desirable to have a Query or find operation refresh the results. Unfortunately JPA 1.0 does not define how this can be done. Some JPA providers offer query hints to allow refreshing to be enabled on a query.

TopLink / EclipseLink : Define a query hint "eclipselink.refresh" to allow refreshing to be enabled on a query.

JPA 2.0 defines a set of standard query hints for refeshing, see JPA 2.0 Cache APIs.

Example refresh[edit | edit source]

EntityManager em = getEntityManager();
em.refresh(employee);

Lock[edit | edit source]

See, Read and Write Locking.

Get Reference[edit | edit source]

The EntityManager.getReference() operation is used to obtain a handle to an object without requiring it to be loaded. It is similar to the find operation, but may return a proxy or unfetched object. JPA does not require that getReference avoid loading the object, so some JPA providers may not support it and just perform a normal find operation. The object returned by getReference should appear to be a normal object, if you access any method or attribute other than its Id it will trigger itself to be refreshed from the database.

The intention of getReference is that it could be used on an insert or update operation as a stand-in for a related object, if you only have its Id and want to avoid loading the object. Note that getReference does not verify the existence of the object as find does. If the object does not exist and you try to use the unfetched object in an insert or update you may get a foreign key constraint violation, or if you access the object it may trigger an exception.

Example getReference[edit | edit source]

EntityManager em = getEntityManager();
Employee manager = em.getReference(Employee.class, managerId);
Employee employee = new Employee();
...
em.persist(employee);
employee.setManager(manager);
em.commit();

Flush[edit | edit source]

The EntityManager.flush() operation can be used to write all changes to the database before the transaction is committed. By default JPA does not normally write changes to the database until the transaction is committed. This is normally desirable as it avoids database access, resources and locks until required. It also allows database writes to be ordered, and batched for optimal database access, and to maintain integrity constraints and avoid deadlocks. This means that when you call persist, merge, or remove the database DML INSERT, UPDATE, DELETE is not executed, until commit, or until a flush is triggered.

The flush() does not execute the actual commit: the commit still happens when an explicit commit() is requested in case of resource local transactions, or when a container managed (JTA) transaction completes.

Flush has several usages:

  • Flush changes before a query execution to enable the query to return new objects and changes made in the persistence unit.
  • Insert persisted objects to ensure their Ids are assigned and accessible to the application if using IDENTITY sequencing.
  • Write all changes to the database to allow error handling of any database errors (useful when using JTA or SessionBeans).
  • To flush and clear a batch for batch processing in a single transaction.
  • Avoid constraint errors, or reincarnate an object.

Example flush[edit | edit source]

public long createOrder(Order order) throws ACMEException {
  EntityManager em = getEntityManager();
  em.persist(order);
  try {
    em.flush();
  } catch (PersistenceException exception) {
    throw new ACMEException(exception);
  }
  return order.getId();
}

Clear[edit | edit source]

The EntityManager.clear() operation can be used to clear the persistence context. This will clear all objects read, changed, persisted, or removed from the current EntityManager or transaction. Changes that have already been written to the database through flush, or any changes made to the database will not be cleared. Any object that was read or persisted through the EntityManager is detached, meaning any changes made to it will not be tracked, and it should no longer be used unless merged into the new persistence context.

clear can be used similar to a rollback to abandon changes and restart a persistence context. If a transaction commit fails, or a rollback is performed the persistence context will automatically be cleared.

clear is similar to closing the EntityManager and creating a new one, the main difference being that clear can be called while a transaction is in progress. clear can also be used to free the objects and memory consumed by the EntityManager. It is important to note that an EntityManager is responsible for tracking and managing all objects read within its persistence context. In an application managed EntityManager this includes every objects read since the EntityManager was created, including every transaction the EntityManager was used for. If a long lived EntityManager is used, this is an intrinsic memory leak, so calling clear or closing the EntityManager and creating a new one is an important application design consideration. For JTA managed EntityManagers the persistence context is automatically cleared across each JTA transaction boundary.

Clearing is also important on large batch jobs, even if they occur in a single transaction. The batch job can be slit into smaller batches within the same transaction and clear can be called in between each batch to avoid the persistence context from getting too big.

Example clear[edit | edit source]

public void processAllOpenOrders() {
  EntityManager em = getEntityManager();
  List<Long> openOrderIds = em.createQuery("SELECT o.id from Order o where o.isOpen = true");
  em.getTransaction().begin();
  try {
    for (int batch = 0; batch < openOrderIds.size(); batch += 100) {
      for (int index = 0; index < 100 && (batch + index) < openOrderIds.size(); index++) {
        Long id = openOrderIds.get(batch + index);
        Order order = em.find(Order.class, id);
        order.process(em);
      }
      em.flush();
      em.clear();
    }
    em.getTransaction().commit();
  } catch (RuntimeException error) {
    if (em.getTransaction().isActive()) {
      em.getTransaction().rollback();
    }
  }
}

Close[edit | edit source]

The EntityManager.close() operation is used to release an application managed EntityManager's resources. JEE JTA managed EntityManagers cannot be closed, as they are managed by the JTA transaction and JEE server.

The life-cycle of an EntityManager can last either a transaction, request, or a users session. Typically the life-cycle is per request, and the EntityManager is closed at the end of the request. The objects obtained from an EntityManager become detached when the EntityManager is closed, and any LAZY relationships may no longer be accessible if they were not accessed before the EntityManager was closed. Some JPA providers allow LAZY relationships to be accessed after close.

Example close[edit | edit source]

public Order findOrder(long id) {
  EntityManager em = factory.createEntityManager();
  Order order = em.find(Order.class, id);
  order.getOrderLines().size();
  em.close();
  return order;
}

Get Delegate[edit | edit source]

The EntityManager.getDelegate() operation is used to access the JPA provider's EntityManager implementation class in a JEE managed EntityManager. A JEE managed EntityManager will be wrapped by a proxy EntityManager by the JEE server that forwards requests to the EntityManager active for the current JTA transaction. If a JPA provider specific API is desired the getDelegate() API allows the JPA implementation to be accessed to call the API.

In JEE a managed EntityManager will typically create a new EntityManager per JTA transaction. Also the behavior is somewhat undefined outside of a JTA transaction context. Outside a JTA transaction context, a JEE managed EntityManager may create a new EntityManager per method, so getDelegate() may return a temporary EntityManager or even null. Another way to access the JPA implementation is through the EntityManagerFactory, which is typically not wrapped with a proxy, but may be in some servers.

In JPA 2.0 the getDelegate() API has been replaced by the unwrap() API which is more generic.

Example getDelegate[edit | edit source]

public void clearCache() {
  EntityManager em = getEntityManager();
  ((JpaEntityManager)em.getDelegate()).getServerSession().getIdentityMapAccessor().initializeAllIdentityMaps();
}

Unwrap (JPA 2.0)[edit | edit source]

The EntityManager.unwrap() operation is used to access the JPA provider's EntityManager implementation class in a JEE managed EntityManager. A JEE managed EntityManager will be wrapped by a proxy EntityManager by the JEE server that forwards requests to the EntityManager active for the current JTA transaction. If a JPA provider specific API is desired the unwrap() API allows the JPA implementation to be accessed to call the API.

In JEE a managed EntityManager will typically create a new EntityManager per JTA transaction. Also the behavior is somewhat undefined outside of a JTA transaction context. Outside a JTA transaction context, a JEE managed EntityManager may create a new EntityManager per method, so getDelegate() may return a temporary EntityManager or even null. Another way to access the JPA implementation is through the EntityManagerFactory, which is typically not wrapped with a proxy, but may be in some servers.

Example unwrap[edit | edit source]

public void clearCache() {
  EntityManager em = getEntityManager();
  em.unwrap(JpaEntityManager.class).getServerSession().getIdentityMapAccessor().initializeAllIdentityMaps();
}