Java Persistence/Caching
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Caching
Caching is the most important performance optimization technique. There are many things that can be cached in persistence, objects, data, database connections, database statements, query results, meta-data, relationships, to name a few. Caching in object persistence normally refers to the caching of objects or their data. Caching also influences object identity, that is that if you read an object, then read the same object again you should get the identical object back (same reference).
JPA does not define a server object cache, JPA providers can support a server object cache or not, however most do. Caching in JPA is required with-in a transaction or within an extended persistence context to preserve object identity, but JPA does not require that caching be supported across transactions or persistence contexts.
There are two types of object caching. You can cache the objects themselves including all of their structure and relationships, or you can cache their database row data. Both provide a benefit, however just caching the row data is missing a huge part of the caching benefit as the retrieval of each relationship typically involves a database query, and the bulk of the cost of reading an object is spent in retrieving its relationships.
[edit] Object Identity
Object identity in Java means if two variables (x, y) refer to the same logical object, then x == y returns true. Meaning that both reference the same thing (both a pointer to the same memory location).
In JPA object identity is maintained within a transaction, and (normally) within the same EntityManager. The exception is in a JEE managed EntityManager, object identity is only maintained inside of a transaction.
So the following is true in JPA:
Employee employee1 = entityManager.find(Employee.class, 123); Employee employee2 = entityManager.find(Employee.class, 123); assert (employee1 == employee2);
This holds true no matter how the object is accessed:
Employee employee1 = entityManager.find(Employee.class, 123); Employee employee2 = employee1.getManagedEmployees().get(0).getManager(); assert (employee1 == employee2);
In JPA object identity is not maintained across EntityManagers. Each EntityManager maintains its own persistence context, and its own transactional state of its objects.
So the following is true in JPA:
EntityManager entityManager1 = factory.createEntityManager(); EntityManager entityManager2 = factory.createEntityManager(); Employee employee1 = entityManager1.find(Employee.class, 123); Employee employee2 = entityManager2.find(Employee.class, 123); assert (employee1 != employee2);
Object identity is normally a good thing, as it avoids having your application manage multiple copies of objects, and avoids the application changing one copy, but not the other. The reason different EntityManagers or transactions (in JEE) don't maintain object identity is that each transaction must isolate its changes from other users of the system. This is also normally a good thing, however it does require the application to be aware of copies, detached objects and merging.
Some JPA products may have a concept of read-only objects, in which object identity may be maintained across EntityManagers through a shared object cache.
[edit] Object Cache
An object cache, is where the Java objects (entities) are cache themselves. The advantage of an object cache, is that the data is cached in the same format that it is used in Java. Everything is stored at the object-level and no conversion is required when obtaining a cache hit. With JPA the EntityManager must still copy the objects to and from the cache, as it must maintain its transaction isolation, but that is all that is required. The objects do not need to be re-built, and the relationships are already available.
With an object cache, transient data may also be cached. This may occur automatically, or may require some effort. If transient data is not desired, you may also need to clear the data when the object gets cached.
Some JPA products allow read-only queries to access the object cache directly. Some products only allow object caching of read-only data. Obtaining a cache hit on read-only data is extremely efficient as the object does not need to be copied, other than the look-up, no work is required.
It is possible to create your own object cache for your read-only data by loading objects from JPA into your own object cache or JCache implementation. The main issue, which is always the main issue in caching in general, is how to handle updates and stale cached data, but if the data is read-only, this may not be an issue.
[edit] Data Cache
A data cache, caches the object's data, not the objects themselves. The data is normally a representation of the object's database row. The advantage of a data cache is that it is easier to implement as you do not have to worry about relationships, object identity, or complex memory management. The disadvantage of a data cache is that it does not store the data as it is used in the application, and does not store relationships. This means that on a cache hit, the object must still be built from the data, and the relationships fetched from the database. Some products that support a data cache, also support a relationship cache, or query cache to allow caching of relationships.
[edit] Caching Relationships
Some products support a separate cache for caching relationships. This is normally required for OneToMany and ManyToMany relationships. OneToOne and ManyToOne relationships normally do not need to be cached, as they reference the object's Id. However an inverse OneToOne will require the relationship to be cached, as it references the foreign key, not primary key.
For a relationship cache, the results normally only store the related object's Id, not the object, or its data (to avoid duplicate and stale data). The key of the relationship cache is the source object's Id and the relationship name. Sometimes the relationship is cached as part of the data cache, if the data cache stores a structure instead of a database row. When a cache hit occurs on a relationship, the related objects are looked up in the data cache one by one. A potential issue with this, is that if the related object is not in the data cache, it will need to be selected from the database. This could result in very poor database performance as the objects can be loaded one by one. Some product that support caching relationships also support batching the selects to attempt to alleviate this issue.
[edit] Cache Types
There are many different caching types. The most common is a LRU cache, or one that ejects the Least Recently Used objects and maintains a fixed size number of MRU (Most Recently Used) objects.
Some cache types include:
- LRU - Keeps X number of recently used objects in the cache.
- Full - Caches everything read, forever. (not always the best idea if the database is large)
- Soft - Uses Java garbage collection hints to release objects from the cache when memory is low.
- Weak - Normally relevant with object caches, keeps any objects currently in use in the cache.
- L1 - This refers to the transactional cache that is part of every
EntityManager, this is not a shared cache. - L2 - This is a shared cache, conceptually stored in the
EntityManagerFactory, so accessible to allEntityManagers. - Data - See Data Cache
- Object - See Object Cache
- Relationship - See Caching Relationships
- Read-only - A cache that only stores, or only allows read-only objects.
- Read-write - A cache that can handle insert, updates and deletes (non read-only).
- Transactional - A cache that can handle insert, updates and deletes (non read-only), and obeys transactional ACID properties.
- Clustered - Typically refers to a cache that uses JMS, JGroups or some other mechanism to broadcast invalidation messages to other servers in the cluster when an object is updated or deleted.
- Replicated - Typically refers to a cache that uses JMS, JGroups or some other mechanism to broadcast objects to all servers when read into any of the servers cache.
- Distributed - Typically refers to a cache that spreads the cached objects across several servers in a cluster, and can look-up an object in another server's cache.
[edit] Query Cache
A query cache caches query results instead of objects. Object caches cache the object by its Id, so are generally not very useful for queries that are not by Id. Some object caches support secondary indexes, but even indexed caches are not very useful for queries that can return multiple objects, as you always need to access the database to ensure you have all of the objects. This is where query caches are useful, instead of storing objects by Id, the query results are cached. The cache key is based on the query name and parameters. So if you have a NamedQuery that is commonly executed, you can cache its results, and only need to execute the query the first time.
The main issue with query caches, as with caching in general is stale data. Query caches normally interact with an object cache to ensure the objects are at least as up to date as in the object cache. Query caches also typically have invalidation options similar to object caches.