Giter Club home page Giter Club logo

simplifying-apache-geode-with-spring-data's Introduction

Simplifying Apache Geode with Spring Data

Source code and examples for Simplifying Apache Geode with Spring Data at SpringOne Platform 2017.

This example demonstrates the use of Spring Boot and Spring Data Geode to build a simple Apache Geode client application to persist and query Customer data.

The purpose of this example is to demonstrate how easy and quick it can be to get started building a simple Spring application using Apache Geode, and scale it up.

No other data store, not Redis nor Hazelcast, is as simple to setup or use given the power of Spring Data Geode (SDG), despite there being Spring Data modules for both Redis and Hazelcast and despite Spring Boot providing auto-configuration support for (again) both Redis and Hazelcast.

Of course, Redis is a very popular and powerful data store that compliments rather than competes directly with Apache Geode, unlike Hazelcast. Apache Geode is a good replacement for Hazelcast when your application needs more reliable, performant and consistent data access at cloud-scale.

NOTE: Apache Geode is the open source core of Pivotal GemFire.

This step-by-step tutorial guides you through the application and how it works.

Tutorial

Prerequisites

This guide assumes a basic understanding of Apache Geode, IMDG / No-SQL, and Spring Data concepts.

To learn more about Apache Geode's, see the User Guide.

To learn more about Spring Data and in particular, Spring Data Geode (SDG), follow the links.

There are also several examples on using Spring Data GemFire/Geode in the Guides at spring.io.

Problem

Suppose we need to create a customer service application that stores customer data and allows the user to search for customers by name.

Prototyping the customer service application

Customer class

First, we need to define an application domain object to encapsulate customer information.

@Region("Customers")
class Customer {

  Long id;

  @Indexed(from = "/Customers")
  String name;

}

TIP: The actual Customer class definition uses the highly convenient Project Lombok framework to simplify the definition of our Customer class.

@Region is a Spring Data Geode (SDG) mapping annotation that identifies the Apache Geode Region where instances of Customer will be stored. If a name (e.g. "Customers") is not explicitly provided in the @Region mapping annotation, then the simple name of the class (i.e. "Customer") is used to identify the Region where customers are persisted.

NOTE: SDG provides several _Region type-specific mapping annotations giving the developer full control over her data management policy.

SDG is intelligent enough to identify the id field as the identifier for individual customers without explicitly having to annotate the identifier field or property with Spring Data's @Id annotation.

NOTE: There is no auto-generated identifier capability provided by Apache Geode nor SDG. You must set the ID before saving instances of Customer to the "Customers" Region.

Finally, you will notice that Customer name field is annotated with @Indexed. This enables the creation of an OQL based Index on the customer's name, thereby improving query by name performance. More on this later.

CustomerRepository interface

Next, we need to define a Repository implementation to store and query Customer objects in Apache Geode.

interface CustomerRepository extends CrudRepository<Customer, Long> {
  ...
}

Our CustomerRepository extends Spring Data's CrudRepository interface, which provides basic CRUD and simply Query operations backed by SDG's implementation for Apache Geode. Refer to the Javadoc link above for more details on which data access operations are provided by o.s.d.repository.CrudRepository out-of-the-box.

Of course, you define additional (OQL-based) queries simply by defining "query" methods in the CustomerRepository interface and following certain conventions.

Spring Boot Application class

Now, we just need to define a Spring Boot application class to get configure everything and run our application.

@SpringBootApplication
@ClientCacheApplication
@EnableEntityDefinedRegions(basePackageClasses = Customer.class, clientRegionShortcut = ClientRegionShortcut.LOCAL)
@EnableGemfireRepositories(basePackageClasses = CustomerRepository.class)
@EnableIndexing
public class SpringBootApacheGeodeClientApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringBootApacheGeodeClientApplication.class, args);
    }

    ...
}

This application is not very interesting at the moment since the application is not performing any data access operations using Apache Geode.

For this, we will add a simple, non-interactive interface using Spring Boot's ApplicationRunner interface, defined as a bean in the Spring context, which will perform a few data access operations on the "Customers" Region using our CustomerRepository that we defined above.

@Bean
ApplicationRunner runner(CustomerRepository customerRepository) {

    return args -> {

        assertThat(customerRepository.count()).isEqualTo(0);

        Customer jonDoe = Customer.newCustomer(1L, "Jon Doe");

        System.err.printf("Saving Customer [%s]...%n", jonDoe);

        jonDoe = customerRepository.save(jonDoe);

        assertCustomer(jonDoe, 1L, "Jon Doe");
        assertThat(customerRepository.count()).isEqualTo(1);

        System.err.println("Querying for Customer [SELECT * FROM /Customers WHERE name LIKE '%Doe']...");

        Customer queriedJonDoe = customerRepository.findByNameLike("%Doe");

        assertThat(queriedJonDoe).isEqualTo(jonDoe);

        System.err.printf("Customer was [%s]%n", queriedJonDoe);
    };
}

The application is ready to run!

But wait! What about configuring Apache Geode?

Well, you may have already noticed, but that was handled by 3 Spring Data Geode configuration annotations.

First is the @ClientCacheApplication, which defines both an Apache Geode ClientCache instance as well as a "DEFAULT" Pool. The Pool is used to connect to a (cluster of) server(s) in a client/server topology. We'll see how this works further below.

Next, is @EnableEntityDefinedRegions. This annotation functions much client JPA entity-scan provided in Spring Boot along with Hibernates auto-schema generation, but without a separate tool!

Additionally, I have set the client Region data management policy to LOCAL, using Apache Geode's ClientRegionShortcut. All this means is that data will be stored locally, with the client application (at least for now).

NOTE: Apache Geode stores all data in a cache into Regions. You can think of Apache Geode Regions as tables in a relational database. They hold your applications data (or state). However, the data that is stored is objects themselves as opposed to using a relational data model. There are both advantages and disadvantages to using object stores, but it is the nature of most key/value stores (or Map-like (distributed) data structures).

I have also used the type-safe basePackageClasses attribute to specify the package containing the entities to scan and for which client Regions will be created. The provided class definition (i.e. Customer.class) is just used to identify the starting package of the entity-scan. All class types contained in the package and all sub-packages will be searched during the scan.

Then, I have declared the @EnableGemfireRepositories annotation to enable the Spring Data (Geode) Repository infrastructure, thereby allowing a developer to create Data Access Objects (DAO) based on only an interface definition.

Finally, I have declared the use of @EnableIndexing to automatically define an Apache Geode OQL Index on the customer's name without having to explicity declare a Index bean definition or use a tool to define an Apache Geode Index on our "Customers" Region.

Now, let's run the application!

Running The Application

When you run the SpringBootApacheGeodeClientApplication you will see Spring Boot 1.5.8.RELEASE startup, Apache Geode log output during the startup sequence, and our application print out some results.

Saving Customer [Customer(id=1, name=Jon Doe)]...
Querying for Customer [SELECT * FROM /Customers WHERE name LIKE '%Doe']...
Customer was [Customer(id=1, name=Jon Doe)]

That was easy!

Next Steps

We were able to store a Customer (i.e. "Jon Doe") and retrieve this Customer by name using the application's CustomerRepository, findByNameLike(:String):Customer (OQL-based) query method.

But, you might be thinking, "So what!" Any data store with a sufficiently robust framework (i.e. Spring Data) can do that.

Also, if this application crashes, then I will lose all my data since this application is not "durable". And, even if this application were persistent, I don't want my data kept locally since I am not really leveraging the full power of Apache Geode, as an In-Memory Data Grid (IMDG), capable of distributing data across a cluster of nodes in a replicated, highly available (i.e. redundant) and partitioned manner while preserving strong consistency and performance (i.e. read/write throughput and latency) guarantees (no less).

Well, that is simple to do to, :)

You have several options for configuring a (cluster of) server(s). The most obvious option is to use Gfsh, Apache Geode's Shell tool.

HINT: Gfsh is equivalent to sqlplus for all you Oracle users.

The following is an example Gfsh shell script to bootstrap a small, yet simple Apache Geode cluster consisting of 2 members, 1 of which is a server that will store our application's data.

start locator --name=LocatorOne --log-level=config
start server --name=ServerOne --log-level=config
list members
describe member --name=LocatorOne
describe member --name=ServerOne
create region --name=Customers --type=PARTITION --skip-if-exists
list regions
describe region --name=/Customers
create index --name=CustomerNameHashIdx --expression="name" --region="/Customers" --type=hash
list indexes

NOTE: this Gfsh shell script, along with several other shell scripts, are provided in ${project.home}/etc and can be executed in Gfsh using... gfsh> run --file=/absolute/file/system/path/to/${project.home}/etc/<script-name> Be sure to replace with the name and extension of the Gfsh shell script, e.g.start-cluster.gfsh`.

In order to leverage Apache Geode's client/server topology with minimal changes to our application, simply remove the clientRegionShortcut attribute declaration from the @EnableEntityDefinedRegions annotation. Now, the client application will expect to send all ("/Customers") Region data access operations to a server.

NOTE: the default client Region data management policy is PROXY, which stores no local state and sends all data operations to the server.

TIP: It is also possible to store data locally, on the client, (using CACHING_PROXY, referred to as a "near-cache") as well as on the server, and for the data to be kept in-sync. The client will receive any updates for which it has registered interests, and can even be configured with "durability" so that it also receives update events that it may have missed while off-line, the next time the client reconnects to the cluster.

Remember our "DEFAULT" Pool. Well, by default, Apache Geode creates this "DEFAULT" Pool to connect to a server running on "localhost", listening on port "40404". Of course, this Pool is highly configurable, with the ability to set min/max connections, Socket timeouts, retry attempts, as well as connection endpoints, and many other configuration settings.

TIP: It is recommended that clients connect to 1 or more Locators in the cluster, which allow a client to seamlessly fail over, load balance, single-hop and route data requests to the appropriate servers containing the data of interests.

With 1 simple change, your application is now fully client/server capable.

Final Steps

While it is recommended that you use Gfsh to script your production configuration deployments, Gfsh is not really a "developer" tool and therefore, is not as convenient as using your IDE, especially during rapid application development (DEV-TEST) and prototyping purposes.

Imagine if you have an application with 100s of data Regions. It is not uncommon to have 100 or even 1000s of tables in a relational database.

While you only need to create a Gfsh shell script once, you still need to handle the client configuration.

This is where Spring Data Geode separates itself from the pack, and has a powerful advantage, even over other Spring Data modules and even though some of those SD modules have robust auto-configuration support in Spring Boot.

This all pales in comparison to SDG auto-configuration capabilities, which were specifically designed for Apache Geode, above and beyond what Spring Boot can or even should provide, which is very data store specific.

NOTE: In fact, Spring Boot is not providing any Apache Geode specific functionality; i.e. no extra magic, no auto-configuration. It is all Spring Data Geode! There really is no advantage to using Spring Boot in this use case. However, it is convenient for bootstrapping the Spring container along with giving us the ability to extend our application to be Web-based, or integrate with countless other technologies (e.g. another data store perhaps, like Redis, using Spring Data Redis, or maybe a Message bus, our to take advantage of Cloud-Native design patterns (e.g. microservices) using Spring Cloud, and so on an so forth. Spring Boot is truly wonderful and magical.

So, what if we want to avoid doing double the work... creating both Gfsh shell scripts to create Regions, Indexes, etc, etc and creating the matching configuration on the client, at least during development-time, while we are prototyping?

What if we could do all the work from the (client) application?

Well, you can! Your application domain types tells us everything your application needs already. Why should you have to repeat that? You've already done the work by defining your domain types, Repositories, and service classes.

Any framework worth its weight in salt should do the heavy-lifting, the "plumbing" for you!

Well, now Spring Data Geode can!

Simply add the @EnableClusterConfiguration annotation to your Spring Boot application class and you are nearly there!

...
@EnableClusterConfiguration(useHttp = true)
public class SpringBootApacheGeodeClientApplication {

Wait! What? "Nearly"!?

Well, guess what? Anytime you send data over-the-wire, your data, your application domain object types, need to be "serializable".

"_Oh, for sakes! Now you tell me! Now, I have to go and change all my domain types (all 1 of them ;-)!"

No worries! Most of the time you would think that all your application domain object classes need to implement java.io.Serializable, huh? Then you have to worry about potentially setting a serialVersionId if the types changes, blah, blah, blah. Oh, for crying out loud Java! What a PITA! Forget that non-sense.

Fortunately, for you, Apache Geode has you covered and provides its own serialization capabilities, which are far more robust (and in certain cases) "portable" than Java Serialization.

See the Data Serialization chapter in Apache Geode's User Guide for more details.

However, unless you want to read a whole chapter on Apache Geode's data serialization features, just annotate your Spring Boot application class with @EnablePdx annotation, and you are all set.

...
@EnablePdx
@EnableClusterConfiguration(useHttp = true)
public class SpringBootApacheGeodeClientApplication {

One advantage of Apache Geode's PDX (Portable Data EXchange) serialization framework is that Apache Geode is able to query data in PDX serialized form without having to deserialize the data first. It is also capable fo adding/remove fields without affecting older clients using older class definitions of those types. It has many other advantages as well.

So, what just happened here?

The @EnableClusterConfiguration annotation enables our Spring Boot client application to push the schema object definitions (both Regions and Indexes) to the server.

However, to do so, you do need a full installation of Apache Geode running on the server.

NOTE: You are not required to have full installation of Apache Geode to run servers. For instance, you can created Spring Boot based Apache Geode server applications as well that only include the necessary JAR files. But, certain features are not available in this arrangement.

SDG is very careful not to stomp on your existing server cluster configuration. That is, if a Region is already defined in the cluster, then SDG will not attempt to create the Region, hopefully for obvious reasons.

NOTE: there is not option to drop and recreate a Region at present. You must remove a Region manually, using Gfsh. You can do this by using gfsh> destroy region --name=/Customers.

Not only does @EnableClusterConfiguration push the schema object definitions to the server(s) in the cluster, but also does so in such a way that the schema changes are recorded and remembered.

Therefore, if you add a new server, it will have the same configuration, thereby allowing to easily and quickly scale up your system architecture.

Just start another server in Gfsh using...

gfsh> start server --name=ServerTwo --log-level=config --disable-default-server

You can keep adding servers to the cluster until your heart is content! They will all have the same configuration.

If your entire cluster goes down, rest assured that when your bring the servers back up, they will retain their configuration.

TIP: you may also change the server Region data management policy used on the servers(s) in the cluster when creating the Regions. By default, it the server Region data management policy is set to PARTITION. To change the data management policy (such as to make the server Regions persistent) then set the serverRegionShortcut attribute in the @EnableClusterConfiguration annotation, like so... @EnableClusterConfiguration(useHttp = true, serverRegionShortcut = RegionShortcut.PARTITION_PERSISTENT). Not only will the server retain their configuration, but now they will retain your data too!

To test our this scenario, I have provided another Gfsh shell script to create an "empty" cluster (i.e. cluster with a server having no Regions or Indexes). Simply run the etc/start-empty-cluster.gfsh file from Gfsh using...

gfsh> run --file=/absolute/file/system/path/to/${project.home}/etc/start-empty-cluster.gfsh

Then, run the Spring Boot application, which should be currently defined as...

@SpringBootApplication
@ClientCacheApplication
@EnableEntityDefinedRegions(basePackageClasses = Customer.class, clientRegionShortcut = ClientRegionShortcut.LOCAL)
@EnableGemfireRepositories(basePackageClasses = CustomerRepository.class)
@EnableIndexing
@EnablePdx
@EnableClusterConfiguration(useHttp = true)
public class SpringBootApacheGeodeClientApplication {
  ...
}

With 6 simple configuration-based annotations...

  • @ClientCacheApplication
  • @EnableEntityDefinedRegions(..)
  • @EnableGemfireRepositories(..)
  • @EnableIndexing
  • @EnablePdx
  • @EnableClusterConfiguration

... SDG has unleashed a profound amount of power here, reducing much of the error-prone, boilerplate, highly redundant task of performing the required plumbing needed by your applications to even get started.

We went from 0 to a fully cluster-capable application in very few lines of code.

The net effect is a greatly simplified getting started experience, staying true to Spring's promise and commitment of developer productivity.

And, this is just the tip of the iceberg.

Where To Go From Here

Be on the lookup for more example tutorials like this, showcasing many different, even more complex use cases.

Minimally, be sure to read "Chapter 6 - Bootstrapping Apache Geode using Spring Annotations" in the Spring Data Geode Reference Documentation.

You can also use this repository (WIP), containing several other examples, as a reference on how to use other Aapche Geode features (e.g. Continuous Query) from Spring.

Conclusion

Thank you for reading this tutorial.

If you have questions or issues, please ask your questions on StackOverflow and file an Issue in GitHub.

You are also welcomed to contribute PRs to this example as well, if you see areas for improvement.

simplifying-apache-geode-with-spring-data's People

Contributors

jxblum avatar

Stargazers

Kris Rott avatar

Watchers

Kris Rott avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.