Introduction
Hadoop is synonymous with Big Data. As you may already know, Hadoop is built on JAVA, so also Cassandra NoSQL database. Java is the foundation of what Google Big Table is. Java is used everywhere, starting from sensors to enterprise applications. JAVA EE 7 is generally available since June 2013. Many enterprises and software companies are adopting it.
As you have seen in my blog series and elsewhere, big data is adoption is also on rise.
I have not seen a blog/article that explains how and what JAVA EE 7 features are useful for big data enterprise apps. This blog is an attempt in that regard. I will focus more on JAVA EE and then end with two use cases of JAVA EE in the context of of BIG DATA applications. Goal is to make a connection between best features of JAVA EE 7 and proven features of Hadoop framework.
JAVA EE 7 - Introduction
Java EE is a set of specifications implemented by different containers. Containers are Java EE runtime environments
that provide certain services to the components they host such as life-cycle management, dependency injection,
concurrency, and so on. These components use well-defined contracts to communicate with the Java EE
infrastructure and with the other components. They need to be packaged in a standard way (following a defined
directory structure that can be compressed into archive files) before being deployed. Java EE is a superset of the Java
SE platform, which means Java SE APIs can be used by any Java EE components. Java EE 7 consists of nearly 30 specifications and is an important milestone for the enterprise layer (CDI 1.1, Bean Validation 1.1, EJB 3.2, JPA 2.1), for the web tier (Servlet 3.1, JSF 2.2, Expression Language 3.0), and for interoperability (JAX-WS 2.3 and JAX-RS 2.0).
JAVA EE Components
The Java EE runtime environment defines four types of components that an implementation must support:
• Applets are GUI (graphic user interface) applications that are executed in a web browser. They use the rich Swing API to provide powerful user interfaces.
• Applications are programs that are executed on a client. They are typically GUIs or batch- processing programs that have access to all the facilities of the Java EE middle tier.
• Web applications (made of servlets, servlet filters, web event listeners, JSP and JSF pages) are executed in a web container and respond to HTTP requests from web clients. Servlets also support SOAP and RESTful web service endpoints. Web applications can also contain EJBs Lite.
• Enterprise applications (made of Enterprise Java Beans, Java Message Service, Java Transaction API, asynchronous calls, timer service, RMI/IIOP) are executed in an EJB container. EJBs are container-managed components for processing transactional business logic. They can be accessed locally and remotely through RMI (or HTTP for SOAP and RESTful web services).
JAVA EE Containers
The Java EE infrastructure is partitioned into logical domains called containers. Each container has a specific role, supports a set of APIs, and offers services to components (security, database access, transaction handling, naming directory, resource injection). Containers hide technical complexity and enhance portability. Depending on the kind of application you want to build, you will have to understand the capabilities and constraints of each container in order to use one or more. For example, if you need to develop a web application, you will develop a JSF tier with an EJB Lite tier and deploy them into a web container. But if you want a web application to invoke a business tier remotely and use messaging and asynchronous calls, you will need both the web and EJB containers.
• Applet containers are provided by most web browsers to execute applet components. When you develop applets, you can concentrate on the visual aspect of the application while the container gives you a secure environment. The applet container uses a sandbox security model where code executed in the “sandbox” is not allowed to “play outside the sandbox.” This means that the container prevents any code downloaded to your local computer from accessing local system resources, such as processes or files.
• Application client container (ACC) includes a set of Java classes, libraries, and other files required to bring injection, security management, and naming service to Java SE applications (swing, batch processing, or just a class with a main() method). The ACC communicates with the EJB container using RMI-IIOP and the web container with HTTP (e.g., for SOAP and REST web services).
• Web container provides the underlying services for managing and executing web components (servlets, EJBs Lite, JSPs, filters, listeners, JSF pages, and web services). It is responsible for instantiating, initializing, and invoking servlets and supporting the HTTP and HTTPS protocols. It is the container used to feed web pages to client browsers.
• EJB container is responsible for managing the execution of the enterprise beans (session beans and message-driven beans) containing the business logic tier of your Java EE application. It creates new instances of EJBs, manages their life cycle, and provides services such as transaction, security, concurrency, distribution, naming service, or the possibility to be invoked asynchronously.
Programming Model
Most of the Java EE 7 specifications use the same programming model. It’s usually a POJO ( plain old java object )with some metadata (annotations or XML) deployed into a container. Most of the time the POJO doesn’t even implement an interface or extend a superclass. Thanks to the metadata, the container knows which services to apply to this deployed component. In Java EE 7, servlets, JSF backing beans, EJBs, entities, SOAP and REST web services are annotated classes with optional XML deployment descriptors. Listing 1 shows a JSF backing bean that turns out to be a Java class with a single CDI annotation.
Listing 1. A JSF Backing Bean
@Named public class BookController {
@Inject private BookEJB bookEJB;
private Book book = new Book(); private List<Book> bookList = new ArrayList<Book>();
public String doCreateBook() { book = bookEJB.createBook(book); bookList = bookEJB.findBooks(); return "listBooks.xhtml"; }
// Getters, setters }
EJBs also follow the same model. As shown in Listing 2 , if you need to access an EJB locally, a simple annotated class with no interface is enough. EJBs can also be deployed directly in a war file without being previously packaged in a jar file. This makes EJBs the simplest transactional component that can be used from simple web applications to complex enterprise ones.
Listing 2. A Stateless EJB
@Stateless public class BookEJB {
@Inject private EntityManager em;
public Book findBookById(Long id) { return em.find(Book.class, id); }
public Book createBook(Book book) { em.persist(book); return book; } }
RESTful web services have been making their way into modern applications. Java EE 7 attends to the needs of enterprises by improving the JAX-RS specification. As shown in Listing 3 , a RESTful web service is an annotated Java class that responds to HTTP actions.
Listing 3 . A RESTful Web Service
@Path("books") public class BookResource {
@Inject private EntityManager em;
@GET @Produces({"application/xml", "application/json"}) public List<Book> getAllBooks() { Query query = em.createNamedQuery("findAllBooks"); List<Book> books = query.getResultList(); return books; } }
Java EE 7 broadens the use of annotations and enhances application portability with standard RESTful Web Services client support as shown above. This release also delivers improvements to Contexts and Dependency Injection (CDI), a Java standard for dependency-injection-based module configuration at runtime. It aims to reduce boiler-plate code using dependency injection and default resources as shown above. The new platform also updates the Java Message Service (JMS). Version 2.0 supports annotations and CDI Beans, reducing significantly the code required to send and receive messages.
BIG DATA and JAVA EE
Enterprise Java spec emphasizes simplification, productivity, and support for a number of web standards, including HTML5, Web Sockets, JSON, and a modern HTTP client API. WebSockets, which supports simultaneous two-way (full-duplex) communication channels over a TCP, reduces the response times of HTML5 apps. JSON (JavaScript Object Notation), the text-based standard for human-readable data interchange based on JavaScript, simplifies data parsing and exchange. Let us
focus on two use cases while there are so many we can consider.
DATA VALIDATION
Context and Dependency Injection has become a central and common specification across Java EE. It solves recurrent problems (injection, alternatives, stereotypes, producers . . .) that developers have in their day-to-day job. Validating data is also a common task that is spread across several, if not all, layers of today’s applications (from presentation to database). Because processing, storing, and retrieving valid data are crucial for an application, each layer defines validation rules its own way. Often the same validation logic is implemented in each layer, proving to be time-consuming, harder to maintain, and error prone.
To avoid duplication of these validations in each layer, developers often bundle validation logic directly into the domain model, cluttering domain classes with validation code that is, in fact, metadata about the class itself. Bean Validation solves the problem of code duplication and cluttering domain classes by allowing developers to write a constraint once, use it, and validate it in any layer. Bean Validation implements a constraint in plain Java code and then defines it by an annotation (metadata). This annotation can then be used on your bean, properties, constructors, method parameters, and return value. In a very elegant yet powerful way, Bean Validation exposes a simple API so that developers can write and reuse business logic constraints.
HADOOP enhanced by JAVA EE
Now that we can have valid data, we can focus on enterprise need, Thanks to JAVA EE architecture, you can develop web applications invoking services in a web container that in turn invokes logic contained in an EJB container, all in an enterprise application context wrapped in a .WAR file. JAVA EE 7's web services exposure using HTTP SOAP/REST has accelerated use of JAVA as first class language for Hadoop.
Conclusion
Now, JAVA APIs exist to write map reduce code. JAVA APIs also exist to write PIG UDFs. JAVA can be used to interface with Hbase. List goes on. It is now prudent to use proven JAVA APIs for Hadoop framework in context of new features of JAVA EE 7. That way, you can enhance Hadoop experience from client, server and a service point of view.