Java 8 and JSR 335: Streams (2/2)

This post is the second of a serie of two posts of Lambda Expressions and Streams. Take a look in the first part of the post clicking here

Introduction

Today I'll write a different kind of post. This post is based on my workshop dedicated to the Java 8 Lambda Expressions and Streams feature (specified in the JSR 335).

The point of this post is to be a summary of the content of the workshop where everyone who attended the workshop, a reference guide. For those who didn't have the change to attend I recommend you to take a look at the presentation (available as SpeakerDeck) and read this post following the presentation. The source code used in the presentation is available at GitHub.

It would be awesome to share some thoughts about these new features. I believe that this is the emerging of a new pattern of programming and in the next few years we'll be seeing a lot of new use cases to these features.

This is the second part of the post. If you didn't take a look at the first part click here!

So let's get started!

Streams

Have you ever heard about Streams? Streams can be describes as a special data structure that let you execute operations over a collection of elements. It allows you to sort, filter, combine and even transform the data as you like.

Furthermore, streams have another very useful benefit. Depending on the type of operation you'll perform over the data, streams can execute those operations in parallel, all behind the scenes and without you having to worry about Executors or anything.

Sometimes stream manipulation works like the manipulation of Collections. Let's take a look on an example of iterating over a list of Strings and printing them in the console:

List<String> names = Arrays.asList("Foo", "Bar", "Gee");

// Iterating in the traditional way
for (String name : names) {
    System.out.println(name);
}

// Iterating using the stream
names.stream().forEach(System.out::println);

Did you notice the diference? I created a stream using names.stream() and then "told" the stream that I wanted to execute a System.out::println (remember method reference?) on all its elements.

We call the first way of programming of imperative programming because you need to tell the program line by line what to do and how to handle the iteration.

The second way we call declarative programming. You declare the stream, configures it to do whatever you want and then triggers the execution. No operations are executed while you don't trigger the execution, that's what we call lazy evaluation.

So let's start taking a deeper look onto streams and how to use it.

Stream Structure and Lifecycle

Every stream has three parts:

  • Source
  • Intermediate operations
  • Terminal

The source of the stream is from where the stream pulls its objects which it will iterate through. In the above example our source was a Collection, but it could be an Array or even an I/O resource as a File.

The Intermediate operations of a stream are the operations that will be executed in the elements of the streams. It could be a filter, an ordering, a mapping, etc. In the above example our pipeline had one operation to print all the elements.

The terminal of a stream is an special operation that has the purpose to end the processing of the stream. It can be a reduction to return one single result or even a search that return only the objects that matched some condition.

The lifecycle of a stream has 4 phases:

  • Creation
  • Configuration
  • Execution
  • Cleanup

The creation is where we use the source of the stream to create it.

The configuration is the configuration of all the intermediate operations.

The execution is the invocation of the terminal operation that pulls the elements of the stream down the pipeline and collect the result.

Stream cannot be reused. The cleanup phase takes care of some implementation details (all behind the scenes).

Intermediate Operations

Intermediate operations are the operations that will be applied over the elements of the stream. There are two types of intermediate operations: stateless and stateful.

Stateless operations do not need to know the history of results from the previous steps in the pipeline, neither keep track of how many results it have produced or seen. A good example of stateless operation is the filter(..) operation. It just need to look at the current element and respond if it should be filtered or not.

In the other hand stateful operations do need to know the history of results produced in the previous steps and needs to keep track of how many results it has produced or seen. Some examples of stateful operations are: distinct(), limit(i), sorted(..), etc.

I'm not putting a lot of source examples of streams here because there are too many ways of using streams that it won't scratch the surface on all the possibilities. You should take a look in the source code that contains more examples of intermediate operations.

Terminal Operations

Now let's take a look into some terminal operations. As we sad those are the operations responsible for triggering the pipeline execution and the collecting the results. We can separate terminal operations in three types:

  • Reduction
  • Mutable reduction
  • Search

A reduction operation is an operation that returns just a single result. For example, given a list of Integers integers, the stream execution integers.stream().count() will return the number of elements in the stream. Other common example is getting the min or max of the elements of the stream.

A mutable reduction operation* is an operation that return multiple results in a container data structure, as a Collection. Take a look in the bellow example, it converts a List<Integer> to a Set<Integer>:

List<Integer> integers = Arrays.asList(1, 2, 3, 4, 5);

Set<Integer> s = integers.stream().collect(Collectors.toSet());

Another common usage of the mutable reduction is converting Collections to Map and vice versa.

There is more examples of the usage of terminal operations in the source code.

Parallelism

One details about streams is that sometimes there are some operations that can be executed in parallel. All behind the scenes and without user intervention. Which operations can be executed in parallel and which can't is a decision of the implementation and the specification doesn't give a lot of details on it.

The only difference on using a parallel stream is on the creating phase. Look at the below example:

List<Integer> integers = Arrays.asList(1, 2, 3, 4, 5);

// creates a stream
Set<Integer> s1 = integers.stream().collect(Collectors.toSet());

// creates a parallel stream
Set<Integer> s2 = integers.parallelStream().collect(Collectors.toSet());

Conclusion

We saw that streams can really help us reducing the verbosity of Java and actually help us on performance with parallelism. As I said in the introduction the Lambda Expressions and Stream API JSR is one of the most exciting changes in the Java language in the year. I believe that we still have a lot of study to do to achieve good usage patterns and learning how to apply these features in a good way. I suggest you to try out these features and discuss it with other developers. If you have anything to add (or if I made any mistake in this post) feel free to put it in the commentary section.

See you!

References