-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathTranscriptForJavaNio
More file actions
242 lines (164 loc) · 134 KB
/
Copy pathTranscriptForJavaNio
File metadata and controls
242 lines (164 loc) · 134 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
Course Overview
Course Overview
Hi, everyone, my name is Jose. I am very glad to present my new course about Java NIO and NIO2. I am talking to you from Paris, where I live and work as an assistant professor here at the university. I'm a Java Champion and Java Rockstar for Java (mumbles). Java Non-blocking IO or NIO is about adding features to Java IO for better performances. It provides the support for very large buffers or asynchronous capabilities. All those topics are precisely explained with real life use cases presented in live coding for both disk and network access. On the other hand, NIO2 brings native access to the file system with API to explore very large directory trees and to respond to file creations, deletions, and modifications events. This is also precisely explained in slides and live coding sessions. Before following this course, be sure to be familiar with the Java language, including the writing of basic lambda expressions and basic Java I/O notions. For that, you can check the Pluralsight Java library. This course is very technical, so I put many examples in it, first explained in slides and then D mode and live coding sessions to make things easy to understand for you.
Creating Channels with NIO
Introducing NIO and NIO2 APIs: What Are You Going to Learn?
Hello, my name is Jose. I'm very glad to welcome you to this course, Java Fundamentals: NIO and NIO2. You will start by learning about creating channels with Java NIO. Just before we start, a preamble on the APIs available in the JDK to access files and network. In fact, there are three of them. The first one is Java I/O introduced in 1996 in the very first version of the JDK. The second, Java NIO, has been added to the JDK in 2002, Java 4. The last one to date, in 2011 in Java 7, we have Java NIO2 introduced. This course covers Java NIO and NIO2. Each relies quite heavily on the good knowledge of Java I/O which is covered in one of my other courses here on Pluralsight called Java Fundamentals: Input/Output. So, Java 4 added Java NIO to Java I/O which stands for non-blocking input/output. And Java 7 added NIO2 which is an extension of I/O and NIO. NIO stands for non-blocking IO. It deals with buffers and channels and supports asynchronous operations. This is what we are going to see about NIO in this course. Just as a side note, it has been stated when NIO was released that NIO was more efficient, better performance than pure Java I/O. This might have changed. This might not be the case anymore. It really depends on your application and your use case. Then NIO2 brought some more functionalities to both Java I/O and Java NIO. First, native access to file systems and events specially directory events to track file creation and deletion. And a very powerful directory structure exploration API. I will cover all these topics. This is what you are going to learn in this course.
What Do You Need to Know to Follow This Course?
What do you need to know before this course? Well, this is a Java course, so you should have a fair knowledge of the language and its main API, especially the collection framework which is always used all over the place. You should also have a good knowledge of the Java I/O API. Of course, the readers, the writers, input stream and output streams, but also the notion of file, of path, and the structure of the exceptions. We'd be also talking about file system so you need some basic knowledge of what a file system and a file system structure is. We'll also use networking, especially in the second module of this course, so you need to know a little bit of networking to understand that. Very simple knowledge. Very basic knowledge is enough. You need also to have some basic knowledge about concurrency. In fact, from time to time, I will be mentioning that this class or this pattern is thread safe or not. So if you want to understand that, you need to know a little about concurrency. If you do not, it's not really important, but you will just miss some points here and there in this course.
Agenda and Organization of This Course
Let us quickly browse through the agenda of this course. You will begin with Java NIO covered in two modules. First, you will learn about channels and buffers by following this module. Second, you will see how to set up asynchronous input/output operations using selectors in Java NIO. Then you will see Java NIO2 covered in three more modules. You will first begin by learning about the file system API available in NIO2. Then you will see how to visit directory trees. NIO2 brings three patterns for that that are much more efficient than the one we have in Java I/O. And you will learn how to listen to file system events to track down files creations, deletions, and modifications.
Agenda and Organization of This Module
We will be covering buffers and channels. First, we will talk about channels, file, socket channels, and in-memory file channels. Then we will talk about buffers and explain how they work together with channels. We will give examples on the read and write operations in buffers. Then see how you can use multiple buffers especially useful for fixed size file formats. We will also see how to use charsets which are always an issue when we're dealing with streams of characters.
Introducing the Java NIO API: Why It's Been Added to the JDK?
Before we begin, let us first introduce Java NIO and explain why this API has been added to the JDK with the Java I/O API. Why a new input/output API? Well, you have to understand that Java I/O has been added to the JDK in 1995/1996 and NIO in 2002, that is seven years later. In fact, during those seven years, people realized that Java I/O was technically speaking a good API, no problem on that, but it was lacking several features especially for performance reasons. For instance, Java I/O writes or reads one byte or character at a time and cannot conduct bulk reads or bulk write operations. Second, readers are for reading only. This is the case of the reader class and the input stream class. And writers are for writing only. Then, buffering is made for use of buffered reader or buffered writer for the character stream and buffered input stream and buffered output stream where the raw bytestreams are. It occurs in the JVM called heap memory that is in a nutshell an essential memory of the Java Virtual Machine and held by the garbage collector. This is not well adapted to very large files, very large read and write operations. Then the handling of charsets is not that great. Basically, a text file written in Latin-1 across it, for instance, that is a charset not supported by default by the JDK which default UTF-8, would have to be read using the binary operation and then converted character by character to the right charset. This is not completely transparent to the user. The Java I/O API does not offer any solution for that. And at last, all the operations on the Java I/O API are synchronous operations. All this was not that bad in 1995 but in 2002, seven years later, people had the feeling that those functionalities were missing from Java I/O and decided to create a new Java I/O API to handle all this.
Understanding What NIO Provides That Is Missing in Java I/O
And this is why NIO has been built. NIO provides first bulk access to raw bytes and only raw bytes so you can leverage the functionalities of flight systems and operating systems to speed up read and write operations. Bidirectional channels with a single channel, that is with a single Java object, you can both read and write data to and from the disc or to and from the network. Off-heap buffering. We will not give much details on this topic because it is outside of the scope of this course but in a nutshell what you can do is create buffer outside of the central memory of the Java built machine in portions of memory not handled by the garbage collector. So you can create very large buffers. Think of multi-gigabytes or even multi-terabytes of size without any impact on the performance of the garbage collector. Proper support for charsets directly inside the JDK. So the JDK defines standard charsets objects. Objects for the standard, well-known, and most widely used charsets around and those charsets provide encode and decode methods to convert a stream of characters expressed in a given charset to another charset. And then there is the support for asynchronous operations which was just not there in the Java I/O API. So those are the main reasons why Java NIO has been introduced in the JDK in 2002.
Introducing Buffers, Channels, and Selectors
Following this, for its I/O implementation, Java NIO introduces three new concepts. First is the concept of a buffer. A buffer can be seen as a space in memory. It can reside in the main memory of the JVM, the heap or off-heap, which is very useful for very large buffers. And this is basically where the data resides. The second concept is the concept of channel. The channel is where the data comes from. The channel object is an object that connects to a file or to a circuit for instance. A channel can write the buffer to a medium or can read data from that medium to a buffer. A channel only knows bytes buffers so it can only read and write bytes from files or for circuits, for instance. Afterwards, we will have to convert the content of this buffer to characters if this is a character buffer or to data or object if it is raw data or raw objects. And the third concept is the concept of selector. A selector has been introduced to handle asynchronous operations. So in the first module of this course we will be covering buffers and channels and play with them and understand how they work and what we can do with them. And in a second module we will see asynchronous operations and this concept of selector. What we have to understand just now is that a write operation takes data from a buffer and writes it to a channel. This is how we can write data to a file, for instance. How this buffer is filled is the responsibility of our application and we will see ways of filling buffers with characters, with raw bytes, with primitive data types, or java objects. And a read operation does the contrary. It takes data from a channel, that is from a file for instance, and writes it into a buffer. Once the data is in the buffer we can read it and interpret it as raw bytes, data types, objects, or characters as we need it.
Understanding Channels and In-memory File Channels
Let us talk about channels. The channel is an interface and it is implemented by several classes. The first one is the FileChannel to access files. It has a cursor. It allows for multiple reads and writes. And it is thread safe. The second implementation provided is the DatagramChannel to access socket. It supports multicast since it is UDP and it supports multiple, non-concurrent reads and writes. And the third one is the SocketChannel and the ServerSocketChannel to access TCP sockets. It supports asynchronous operations and also supports multiple, non-concurrent reads and writes. FileChannel, SocketChannel, and ServerSocketChannel are in fact, abstract classes extended by concrete classes in the JDK. But these concrete implementations are hidden and should not be used directly. To create instances of channels, we are going to use factory methods. In fact, a FileChannel can be mapped to a memory array for direct access. This allows for much faster operation than accessing to the disc directly. It is built on native features provides by the different operating system and this is the reason why the concrete implementation of FileChannel are hidden just because they are different depending on the machine we are working on. It should be used with caution because a single write in this kind of array can trigger a modification of the file that will be sent to the disc directly. This mapping supports three modes. The first one is READ_ONLY. The file is just loaded in memory and is read from this array. A READ_WRITE then just means that the file can be modified with the previous warning I give you and private means that the modifications made to this file are local to this channel. So they will not be propagated to the disc.
Understanding Buffers and Their Main Properties
Let us talk about buffers and see how they work. First, buffer is an abstract class extended by typed buffers. The first typed buffer and the most important is the ByteBuffer since this is the only type of buffer a channel can write in or read from. So if we need to write characters to a ByteBuffer we need to decorate it with a CharBuffer precisely and to write Chars using the method of this CharBuffer. And the same goes, for instance, for IntBuffer that allows for the reading and writing of integer into a ByteBuffer. And then those ByteBuffers, CharBuffers, etc. are extended by concrete implementations. Those concrete implementations are hidden. We do not have direct access to them. We can only create buffer using factory methods. This is the right pattern to use. As we can guess, a buffer is an in-memory structure backed by an array of bytes. It is usually stored in the central memory of the JVM, handled by the garbage collector. But it can also be stored in the off-heap space of the JVM, thus not impacting the garbage collector. And this is very useful for very large buffers. The size of a buffer is an Int since it is backed by an array. So the size of a buffer can be as large as two gigabytes which could have an impact on the performance of the garbage collector if stored in the central memory space of the JVM. A buffer object has three properties. First, a capacity which is an int, the size of the backing array. A current position that can be seen as a cursor. All the read and the write operations are made to or from the current position. And the limit which is the last position in memory seen by this buffer. So with those two indexes, cursor and limit, we can create views on buffers which can be seen as a kind of a sub buffer inside a buffer. And a buffer always keeps track of the available space so we know exactly what amount of data we can write in a buffer. A buffer can hold a single mark. Marking a buffer in fact, does two things. First, it sets the mark at the current position. The position we are reading from or writing to in the buffer. And second, it returns this to be able to chain calls on this buffer. And at last, a buffer supports four basic operations. The rewind that clears the mark and sets the current position to 0. The reset that sets the current position to the previously set mark. This is the companion method of the mark method. The flip method that sets the limit to the current position and rewinds the buffer. We are going to see this method in details in a few minutes and also in the live cutting part of this module because it is very important. And then the clear method that just clears the buffer from all its contents. All these operations, along with the mark operation, return this and so they can be chained for better patterns.
Writing Content to a File Using Buffers and Channels
Let us see how to write content to a file using those buffers and channels. Suppose we need to write integers to a file. First, we need a buffer to write those integers in and then a channel which will be a file channel to write the contents of this buffer to the right file. A channel can only write or read from a byte buffer. So first we create this byte buffer. Second, we can use one of the put int, put char, put double, etc. method to write our data. And third, have our file channel to write the content of this byte buffer to the right file. Note that in this way we can write characters or arrays of characters but we cannot write directly strings of characters. To write strings of characters you can use the following pattern. First convert the byte buffer to a char buffer with the as char buffer method. And second, use the put string method available on the char buffer. Let us take a look at the code. First we create a byte buffer here on one kilobyte of data. We use for that the allocate factory method of the byte buffer class. We could also use the allocate direct factory method to create byte buffer outside of the central memory of the JVM in the memory called the off-heap memory of the JVM. Then we write the data we need in the buffer. Here it is just one integer. Return using the putInt method on byte buffer. Then we create the file channel to a file to write the content of this buffer to this file. We use the open factory method of the file channel class. That takes a path as a first argument. And then as many standard open options as we need. Here, create and write since we want to create this file if it does not exist and write to it. And then we just call the right method of this file channel and pass the byte buffer as an argument of the stripe method. We should not forget to close manually the file channel if we are not choosing the try with resources button available from Java7.
Reading Content from a File Using the Flip Operation
Now that we have a file with some data in it, let us try to read it using the opposite pattern. Remember that in Java NIO a read operation takes data from a channel and puts it in a buffer. So we know that to properly read a file we need to understand how buffers work. First, we are going to create a channel on the given file to read its content. That content will be transferred in a byte buffer since a channel can only access a byte buffer and then our application will need to read the content of this buffer and translate it into understandable data. So, we have a file on a disc with let's say three integers, 10, 20, and 30. We can set up a channel that will transfer this data directly into an in-memory buffer and we will have an image of this file in this buffer. Once this is done, the cursor of this buffer will be marking the byte stream just after 30. So we will have to drop this buffer as an end buffer. When we do that we have an image of the buffer which starts at the current position of the cursor. That is, we have this image of a buffer. Of course this is not what we need. Because if we try to read from that buffer we will not see any content. So we need to put the cursor at the beginning of the buffer before doing that. We can call rewind and indeed, rewind will do the trick. It will put the cursor at the beginning of the buffer. And if we call as end buffer at this point, we will see the full buffer starting from the cursor to the end of the back end array. Now, the problem is that we need to know that we wrote three integers in this buffer because if we try to read as many integers as this buffer can contain, we'll probably read junk past the three integers that have been written. What we want, in fact, is not exactly this configuration but this configuration, an Int buffer that ends exactly at where the cursor was before the rewind operation. So in fact, what we want is to set a limit just here at the point where our cursor was before the rewind operation. And we have a method that does exactly this which is not the rewind method but which is the flip method. This is just what does the flip method. It sets the limit of the buffer at the current position of the cursor and then rewinds the cursor. So the correct pattern to read the content we just wrote using this channel is the following. Buffer. flip then buffer. asIntBuffer to set up some kind of view inside our buffer. And once we are done reading the integers do not forget to call the clear method to reset both the cursor, the limit, and the end buffer.
Understanding the Pattern to Read the Content of a File
So now that we know what to do let us write the code. First we need a FileChannel open in read mode on our file. And a byteBuffer of a certain size in memory. Then we read the content of the file in that byte buffer. All the content of the file will be put in the byte buffer. Call the flip method on this byte buffer. That will set the limit of the view on the ground position and rewind the buffer. Then we can call asIntBuffer to be able to read the content of our buffer. And to read this content all we need to do is to repeatedly call the get method to read the integers one by one. So this is the pattern to read the content of a file inside a byte buffer then decode the content of that buffer following the data it contains. It is important to note the difference between rewind and flip. Rewind is just a reset of the cursor inside the buffer. Flip is a reset that also prevents readings past what has been returned into the buffer. So to properly conduct read operation, most of the time it is the flip operation that is used.
Understanding Scattering Read and Gathering Write Operations
Java NIO also allows for the reading and writing to multiple buffers at the same time. Let us see that. The reading of a file in multiple buffers is called the scattering read operation. It consists in reading from a single channel, that is from a single file, to an array of buffers. It is specified by the ScatteringByteChannel interface. The reading process will first fill the first buffer and continues with the next one and so on. So it is mostly usable when we have fixed length file formats. The opposite is called the gathering write operation. It consists in writing from an array of buffers to a single channel. It is specified by the ScatteringByteChannel interface and the writing operation does the exact opposite as the reading operation. It starts with the first buffer and then the next one and so on. The gather/scatter pattern is mostly useful when we are handling messages with fixed-length parts. And in this case, it is in fact extremely useful. Suppose we have a one kilobyte header followed by a four kilobyte body, ending with a 128 byte footer. Then we can set up an array of three buffers of the right size, plug a channel on those buffers, then the first reading will fill the first buffer, second reading will fill the body, and the third reading will fill the footer. Here is the code to implement that. Very easy. We create three buffers of the right size then we put them in an array of buffers and all we have to do is just called the read method and pass this array as a parameter. This read call returns the number of bytes read in a long since the number of bytes read may not fit in an integer. The write operation works the same. First we create and populate our buffers. We put them in a single array and pass this array to the write method of our file channel object. The number of bytes written is also written in the long for the same reason as previously. So those are the scattered read and gathering write. Simple to do by the file channel class. It is a very useful pattern to handle fixed size messages. Just remember to properly rewind the buffers when using it.
Using Mapped Byte Buffers to Map Large Files in Memory
Java NIO also introduces the notion of MappedByteBuffer. What is a MappedByteBuffer? It is simply a buffer that maps a file to memory. Think of a buffer that is able to load a file in memory thus all the paths of your application that are reading your same file again and again will be much more efficient since the readings will take place in memory instead of taking place on the disc. There are three modes for those MappedByteBuffers. The first one and probably the most useful one is the READ mode. The second one is the READ_WRITE mode so you can both read and modify this file through this buffer. Now those modifications can be made private if you activate the private flag. The way the MappedByteBuffer is created also allows for the buffering of a portion of a file instead of the whole file itself. How does it work? Let us see the patterns. First, we need a FileChannel opened on a given path and a given mode. This FileChannel has already been covered so we already know that. Then, we map the corresponding file to this mapped buffer using the map method of this file channel. Here we pass a constant filechannel. mapmode. READ_ONLY to tell the API that this buffer will be READ_ONLY. And then we pass two integers to specify which portion of the file we want to map. Here we are mapping all the files. Of course this code will work only if the file is not too big, at least smaller than the amount of memory we have. Remember that this ByteBuffer is created in the main memory of the JVM. So if the file is too big, we will come across and out of memory error most probably. And then if this file is in fact a text file, we can use a standard char set to decode it. It will return a char buffer. We haven't seen this pattern yet. This is the object of the next part of this module.
Introducing ByteBuffer to CharBuffer Conversion Using Charsets
Let us see now how we can use charsets to convert by byte buffers into characters buffers. Java NIO in fact defines two types of buffers, ByteBuffer and CharBuffer. To convert a ByteBuffer into a CharBuffer and vice versa, we need to specify and to use a charset. And this conversion is based on the use of a decoder and an encoder. We have standard charsets supported in Java. The US ASCII and the ISO 8859 1, also known as latin1, UTF_8 which is the only one that we should be using in our applications, and also various flavors of UTF_16. A charset object has two methods, an encode method that takes a CharBuffer and returns a ByteBuffer and a decode method that takes a ByteBuffer and returns a CharBuffer. That does the opposite of the encode method of course. Using this method is the only way to convert a CharBuffer to a ByteBuffer and vice versa and this is what we need to read and write text files using Java NIO.
Understanding the Patterns to Convert Bytes Using Charsets
So what does the code look like? First, we need to create a FileChannel on the given text file. Let us suppose that it is encoded using latin1 and we allocate a ByteBuffer to read this file into this buffer. Then we need the right charset latin1 from the JDK and we can then decode this ByteBuffer using this charset. This charset of course, has to match the encoding of the text file we are reading. And this decode method will return a char buffer in UTF_8 so we can use this CharBuffer, namely its array method to extract the backing array of this CharBuffer and pass it as the construct of a string to get a proper Java string. On the write operation, we could encode the UTF_8 CharBuffer, filled for instance, with regular Java strings into a UTF_8 ByteBuffer. And then with another file channel opened on another file, write this ByteBuffer to this file to get a regular UTF file. So using this charset object, we can do both conversion from a ByteBuffer to a charset buffer associated with a given encoding. A few ideas on buffers, channels, and charsets. First, a channel can only read and write ByteBuffers. This is important to note. If we are reading a text file it will thus be read in a ByteBuffer. And then, using the encoding and decoding operation, we can convert a ByteBuffer to a CharBuffer using the right encoding and this is the only way to do it. Encoding and decoding are only available through the charsets object provided by the JDK.
Convert NIO Objects to I/O Objects Using the Channels Factory
The Java NIO API also provides bridges to the JAVA I/O API through the use of the channels factory class. What do we have in this class? In fact we have ten factory methods. Methods to create channels from input stream and output stream. Methods to create input stream and output stream from a channel whether it is asynchronous or not. We will see asynchronous channels in the next module of this course. And to create readers and writers from a channel providing a charset, because remember that a channel holds bytes and to decode bytes in characters we need a charset.
Live Coding: Simple Writing and Reading in a ByteBuffer
Right, now is the time for a little live coding session. Let us see some code in action. We are going to play with buffers and channels. Create them and connect them together and connect them to files. We will also use charsets. That is reading and writing a text file in latin1 and UTF_8 and see what it gives. And then we will see how we can read and write data on files using all this whether it is raw byte data or text data. Let us see how we can play with buffers and use FileChannel to write the content of a buffer to a file and read it back. For that we need a ByteBuffer. Let us call it buffer. And we can create a ByteBuffer using the allocate factory method and passing to it the capacity of this buffer. Let us take a one megabyte buffer. We can write data to this buffer by using one of its put methods and you can see that there are quite a lot of them for all the primitive types of the Java language. Let us use putInt and write in, for instance 20 and 30. Remember that a buffer has two properties, a position and a limit. Let us print out the position and the limit of this ByteBuffer and see what it gives. We can see that once we have returned three integers, that is 12 bytes, the position is in fact the next available position in the buffer which is expected. And the limit-- The limit of that buffer. Now, if we need to read this data back we need to convert this buffer into an IntBuffer. This is an IntBuffer. And use the get method of that IntBuffer. Let us print out the result. And we see that i = 0 is not the expected result. What I would like to have read is the 10. Why did I read 0? Well because when I converted this ByteBuffer into an IntBuffer, I started at the current position of this buffer. So this IntBuffer does not start at position 0 of this ByteBuffer but instead at position 12. The right thing to do when I do this kind of thing is to flip this buffer, buffer. flip. The flip operation will set the limit of this ByteBuffer to the current position we have, that is 12. And the new position to 0. So this time I should be reading 10 when getting the content of this IntBuffer. And indeed, this is the case. Something interesting to note is that position of limit of this IntBuffer once it has been created. Let us run this code. You see that the current position is 0 and the limit is 3. Remember that this is an IntBuffer. It means that in this buffer I have three integers, one at position zero, position one, and position two. So this IntBuffer has not the same characteristics as the ByteBuffer. It has its own position and limit and it sees each element it can contain as an Int occupying four bytes of the underlying ByteBuffer.
Live Coding: Flipping a ByteBuffer and Writing It to a File
Let us now continue and write the content of this ByteBuffer to a file. For that, I need a FileChannel. I am going to create this FileChannel using the factory method from the FileChannel class. Open, it takes a path as an argument so let us create this path. We use for that the factory method path with an S. And we are going to write files/int. bin. This is my path and I need two paths, standard open options for the creation of this path. So standard open option. I want to create this file if it does not exist and I want to be able to write to it so I need two open options for that, create and write. Now I will need to close this file channel once I'm done. So I'm going to run this in a trial with resources button. Catch this IOException, print this tack trace if anything goes wrong. Now, printing is tack traced like that. It's just something I am using here for this example. Of course this code should not be used in a production environment. And all I need to do is use the right method that takes a ByteBuffer as a parameter to write the content of that buffer to the FileChannel. Now, what this FileChannel is writing is the content of the buffer between the current position and the limit. Let us get rid of this IntBuffer code. If I do not call the flip method, just to check let us print out the size of the file that is created that I've got to size method that takes a path as an argument. Very handy. The size method for the IOException-- My main method already throws this exception. And let us run this code. You can see that the size of the written file is quite not the size expected. I am supposed to write three integers in this file so to have a file size of 12 bytes and an IF here of five size which is roughly one megabyte but in fact, slightly lesser than that. Here's 64 and here's 76. Why so? Because what has been returned is the portion of the buffer between its current position and its limit. And its current position is 12 and the limit one megabyte. So before writing this FileChannel, what I need to do is flip the buffer to set the position to zero and the limit to the current position of that buffer that is 12. Let us run this code and indeed, this time the size of this file is 12 bytes which is the expected size for our problem.
Live Coding: Reading Back Data from the FileChannel
Let us try to read this file back. We are going to copy and paste this code because it will be roughly the same. This time we want to open our file in READ mode. Let us clear our buffer and read the content of the FileChannel to the buffer and see what it does by printing out, at this point the position and the limit of the buffer. We have read the 12 bytes of our file. The position is 12 and the limit is the end of the buffer which is the expected behavior, of course. So if we want to read this content back, we need first to flip the buffer then to create an IntBuffer on our ByteBuffer=buffer. asIntBuffer and then begin to read the content just as we did IntBuffer. get as many times as there are integers in this file. Now suppose we do not know the number of integers in it. We are going to read them all. Let us create a list of integers, new array lists. We are going to run this code in an infinite loop. While this is true we add IntBuffer. get into this list. Now, let us check the documentation of this get method. It tells us that if a buffer's current position is smaller than the limit, then we have a buffer and a flow exception. So we are going to rob this code and then try catch, bufferunderflowexception. We are not really interested in this exception but we know that when it is thrown then we will have read the full content of the buffer. So let us print out the number of bytes we have read. Size= ints. size and print out the content just to make sure that what we have read is the correct data. Let us run this code. Indeed, we have read three integers of value, 10, 20, and 30. So this is the basic pattern on how to read and write data using buffers and channels. What is different with Java I/O is that we are using the single buffer and a single channel, both for reading and writing data. The price to pay is that we need to handle ourself the properties of the buffers and in a certain way, its content. Especially, we need to make sure that the position and the limit of this buffer have the right values when we're out reading and writing from it.
Live Coding: Writing Latin-1 or UTF-8 Text to a File
Let us see how we can use charsets to correctly write latin1 and UTF_8 files. I have created two charsets variables. Ones pointing to the standard charsets latin1 provided by the JDK and the other pointing to the standard charset UTF_8 also provided by the JDK. I have a string of characters, "Hello world from Jose". Now, there is a special character in it which will not be encoded in the same way in latin1 and UTF_8. Let us run this code and see that this string of character is of length 21. Now, to write this string of characters we need a CharBuffer. A CharBuffer can be created from a ByteBuffer using the as a CharBuffer method but it can also be directly created with the factory method allocate by seeing the size of this CharBuffer. On this CharBuffer I have a put method that accepts string of characters so I can directly write Hello in this CharBuffer. Do not forget of course, to flip the buffer to set the limit to the end of the string of character and reset the original to the beginning. And from now, we are ready to write this CharBuffer to a file. Now, remember that FileChannels can only handle ByteBuffers so I need to convert this CharBuffer to a ByteBuffer containing the bytes that are going to be returning on the disc. The right way to convert the CharBuffer to a ByteBuffer is to use precisely one of those charsets so let us do that, latin1. encode. Take the CharBuffer as a parameter, returns a ByteBuffer, ready to be returned to the file. To do that we are going to use one small, our old friend try with resources button. We need a path to create our FileChannel. Let us create one on files/hello-latin1. Since we are creating a file encoded in latin1 let us create this FileChannel object using the factory method we have already used. Passing the path as an argument and since we needed to create this file use the create standard open option and the write standard open option to write this file. Now we are ready to write the content of this buffer to our FileChannel just by calling channel. write and passing the buffer as a parameter. The channel will be automatically closed by the try with resources button as usual. Now what we can do is print out the size of the file that has been created, file. size and passing the path as a parameter. Let us run this code. We see that the size of the file is the same as the size of the string of character. Let us check the content of this file. It is "Hello world from Jose". And if we check the encoding we see that the encoding is ANSI, that is latin1. If we try to look at this file and force the UTF_8 encoding we see that the "e" I cued has not been correctly displayed by notepad because this is not a UTF_8 file. Let us go back to ANSI. This time it is okay. Let us run this code once again, using this time the UTF_8 charsets. We are going to encode this buffer using the UTF_8 standard charset. Let us run this code again. We can see that this time the size of the file is not 21 but 22. Why? Because this "e" I cued is encoded on two bytes in UTF_8 instead of one in latin1. Let us check the content of the file now. This time the encoding is detected as UTF_8 and the file is correctly displayed by notepad. If I try to change encoding to ANSI we have this well known buggy character that we can see all over the place, especially on websites that do not support correctly UTF_8. So this is the way of writing files using the standard charsets. Of course, if you need other charsets than latin1 and UTF_8 you just have to change the values of the constant we are using here.
Live Coding: Reading Latin-1 or UTF-8 Test from a File
How can we read these data back from the file to our application? Well once again we need to use those standard charsets to decode the bytes we are going to read. Let us point to the same file as previously the UTF file. Let us reuse the buffer we just used also. And let us copy/paste. Let's try with resources button. To read the contents of this file in this buffer we will be needing the read method of course and the standard open option READ to read this file. So here we have filled our buffer with the content of the file. Do not forget to call the flip method to set the limit and the position of this buffer to the right values and then, since this is the UTF file that we are reading, we are going to decode this buffer in the same CharBuffer as previously. How can we convert this CharBuffer to string? Well, there is a very handy method on the CharBuffer class which is the array method and we can create a new string from this buffer that is called the result result and print it out just like that. Let us run this code. Indeed, we have correctly read in our application the content of this UTF_8 file, "Hello world from Jose". Suppose we are not taking here the right decoder. Let us take the latin1 decoder. You see that here we are not reading the characters correctly. This is the typical badly read UTF_8 "e" acute read with the latin1 decoder. Now, if we take the latin1 file decoded with the latin1 decoder, we have this time the correct string of correctors but once again if we do not take the right decoder to decode this file, we will have a buggy character for the "e" acute. So this is the basic pattern to encode and decode text files using this API in Java. It relies on the use of standard charsets and is very handy to do the conversion for us.
Module Wrap-up
And with this I think we are done with this module so let us quickly wrap it up. What did you learn in this module? Well, you learned everything you need to know to understand the fundamentals of Java NIO. Namely, buffers and channels. We still have selectors to cover and this is going to be done in the next module. You learned how to reset, rewind, and flip buffers. Those are the three fundamental operations on buffers. You saw how to scatter and gather read and write operations supported by FileChannels and adapted to the reading and writing of fixed-sized file formats. And you also saw how to use charsets properly to encode and decode characters. Well, thank you for your attention. I hope you found this module interesting. And I hope to see you again in the next module of this course where you will learn about NIO asynchronous operations and selectors.
Setting up Asynchronous Operations with NIO
Agenda: Understanding Asynchronous Operations Using Selectors
Hello, my name is Jose. Welcome back to this Java NIO, NIO. 2 course. And welcome to this module, setting up asynchronous operations with non-blocking input/output. Here is your agenda for this module. The goal is to introduce asynchronous operations using Java NIO. First, you will see the difference between synchronous and asynchronous operations. And then you will learn about selectors and the bare minimum you need to know about socket channels. Then you will see how to set up asynchronous network operations using selectors. This is the use case you will see in detail, both in slides and in the live coding session of this module.
What Does It Mean for an Operation to Be Asynchronous?
So let us first introduce asynchronous operations. What does it mean to be asynchronous? When we call the read method, whether it is on a reader from Java I/O or on a channel from Java NIO, this is said to be a synchronous operation. It means that the method returns, the read call returns when the data is read, either from the reader or from the channel to the byte buffer. This method is said to be synchronous. This is a synchronous call, meaning that it blocks until the data has been read and is there. And this is very practical, because it means that in the following code, I can safely assume that the data is indeed in the buffer. On the other hand, asynchronous is said to be non-blocking. And it behaves differently. In fact, what we do is that we trigger a read operation. Then we can continue working on something else. And when the data is there, when the data is ready to be read in our buffer, then we are called back by the system, that is, by the API. So in the meantime, we can do something else. We do not have to wait for the read call to return. So what is the difference between both systems? Suppose our application needs to read data from many sources. Take a web server, for instance, which has many incoming requests on many sockets from many clients. With a synchronous read, each read operation has to be conducted in its own thread and will block this thread until the data is available. And remember that network operations, especially, are slow compared to CPU and in-memory operations. With asynchronous reads, a single thread can handle many read operations. So this is the main difference, and this is what is supposed to bring performance, especially to web servers. From an operating system perspective, a thread is a system resource. So it can be expensive to set up, especially in 2002, when Java NIO was developed. It might not be the case today, or at least threads are much less expensive to set up today than it was 15 years ago. And when an operating system has lots of threads to handle, it can be expensive for a CPU to switch from one thread to another. This is called context switching. And context switching has a cost. It is not a free operation. So back in 2002, this asynchronous system has been added to the JDK to enable the development of web servers, of Java web servers in a more performant way. And indeed, web servers like Tomcat at this time leveraged this technical possibility offered by the JDK to speed up their operations.
Understanding Selectors to Set up Asynchronous Systems in NIO
So how does asynchronous operations handle from a technical point of view? It is based on the use of an object, a selector, that we already mentioned in the previous modules and that we are going to see now. A selector is the entry point to set up an asynchronous system. First, we need to create a channel. This has not changed. And a channel can be created on a disk, network, whatever. Then we configure this channel to be non-blocking. This is something that we have not covered yet. And we register this channel with this selector. And the asynchronous stuff will be handled by this selector object. This operation returns a registration key. From this key, we can access the channel and various informations about the state of the channel directly. And we can do that with as many channels as we need. That is, a single selector can handle several channels. In fact, it can handle many channels at the same time. Then the channel will generate events, will fire events to the selector. There are four events defined in the channel class: read and write, meaning that the channel is ready for reading or writing elements, and in the case of socket channels, connect, meaning that a connection was established on this socket, and accept, meaning that the connection was accepted. The registration is configured to listen to certain events. Those events are passed as parameters during the registration process. But we can also modify those listen events directly through the key. So once the system is properly set up, we need to call the select method on the selector. This will be a blocking call until some other channels registered to this selector have events to be consumed. And from the selector, we can get the list of the keys associated to those channels that have events to be processed.
Pattern to Set up a Selector to Read from a Socket Channel
Let us set up an asynchronous network reader. Let us see the code to do that. So first, we need to create a server socket channel, special type of channel dedicated to listening to incoming requests on sockets. We have a factory method for that. And we configure it, we configure this channel to be non-blocking just by calling this configure blocking false. Then from this channel we can create a server socket by calling the socket method and bind it to a special socket address here on the local machine, using the given port. Then in a second step, we create a selector. We have a factory method for that. And we register this server socket channel to this selector on the event OP_ACCEPT. That is the accept event. This register call returns a selection key that should be stored for future use, like unregistering this channel, checking for the validity, or changing the events we are listening to. This step can be repeated with as many socket channels as we have, with as many ports we are listening to. Then when the registration process is done, we enter the event consuming loop. All we have to do is call the select method on our selector object. Remember that this selector object can have many channels registered to it. This select call is blocking until there are events available in some of the channels. N is the number of keys with available events, that is, with data ready to be consumed. Then after that, we need to call this selected keys on the selector object. It will return a set of the corresponding selection keys, that is, the registration keys that generated events. And then we need to do something with those keys, probably in a for loop or in a for each pattern, like here.
Setting up the Selector to Accept an Incoming Connection
So in our example, the first step is the following. An incoming request has been received on our socket channel. So first of all, we need to make sure that the key corresponds to a connection request. And second, we need to set up a connection with a socket and listen to the events on the socket. So let us write this code to do that. Checking if the selection key indeed received an accept event is done in that way. We need to invoke the ready ops method on the key object. This returns an int, which is in fact a filter of bits. And we need to check if the corresponding bit to the accept event has been set to one or not. So this is done with the selectionKey. OP_ACCEPT bit mask and the AND bitwise operation. Be careful, this is not the AND of the Boolean operation. This is the bitwise AND. And we need to check if the masked bit field still has a one on the OP_ACCEPT bit. So if this is the case, we have a connection request. Then we can get the corresponding channel using the key, because the key knows the channel it is linked to. And from this channel, by calling the accept method, we can open a second socket, which is a socket channel dedicated to the communication to the client that made the connection request. Of course, we are in a non-blocking world, so we are not going to create a synchronous socket. We also configure this socket to be non-blocking. And this new channel, we can also register it using the same selector we already have on the read event to get the incoming data from this new client.
Read the Content Sent Through the Socket Using the Selector
So at this point, our selector is registered with two channels and will get events from both channels. The first channel listens to incoming requests, and in the meantime, we may have other incoming requests generating other channels. And a second channel has been created to handle the communication with a given client. So we need to add the handling of the read event for the second channel in our code. Let us do that. We already have code to handle the connect event. So now we need to add code to handle the read event. And we have a bit mask for it, which is the selectionKey. OP_READ. We get the socket channel that generated the read event from the key. We can then copy the content of this channel to a byte buffer that we created somewhere else. And this accept method will return immediately. Why? Because if we are in this code, it means that the data from this socket channel is available. If the data was not available, then we would not be running this callback. That is invoked only if the data is available. And once the data is in the byte buffer, then we can do something with it, whatever we need, analyze the content, get the content, and act accordingly. This is where our business, our application code, will be returned. And there is some technical code to continue. First, remove the key from the selected keys. Cancel the key, since we are done consuming the information. And probably closing this channel, since this socket was created just to handle this particular communication with the client. Of course, the closing of this channel could have been made in a try with resources pattern, if we are in Java 7, and it will unregister this channel from the selector.
Live Coding: Setting up the Asynchronous Server Socket Channel
Let us see all this is action. In this live coding session, let us replay this little asynchronous server live and with some data and see how it works. So let us see how we can use selectors on a very simple toy example based on sockets and socket channels. So first we're going to set up a server. This server will be accepting requests from sockets. And second, a client to connect to this server, just to show that things are working properly. So to create this server, we need the server socket channel. So let us create this object. We have a factory method for this called open. Right, we are going to configure this server socket channel to be non-blocking. So configureBlocking, and we pass false as a parameter. Then from this channel, we can get a server socket. Let us call it server socket. ServerSocketChannel. socket. And this socket, we are going to bind it, serverSocket. bind, to an address that is encoded in Java in an Inet socket address object. And the port we are going to bind to is this one. Hopefully it will be available. And since we do not provide any address, it will be bound to our local address. Now, since we have an asynchronous server socket channel, we need to register it using the selector. So let us create a selector. We have a factory method for that, Selector. open method. And we can register our server socket channel to this selector using the register method. Pass this selector as a parameter, and then the type of event we want to listen, given by the SelectionKey. OP_ACCEPT parameter. So here we have set up our channel. And our channel is going to send this type of event to the selector we have here.
Live Coding: Accepting an Incoming Connection Request
Let us wrap the rest of the code in an infinite loop. From here, we will be saying that we are waiting for events, okay. And remember, the first step is to call the select method on this selector object. It returned a number of events received by this selector. So we can just print with format, got and events, just like that. And this is select the number of events that we received. Remember that this select call is blocking. The application will be blocked on this method call. Now, once we have events, we can get the keys associated to those events from our selector object with the selectedKeys method. This call is non-blocking. And it returned a set of selection key that will carry the type of event that has been catched by this selector. Let us loop through those selection keys. SelectionKey from keys. Now, we are only interested, for the moment, to the selection key OP dot ACCEPT. And we need to check if the type of event that this key is carrying is indeed an OP_ACCEPT event. To do that, we need to read the readyOps property of that key. This readyOps property is in fact an integer. The bits of this integer are meaningful. They all have a special role in this pattern, and we are only interested in the one called OP_ACCEPT. So we are going to use this selectionKey OP_ACCEPT as a bit mask on this integer that is in fact a bit field. And if the value here is equal to selection OP_ACCEPT, it means that the corresponding bit to this bit field is set to one, meaning that this event is indeed an OP_ACCEPT event. So if we fall in this if block, it means that we have a connection request from a client that we need to accept. So let us tell that to the console. Accepting the connection. From this, we need to get the channel on which this request arrived. The key is carrying, indeed, the channel that it is coming from. And this is our server socket channel here, of course. So let us put that in a server socket channel variable. On the real production environment, I could have many, many channels. So this is the right code to consider. We need the corresponding socket channel from this server socket channel. We can have this socket channel by calling the accept method of this channel. And we are going to configure this channel to be non-blocking because this is also a channel that supports asynchronous operation. This is a new channel, a special channel to communicate with the client that is requesting a connection. So now we have a new socket that we need to register, once again, to our selector. So let us register this tracker channel to our selector. And this time we are not listening to the selection key OP_ACCEPT event. Why? Because we expect some request sent through this channel. And what we need to do is to read the content of this request. So now we are interested in the OP_READ element, so we are registering to this event. Now what should happen is that some data is going to be sent to this socket and arrive through this channel. If this is the case, we will have another event on this select call with another key. The type of this event will be OP_READ instead of OP_ACCEPT, so we will process this event in another block of this if statement, bound to the OP_READ type of event. Since we are done with this particular key, we need to remove it from the selected keys. So key. remove of key. So this is the code that is going to handle incoming requests. We can see that it is systematically accepting the connection. In a real production environment, this might not be what we would want to do.
Live Coding: Reading the Text Content Incoming from the Socket
So now we need to handle the case where this is not an OP_ACCEPT event but a read event. Here we are accepting a connection. Here we are reading the contents of the socket. So in this case, we are only interested in the OP_READ event. So we need just to replace OP_ACCEPT by OP_READ here and here. This selection OP_READ still acts as a bit mask on this bit field. And this time we are checking for the OP_READ bit set to one. Reading the content consists in the same kind of code. This time, what we are getting is the socket channel that is responsible for this event. So the key. channel here is going to return this socket channel there. So let us read this variable. Remember that we are still in this NIO space, so we need a buffer to read this channel. Let us create this buffer. ByteBuffer buffer equals ByteBuffer. allocate for one K. We are not sending big messages over the network. Let us read the content of this channel in the buffer. Read buffer. Suppose we are dealing with a text protocol. Then we will need to convert this ByteBuffer to a CharBuffer. Let us do that. First, of course, buffer. flip, just to read, again, what has been written from the socket. And then we need the StandardChatsets. UTF_8 to decode the content of this buffer to a char buffer. Let us call it charBuffer. And now we have the content of what has been sent through this socket inside that charBuffer. Here we would put our application code, our business code. What we are going to put here is just some code to see what has been sent. So let us call the array method to get the array of bytes. Hopefully this has been properly decoded in UTF8 so we can put this in a string. Let us call it result. And let us print out what has been read from that socket. This is in the result variable. Now we can clear this buffer. This is a local variable to this block of code, so it will be destroyed anyway when we leave this block of code. Let us remove the key from the set of the keys. Call cancel on the key. And once we are done with this socket, call the close method on this socket channel. So this is basically the code we could write to read the content incoming from our request. We have to call this close method because we did not use the try with resources pattern. In fact, we did not bother at all with the exception, since our main method already throws the IOException.
Live Coding: Writing a Basic Client to Send Data to the Server
Let us now write our client code. We need a socket channel. SocketChannel. open. And since we want to connect to a given host, we have to pass an address to this socket channel. So we need to create first this address. It is an Inet socket address that can be created with new. We are going to connect locally, because this is just a very simple example on this port. This is the port we used for our server. Once we have that, we can just ask for the socket and get the socket from the socket channel. This is the right pattern to create a socket channel and a socket bound to it. Let us see another pattern. I am going to comment out this code for a minute. In raw Java I/O code, I could write this kind of code. Create a socket, connect this socket to an address, which is in fact this one, and then get the channel from the socket. And this is a socket channel. This is the right object for this socket. The problem is that if I write this kind of code, this channel returned here will be new. This object would be new. So this pattern does not work. It should not be used. It is written, in fact, in the Java doc, okay? The socket will have a channel if and only if the channel itself was created via the socketChannel. open or ServerSocketChannel. accept, which is what we did in the previous server example. So if you create your socket using this pattern, the channel property will be new. So just don't do that. It doesn't work. We'll stick to the right pattern, of course. So to write to this socket channel, as usual, we need a char buffer. This is our buffer. Let us take a small buffer, allocate it to one kilobyte, right, and we're going to put a request in it, very simple request. REQUEST, like that. Remember that to write that content of this char buffer to the socket channel, we need to make a byte buffer out of it. And to do that, we need a char set and to use the encode method in that case. So let us take a StandardCharSet, the UTF8 one, and let us encode our buffer with it. And directly write the contents on our socketChannel. write, just like that. And once this is done, we can close our socket. We do not need it anymore. Once again, this main method throws the IOException, so no explicit exception handling has been returned in this code. And by the way, do not forget to flip this buffer to send request instead of sending the rest of the buffer from this character to the end of the one kilobyte. Let us try to run this code. Now our server socket channel is waiting for event. We are here in the code. And let us run the client code. This is what the server is telling us. We got one event. Then accepting the connection. Let us see where we are in the code. Accepting the connection, it means that we are here. The socket to get the data from the client is set up. Then back to we are waiting for events, meaning that we are there. And then another event is arriving, which is a read event with the REQUEST content as expected. So this is how we can set up using asynchronous channels a simple server with a simple client to connect to it.
Module Wrap-up
Well, I think this is the time now to wrap up this module. So what did you learn here? Well, you saw the way Java NIO brings asynchronous operations to the Java platform. The good news is that all the technical details are hidden in the API and handled for you. You learned how to use selectors in the API and how to set up an asynchronous server, very simple, very basic, but still working. And with all those three notions, buffers, channels, are covered in the previous module, and selectors are covered in this module, you have seen most of what you need to know to understand Java NIO code in your applications. This is the end of this module and of the part of this course about Java NIO. So thank you for watching. I hope you found this part interesting. In the next module, you will begin to learn about Java NIO. 2 from Java 7.
Using FileSystems in Java NIO2
Agenda: Using the Use of File Systems in Java NIO2
Hello, and welcome back to this Java NIO, NIO2 course. My name is Jose. This module is about using file systems in Java NIO2. The agenda of this course is the following: as we mentioned in the introduction of this course, NIO2 has been introduced in 2011 in Java 7. It provides classes to access the file system directly, and implements file systems dependent operations not provided in Java input/output. The FileSystemProvider class, the FileSystem class, and the FileStore class are three of those classes that we are going to cover after a quick introduction on why those classes have been created, and why they have been introduced in the JDK. All this is highly operating system and file system dependent. And it's really a great change in the philosophy of the design of the JDK, since the JDK at first aimed to be completely operating system and file system independent.
Introducing the File Systems Support in Java NIO2
Let us first talk about file systems support in NIO2. And let us first answer to this question: why adding this file system support in the JDK? The question is interesting because we already have methods to move files around, to create files, to check for the existence of files and the same for directories. And we can get the content of a directory. We have methods for that in the file class. The answer is performance. In fact, the methods we have from Java I/O are okay, they work, but when it comes to handling very large directories with many, many files in them, think about thousands of file, it is becoming less and less efficient. So this new API from NIO2 is directly plugged on the native file system and can handle very large directories even in its Java implementation, and we are going to see that. So the main reason is, first, performance, and second, get more functionalities from the native file system. First of all, what is a file system in NIO2? A file system in NIO2 is an abstraction of a real file system. So it is a new concept in the JDK; did not exist before that. And it is bound to a scheme in the URI sense. And the default scheme is the file column //. And it implements all the operations of the files factory class, creation, copy, duration, et cetera. By default, the JDK provides two file systems. First, the default file system which is the classical disk file system, and also a JAR file system that can be set up in memory or directly on the disk. And that we will cover in this module.
Understanding the API and File System Providers
Technically speaking, the file system support is built on three classes. The first one is the file system provider. And this class acts as a factory for file systems. Through file system providers, we can create other file systems. This subject provides methods to create, move, copy or direct files. It also has support for directories and links, which is new in the Java input/output space. It also works with Java I/O, providing bridges to objects from those two APIs. And it also gives access to security attributes and special attributes particular to a special file system. Think about a Windows file system or a Linux file system, they do not have the same security attributes. Through the file system provider, we can get them in a native way, and this was not possible in Java I/O.
Modeling File Systems
The second class is the feel system class, and it is an abstraction of a file system. It can close the file system or query if this file system is opened or read only, for instance. And it can provide technical information on this file system like the root directories or the used separator. You know that the separator is not the same on Windows and Linux, for instance. And it can also get the stores as a third class that we are going to cover in a minute, the FileStore objects. It can also create a path in that file system, and this is a very important point to note, and we will come back to it later. File system is the object that is used to create path instances. We saw that the path is an interface, and since it is created from a file system, we can see that the implementation of this interface will be dependent on the file system it is created on. We will come back to this later. And at last, a file system can be used to create a watch service. This is covered in the last module of this course.
Understanding File Store
The last of the three classes is the FileStore class. This file store is an abstraction of a file store within a file system. It can provide the name and the type of the store, and those two informations were not made available through the file class of the Java I/O API. And it can provide also information on the space of the store, the used space, the available space, et cetera. And it can also give access to security attributes also in a native way. So those three classes: FileSystemProvider, FileSystem, and FileStore will give us information on the disks connected to the machine we are on, informations that were not available with the other APIs of the JDK. We have a fourth class which is a factory class called FileSystems. And in fact, it provide facade methods to the FileSystemProvider class.
Getting File Systems and Stores from the FileSystemProvider
Now that we have the big picture in mind, let us see how it works technically speaking, let us write some code. We can get all the file system providing by the default JDK installation by using the factory method installedProvider. This usually returns two elements: the default file system provider to access the disk, and the JAR or ZIP file system provider which is the same, and which is used internally by the JVM to read Java files. So let us get the first of those: providers. get 0 will return the first file system provider of this list. It has a getFileSystem method, which takes a URI as a parameter. In fact, it only accepts this URI: file://, which is the scheme of the file system, and then / which is the root directory of this file system. We can also get a reference to the same file system by just calling the getDefault factory method or by calling the getFileSystem factory method and provide the same URI. In fact, those three patterns will return the same file system, which is the default file system on the root directory of our disk. Then we can get, for instance, the root directories of this file system. Suppose we are on a Windows machine with three drives: C, D, and E. Then those three root directories will be the corresponding directory. We already have a method like that on the file's factory class, but it does not return an Iterable, it returns a list. And returning a list can be very costly, especially if we have many, many root directories on our file system. And this is how NIO2 has been designed. Every time we need to get the set of elements, for instance here, the root directories of our file system, but think also of the files available in a given directory, it always return an Iterable. Why? Because an Iterable is a lazy structure. It does not hold the result. So it is much more efficient than returning a list. We can also get the file stores from this default file system, and it returns an Iterable FileStore objects. And from this FileStore, we can get information that were not available through the use of the plain Java I/O API. We can get the name of the store. So here on our Window example, it will be the name of the drive, and also the type of that drive; here NTFS, which is a file system from Windows. We will see examples of that in the live coding session on other types of drives.
Creating I/O and NIO Objects with the FileSystem Object
Let see now patterns on how to create files on directory with the file systems API. So far with Java I/O and Java NIO2, we saw two ways of reading and writing files. The Java I/O way based on input/output streams for stream of bytes, and readers and writers for streams of characters. And the Java NIO2 based on channels and buffers asynchronous or not. Those two ways are fully supported by the file system API. The entry point is the FileSystemProvider class. For Java I/O operations, we have two methods. First, newInputStream that takes the Path and some standard OpenOptions, and newOutPutStream that takes the same parameters. For NIO2 operations, we have three methods. newFileChannel that takes the path and some options that will create a file channel, of course. newByteChannel to create a byte channel, and newAsynchronousFileChannel to create an asynchronous channel to a file. So with all those methods, we can very easily integrate some NIO2 code to existing Java I/O or Java NIO2 code. The file system provider API can handle the following elements: files and directories of course, symbolic links which is new to the JDK. We do not have any API to properly handle symbolic links in the Java I/O API. And supports the following operation: creation, copy and moving elements around, deleting and delete if exists also, and access to security elements. As we already mentioned, the access to security elements can only be done through this API.
Creating Directories Using FileSystem or FileSystemProvider
We are going to see now three patterns to create directories. The first one is the createDirectory method declared on the fileSystem object that takes a file as a parameter. And this one just creates a file on the given file system. And the two others are declared on the fileSystemProvider. The first one takes a URI as a parameter, and since there is a scheme in a URI, we can guess that the fileSystemProvider will use that scheme to create the directory on the right file system. And the second one takes a path as a parameter. Now we already saw this path object. We've been already using it. But there is still one detail that is missing and that we are going to see now.
Pattern to Create Directories from Files, Paths, and Names
Let us take an example of the creation of a directory on a file system. And suppose we know this directory as a file. We know that from Java I/O we have an NK dir method on the file class. If we call it this directory, will be created on the default file system. Using the file system API, we can get any file system and create this directory on this file system that can be different from the default file system. This is the pattern if we know this directory as a file, plain and simple. Now suppose we know this directory as a URI. A URI has a scheme, here file://. So a URI knows which file system it belongs to. This time, we are not going to create this directory using a given file system, but merely using the FileSystemProvider. And from this scheme, the FileSystemProvider will be able to check for the target file system. And we call the right file system for the creation of this directory. And suppose now that we know this directory as a path. This path has been created using the classical factory method, get, from the factory Path class. The fact is, the createDirectory method that takes a path as a parameter is defined on the FileSystemProvider class, not on the file system class. So the question is what file system this file system provider is going to choose to create the directory in. Well the answer is quite tricky, and lies in the way the path is created. If we check the factory method if this Path class, the one that takes strings as parameters, we can see that it creates this path on the default file system of our system. So there are two things to note. First, a path is linked to a file system, it is bound to a particular file system. And if we create a path just with a plain string of characters, this file system is the default file system. The good news is that most of the time this is what we want. But if we create our path always like that, we could end up creating directories not on the file system we want. So in this pattern, the directory creation will be made on the file system this path is linked to. We just need to be very wary on this pattern, because it really depends on the way the path has been created, if we get this path through a method, for instance, we need to be careful because the corresponding or directory might not be where we think it should be. So fortunately, we also have a pattern to create path that takes a URI as a parameter. And since this URI has a scheme, it will tell the file system provider where to put the path in the right place.
Understanding Path Creation and Binding to File Systems
So we just saw a very important detail of this path object. A path is bound to a given file system. If this path has been created from a plain string of character, then it is bound to the default file system, that is the disk, and most of the time this is what we need. But we can also create a path using a URI, and this time, this path is bound to the file system with the corresponding scheme we passed in this URI. With the corresponding scheme, we passed in this URI.
Accessing Files Attributes Using the FileSystemProvider
Let see now how we can use file system in a native way to access native file attributes dependent on the file system we are using. It is a possibility that is given only by NIO2 in the Java space. And that allows to access file attributes for both Windows and Unix file systems. There are three interfaces involved, and we are going to see them. Reading the attributes of a file is done through the file system provider. The first interface is called BasicFileAttributes. And it's common to all the file systems around. So it has method to get the different time information of the file or directory; more method to check for the file itself if it's a directory, a file, or a symbolic link, for instance; a size method to get the size of the file, and a fileKey which returns a unique identifier to identify this file. In fact, this key is used, for instance, in directory exploration to check for cycles if symbolic links are followed. These BasicFileAttributes interface is extended by a DOSFileAttributes interface, with methods specific to the Windows file systems. And another PosixFileAttributes interface, with more methods different from the DOSFileAttributes method to get the Posix-specific file attributes. With these three interfaces, we can get native file attributes specific to a given file system. Let us see how it works in the code. First, we create a path. Obviously, we are on a Windows file system. To read the attributes of this path, we need a file system provider. A safe way to get this file system provider is to first ask for the file system this path is bound to. And then from this file system, get the provider. In this way, we are sure that the file system provider knows this path and that the code will not throw any exception. Of course, if we are querying the Posix file attributes on a Windows file system, it will not work. We will get an exception. So this code is specific to Windows file systems. By the way, all these codes are encapsulated in a factory method of the Files class called readAttributes. Of course, this method chooses the right file system and file system provider to get the attributes, it is a safe method.
Introducing the Jar File System to Handle JAR and ZIP Files
Let us now talk about the JAR file system, which is one of the two file system provided and created by the JDK when we launch a Java application. This file system provided by default can be used to read and write ZIP file very easily. And it follows the same pattern as the disk file system. It supports two modes. The first one is the creation of ZIP files and adding of content to a ZIP file. And the second one is the reading of ZIP files. A ZIP file can be written or read in two ways. First, we can copy existing files in it, we copy existing files from it outside of it, and second, we can write content directly in it, thus creating files in the ZIP file.
Creating a ZIP or JAR Archive File
Let us see that on some code. First, we create a ZIP file URI with a special very precise scheme: jar:file://. Then we need to pass options at the creation of the file system. Those options are added to HashMap of String, String. There are two values for the key: create and encoding for text files, and the create option supports, of course, two value: true and false. Encoding supports standards (mumbles) name defined in the JDK. Then we create our file system using the newFileSystem factory method from the FileSystems factory class, bypassing the ZIP file as a URI and the HashMap as the options. This creates this file system, and creates the given archive file.
Copying Files and Creating Directories in a ZIP Archive File
How can we add an existing file to this archive? Well let us take a path to an existing text file, here, some. txt. We just use the copy factory method from the Files factory class with this first path as a parameter, and with a path inside the ZIP file. This will create a ZIP entry, some. txt, and add the content of someText, of course, in a compressed way. So you see that this pattern is extremely simple and extremely straightforward. It is just a matter of copying a given file from one file system, here the default disk file system, to our ZIP file system. Now suppose we want to copy this file within a directory inside our ZIP archive. First, we need to create that directory. And doing that, since our archive is seen as a file system, it's just a matter of calling a createDirectory method on the file system provider, providing the right path. To create this path, we have two method. We can create this path directly from the ZIP file system by just providing the name of the directory. This is our first example, very simple. And we can also provide a URI to create this path. This URI is a bit more complex. It is a jar:file://, the scheme of our archive, followed by the full path to the archive file, followed by an exclamation mark, and a full path of the directory within the archive. This is still a path, so we can provide this path to the creatDirectory method to achieve the same result.
Using OutputStreams or ByteChannels to Write to an Archive File
The second way of creating content within a ZIP file is by writing content directly into it. Let us see that. We need first to create an entry in this file system in a form of a path. Same as previously, we have two methods to create this path. The simple one, leveraging the getPath method from the ZIP file system. Here we are creating an ints. bin file. And from this path, and the provider of the ZIP file system, we can create a basic classical OutputStream on this target with standard option: here, CREATE_NEW and WRITE. We can decorate this OutputStream to write data or objects as usual. And with the same kind of pattern, we can also open a byte channel to this entry from within our archive file using the newByteChannel method from the file system provider. In this call, the syntax is not exactly the same. The options are provided within a set, so we need to create a set, add the StandardOpenOptions in it and pass this set as a parameter. So those are the two ways to add content to an archive: copy file to or from this archive file, or opening input streams or output streams or byte channels directly within the archive.
Live Coding: Pattern to Read Available File Systems
Now is the time for our live coding session. Let us see some code in action. What are we going to see here? Well first, we are going to play with those file systems, see how we can do basic operations of file creation, file copying and the like. We will first use the classical disk file system and see how it works. And then conduct some operation on the ZIP file system. Let us see the basic operations on file system providers and file systems. First, we can have access to the install the file system providers factory method installedProviders, available on the FileSystemProvider class. It returns a list of the installed providers. Let us print out all the elements of this list. They are not that many. System. out::println. Let us run this code. The default JDK installation has two file system providers: the disk file system provider. Here, I am on a Windows machine, so this is a Windows file system provider. And the ZIP file system provider used by the JVM to read its Java files containing both the JDK classes and our applications classes. Let us get the reference on the first of those two file systems. This is the disk file system, let us put this in a variable: windowsFS. And let us compare this file system with the other file system we are getting by other means. We have two other means, for instance, the FileSystems. getDefault factory class that returns our first file system. And we can also create a rootURI, URI. create, and pass file:// and then a further /, and from this invoke the FileSystems. getFileSystem and pas this root URI as a parameter. Let us put this in another variable. In fact of course, all those three file systems have the same object returned in three different ways. We can see that, let us print out fileSystem1 == fileSystem2. Let us run this code, and indeed, we have only one file system object in our system that can be returned in different ways. By the way, this variable should be called FSP, since it is a file system provider and not of our system.
Live Coding: Understanding Path Creation from FileSystems
Let us try a very basic operation on file systems, the creation of a directory. We have a createDirectory method on the file system provider object that takes the path as a parameter and some fine attributes if needed. Let us create this path. We have a get method on the Paths factory class that takes a string as a parameter. So let us provide this string of character, which is a Windows path, and let us run this code. And check that, indeed, in this directory, e slash tmp, the tmp-dir has been created. Now you need to be extremely wary with the way you create your path using this API. If I check the get method of this Path factory class, I can see that the creation of this path is in fact bound to the default file system that is most of the time the disk file system. This is perfect because I wanted to create a directory on the disk. So this is exactly the method I was needing. This is another way of creating path, let us comment this code, which is the following Paths. get. And instead of providing a string of character, we can provide a URI. URI. create, and this time I have to provide the scheme of the URI corresponding to the file system I want to create this directory on. And since I want to create this directory, I'm going to copy/paste this. I need to add the following path as a string of character to this scheme. And now the method used to create the path is not the same, it is the method that takes the URI. And you see that this method is going to extract this scheme from this URI. If the scheme is null, it will not create anything but throw an IllegalArgumentException. If the scheme is file, it will be found to the disk file system, which is the default file system. And if it is another scheme, it will try to find the right provider to create the path on. So as long as you are creating stuff on the disk, you can rely on this method that take a string as a parameter to create your path. But the safest way to do that is to explicitly provide a URI. In this way, you will be sure to create a right path bound to the file system you want to operate on. Let us first direct this directory, then run this code again. And indeed, this directory has been created again. So this is something really to keep in mind. A path is always bound to a file system. The right way to create a path is to provide a URI, this is the safest way. By the way, you can also create path from fileSystem. getPath, and this time just providing the path part of this path. Since this path has been created from the fileSystem, it is also safe, it will be created with this scheme directly.
Live Coding: Getting Root Directories and File Stores
Let us now comment out this code. We do not need it anymore, and carry on. From the fileSystem we have, we can get the root rootDirectories. Let us put those rootDirectories in a variable. You can see that the rootDirectories are not returned in a list, but merely in an Iterable. This is for performance reasons, an Iterable is a lazy structure. It does not hold its data. So if I have many, many root directories on my file system, I will not have to store them in a huge list that will probably consume resources. We can check the elements of this Iterable. Let us just call System. out::println, run the result. On my machine, I have five root directories, C, D, and E, which are hard drive; F, which is a USB stick; and J, which is my Blu-ray driver. Now those are just logical root directories. It does not give me an information on the fact that I have, for instance, the disk inserted in my drive. We can reach this information using NIO2. Instead of reading the root directory, we can also get the fileStores. Once again, this is also an Iterable for the same reasons as the root directories. Let us print out some information on those fileStores. I'm going to use the same pattern, fileStores. forEach. Take a fileStore and just System. out. println. I'm going to print the type of each file store. fileStore. type, just like that. Let us see what it gives. This time it gives me the type of each file store, telling me NTFS for my hard drives, and FAT32 for the USB stick I have plugged in this machine. But now what I can see is that I have five root directories, but only four types here. Let us print out some more information. fileStore. name, along with the type. Run this code again, and now what I have is the name of each driver. My D and E drives are called Data 1 and Data 2. My USB stick is called a USB-JOSE. My system drive C has no name. And I have indeed the NTFS and FAT32 types of each drive. The J drive is not mentioned in this list. Why? Because I do not have any disk inserted in this drive. Meaning that it will not appear as a mounted file store. So you see that the RootDirectories information I have is not exactly the same as the FileStore information I have. FileStore is really about the mounted stores in my machine. RootDirectories are just declared elements that appear whether there is something to read through it or not.
Live Coding: Creating a ZIP Archive File Using the FileSystem API
Let us now examine the way the Java file system or the ZIP file system, which is almost the same, are working with this API. For that, we are going to create a ZIP file using this NIO2 file system API instead of using the ZIP output stream. If you have followed my other course, Java Fundamental Input/Output, you probably saw how we could create ZIP files using the basic Java I/O API. So let us first create a URI. This will be a URI to my ZIP file. And I'm going to create it in the exact same way as usual, but providing a special URI with a special scheme to a ZIP file. This special scheme is jar:file://. And then I have to give the path to my ZIP file on the file system. Let us call it E:tmp/archive. zip. This special scheme is recognized by the JDK as a JAR file system, and it will be used as such. So I could create a ZIP file system directly, this is my ZIP file system from the fileSystems factory class, newFileSystem URI with my ZIP file. But in fact, I need to pass some further options here to this factory class. Those further options are put in map, map of String, String; let us call it options, and let us create it as a regular HashMap. This HashMap will contain, in fact, two keys. So let us put one of them, create. That is a bullion value that can take the true or false value. This create option just means that this file is going to be created by the system. I have another key available, encoding, that can take the name of the standard (mumbles) used to create this ZIP file. This is used in case you want to put non-standard (mumbles) set text file inside this archive file. Now this does not compile because we need to handle the IOException. And this is zipFS will have to be closed at the end of our operation. So this is a good-use case for our old friend the try with resources pattern, catch IOException, and let us print the StackTrace. One again, this is a code you should not be using in production. You need to do something smarter if you are writing a real application, like logging this exception or whatever. And in fact, this simple code is enough to create this ZIP file. Let us check that our E:tmp directory is indeed empty, and let us run this code. Our ZIP file has been properly created as an archive.
Live Coding: Copying Existing Files in a ZIP Archive File
So what can we do now? Since this is a file system, I can use all the methods available on this file system, including, for instance, the copy method. Here I have an aesop. txt text file, which is about 200 kilobytes long. Let us create a path on this file, Paths. get, and I'm going to use the string pattern since this is a file on a local disk: files/aesop. txt. Let us create another path called target within this zipFS. Now I have two patterns to do that. Either I call zipFS. getPath and pass a string as a parameter, which is a local path within that file system: aesop-compressed. txt. And then what I need to do is Files. copy from the source to the target to copy this file from here to the inside of my archive. Let us run this code. And check the content of the directory. This information is in French, but we can see that the size of this archive is not zero, so it should have some content. Let us open it. And indeed, inside, we have our compressed file with its original size, roughly 200 kilobytes, and the packed size, the compressed, which is about 75 kilobytes. This way of creating content in an archive using file system is much more simple than creating an archive ZIP file by hand using the Java I/O API. Suppose now that I want to put this file in a directory within this ZIP file. In fact, it's just a matter of creating a directory and then copying this file within that directory in the archive. And since this archive is a file system, I can use the file system API to do that. How can I create a directory from within a file system? For that, I need to use a fileSystemProvider and the method createDirectory on it. So what is this file system provider? This file system provider can be obtained directly from this file system. So we can replace this code with zipFS. provider to create this directory. And this directory has to be a path within this file system. So I can use this pattern again, this is a directory, zipFS. getPath, for instance, files, and then copy this aesop-compressed within that directory. Let us run this code. Check the content of this archive file. Indeed, I have a files directory within that archive with the aesop-compressed. txt file in that directory.
Live Coding: Creating Content in a ZIP Archive File
Now suppose that what we need is to create manually an entry in this archive file and write elements directly to it. Suppose that we have a list of integers in our application, and we want to add them directly to a file within that archive. We are going to create another directory. Let us call it binDir, and bin. Then a file binFile. Let us call it ints. bin. Let us create this directory using this pattern, create binDir, and then from the same zipFS. provider, we have a set of methods to create output stream and input stream to read and write directly within the archive file system using Java I/O. But also newFileChannel, newByteChannel, and newAsynchronousFileChannel to do so using Java NIO2. Here we are going to use newOutputStream to our binFile, providing the classical standard option to create a new file, and WRITE to be able to write to it. This OutputStream is a regular OutputStream that we can further decorate using a DataOutputStream, new DataOutputStream and pass os as a parameter. This is a plain old DataOutputStream from Java I/O. So we can just write integers using it. Let us write three, for instance, then 20 and 30. Do not forget to close this DataOutputStream once you're done using it. We can now run this code and check the result. Within our archive file, we still have the files directory with the compressed text file as we had, and the new bin directory with the ints. bin file containing 12 bytes. Since we have three integers and four bytes per integer, this is indeed its expected size. So those are the two patterns built on NIO2. First, to copy existing files from within an archive file, and second, to directly create content in a file from within this file system. And you can see that acting on an archive file system is no different than acting on any file system like the disk file system.
Module Wrap-up
And with this, we have reached the end of this module. So let us quickly wrap it up. What did you learn in this module? Well you saw everything you need to know about the NIO2, the Java NIO2 File System API. You saw how to handle file systems with native elements accessible from the JDK, with a little warning still. The code you saw is supposed to run on the Windows file system. Of course, it differs a little on the Linux file system or on any other file system. You saw how to manipulate files with the file system API using NIO2. And you saw how to very easily manipulate ZIP archive files in a much easier way than with the plain Java I/O API. And with this, I would like to thank you for watching this module. And I hope to see you again in the next module of this Java NIO, NIO2 course about visiting directory trees using the NIO2 APIs.
Visiting Directory Trees
Agenda: Visiting Directory Trees with NIO2
Hello and welcome back to this Java NIO and NIO. 2 course. My name is Jose. In this module, you will learn how to visit directory trees using Java NIO. 2. Let us quickly see the agenda of this module. The goal of this module is to present the API from NIO2 to explore the content of directories. And there are three of them. The question remains the same. Why do we need a new API for that since we can already do that with plain Java IO coder. The answer is simple. Increased efficiency. NIO2 gives better performances than plain Java IO coder. You will see the path matchers that are used to filter paths from a directory tree and how to filter the content of a tree with regular expressions and control over the depth of exploration. And at least you will see how to visit a directory tree using the visit pattern introduced in NIO2.
Writing a Directory Filtering Pattern Using Regular Expression
Let us first begin with directory streams and matchers. A directory stream is a way of analyzing the content of a directory. There is the stream keyword in it, but it has nothing to do with the input or output streams of Java IO. And it has nothing to do neither with the Java 8 stream API. A directory stream is a way of analyzing the content of a directory. It doesn't explore the subdirectories but those directories are still part of the analysis. It can be used to get all the content of a directory. And it can also filter its content by providing a longer expression that is a filter. Let us see some code and let us see the content of this directory named files. So first we create a path to this directory. Then we pass this path as a parameter to the newDirectoryStream factory method from the Files object. We also give a filter as a lambda expression at the second argument to this method. And it returns an object called directory stream. Now this directory stream is not a stream from Java 8. It is a different object that has nothing to do. By the way, we can also write it as a method reference like that leading to a very clean pattern. We can also pass directly a regular expression to match the file or the directory names. Here's star dot Java. And we can also pass a PathMatcher as a parameter, which is a special object of this API, written with a special syntax. This is useful if we need complex file name checking, since we are using regular expression. This PathMatcher object has a matcher's method, so we can pass this another expression written here as a method reference. Let us take a look at this path matcher object. It allows for two kinds of regular expression, depending on the scheme we gave. The first scheme is regex colon and then it is regular expression as specified in the pattern class for the jdk. Or we can use the glob colon scheme which is in fact a simplified version of the regex. This simplified version of the regular expression is specified in the FileSystem. getPathMatcher method. This syntax allows for future extension. The jdk could decide to implement more regular expression system by providing new scheme. For the moment, we only have those two.
Using a Directory Stream to Analyze the Content of a Directory
What can we do with this directory stream? Well first of all, this directory stream is an interface that extends iterable. So it can be used in a foreach syntax just like that for path:directoryStream and then add some business code, some application code, to process the different paths. Why is it an iterable and not a list? Well it is just for performance reason. It will be more efficient to store the result in an iterable that precisely does not store all the results, but that is lazy structure than creating a list that could be huge if there are many, many entries in this directory. The iterable interface also has a forEach method that we can use, for instance, to print all the entries of the directory and if we need to create a regular stream, we can use also the spliterator method from this iterable interface and pass it to the StreamSupport that stream factory method providing force as a second parameter. Since this is not a parallel stream, here we collect this stream in a list to create a list of all the path entries from within this directory.
Exporing the Content of a Directory in a Depth-first Approach
The second pattern we have consist in walking directory trees meaning that we are going to explore the content of a given directory and also the content of its subdirectories. Let us see that. Walking directory tree, consist in exploring all the files on the subdirectories and their content. It can be done in two ways. The first way is called a depth-first approach. And the second way a breadth-first approach. Both approaches have their pros and cons. The one that is used in the JDK is the depth first approach. Let us see the difference between both. Suppose we have a root directory. It has some content. First subdirectory with two files in it. Then a second subdirectory. Then a third one with another subdirectory with two files in it, that has a second subdirectory with two files in it, and then another file in this subdirectory and then two files in the root directory. The depth first approach, will first explore the first subdirectory and its content. Then the second subdirectory and its content. Then the third subdirectory and the first subdirectory of that subdirectory, thus exploring the two files eight and nine, and it will continue with the content of the directory six, with this subdirectory and its content, we'll end up with the content of the sixth subdirectory and the content of the root directory. This is the older of the depth first approach and this is the approach used in the JDK. We can take a quick look at the breadth-first approach. The breadth-first approach does the opposite, it first takes the content of the root directory. Then the content of the first subdirectory of the second one, of the third one and this time all the content. Then takes the first subdirectory, here nine. Explore the content and the second subdirectory and its content.
Using the Files.walk Pattern to Explore Directory Trees
So the Files. walk factory method from the Files factory class walks a directory tree in a depth-first approach. The parameters that we need to provide are the following. First of course the starting point as a path, it should be a directory. Then optionally the maximum depth to be explored, to limit the exploration and then an option to decide to follow the symbolic links or not, and this parameter is also optional. By default working directory tree does not follow the symbolic links. So the basic pattern just take the starting path. Suppose we are exploring the sources directory, where we store all our Java sources. The work method, returns a stream of path. And this time this is a regular Java 8 stream. It can take a maximum depth and can optionally take this option follow links, to tell the system to follow the symbolic links, while exploring the directory tree. Now as you may know, if you do that, you might find cycles on a FileSytem. If it is the case, if there is a cycle, the API will detect it and will throw an exception.
Looking for Content in a Directory Tree Using Files.find
The Files. find method is an alternative method to this one and it works the same except that it takes a BiPredicate as a parameter that will be used to filter out the elements while working, and it still returns a stream of the matching path, regular Java 8 stream. Here is the full pattern. We have created it using a path matcher. The find method takes the starting path. This BiPredicate from java. util and whether or not to follow the symbolic links here since we have not provided anything, the default behavior is do not follow those links. The nice thing is that this BiPredicate also takes the attributes of that path. It is possible to filter the path using for instance date attributes or security attributes.
Understanding Weak Consitency When Exporing Directories
Just a word of warning though. Those streams are lazily built while exploring the directory tree, while working through the directory trees. It means that their weakly consistent with the file system. There is no log put on the file system while exploring the directory tree. It will be really a very bad idea to do that, to log the file system. So it mean that the file system might change during the process. What can happen for instance is that the directory that we're currently exploring is deleted. In that case, we will have an exception. This kind of thing may happen in this process.
Introducing the Walking Tree Pattern and File Visitors
As a third and last pattern provided by this API, it's called visiting directory trees. Now you might think that working and visiting is kind of the same, but in fact it's not. Visiting is different from working. Visiting a directory in fact offers more control over the process. The first thing is that it can interrupt this process. And it can be very interesting. Suppose we're looking for file, a particular file in a directory tree, once we have found it, we do not need to explore the directory tree any further so we need this interruption. And it can skip elements based on filtering, whether they are directories hidden elements etcetera. This is not allowed when walking a directory and it is supported when visiting a directory. The method used is called walkFileTree and it's a factory method from the files class. It takes three parameters. The first one is the file visitor that we're going to see in a minute. The second one is whether or not to follow the links. The same option as in the previous pattern. Add maximum depth of exploration. Again the same option as in the previous pattern. What is this file visitor object? A file visitor is an object used during the traversal of a tree. It can act when a directory is met before and after this directory is visited. It can also act on every file telling what to do with every file, and it also handle exceptions, in case something wrong is happening. From a technical point of view, this file visitor is in fact an interface with four methods. There is also an adapter class. That provides default implementation for those four methods.
Setting up a FileVisitor to Walk a Directory Tree
Let us see that in the code. Suppose we have a starting directory, which is the same as the previous example with a bunch of java source files in it. We create this file visitor object. We will see that in a minute and we just called this walkFileTree method from the Files class, by providing the starting point as the directory, and the file visitor. Let us take a look at this interface. It has four simple methods. Two for the handling of the directories. Previsit and post visit. And you can see that the previsit method takes the path of the directory, but also its attributes as a parameter and the post-visit directory takes an exception as a parameter in case something went wrong while visiting this directory. And there is another two methods for handling files. VisitFiles with the name of the file and its attributes and visitFileFailed that takes an exception in case something went wrong while visiting this file. Now this file visitor is an interface with a parameter T. This parameter T will take the value path in the following. Note, all these four method return the same object, FileVisitResult. What is this object? In fact it is an enumeration with four values. First value is CONTINUE. If a method returns CONTINUE, then the visit of the directory tree will continue. And if it returns TERMINATE, it will end up the process at once. So if we need to write some code to find a special file, once we have found this file, we can return terminate and it will end up the process. Now we also have two other enumerator values. The first one is the SKIP_SUBTREE. It is relevant when returned by the previsit directory method. In that case, this directory will not be explored. And SKIP_SIBLINGs relevance inside a directory meaning that the rest of the directory should not be visited.
Using the Walking Directory Tree Pattern to Write a File Finder
Let us write a simple example of a FileFinder that will look for a given file in a directory tree. We suppose that in this class, we know the name of the file we're looking for, the implementations of the first directory methods are very straightforward. We need to explore all the directory tree so the preVisitDirectory will just return CONTINUE. And the postVisitDirectory will also return CONTINUE. Now if the visitFileFailed method is called, just mean that this file could not be opened for some reason. There is not much we can do. So we just continue visiting the directory tree. The interesting method is of course the visitFile method. Here we just compare the name of the visited file to the name of the file we're looking for. If this path matches, then we save it somewhere and we return terminate thus terminating the exploration of the directory tree. If it's not the case, well we continue to the next file. So we can see that implementing this file visitor, is not very complex. And all the technical details of the directory tree exploration are handled for us by the API.
Live Coding: Counting Filtered Files from a Directory
Let us see all this in action in our early tool life cutting session. What are we going to see here? First, we're going to play with this directory walking API and see how it works. And we're going to set up a system to visit a directory tree and extract information from it. Let us play with those visiting and exploring directories API we have in Java NIO2. For that and to create meaningful examples, we need directories with lot of content, lot of subdirectories and lot of files in those subdirectories. It turns out that I have one locally on this disk which is the directory in which I store all my pictures, my photos, so let us create a path to this directory using this URI, it is in fact in E/Photos. First thing we can check is does this path really exists. We have a Files. exists factory method for that. Let us pass this path as a parameter, get the boolean and print out the results, exists equal plus exists. I think that's I have one extra colon here. Let us run this code. And indeed this directory exists. Let us count the content of this directory. How can I do that? Well I can explore it using this Files. find method. It takes the path as a starting point. MaxDepth of exploration, since I want to explore everything, I am going to pass Integer. MAX_VALUE to it. Then it takes a matcher and some options. I am not going to give any options. The matcher is just another expression that takes the given path of the file or directory that is looked at at that time and that returns a boolean. It is some kind of BiPredicate. This method returns a stream, regular stream from Java 8. So let us write it like that. And from that stream, I can just for instance count the total number of elements. Here I have not filtered anything. Everything has been taken into account. So let us run this code. I have roughly 30, 000 elements. That includes subdirectories, files, etcetera. Suppose I want to count the number of files, ending with the JPEG extension. This is a photo directory. So I expect to have some. I cannot reuse this stream, because remember a stream is an object that can be used only once. So let us copy paste those two lines, call it find2. And now this p here is the path, so I can just convert it to a string and test it if it ends with. jpeg suffix. This is a very basic way of checking if this is a jpeg image but at least it should work. Let us run this code. I have a little more than 6000 and 600 jpeg images in it. The second argument I have is a basic file attribute argument. Let us play with this object. This object allows us to reach for instance the creation date of each file in my directory. See I have this BasicFileAttributes object. Let us call it attributes for instance. Let us put it as a new object. On this object, I have a CreationTime, which is a file time object. CreationTime, okay and from this creation time, I can get toMillis method. The toMilli method is the number of millisecond since the APOC, so using this information I can compare the creationTime to a given data in the past and see the images that are older or more recent than this one. Now how can we compute the number of milliseconds from the APOC using the JDK API. Well we have several ways. The way that will work in all the Java version is the following. We create a GregorianCalender with this getInstance factory method. Set it to the date we need. For instance, 2017 January the 1st at midnight and we can convert this calender to the right number of milliseconds using the getTimeInMillis method. Let us get rid of this code. So now we can just copy past this code. Let us call it find3 and compare this attribute, creationTimetoMillis. And suppose that we wanted to be lesser than the date we just entered. In that way, I will have the number of files older than the first of January 2017. So if we run this code, we see that among the 29, 755 files, we only have this amount that is older than 2017.
Live Coding: Counting Directories Using a FileVisitor on a Tree
Now suppose that we want to do something more sophisticated on this directory, we would like in one pass count the number of empty directories there are in it, plus count the number of files types and the number of files for each type. And we want to do that in only one path of this directory. This is a good job for the file visitor pattern. So in a first step, we are going to create a custom file visitor, implementation of the file visitor interface and then run this visitor over our directory tree. Let us create a static class called CustomFileVisitor. It is an implementation of the file visitor interface with the parameter path. Let us ask our IDE to create the implementation for us. We have the four methods that we saw in the slides. preVisitdirectory, visitFile, visitFileFailed and postVisitDirectory, that I'm going to copy paste just here. We do not have anything to do when we are exiting a directory, so let us just return fileVisitResult. CONTINUE for this method. And if we are visiting a file and this visit is a failure, there is nothing much we can do. Let us just return CONTINUE also. Now before we are entering a directory, what we can do is check if this directory is empty or not. To do that, we can use the new directory stream factory method from the Files class. Passing the directory as a parameter. The object returned is a directory Stream. Now remember this object has nothing to do with the Stream for the Stream API of Java 8. It is in fact an extension of iterable. Now what we want to check is just if this directory is empty or not. That is if this iterable has objects for us or not. Remember that this iterable is a lazy object. It is an object that does not hold the content of the directory, which would be very better to try to do the directory, was very big with many files on many subdirectories in it. In fact all we need to do is to check if this directory stream is empty or not. From this directory stream, we can create a regular stream by calling the spliterator method on this iterable. And passing it as an argument to the StreamSupport. stream factory method. The second argument is a boolean. True if we want a parallel stream, which is of course not the case here. Now this is a regular stream. Stream of path. We do not have any isEmpty method on it. But what we can do is use a trick called the FindFirst method, that will return the first element of that stream if it exists and rub it in an optional. And on this optional object, I have an isPresent method. So in fact, this is a boolean directory is not empty. And this boolean is evaluated in a lazy way. We just check if there is one element in that stream and we do not explore that stream any further. So if this directory stream in fact points to a very large set of elements, this set of element will not be explored. It will not be evaluated at all. So if this dirIsNotEmpty boolean is true then we need to explore this directory. So we return FileVisitResult CONTINUE and if it's not the case, we do not want to visit this directory, so FileVisitResult. SKIP_SUBTREE. And we also need to increment a counter since we need to count the number of empty directories. Let us create this counter emptyDirs ++ private long emptyDirs equals zero, which is object is created. So this is how we can count the number of empty directories, using this preVisitDirectory method.
Live Coding: Couting File Types Using a FileVisitor on a Tree
Let us implement the second thing we want to do. Count the number of files per type. So we are going to add code in a fileVisitResult method here. Of course in all the cases we want to continue. So we're going to return FileVisitResult. CONTINUE. Here we know that the path we're looking at is a file and we are going to use the factory method from the Files factory class called ProbeCOntent that takes the path as a parameter. The ProbeContent method returns a string that we are going to call FileType. In fact it does its best to determine the type of the file we're dealing with. We are going to see what it returns for this file type, when we will be running this code. Now we will be needing HashMap for that. The keys will be the FileTypes, and the values the number of files of this file type. So let us create this field, private Map of string and long fileTypes equals new HashMap. And let us fit it fileTypes. What I want to do is the following. If this file type is not present in a map, I want to put it in a map, associated to the value one, since I have seen one file so far with this file type. And if this file type is already in the map, I want to increment that counter. Now it turns out that the value of this map is long, long is an immutable object in Java. So I cannot really increment that long. What I need to do exactly is to replace that long with another long of the same value plus one. This is exactly the job for the merge method of the map interface. This merge method is a method from Java 8. The key is the file type. The initial value is one and if this key value pair is already in the map, I want to sum up that values, the second value being always one since it is in fact this value. This is what I need to do, in this visitFile method. Now in this class, I have two properties. EmptyDirs and FileTypes, which are going to be changed by the file visiting process but what I need to do at the end of this process is get those values. So I'm going to create together on empty dir and file types here, and now can add some code in our main method. Let us create an instance of this file visitor class. Let us create a path pointing to the right directory. URI. create and parse file colon two slashes for the scheme of this URI and then E:Photos for the name of the directory. And from that, we need to walk this file tree from this path using the file visitor we have just created. Now we can print out the result number of empty dirs. Equals filevistor. getEmptyDirs. And print out the file types. This is a map. So I can print out the content of the map in that way, getFileTypes and for each key value pair, print out the result in that way key plus let us draw this little arrow in skey art and value. Let us run this code. So we have more than 2000 empty directories in this photos directories, and the file types are the following. And you see that this probe file type method in fact tries to determine the media type in the web sense of each file. So this is the pattern we can use to visit a directory trees and to act in the directories and on the files.
Module Wrap-up
And with this, we have reached the end of this module, so let us wrap it up. What did you learn in this module? Well you learnt the three ways of exploring directory trees, provided by the Java NIO2 API. You saw that these explorations are conducted in a lazy way, and return lazy elements, which allows for the exploration of very large directory trees in still very efficient way. Without blocking the whole file system of course which would be very bad, but with the drawback. The exploration gives only weak consistency and can crash in certain situations. And we saw examples of that. You saw different patterns for different needs. First the exploration of a single directory. Second, the walking inside a full directory tree and third the visiting of a full directory tree, which gives you more control over the process. Thank you for watching this module. I hope you found it interesting. Let us meet again for the next module. The last one of this Java NIO and NIO2 course about directory events.
Listening to Directory Events
Agenda: Listening to File Creation Deletion and Modification
Hello, my name is Jose. I am very happy to welcome you back for the last module of the Java NIO. 2 course about listening to directory events. What are you going to learn in this module? Well, this module is about observing what is happening in a directory. So the question is: what can happen in a directory? Well, we can have file creation or directory creation or deletion or modification. And this is what this module is about, using the Java NIO. 2 API to be able to catch those events and to react on them in a way that is reliable, portable, and performant since the API you are going to see is bound to the native APIs of the file system.
Understanding Legacy Solutions to Catch File Creations
Before we examine the solution provided by the NIO. 2 API, let us take a look at the problem and let us understand it. What we want is the following: we want to set up a system to observe the creations of files, for instance, in a given directory. We know how to read the entries from a directory. We have, in fact, several solutions for that from Java IO and NIO. 2. So we could set up a special task. That could be activated on a timer that would analyze the content of the directory on this timer. So this is, in fact, sampling, and give us the information we need. What do we need to set up such a system? Well, first we need a thread. We need to keep track of the entries, that is, to store the state of the directory each time we watch it; and we need to compare the content of that directory in the first step, in the second step, in the third step, et cetera, and generate events when we see that the file has been created or deleted. What if the rate of creations and deletions is greater than our sampling? For instance, suppose that a given file, file. log, is created then deleted between two samples or created then deleted then created again between two samples. Well, in fact, since we are just sampling the state of the directory, we will see that this file. log was not there at the beginning of the process and is not there at the end of it. So we will be missing the creation and deletion; and if it's created again, we will just generate one creation event. So this system does not work very well since we need to tune the sampling of the directory finely to have all the events; but even if we do that, we'll probably miss some events. There's nothing we can do about it. So, in fact, it is not that great. It does not work very well. This solution is, first, costly because we need to store the state of the directory. Second, it cannot guarantee that we will not be missing events. Unfortunately, until Java 7, there is no proper way to solve this problem. This is the best way to do, which is not that great. Java 7 introduces a new pattern called the WatchService pattern. How does it work? It still uses a special thread. I think that we cannot avoid this one; but no scheduler, no events are missed, almost. We will see that, in fact, some events may be missed in some cases; and it is plugged directly on the signals emitted by the native file system. Remember, NIO. 2 is about interfacing with the native file system; so this system is just built on that.
Setting up a Watch Service Pattern to Listen to Events
So let us now introduce this WatchService pattern. How does it work? Setting up a watch service is a four steps process. First, we need to create this watch service, which is, in fact, an object. Second, we register this watch service to a directory; and we have a special method on the path interface for that. Third, we need to get the returned key. This registration would create a key, and we need that key. And, fourth, we need to poll the events and analyze them. As usual, we need a starting directory. Suppose we want to track file creation and deletion in our logs directory, and we need the file system of this directory that we can safely get with a method directly on the path interface. The watch service object is created from the file system object directly. There is a method for that, newWatchService. Then we need to register the watch service we just created to the correct path. We provide the events we want to listen to. There are three standard event: create, delete, and modify. So here on this example, we are listening to all the possible events in this directory. We get an instance of watch key as a return object of this register method, and this watch key is the object we need to get the events from the watch service. There is one key per directory, of course. We can listen to events on as many directories as we need.
Understanding the Watch Key Object
How does this WatchKey object works? Well, first, it has a valid Boolean property, which is true as long as the directory this key object is bound to is accessible. So if this directory is deleted or no longer accessible for whatever reason, this valid Boolean property will be set to false. There are three methods available to poll events. The first one is a take method; it is a blocking call. As long as there are no events available, this take method will not return. There is a poll event, which is non-blocking. It will return immediately; and if there are no event available, each will return null. And a poll that takes a time out as a parameter that also returns null if there is no events available after the given time out. So usually we create a loop while the key is valid; and in this example, call the take method on the watch service to get the key. Now, if there is only one directory watched for events, the key returned will always be the same; but a single watch service can be used for more than one directories. In fact, it can be used for as many directories as we want. So if we want to know which directory has generated what event, we need to get the key written by this take method. Once we have the key, we can get the events generated for that key. We get them in a list of watch events. Then we can add our application code, our business code that will analyze the event and act accordingly; and at the end of the day, we need to call the reset method on that key to tell the system that we are done consuming the events attached to this key. If we do not call this reset method, then no more events will be added to the key; and the take method will never return that key again.
Processing Available Events from the WatchKey
What happen if too many events are generated, if too many file creation and deletion are put in our directory? This could happen on very busy systems. In that case, a special event that we cannot listen to, called OVERFLOW, will be added to the queue. If we come across such an event, it means that the queues in the watch service have overflowed and that some of the events may have been missed. So the pattern to analyze the events will look like the following. We have here for each loop. We first get the kind of the event by calling the kind method, and this kind is create, modify, or delete. Or it can also be overflow in case of an overflow. If there is an overflow, well, there's nothing much we can do apart from logging something and continuing with the rest of the events. And if it's not, then we can process the event normally depending on its kind: create, modify, or delete. This is where our application, our business code, will be put.
Live Coding: Setting up a Watch Service
This module on the watch service is very technical, so I think it will be a good thing to see it in action in this live coding session. There's only one topic in it: set up a watch service and see how it can work on the Windows file system. Let us see how we can set up a system to listen to event to a given directory. Here, we have a dir path pointing to this directory events in which we are going to create and delete content. And the file system of that path that we can get in a safe way by calling the get file system on this path directly. Now, on this file system object, we can create a new watch service object. Let us put it in a variable, WatchService, and we can pass this WatchService to the register watch service method of the path object, adding some events we want to track. Those are StandardWatchEventKinds. create. We are going to copy paste this, modify, and delete. This returns a WatchKey object, let us call it key, that can be used to get the events generated by this watch service. We know that this key has a property valid, which is true as long as this directory exists and false if this directory is deleted or at least becomes inaccessible. So let us wrap the rest of the code in a while key is valid; and if this key becomes invalid, let us print out the message: Key is invalid. Now, what we need to do is begin to poll the events generate by this watch service. For this we have several solution. The one we are going to use is to call the take method, that is a blocking method. As soon as there are events available, this method will return and we will be able to execute some code. Note that this take method throws the interrupted exception, which is an exception from the Java thread API. We are just throwing this exception at the main method level because we are not interested in it; but this exception come from the fact that this work service is running code in a another thread. And if this other thread is interrupted for some reason, this interrupted exception is re-thrown by this take method. Now this take method returns a key object, let us call it take, which is, in our example, the same as the key objects we have here. Why? Because we have only one registration on one path for this watch service, but suppose that this watch service is used to watch more than one directory, as many directory as we need. We would have one key per directory. So we need to take that key at this level. Here we would not need to do it since we have only one directory in our example. From this key, we can get the events generated by this service in a list of watch events object. Let us look over this list. This is the type of each event. Now, as we saw in the slide, if the kind of event is a StandardWatchEventKinds. OVERFLOW, it means that our system is not fast enough to cope with the number of events generated in that directory. There is nothing much we can do apart from continuing and logging some kind of message. If it's not the case, if we have a create event, let us get the path of the element that generated this event, event. context. This context method returns a plain Java object, so we need to cast it to a path. And from this path object, we can just print the following message: file creation path and, for instance, probe the type of the path created. We can do the same for the modification, entry. modify, file modified, and the same for the delete event, file deleted. This is the basic pattern we can write to check for events in this file directory. Now remember that once we have processed those events, we need to reset the key because if we don't, we won't be able to get any more events from that watch service for the associated directory.
Live Coding: Testing the Watch Service
So let us now create this directory and create some content in it. First, create the events directory. Let us visit it, run this code, cd to the events directory. And the, for instance, touch file1. txt. We can see that the event has been properly polled by our system. Let us create a JPG image. Once again, it has been caught; and, for instance, create a sub-directory. This sub-directory has been created with no meme type. Here we have (mumbles). Let us CD to this directory and create another text file. Creating a file within a directory is indeed a modification of the directory as a file, which is normal. Let us create another sub-directory from within that directory. To generate the modification event, we need to change directory to that sub-directory. But if we create another file in that sub-directory, it is not seen as a modification of the first directory we have created. You see that those events are only bound to the events directory we have created and not to the sub-directories created within it. If we want to listen to all the events of a directory structure, we need to register the sub-directories when they are created. So let us cd again in the events directory. Touch, for instance, file1. txt once again. This is seen as a modification of this file, which is exactly the case. Let us delete this file1. txt file. This is seen as a deletion of this file and generate the events accordingly, which is the expected behavior. Now if we delete the events directory, we will have a deletion of the dir within that directory, of the JPG image within that directory, and then the message key is invalid since the directory this key was bound to has been deleted. So this is how we can deal with file events using the Java NIO. 2 API. Remember, this API is directly bound to the file system, thus, extremely efficient and much more efficient than setting up a task in a timer service, for instance.
Module and Course Wrap-up
And that's it for the last module of this course. That was all about watching directories for entries events, files, and sub-directories, creations, deletions, and modifications. Java 7 brings the right API for that, plugged on the native file system. It is very efficient and should not miss any event as long as your system is not too heavily loaded. And this is also the last module of this course, so let us wrap the whole course. What you learned is basically divided in two parts. The first part is about non-blocking IO, the Java NIO API. You learned how to write and read to and from disks and networks using buffers and channels. This API also gives access to off heap memory for very specialized applications and asynchronous operations using the select object, which you saw in details. Then you learned about the Java NIO. 2 API from Java 7, access to native file system in a portable way, patterns to visit very large directory trees efficiently, and a very efficient API to listen to directory events, plugged on the native file system. And that's it for this course. Thank you for watching it. You can follow me on Twitter for technical news in the Java space, and also check my GitHub account for open source Java projects I work on. Thank you for watching, and I hope to see you again in another course here on Pluralsight.