Tobson February 2016

Java 8 Streams: Read file word by word

I use Java 8 streams a lot to process files but so far always line-by-line.

What I want is a function, which gets a BufferedReader br and should read an specific number of words (seperated by "\\s+") and should leave the BufferedReader at the exact position, where the number of words was reached.

Right now I have a version, which reads the file linewise:

    final int[] wordCount = {20};
    br
          .lines()
          .map(l -> l.split("\\s+"))
          .flatMap(Arrays::stream)
          .filter(s -> {
              //Process s
              if(--wordCount[0] == 0) return true;
              return false;
          }).findFirst();

This obviously leaves the Inputstream at the position of the next line of the 20th word.
Is there a way to get a stream which reads less than a line from the inputstream?

EDIT
I am parsing a file where the first word contains the number of following words. I read this word and then accordingly read in the specific number of words. The file contains multiple such sections, where each section is parsed in the described function.

Having read all the helpful comments, it becomes clear to me, that using a Scanner is the right choice for this problem and that Java 9 will have a Scanner class which provides stream features (Scanner.tokens() and Scanner.findAll()).
Using Streams the way I described it will give me no guarantee, that the reader will be at specific position, after the terminal operation of the stream (API docs), therefore making streams the wrong choice for parsing a structure, where you parse only a section and have to keep track of the position.

Answers


Tagir Valeev February 2016

Regarding your original problem: I assume your file looks like this:

5 a section of five words 3 three words
section 2 short section 7 this section contains a lot 
of words

And you want to get the output like this:

[a, section, of, five, words]
[three, words, section]
[short, section]
[this, section, contains, a, lot, of, words]

In general Stream API is badly suitable for such problems. Writing plain old loop looks a better solution here. If you still want to see Stream API based solution, I can suggest using my StreamEx library which contains headTail() method allowing you to easily write custom stream-transformation logic. Here's how your problem could be solved using the headTail:

/* Transform Stream of words like 2, a, b, 3, c, d, e to
   Stream of lists like [a, b], [c, d, e] */
public static StreamEx<List<String>> records(StreamEx<String> input) {
    return input.headTail((count, tail) -> 
        makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
}

private static StreamEx<List<String>> makeRecord(StreamEx<String> input, int count, 
                                                 List<String> buf) {
    return input.headTail((head, tail) -> {
        buf.add(head);
        return buf.size() == count 
                ? records(tail).prepend(buf)
                : makeRecord(tail, count, buf);
    });
}

Usage example:

String s = "5 a section of five words 3 three words\n"
        + "section 2 short section 7 this section contains a lot\n"
        + "of words";
Reader reader = new StringReader(s);
Stream<List<String>> stream = records 

Post Status

Asked in February 2016
Viewed 2,299 times
Voted 9
Answered 1 times

Search




Leave an answer