Stefano R. February 2016

Cache file in memory and read in parallel

I've a program (simple log parser) that's so slow couse in some cases it had to full scan input file. So I think to pre-cache the entire file (~100MB) in and read it with multiple thread.

With actual configuration I use the BufferedReader to do the "main read" and RandomAccessFile to goto onto specific offset and read what I need.

I've tried this way:

..
Reader reader = null;
if (cache) {
    // caching file in memory
    br = new BufferedReader(new FileReader(file));
    buffer = new StringBuilder();
    for (String line = br.readLine(); line != null; line = br.readLine()) {
        buffer.append(line).append(CR);
    }
    br.close();
    reader = new StringReader(buffer.toString());
} else {
    reader = new FileReader(file);
}
br = new BufferedReader(reader);
for (String line = br.readLine(); line != null; line = br.readLine()) {
    offset += line.length() + 1; // Il +1 รจ per il line.separator
    matcher = Constants.PT_BEGIN_COMPOSITION.matcher(line);
    if (matcher.matches()) {
        linecount++;
        record = new Record();
        record.setCompositionCode(matcher.group(1));
        matcher = Constants.PT_PREFIX.matcher(line);
        if (matcher.matches()) {
            record.setBeginComposition(Constants.SDF_DATE.parse(matcher.group(1)));
            record.setProcessId(matcher.group(2));
            if (cache) {
                executor.submit(new PubblicationParser(buffer, offset, record));
            } else {
                executor.submit(new PubblicationParser(file, offset, record));
            }
            records.add(record);
        } else {
            br.close();
            throw new ParseException(line, 0);
        }
    }
}

In the PubblicationParser there is a init() method that choose what custom reader to use. A RandomAccessFileReader:

if (file != null) {
    this.logReader = new RandomAccessFileReader(file, offset);
} else if (sb != nu        

Answers


sfThomas February 2016

You should be making sure that it is indeed the I/O making your application slow, not something else (e.g inefficient logic in your parser). For that, you could use a Java profiler (JProfiler, for example).

If it is indeed I/O, then it might be better to use some ready-made solution to load the file into memory - essentially that's what you are trying to implement yourself.

Have a look at MappedByteBuffer and ByteBuffer.

Post Status

Asked in February 2016
Viewed 3,365 times
Voted 10
Answered 1 times

Search




Leave an answer