scheylord February 2016

Generate Lucene segments_N file

While moving Lucene index files from a server to an other, I forgot to move the segments_N file (because I use the pattern *.*)

Unfortunately I've erased the original folder, and I only have these files in my directory now :

_1rpt.fdt
_1rpt.fdx
_1rpt.fnm
_1rpt.nvd
_1rpt.nvm
_1rpt.si
_1rpt_Lucene50_0.doc
_1rpt_Lucene50_0.dvd
_1rpt_Lucene50_0.dvm
_1rpt_Lucene50_0.pos
_1rpt_Lucene50_0.tim
_1rpt_Lucene50_0.tip
write.lock

I am missing the segments_42u file, and without it I cannot even do an org.apache.lucene.index.CheckIndex :

Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in MMapDirectory@/solr-5.3.1/nodes/node1/core/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx, _1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc, _1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos, _1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip]
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483)
at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)

The index is pretty huge (> 800GB) and it will take weeks to rebuild it.

Is there a way to generate this missing segment info file ?

Thanks a lot for your help.

Answers


scheylord February 2016

As ameertawfik has suggested, I ask the question to Lucene mailing list, and they help me solving this issue.

Here is my solution, in case it could help someone else (add lucene-core-x.x.x.jar to the classpath) :

package org.apache.lucene.index;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.SimpleFSDirectory;

public class GenSegmentInfo {
    public static void main(String[] args) throws IOException {
        Codec codec = Codec.getDefault();
        Path myPath = Paths.get("/tmp/index");
        Directory directory = new SimpleFSDirectory(myPath);

        //launch this the first time with random segmentID value
        //then with java debug, get the right segment ID
        //by putting a breakpoint on CodecUtil#checkIndexHeaderID(...)
        byte[] segmentID = {88, 55, 58, 78, -21, -55, 102, 99, 123, 34, 85, -38, -70, -120, 102, -67};

        SegmentInfo info = codec.segmentInfoFormat().read(directory, "_1rpt",
                segmentID, IOContext.READ);
        info.setCodec(codec);
        SegmentInfos infos = new SegmentInfos();
        SegmentCommitInfo commit = new SegmentCommitInfo(info, 1, -1, -1, -1);
        infos.add(commit);
        infos.commit(directory);
    }
}

Post Status

Asked in February 2016
Viewed 1,493 times
Voted 14
Answered 1 times

Search




Leave an answer