user3324521 February 2016

python read lines of an entire file, and efficiently storing the ones I want in lists

I have a text file which is repeating the following block structure like this:

EL_TEXT
LAYER 6
DATATYPE 0
XY 2677000: 2316500
2677000: 2340500
2707000: 2340500
2707000: 2316500
2677000: 2316500
ENDEL

...

and this blocks are repeating themselves, with different values, along the text file. They always end in ENDEL
I would like to read and find all the lines that are for example "LAYER 6" (or "LAYER 6 \n") and store these lines into a list. Also including the XY coordinates of these blocks into a list of XY tuples [(2677000, 2316500), ...]
so that the final list is something like
onelayer6polygon = [(2677000, 2316500), (2677000, 2340500), ...]
listofpolygonsL6 = [[(2677000, 2316500), (2677000, 2340500), ...], [...], ...]
How can I do it efficiently in python? i.e. reading only some lines of the file, and only once, advancing the position lseek to skip the coordinates that are not the layer I want.
As far as I understood for line in file: will read all the lines until the EOF, but I only need to read and store (and later process) the ones with the layer number I specify. Doing it with a while loop and handling the index is also not good for detecting the EOF, right?

Answers


poke February 2016

As far as I understood for line in file: will read all the lines until the EOF, but I only need to read and store (and later process) the ones with the layer number I specify.

Yes and no. Looping over the file will read the file in its entirety line-by-line, until you reach the end of the file. But it will read one line after another, without storing the contents anyway. So it’s up to you to skip lines you are not interested in, and store information you want to keep as you encounter them.

Note that you do have to read the file completely in order to iterate over lines. There is no concept of jumping between lines, you can only seek on a byte (or character) level, so in order to know what a line is you need to check all characters looking for line breaks. So unless you know that every block is always X characters in total, you will have to read the blocks you are not interested in too in order to find the start of the next block.

That being said, you usually solve this sort of task with a state machine: You read the file line by line, and as you read a line, you may choose to change the state of your machine to set it into a different “mode”.

In your case, a mode may be “within the LAYER 6 block”, so that’s what you should start with:

inLayer6Block = False
for line in file:
    # strip trailing whitespace, since every line ends with the line break
    line = line.rstrip()

    # if we see the `LAYER 6` line, we start our block
    if line == 'LAYER 6':
        inLayer6Block = True

    # if we see the `ENDEL` line, we are no longer in the block
    elif line == 'ENDEL':
        inLayer6Block = False

So all that’s left now is to add logic to handle the case in between, when inLayer6Block is true. I’ll leave this up to you for now to expand on the code above. In general, you want to have a list that stores the content

Post Status

Asked in February 2016
Viewed 3,390 times
Voted 13
Answered 1 times

Search




Leave an answer