shome February 2016

Read columns from a file into variables and use for substitute values in another file

I have following file : input.txt

b73_chr10   w22_chr9
w22_chr7    w22_chr10
w22_chr8    w22_chr8

I have written the following code(given below) to read the first and second column and substitute the values of first column with values in second column in output.conf file .For example, I would like to change the value b73_chr10 with w22_chr9,w22_chr7 with w22_chr10,w22_chr8 with w22_chr8 and keep doing for all the values till the end.

value1=$(echo $line| awk -F\ '{print $1}' input.txt)
value2=$(echo $line| awk -F\ '{print $2}' input.txt)
sed -i '.bak' 's/$value1/$value2/g' output.conf 
cat output.conf

output.conf

    <rules>
    <rule>
    condition =between(b73_chr10,w22_chr1)
    color = ylgn-9-seq-7
    flow=continue
    z=9
    </rule>
    <rule>
    condition =between(w22_chr7,w22_chr2)
    color = blue
    flow=continue
    z=10
    </rule>
    <rule>
    condition =between(w22_chr8,w22_chr3)
    color = vvdblue
    flow=continue
    z=11
    </rule>
    </rules>

I tried the commands(as above),but it is leaving blank file for me.Can anybody guide where I went wrong ?

Answers


ghoti February 2016

I suspect that sed by itself is the wrong tool for this. You can however do what you're asking in bash alone:

#!/usr/bin/env bash

# Declare an associative array (requires bash 4)
declare -A repl=()

# Step through our replacement file, recording it to an array.
while read this that; do
  repl["$this"]="$that"
done < inp1

# Read the input file, replacing things strings noted in the array.
while read line; do
  for string in "${!repl[@]}"; do
    line="${line/$string/${repl[$string]}}"
  done
  echo "$line"
done < circos.conf

This approach of course is oversimplified and therefore shouldn't be used verbatim -- you'll want to make sure you're only editing the lines that you really want to edit (verifying that they match /condition =between/ for example). Note that because this solution uses an associative array (declare -A ...), it depends on bash version 4.

If you were to solve this with awk, the same basic principle would apply:

#!/usr/bin/awk -f

# Collect the tranlations from the first file.
NR==FNR { repl[$1]=$2; next }

# Step through the input file, replacing as required.
{
  for ( string in repl ) {
    sub(string, repl[string])
  }
}

# And print.
1

You'd run this with the first argument being the translation file, and the second being the input file:

$ ./thisscript translations.txt circos.conf


Walter A February 2016

Before you read the better solution(s), a small explanation what you did wrong.
A fixed version of your script would be

while read -r line; do
   value1=$(echo "$line"| awk -F" "  '{print $1}')
   value2=$(echo "$line"| awk -F" "  '{print $2}')
   sed -i "s/$value1/$value2/g" circos.conf 
done < input.txt

What are the changes here?

  • Added while read -r line; do ... done < input.txt
    Your "$line" was never initialised
  • awk with -F" " and not \;
    You have whitespace in between
  • awk without input.txt
    awk should read from the pipe, not from the file
  • sed with double quotes
    The variables must be evaluated.

What's wrong with this solution?
First you must hope that the values from input.txt are sed_friendly (no slashes or other special characters). And when you use this for large files, you will keep on looping. awk can handle the looping, you should avoid nesting awk in a loop.

When the input.txt is limited, you might want something like

sed -i -e 's/b73_chr10/w22_chr9/g' \
       -e 's/w22_chr7/w22_chr10/g' \
       -e 's/w22_chr8/w22_chr8/g' circos.conf

And now the comment of @alvits makes sence. Put all those sed commands in a sed-command file. When you can't change the format of input.txt, you can rewrite it in the script, but using an array as in the solution of @Ghoti is better.

Post Status

Asked in February 2016
Viewed 1,572 times
Voted 13
Answered 2 times

Search




Leave an answer