Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a complex file into a hash

Tags:

ruby

I am running a command line program, called Primer 3. It takes an input file and returns data to standard output. I am trying to write a Ruby script which will accept that input, and put the entries into a hash.

The results returned are below. I would like to split the data on the '=' sign, so that the has would something like this:

{:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc }

I would also like to lower case the keys, ie:

 {:sequence_id => "example", :sequence_template => "GTAGTCAGTAGACNAT..etc", :sequence_target => "37,21" etc }

This is my current script:

#!/usr/bin/ruby
puts 'Primer 3 hash'

primer3 = {}
while line = gets do
  name, height = line.split(/\=/)
  primer3[name] = height.to_i
end

puts primer3

It is returning this:

Primer 3 hash
{"SEQUENCE_ID"=>0, "SEQUENCE_TEMPLATE"=>0, "SEQUENCE_TARGET"=>37, "PRIMER_TASK"=>0,     "PRIMER_PICK_LEFT_PRIMER"=>1, "PRIMER_PICK_INTERNAL_OLIGO"=>1,  "PRIMER_PICK_RIGHT_PRIMER"=>1, "PRIMER_OPT_SIZE"=>18, "PRIMER_MIN_SIZE"=>15, "PRIMER_MAX_SIZE"=>21, "PRIMER_MAX_NS_ACCEPTED"=>1, "PRIMER_PRODUCT_SIZE_RANGE"=>75, "P3_FILE_FLAG"=>1, "SEQUENCE_INTERNAL_EXCLUDED_REGION"=>37, "PRIMER_EXPLAIN_FLAG"=>1, "PRIMER_THERMODYNAMIC_PARAMETERS_PATH"=>0, "PRIMER_LEFT_EXPLAIN"=>0, "PRIMER_RIGHT_EXPLAIN"=>0, "PRIMER_INTERNAL_EXPLAIN"=>0, "PRIMER_PAIR_EXPLAIN"=>0, "PRIMER_LEFT_NUM_RETURNED"=>0, "PRIMER_RIGHT_NUM_RETURNED"=>0, "PRIMER_INTERNAL_NUM_RETURNED"=>0, "PRIMER_PAIR_NUM_RETURNED"=>0, ""=>0}

Data source

SEQUENCE_ID=example
SEQUENCE_TEMPLATE=GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG
SEQUENCE_TARGET=37,21
PRIMER_TASK=pick_detection_primers
PRIMER_PICK_LEFT_PRIMER=1
PRIMER_PICK_INTERNAL_OLIGO=1
PRIMER_PICK_RIGHT_PRIMER=1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=21
PRIMER_MAX_NS_ACCEPTED=1
PRIMER_PRODUCT_SIZE_RANGE=75-100
P3_FILE_FLAG=1
SEQUENCE_INTERNAL_EXCLUDED_REGION=37,21
PRIMER_EXPLAIN_FLAG=1
PRIMER_THERMODYNAMIC_PARAMETERS_PATH=/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/
PRIMER_LEFT_EXPLAIN=considered 65, too many Ns 17, low tm 48, ok 0
PRIMER_RIGHT_EXPLAIN=considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35
PRIMER_INTERNAL_EXPLAIN=considered 0, ok 0
PRIMER_PAIR_EXPLAIN=considered 0, ok 0
PRIMER_LEFT_NUM_RETURNED=0
PRIMER_RIGHT_NUM_RETURNED=0
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=0
=

$ primer3_core < example2 | ruby /Users/sean/Dropbox/bin/rb/read_primer3.rb
like image 550
port5432 Avatar asked Nov 24 '25 01:11

port5432


2 Answers

#!/usr/bin/ruby
puts 'Primer 3 hash'

primer3 = {}
while line = gets do
  key, value = line.split(/=/, 2)
  primer3[key.downcase.to_sym] = value.chomp
end

puts primer3
like image 99
Arie Xiao Avatar answered Nov 26 '25 17:11

Arie Xiao


For fun, here are a couple of purely-functional solutions. Both assume that you've already pulled your data from the file, e.g.

my_data = ARGF.read # read the file passed on the command line

This one feels sort of gross, but it is a (long) one-liner :)

hash = Hash[ my_data.lines.map{ |line|
  line.chomp.split('=',2).map.with_index{ |s,i| i==0 ? s.downcase.to_sym : s }
} ]

This one is two lines, but feels cleaner than using with_index:

keys,values = my_data.lines.map{ |line| line.chomp.split('=',2) }.transpose
hash = Hash[ keys.map(&:downcase).map(&:to_sym).zip(values) ]

Both of these are likely less efficient and certainly more memory-intense than your already-accepted answer; iterating the lines and slowly mutating your hash is the best way to go. These non-mutating variations are just a mental exercise.


Your final answer should use ARGF to allow filenames on the command line or via STDIN. I would write it like so:

#!/usr/bin/ruby

module Primer3
  def self.parse( file )
    {}.tap do |primer3|
      # Process one line at a time, without reading it all into memory first
      file.each_line do |line|  
        key, value = line.chomp.split('=', 2)
        primer3[key.downcase.to_sym] = value
      end
    end
  end
end

Primer3.parse( ARGF ) if __FILE__==$0

This way you can either call the file from the command line, with or without STDIN, or you can require this file and use the module function it defines in other code.

like image 35
Phrogz Avatar answered Nov 26 '25 17:11

Phrogz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!