Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gsub raises "invalid byte sequence in UTF-8"

I have the next method call:

Formatting.git_log_to_html(`git log --no-merges master --pretty=full #{interval}`)

The value of interval is something like release-20130325-01..release-20130327-04.

The git_log_to_html ruby method is the next (I am only pasting the line what raises the error):

module Formatting
  def self.git_log_to_html(git_log)
    ...
    git_log.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
    ...
  end
end

This used to work, but actually I checked that gsub is raising an "invalid byte sequence in UTF-8" error.

Could you help to understand why and how can I fix it? :/

Here is the output of git_log:

https://dl.dropbox.com/u/42306424/output.txt

like image 262
Draco Avatar asked Oct 21 '25 15:10

Draco


1 Answers

For some reason, this command:

git log --no-merges master --pretty=full #{interval}

is giving you a result that is not encoded in UTF-8, it may be that your computer is working with a different charset, try the following:

module Formatting
  def self.git_log_to_html(git_log)
    ...
    git_log.force_encoding("utf8").gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
    ...
  end
end

I'm not sure if that will work, but you can try.

If that doesn't work, you can check ruby iconv to detect the charset and encode it on utf-8: http://www.ruby-doc.org/stdlib-2.0/libdoc/iconv/rdoc/


Based on the file you added on the comment, I did:

require 'open-uri'
content = open('https://dl.dropbox.com/u/42306424/output.txt').read
content.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit")

and worked nice without any kind of troubles


btw, you can try:

require 'iconv'

module Formatting
  def self.git_log_to_html(git_log)
    ...
    git_log = Iconv.conv 'UTF-8', 'iso8859-1', git_log
    git_log.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
    ...
  end
end

but you should really detect the charset of the string before attempting a conversion to utf-8.

like image 176
rorra Avatar answered Oct 24 '25 08:10

rorra



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!