Files with mixed and invalid Encodings in Ruby
Recently I encountered a file which mostly contained UTF-8 characters. I could read the file and even throw it at Nokogiri and there was no problem. When I wanted to preprocess the content of the file with gsub ruby raised this Exception: invalid byte sequence in UTF-8 (ArgumentError) Now what to do? There aren’t many […]