Can I set the default string encoding on Ruby 1.9? -


this might sound minor, it's been driving me nuts. since releasing application production last friday on ruby 1.9, i've been having lots of minor exceptions related character encodings. of variation on:

encoding::compatibilityerror: incompatible character encodings: ascii-8bit , utf-8 

we have international user base plenty of names contain umlauts, etc. if fix templates use force_encoding in bunch of places, pops in flash message helper. et cetera.

at moment looks i've nailed down ones knew about, patching activesupport's string concatenation in 1 place , setting # encoding: utf-8 @ top of every 1 of source files. feeling might have remember every file of every ruby project ever on, forever, avoid string assignment problems, not sit in stomach. read -ku switch seems warn it's backwards compatibility , might go away @ time.

so question 1.9-experienced folks: setting #encoding in every 1 of files really necessary? there reasonable way globally? or, better, way set default encoding on non-literal values of strings bypass internal/external defaults?

thanks in advance suggestions.

don't confuse file encoding string encoding

the purpose of #encoding statement @ top of files let ruby know during reading / interpreting code, , editor know how handle non-ascii characters while editing / reading file -- necessary if have @ least 1 non-ascii character in file. e.g. it's necessary in config/locale files.

to define encoding in files @ once, can use magic_encoding gem, can insert uft-8 magic comment ruby files in app.

the error you're getting @ runtime encoding::compatibilityerror error happens when try concatenate 2 strings different encoding during program execution, , encodings incompatible.

this happens when:

  • you using l10n strings (e.g. utf-8), , concatenate them e.g. ascii string (in view)

  • the user types in string in foreign language (e.g. utf-8), , view tries print out in view, along fixed string pre-defined (ascii). force_encoding help there. there's encoding::primary_encoding in rails 1.9 set default encoding new strings. , there config.encoding in rails in config/application.rb file.

  • string come database, , combined other strings in view. (their encodings either way around, , incompatible).

side-note: make sure specify default encoding when create database!

    create database yourproject  default character set utf8; 

if want use emojis in strings:

    create database yourproject default character set utf8mb4 collate utf8mb4_bin; 

and indexes on string columns may contain emoji need 191 characters in length. character set utf8mb4 collate utf8mb4_bin

the reason normal utf8 uses 3 bytes, whereas emoji use 4 bytes storage.

please check yehuda katz article, covers in-depth, , explains well: (there section 'incompatible encodings')

http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/

http://yehudakatz.com/2010/05/17/encodings-unabridged/

and:

http://zargony.com/2009/07/24/ruby-1-9-and-file-encodings

http://graysoftinc.com/character-encodings


Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -