UnicodeEncodeError: ตัวแปลงสัญญาณ 'latin-1' ไม่สามารถเข้ารหัสอักขระได้

Question 1

อะไรอาจทำให้เกิดข้อผิดพลาดนี้เมื่อฉันพยายามแทรกอักขระแปลกปลอมลงในฐานข้อมูล

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

และฉันจะแก้ไขได้อย่างไร?

ขอบคุณ!

Question 2

อักขระ U + 201C เครื่องหมายคำพูดคู่ด้านซ้ายไม่มีอยู่ในการเข้ารหัส Latin-1 (ISO-8859-1)

มันเป็นอยู่ในหน้ารหัส 1252 (ยุโรปตะวันตก) นี่คือการเข้ารหัสเฉพาะของ Windows ที่อิงตาม ISO-8859-1 แต่ทำให้อักขระพิเศษในช่วง 0x80-0x9F โค้ดเพจ 1252 มักสับสนกับ ISO-8859-1 และเป็นพฤติกรรมของเว็บเบราว์เซอร์ที่น่ารำคาญ แต่เป็นมาตรฐานในปัจจุบันซึ่งหากคุณให้บริการเพจของคุณเป็น ISO-8859-1 เบราว์เซอร์จะถือว่าเป็น cp1252 แทน อย่างไรก็ตามพวกเขามีการเข้ารหัสสองแบบที่แตกต่างกัน:

>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'

หากคุณใช้ฐานข้อมูลของคุณเป็นที่เก็บไบต์เท่านั้นคุณสามารถใช้ cp1252 เพื่อเข้ารหัส“และอักขระอื่น ๆ ที่มีอยู่ในหน้าโค้ด Windows Western แต่อักขระ Unicode อื่น ๆ ที่ไม่มีอยู่ใน cp1252 จะทำให้เกิดข้อผิดพลาด

คุณสามารถใช้encode(..., 'ignore')เพื่อระงับข้อผิดพลาดโดยการกำจัดอักขระ แต่จริงๆแล้วในศตวรรษนี้คุณควรใช้ UTF-8 ทั้งในฐานข้อมูลและหน้าเว็บของคุณ การเข้ารหัสนี้อนุญาตให้ใช้อักขระใดก็ได้ นอกจากนี้คุณควรบอก MySQL ว่าคุณกำลังใช้สตริง UTF-8 (โดยการตั้งค่าการเชื่อมต่อฐานข้อมูลและการจัดเรียงบนคอลัมน์สตริง) เพื่อให้สามารถเปรียบเทียบตัวพิมพ์เล็กและใหญ่ได้อย่างถูกต้อง

Question 3

ฉันพบปัญหาเดียวกันนี้เมื่อใช้โมดูล Python MySQLdb เนื่องจาก MySQL จะช่วยให้คุณจัดเก็บข้อมูลไบนารีที่คุณต้องการในฟิลด์ข้อความโดยไม่คำนึงถึงชุดอักขระฉันจึงพบวิธีแก้ปัญหาของฉันที่นี่:

ใช้ UTF8 กับ Python MySQLdb

แก้ไข: อ้างจาก URL ด้านบนเพื่อตอบสนองคำขอในความคิดเห็นแรก ...

"UnicodeEncodeError: ตัวแปลงสัญญาณ" latin-1 "ไม่สามารถเข้ารหัสอักขระ ... "

เนื่องจากโดยปกติแล้ว MySQLdb จะพยายามเข้ารหัส everythin เป็น latin-1 สามารถแก้ไขได้โดยดำเนินการคำสั่งต่อไปนี้ทันทีหลังจากที่คุณได้สร้างการเชื่อมต่อแล้ว:

db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')

"ฐานข้อมูล" เป็นผลมาจากMySQLdb.connect()และ "DBC" db.cursor()เป็นผลมาจาก

Question 4

ทางออกที่ดีที่สุดคือ

ตั้งค่า charset ของ mysql เป็น 'utf-8'
ชอบความคิดเห็นนี้ (เพิ่มuse_unicode=Trueและcharset="utf8")

db = MySQLdb.connect (host = "localhost", user = "root", passwd = "", db = "testdb", use_unicode = True, charset = "utf8") - KyungHoon Kim 13 มี.ค. 57 เวลา 17:04 น.

รายละเอียดดู:

class Connection(_mysql.connection):

    """MySQL Database Connection Object"""

    default_cursor = cursors.Cursor

    def __init__(self, *args, **kwargs):
        """

        Create a connection to the database. It is strongly recommended
        that you only use keyword parameters. Consult the MySQL C API
        documentation for more information.

        host
          string, host to connect

        user
          string, user to connect as

        passwd
          string, password to use

        db
          string, database to use

        port
          integer, TCP/IP port to connect to

        unix_socket
          string, location of unix_socket to use

        conv
          conversion dictionary, see MySQLdb.converters

        connect_timeout
          number of seconds to wait before the connection attempt
          fails.

        compress
          if set, compression is enabled

        named_pipe
          if set, a named pipe is used to connect (Windows only)

        init_command
          command which is run once the connection is created

        read_default_file
          file from which default client values are read

        read_default_group
          configuration group to use from the default file

        cursorclass
          class object, used to create cursors (keyword only)

        use_unicode
          If True, text-like columns are returned as unicode objects
          using the connection's character set.  Otherwise, text-like
          columns are returned as strings.  columns are returned as
          normal strings. Unicode objects will always be encoded to
          the connection's character set regardless of this setting.

        charset
          If supplied, the connection character set will be changed
          to this character set (MySQL-4.1 and newer). This implies
          use_unicode=True.

        sql_mode
          If supplied, the session SQL mode will be changed to this
          setting (MySQL-4.1 and newer). For more details and legal
          values, see the MySQL documentation.

        client_flag
          integer, flags to use or 0
          (see MySQL docs or constants/CLIENTS.py)

        ssl
          dictionary or mapping, contains SSL connection parameters;
          see the MySQL documentation for more details
          (mysql_ssl_set()).  If this is set, and the client does not
          support SSL, NotSupportedError will be raised.

        local_infile
          integer, non-zero enables LOAD LOCAL INFILE; zero disables

        autocommit
          If False (default), autocommit is disabled.
          If True, autocommit is enabled.
          If None, autocommit isn't set and server default is used.

        There are a number of undocumented, non-standard methods. See the
        documentation for the MySQL C API for some hints on what they do.

        """

Question 5

ฉันหวังว่าฐานข้อมูลของคุณจะเป็น UTF-8 เป็นอย่างน้อย จากนั้นคุณจะต้องเรียกใช้yourstring.encode('utf-8')ก่อนที่จะลองใส่ลงในฐานข้อมูล

Question 6

คุณกำลังพยายามจัดเก็บจุดรหัส Unicode \u201cโดยใช้การเข้ารหัสISO-8859-1 / Latin-1ที่ไม่สามารถอธิบายจุดรหัสนั้นได้ คุณอาจต้องปรับเปลี่ยนฐานข้อมูลเพื่อใช้ utf-8 และจัดเก็บข้อมูลสตริงโดยใช้การเข้ารหัสที่เหมาะสมหรือคุณอาจต้องการล้างข้อมูลอินพุตของคุณก่อนจัดเก็บเนื้อหา คือการใช้สิ่งที่ต้องการคู่มือ i18n แซมทับทิมที่ดีเยี่ยม ที่พูดถึงปัญหาที่windows-1252อาจทำให้เกิดและแนะนำวิธีดำเนินการพร้อมลิงก์ไปยังโค้ดตัวอย่าง!

Question 7

ผู้ใช้ SQLAlchemy สามารถระบุฟิลด์ของตนเป็นconvert_unicode=Trueไฟล์.

ตัวอย่าง: sqlalchemy.String(1000, convert_unicode=True)

SQLAlchemy จะยอมรับออบเจ็กต์ Unicode และส่งคืนกลับจัดการการเข้ารหัสเอง

เอกสาร

Question 8

ใช้ข้อมูลโค้ดด้านล่างเพื่อแปลงข้อความจากภาษาละตินเป็นภาษาอังกฤษ

import unicodedata
def strip_accents(text):
    return "".join(char for char in
                   unicodedata.normalize('NFKD', text)
                   if unicodedata.category(char) != 'Mn')

strip_accents('áéíñóúü')

เอาต์พุต:

'aeinouu'

Question 9

Latin-1 (หรือที่เรียกว่าISO 8859-1 ) เป็นรูปแบบการเข้ารหัสอักขระอ็อกเต็ตเดี่ยวและคุณไม่สามารถใส่\u201c( “) ลงในไบต์ได้

คุณต้องการใช้การเข้ารหัส UTF-8 หรือไม่?

Question 10

Python: คุณจะต้องเพิ่ม # - * - coding: UTF-8 - * - (ลบช่องว่างรอบ ๆ *) ในบรรทัดแรกของไฟล์ python แล้วเพิ่มต่อไปนี้เพื่อข้อความที่เข้ารหัส: .encode (ASCII ', 'xmlcharrefreplace') สิ่งนี้จะแทนที่อักขระ Unicode ทั้งหมดด้วย ASCII ที่เทียบเท่า