identifier.go 3.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
  1. // Copyright 2015 The Go Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style
  3. // license that can be found in the LICENSE file.
  4. //go:generate go run gen.go
  5. // Package identifier defines the contract between implementations of Encoding
  6. // and Index by defining identifiers that uniquely identify standardized coded
  7. // character sets (CCS) and character encoding schemes (CES), which we will
  8. // together refer to as encodings, for which Encoding implementations provide
  9. // converters to and from UTF-8. This package is typically only of concern to
  10. // implementers of Indexes and Encodings.
  11. //
  12. // One part of the identifier is the MIB code, which is defined by IANA and
  13. // uniquely identifies a CCS or CES. Each code is associated with data that
  14. // references authorities, official documentation as well as aliases and MIME
  15. // names.
  16. //
  17. // Not all CESs are covered by the IANA registry. The "other" string that is
  18. // returned by ID can be used to identify other character sets or versions of
  19. // existing ones.
  20. //
  21. // It is recommended that each package that provides a set of Encodings provide
  22. // the All and Common variables to reference all supported encodings and
  23. // commonly used subset. This allows Index implementations to include all
  24. // available encodings without explicitly referencing or knowing about them.
  25. package identifier
  26. // Note: this package is internal, but could be made public if there is a need
  27. // for writing third-party Indexes and Encodings.
  28. // References:
  29. // - http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/convrtrs.txt
  30. // - http://www.iana.org/assignments/character-sets/character-sets.xhtml
  31. // - http://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
  32. // - http://www.ietf.org/rfc/rfc2978.txt
  33. // - http://www.unicode.org/reports/tr22/
  34. // - http://www.w3.org/TR/encoding/
  35. // - http://www.w3.org/TR/encoding/indexes/encodings.json
  36. // - https://encoding.spec.whatwg.org/
  37. // - https://tools.ietf.org/html/rfc6657#section-5
  38. // Interface can be implemented by Encodings to define the CCS or CES for which
  39. // it implements conversions.
  40. type Interface interface {
  41. // ID returns an encoding identifier. Exactly one of the mib and other
  42. // values should be non-zero.
  43. //
  44. // In the usual case it is only necessary to indicate the MIB code. The
  45. // other string can be used to specify encodings for which there is no MIB,
  46. // such as "x-mac-dingbat".
  47. //
  48. // The other string may only contain the characters a-z, A-Z, 0-9, - and _.
  49. ID() (mib MIB, other string)
  50. // NOTE: the restrictions on the encoding are to allow extending the syntax
  51. // with additional information such as versions, vendors and other variants.
  52. }
  53. // A MIB identifies an encoding. It is derived from the IANA MIB codes and adds
  54. // some identifiers for some encodings that are not covered by the IANA
  55. // standard.
  56. //
  57. // See http://www.iana.org/assignments/ianacharset-mib.
  58. type MIB uint16
  59. // These additional MIB types are not defined in IANA. They are added because
  60. // they are common and defined within the text repo.
  61. const (
  62. // Unofficial marks the start of encodings not registered by IANA.
  63. Unofficial MIB = 10000 + iota
  64. // Replacement is the WhatWG replacement encoding.
  65. Replacement
  66. // XUserDefined is the code for x-user-defined.
  67. XUserDefined
  68. // MacintoshCyrillic is the code for x-mac-cyrillic.
  69. MacintoshCyrillic
  70. )