Source file src/encoding/gob/doc.go

     1	// Copyright 2009 The Go Authors. All rights reserved.
     2	// Use of this source code is governed by a BSD-style
     3	// license that can be found in the LICENSE file.
     4	
     5	/*
     6	Package gob manages streams of gobs - binary values exchanged between an
     7	Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
     8	arguments and results of remote procedure calls (RPCs) such as those provided by
     9	package "net/rpc".
    10	
    11	The implementation compiles a custom codec for each data type in the stream and
    12	is most efficient when a single Encoder is used to transmit a stream of values,
    13	amortizing the cost of compilation.
    14	
    15	Basics
    16	
    17	A stream of gobs is self-describing. Each data item in the stream is preceded by
    18	a specification of its type, expressed in terms of a small set of predefined
    19	types. Pointers are not transmitted, but the things they point to are
    20	transmitted; that is, the values are flattened. Nil pointers are not permitted,
    21	as they have no value. Recursive types work fine, but
    22	recursive values (data with cycles) are problematic. This may change.
    23	
    24	To use gobs, create an Encoder and present it with a series of data items as
    25	values or addresses that can be dereferenced to values. The Encoder makes sure
    26	all type information is sent before it is needed. At the receive side, a
    27	Decoder retrieves values from the encoded stream and unpacks them into local
    28	variables.
    29	
    30	Types and Values
    31	
    32	The source and destination values/types need not correspond exactly. For structs,
    33	fields (identified by name) that are in the source but absent from the receiving
    34	variable will be ignored. Fields that are in the receiving variable but missing
    35	from the transmitted type or value will be ignored in the destination. If a field
    36	with the same name is present in both, their types must be compatible. Both the
    37	receiver and transmitter will do all necessary indirection and dereferencing to
    38	convert between gobs and actual Go values. For instance, a gob type that is
    39	schematically,
    40	
    41		struct { A, B int }
    42	
    43	can be sent from or received into any of these Go types:
    44	
    45		struct { A, B int }	// the same
    46		*struct { A, B int }	// extra indirection of the struct
    47		struct { *A, **B int }	// extra indirection of the fields
    48		struct { A, B int64 }	// different concrete value type; see below
    49	
    50	It may also be received into any of these:
    51	
    52		struct { A, B int }	// the same
    53		struct { B, A int }	// ordering doesn't matter; matching is by name
    54		struct { A, B, C int }	// extra field (C) ignored
    55		struct { B int }	// missing field (A) ignored; data will be dropped
    56		struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
    57	
    58	Attempting to receive into these types will draw a decode error:
    59	
    60		struct { A int; B uint }	// change of signedness for B
    61		struct { A int; B float }	// change of type for B
    62		struct { }			// no field names in common
    63		struct { C, D int }		// no field names in common
    64	
    65	Integers are transmitted two ways: arbitrary precision signed integers or
    66	arbitrary precision unsigned integers. There is no int8, int16 etc.
    67	discrimination in the gob format; there are only signed and unsigned integers. As
    68	described below, the transmitter sends the value in a variable-length encoding;
    69	the receiver accepts the value and stores it in the destination variable.
    70	Floating-point numbers are always sent using IEEE-754 64-bit precision (see
    71	below).
    72	
    73	Signed integers may be received into any signed integer variable: int, int16, etc.;
    74	unsigned integers may be received into any unsigned integer variable; and floating
    75	point values may be received into any floating point variable. However,
    76	the destination variable must be able to represent the value or the decode
    77	operation will fail.
    78	
    79	Structs, arrays and slices are also supported. Structs encode and decode only
    80	exported fields. Strings and arrays of bytes are supported with a special,
    81	efficient representation (see below). When a slice is decoded, if the existing
    82	slice has capacity the slice will be extended in place; if not, a new array is
    83	allocated. Regardless, the length of the resulting slice reports the number of
    84	elements decoded.
    85	
    86	In general, if allocation is required, the decoder will allocate memory. If not,
    87	it will update the destination variables with values read from the stream. It does
    88	not initialize them first, so if the destination is a compound value such as a
    89	map, struct, or slice, the decoded values will be merged elementwise into the
    90	existing variables.
    91	
    92	Functions and channels will not be sent in a gob. Attempting to encode such a value
    93	at the top level will fail. A struct field of chan or func type is treated exactly
    94	like an unexported field and is ignored.
    95	
    96	Gob can encode a value of any type implementing the GobEncoder or
    97	encoding.BinaryMarshaler interfaces by calling the corresponding method,
    98	in that order of preference.
    99	
   100	Gob can decode a value of any type implementing the GobDecoder or
   101	encoding.BinaryUnmarshaler interfaces by calling the corresponding method,
   102	again in that order of preference.
   103	
   104	Encoding Details
   105	
   106	This section documents the encoding, details that are not important for most
   107	users. Details are presented bottom-up.
   108	
   109	An unsigned integer is sent one of two ways. If it is less than 128, it is sent
   110	as a byte with that value. Otherwise it is sent as a minimal-length big-endian
   111	(high byte first) byte stream holding the value, preceded by one byte holding the
   112	byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
   113	256 is transmitted as (FE 01 00).
   114	
   115	A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
   116	
   117	A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
   118	upward contain the value; bit 0 says whether they should be complemented upon
   119	receipt. The encode algorithm looks like this:
   120	
   121		var u uint
   122		if i < 0 {
   123			u = (^uint(i) << 1) | 1 // complement i, bit 0 is 1
   124		} else {
   125			u = (uint(i) << 1) // do not complement i, bit 0 is 0
   126		}
   127		encodeUnsigned(u)
   128	
   129	The low bit is therefore analogous to a sign bit, but making it the complement bit
   130	instead guarantees that the largest negative integer is not a special case. For
   131	example, -129=^128=(^256>>1) encodes as (FE 01 01).
   132	
   133	Floating-point numbers are always sent as a representation of a float64 value.
   134	That value is converted to a uint64 using math.Float64bits. The uint64 is then
   135	byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
   136	exponent and high-precision part of the mantissa go first. Since the low bits are
   137	often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
   138	three bytes (FE 31 40).
   139	
   140	Strings and slices of bytes are sent as an unsigned count followed by that many
   141	uninterpreted bytes of the value.
   142	
   143	All other slices and arrays are sent as an unsigned count followed by that many
   144	elements using the standard gob encoding for their type, recursively.
   145	
   146	Maps are sent as an unsigned count followed by that many key, element
   147	pairs. Empty but non-nil maps are sent, so if the receiver has not allocated
   148	one already, one will always be allocated on receipt unless the transmitted map
   149	is nil and not at the top level.
   150	
   151	In slices and arrays, as well as maps, all elements, even zero-valued elements,
   152	are transmitted, even if all the elements are zero.
   153	
   154	Structs are sent as a sequence of (field number, field value) pairs. The field
   155	value is sent using the standard gob encoding for its type, recursively. If a
   156	field has the zero value for its type (except for arrays; see above), it is omitted
   157	from the transmission. The field number is defined by the type of the encoded
   158	struct: the first field of the encoded type is field 0, the second is field 1,
   159	etc. When encoding a value, the field numbers are delta encoded for efficiency
   160	and the fields are always sent in order of increasing field number; the deltas are
   161	therefore unsigned. The initialization for the delta encoding sets the field
   162	number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
   163	delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
   164	sent a terminating mark denotes the end of the struct. That mark is a delta=0
   165	value, which has representation (00).
   166	
   167	Interface types are not checked for compatibility; all interface types are
   168	treated, for transmission, as members of a single "interface" type, analogous to
   169	int or []byte - in effect they're all treated as interface{}. Interface values
   170	are transmitted as a string identifying the concrete type being sent (a name
   171	that must be pre-defined by calling Register), followed by a byte count of the
   172	length of the following data (so the value can be skipped if it cannot be
   173	stored), followed by the usual encoding of concrete (dynamic) value stored in
   174	the interface value. (A nil interface value is identified by the empty string
   175	and transmits no value.) Upon receipt, the decoder verifies that the unpacked
   176	concrete item satisfies the interface of the receiving variable.
   177	
   178	If a value is passed to Encode and the type is not a struct (or pointer to struct,
   179	etc.), for simplicity of processing it is represented as a struct of one field.
   180	The only visible effect of this is to encode a zero byte after the value, just as
   181	after the last field of an encoded struct, so that the decode algorithm knows when
   182	the top-level value is complete.
   183	
   184	The representation of types is described below. When a type is defined on a given
   185	connection between an Encoder and Decoder, it is assigned a signed integer type
   186	id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
   187	the type of v and all its elements and then it sends the pair (typeid, encoded-v)
   188	where typeid is the type id of the encoded type of v and encoded-v is the gob
   189	encoding of the value v.
   190	
   191	To define a type, the encoder chooses an unused, positive type id and sends the
   192	pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
   193	description, constructed from these types:
   194	
   195		type wireType struct {
   196			ArrayT           *ArrayType
   197			SliceT           *SliceType
   198			StructT          *StructType
   199			MapT             *MapType
   200			GobEncoderT      *gobEncoderType
   201			BinaryMarshalerT *gobEncoderType
   202			TextMarshalerT   *gobEncoderType
   203	
   204		}
   205		type arrayType struct {
   206			CommonType
   207			Elem typeId
   208			Len  int
   209		}
   210		type CommonType struct {
   211			Name string // the name of the struct type
   212			Id  int    // the id of the type, repeated so it's inside the type
   213		}
   214		type sliceType struct {
   215			CommonType
   216			Elem typeId
   217		}
   218		type structType struct {
   219			CommonType
   220			Field []*fieldType // the fields of the struct.
   221		}
   222		type fieldType struct {
   223			Name string // the name of the field.
   224			Id   int    // the type id of the field, which must be already defined
   225		}
   226		type mapType struct {
   227			CommonType
   228			Key  typeId
   229			Elem typeId
   230		}
   231		type gobEncoderType struct {
   232			CommonType
   233		}
   234	
   235	If there are nested type ids, the types for all inner type ids must be defined
   236	before the top-level type id is used to describe an encoded-v.
   237	
   238	For simplicity in setup, the connection is defined to understand these types a
   239	priori, as well as the basic gob types int, uint, etc. Their ids are:
   240	
   241		bool        1
   242		int         2
   243		uint        3
   244		float       4
   245		[]byte      5
   246		string      6
   247		complex     7
   248		interface   8
   249		// gap for reserved ids.
   250		WireType    16
   251		ArrayType   17
   252		CommonType  18
   253		SliceType   19
   254		StructType  20
   255		FieldType   21
   256		// 22 is slice of fieldType.
   257		MapType     23
   258	
   259	Finally, each message created by a call to Encode is preceded by an encoded
   260	unsigned integer count of the number of bytes remaining in the message. After
   261	the initial type name, interface values are wrapped the same way; in effect, the
   262	interface value acts like a recursive invocation of Encode.
   263	
   264	In summary, a gob stream looks like
   265	
   266		(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
   267	
   268	where * signifies zero or more repetitions and the type id of a value must
   269	be predefined or be defined before the value in the stream.
   270	
   271	Compatibility: Any future changes to the package will endeavor to maintain
   272	compatibility with streams encoded using previous versions. That is, any released
   273	version of this package should be able to decode data written with any previously
   274	released version, subject to issues such as security fixes. See the Go compatibility
   275	document for background: https://golang.org/doc/go1compat
   276	
   277	See "Gobs of data" for a design discussion of the gob wire format:
   278	https://blog.golang.org/gobs-of-data
   279	*/
   280	package gob
   281	
   282	/*
   283	Grammar:
   284	
   285	Tokens starting with a lower case letter are terminals; int(n)
   286	and uint(n) represent the signed/unsigned encodings of the value n.
   287	
   288	GobStream:
   289		DelimitedMessage*
   290	DelimitedMessage:
   291		uint(lengthOfMessage) Message
   292	Message:
   293		TypeSequence TypedValue
   294	TypeSequence
   295		(TypeDefinition DelimitedTypeDefinition*)?
   296	DelimitedTypeDefinition:
   297		uint(lengthOfTypeDefinition) TypeDefinition
   298	TypedValue:
   299		int(typeId) Value
   300	TypeDefinition:
   301		int(-typeId) encodingOfWireType
   302	Value:
   303		SingletonValue | StructValue
   304	SingletonValue:
   305		uint(0) FieldValue
   306	FieldValue:
   307		builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
   308	InterfaceValue:
   309		NilInterfaceValue | NonNilInterfaceValue
   310	NilInterfaceValue:
   311		uint(0)
   312	NonNilInterfaceValue:
   313		ConcreteTypeName TypeSequence InterfaceContents
   314	ConcreteTypeName:
   315		uint(lengthOfName) [already read=n] name
   316	InterfaceContents:
   317		int(concreteTypeId) DelimitedValue
   318	DelimitedValue:
   319		uint(length) Value
   320	ArrayValue:
   321		uint(n) FieldValue*n [n elements]
   322	MapValue:
   323		uint(n) (FieldValue FieldValue)*n  [n (key, value) pairs]
   324	SliceValue:
   325		uint(n) FieldValue*n [n elements]
   326	StructValue:
   327		(uint(fieldDelta) FieldValue)*
   328	*/
   329	
   330	/*
   331	For implementers and the curious, here is an encoded example. Given
   332		type Point struct {X, Y int}
   333	and the value
   334		p := Point{22, 33}
   335	the bytes transmitted that encode p will be:
   336		1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
   337		01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
   338		07 ff 82 01 2c 01 42 00
   339	They are determined as follows.
   340	
   341	Since this is the first transmission of type Point, the type descriptor
   342	for Point itself must be sent before the value. This is the first type
   343	we've sent on this Encoder, so it has type id 65 (0 through 64 are
   344	reserved).
   345	
   346		1f	// This item (a type descriptor) is 31 bytes long.
   347		ff 81	// The negative of the id for the type we're defining, -65.
   348			// This is one byte (indicated by FF = -1) followed by
   349			// ^-65<<1 | 1. The low 1 bit signals to complement the
   350			// rest upon receipt.
   351	
   352		// Now we send a type descriptor, which is itself a struct (wireType).
   353		// The type of wireType itself is known (it's built in, as is the type of
   354		// all its components), so we just need to send a *value* of type wireType
   355		// that represents type "Point".
   356		// Here starts the encoding of that value.
   357		// Set the field number implicitly to -1; this is done at the beginning
   358		// of every struct, including nested structs.
   359		03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
   360			// structType starts with an embedded CommonType, which appears
   361			// as a regular structure here too.
   362		01	// add 1 to field number (now 0); start of embedded CommonType.
   363		01	// add 1 to field number (now 0, the name of the type)
   364		05	// string is (unsigned) 5 bytes long
   365		50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
   366		01	// add 1 to field number (now 1, the id of the type)
   367		ff 82	// wireType.structType.CommonType._id = 65
   368		00	// end of embedded wiretype.structType.CommonType struct
   369		01	// add 1 to field number (now 1, the field array in wireType.structType)
   370		02	// There are two fields in the type (len(structType.field))
   371		01	// Start of first field structure; add 1 to get field number 0: field[0].name
   372		01	// 1 byte
   373		58	// structType.field[0].name = "X"
   374		01	// Add 1 to get field number 1: field[0].id
   375		04	// structType.field[0].typeId is 2 (signed int).
   376		00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
   377		01	// Add 1 to get field number 0: field[1].name
   378		01	// 1 byte
   379		59	// structType.field[1].name = "Y"
   380		01	// Add 1 to get field number 1: field[1].id
   381		04	// struct.Type.field[1].typeId is 2 (signed int).
   382		00	// End of structType.field[1]; end of structType.field.
   383		00	// end of wireType.structType structure
   384		00	// end of wireType structure
   385	
   386	Now we can send the Point value. Again the field number resets to -1:
   387	
   388		07	// this value is 7 bytes long
   389		ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
   390		01	// add one to field number, yielding field 0
   391		2c	// encoding of signed "22" (0x2c = 44 = 22<<1); Point.x = 22
   392		01	// add one to field number, yielding field 1
   393		42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
   394		00	// end of structure
   395	
   396	The type encoding is long and fairly intricate but we send it only once.
   397	If p is transmitted a second time, the type is already known so the
   398	output will be just:
   399	
   400		07 ff 82 01 2c 01 42 00
   401	
   402	A single non-struct value at top level is transmitted like a field with
   403	delta tag 0. For instance, a signed integer with value 3 presented as
   404	the argument to Encode will emit:
   405	
   406		03 04 00 06
   407	
   408	Which represents:
   409	
   410		03	// this value is 3 bytes long
   411		04	// the type number, 2, represents an integer
   412		00	// tag delta 0
   413		06	// value 3
   414	
   415	*/
   416
View as plain text