Skip to content

Latest commit

 

History

History
234 lines (173 loc) · 7.16 KB

4.04.md

File metadata and controls

234 lines (173 loc) · 7.16 KB

Section 4.04: String Type

Let's have a look at String Type. String has some complexity to it, but let's have a look at the simplicity of String. Remember, Go has some goals: efficient compilation, efficient execution, and ease of programming.

Let's create a variable that is of type String. A string can be created with double quotes "" or with backticks ````. A string created with backticks is a raw string literal, so it will include any returns, spaces, whatever.

package main

import (
	"fmt"
)

var string_literal string

func main() {
	s := "Hello, playground"
	string_literal = `"Hello,
	
	playground"`
	fmt.Println(s)
	fmt.Printf("%T\n", s)
	fmt.Println(string_literal)
	fmt.Printf("%T\n", string_literal)
}

What the Go spec says about String Types is,

A string type represents the set of string values. A string value is a (possibly empty) sequence of bytes. Strings are immutable: once created, it is impossible to change the contents of a string. The predeclared string type is string.

So, it's a sequence of bytes. What that means is that it is a slice of bytes. A byte is a type. It is an alias for an int8. Let's use conversion to convert this string to a slice of bytes, bs := []byte(s):

package main

import (
	"fmt"
)

var string_literal string

func main() {
	s := "Hello, playground"
	fmt.Println(s)
	fmt.Printf("%T\n", s)
	bs := []byte(s)
	fmt.Println(bs)
	fmt.Printf("%T\n", bs)
}

playground

Notice that this returns the slice of bytes [72 101 108 108 111 44 32 112 108 97 121 103 114 111 117 110 100]

If we look at the ASCII coding scheme, we can see that 72 is a capital "H", and 101, 108, 108, 111 are "e", "l", "l", and "o", respectively. This shows that in this string, we have used a coding scheme to represent letters of the alphabet. Those letters are being represented by the numbers in this slice of bytes. In this case, they are being represented as decimal numbers. In programming, we also represent things in hexadecimal numbers. Hexadecimal is just another way of representing numbers; it's base 16 whereas decimal numbers are base 10.

In UTF-8, a UTF-8 code point is 1 to 4 bytes. Each code point corresponds to a character.
We have just seen that 72 corresponds to "H" in this case. And in the ASCII coding scheme, 72 corresponds to the binary 100 1000, which is 48 in hexadecimal.

To take a look deeper into the internals, and the hexadecimal, UTF-8 way, we can have a look in the fmt package documentation, particularly %x

%s	the uninterpreted bytes of the string or slice
%q	a double-quoted string safely escaped with Go syntax
%x	base 16, lower-case, two characters per byte
%X	base 16, upper-case, two characters per byte

And if we scroll down to the "Other flags:" section

+	always print a sign for numeric values;
	guarantee ASCII-only output for %q (%+q)
-	pad with spaces on the right rather than the left (left-justify the field)
#	alternate format: add leading 0 for octal (%#o), 0x for hex (%#x);
	0X for hex (%#X); suppress 0x for %p (%#p);
	for %q, print a raw (backquoted) string if strconv.CanBackquote
	returns true;
	always print a decimal point for %e, %E, %f, %F, %g and %G;
	do not remove trailing zeros for %g and %G;
	write e.g. U+0078 'x' if the character is printable for %U (%#U).
' '	(space) leave a space for elided sign in numbers (% d);
	put spaces between bytes printing strings or slices in hex (% x, % X)
0	pad with leading zeros rather than spaces;
	for numbers, this moves the padding after the sign

Particularly, the like write e.g. U+0078 'x' if the character is printable for %U (%#U).Let's have a look at some of the UTF-8 characters, using %#U:

package main

import (
	"fmt"
)

var string_literal string

func main() {
	s := "Hello, playground"
	fmt.Println(s)
	fmt.Printf("%T\n", s)
	
	bs := []byte(s)
	fmt.Println(bs)
	fmt.Printf("%T\n", bs)
	
	for i := 0; i < len(s); i++ {
		fmt.Printf("%#U ", s[i])
	}
}

playground

Another thing we can do is loop over the range of s and print out the index position and the hex value

package main

import (
	"fmt"
)

var string_literal string

func main() {
	s := "Hello, playground"
	fmt.Println(s)
	fmt.Printf("%T\n", s)
	
	bs := []byte(s)
	
	/* Show each character from string s in decimal */
	fmt.Println(bs)
	fmt.Printf("%T\n", bs)
	
	/* Show each character from string s in UTF-8 code point */
	for i := 0; i < len(s); i++ {
		fmt.Printf("%#U ", s[i])
	}
	
	fmt.Println("")
	
	/* Show each character from string s in hexadecimal */
	for i, v := range s {
		fmt.Printf("At index position %d we have hex %#x\n", i, v)
	}
}

playground

You can verify in the output with the ASCII coding scheme and see it matches up:

Binary Hex Glyph
100 1000 48 H
110 0101 65 e
110 1100 6C l
110 1100 6C l
110 1111 6F o
010 1100 2C ,
010 0000 20 space
111 0000 70 p
110 1100 6C l
110 0001 61 a
111 1001 79 y
110 0111 67 g
111 0010 72 r
110 1111 6F o
111 0101 75 u
110 1110 6E n
110 0100 64 d

Each character from the string "Hello, playground" in:

Decimal notation

[72 101 108 108 111 44 32 112 108 97 121 103 114 111 117 110 100]

UTF-8 code point

U+0048 'H' U+0065 'e' U+006C 'l' U+006C 'l' U+006F 'o' U+002C ',' U+0020 ' ' U+0070 'p' U+006C 'l' U+0061 'a' U+0079 'y' U+0067 'g' U+0072 'r' U+006F 'o' U+0075 'u' U+006E 'n' U+0064 'd'

Hexadecimal

0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x70 0x6c 0x61 0x79 0x67 0x72 0x6f 0x75 0x6e 0x64

Each code point is known as a rune, which is an alias for int32. Each rune is a code point in UTF-8.

Diving Deeper

To recap, we write strings either enclosed in double quotes or as string literals in backticks. All the code you write in Go is encoded as UTF-8, but that doesn't mean that all of your strings are going to be UTF-8 code points. You could have bytes which don't correspond to a code point.

A string is a slice of bytes. It is immutable, which means its value cannot be modified. We can assign a new value:

package main

import (
	"fmt"
)

var string_literal string

func main() {
	s := "Hello, playground"
	s = "Hello, Hawaii"
	fmt.Println(s)

}

playground

So, you can assign a new value, but you cannot change the bytes of a string. So a string is an immutable slice of bytes. You can do conversion and see that slice of bytes, and see how it translates back to the ASCII UTF-8 coding scheme.

On the Golang website they have Asian characters in the example. Go ahead and use the techniques from this lesson and see that those characters are more than one byte.

package main

import "fmt"

func main() {
	fmt.Println("Hello, 世界")
}