diff --git a/book.toml b/book.toml index 87d0e77c8e..98697bf6c2 100644 --- a/book.toml +++ b/book.toml @@ -90,6 +90,7 @@ use-boolean-and = true "/types.html#type-parameters" = "types/parameters.html" "/types.html#union-types" = "types/union.html" "/types.html#unique-immutable-borrows-in-captures" = "types/closure.html#unique-immutable-borrows-in-captures" +"/types/textual.html" = "char.html" "/unsafe-blocks.html" = "unsafe-keyword.html" "/unsafe-functions.html" = "unsafe-keyword.html" diff --git a/src/SUMMARY.md b/src/SUMMARY.md index c3786707fa..36f4957fc2 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -75,7 +75,8 @@ - [Types](types.md) - [Boolean type](types/boolean.md) - [Numeric types](types/numeric.md) - - [Textual types](types/textual.md) + - [Character type](types/char.md) + - [String slice type](types/str.md) - [Never type](types/never.md) - [Tuple types](types/tuple.md) - [Array types](types/array.md) diff --git a/src/dynamically-sized-types.md b/src/dynamically-sized-types.md index 0496e16c73..e34baa2b55 100644 --- a/src/dynamically-sized-types.md +++ b/src/dynamically-sized-types.md @@ -34,7 +34,7 @@ r[dynamic-sized.struct-field] [sized]: special-types-and-traits.md#sized [Slices]: types/slice.md -[str]: types/textual.md +[str]: types/str.md [trait objects]: types/trait-object.md [Pointer types]: types/pointer.md [Variables]: variables.md diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 51b3c7ae26..5517c9d2a5 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -130,7 +130,7 @@ r[expr.literal.char.intro] A character literal expression consists of a single [CHAR_LITERAL] token. r[expr.literal.char.type] -The expression's type is the primitive [`char`][textual types] type. +The expression's type is the primitive [`char`] type. r[expr.literal.char.no-suffix] The token must not have a suffix. @@ -151,7 +151,7 @@ r[expr.literal.char.single] * Otherwise the represented character is the single character that makes up the literal content. r[expr.literal.char.result] -The expression's value is the [`char`][textual types] corresponding to the represented character's [Unicode scalar value]. +The expression's value is the [`char`] corresponding to the represented character's [Unicode scalar value]. > [!NOTE] > The permitted forms of a [CHAR_LITERAL] token ensure that these rules always produce a single character. @@ -172,7 +172,7 @@ r[expr.literal.string.intro] A string literal expression consists of a single [STRING_LITERAL] or [RAW_STRING_LITERAL] token. r[expr.literal.string.type] -The expression's type is a shared reference (with `static` lifetime) to the primitive [`str`][textual types] type. +The expression's type is a shared reference (with `static` lifetime) to the primitive [`str`] type. That is, the type is `&'static str`. r[expr.literal.string.no-suffix] @@ -198,7 +198,7 @@ r[expr.literal.string.raw] * If the token is a [RAW_STRING_LITERAL], the represented string is identical to the literal content. r[expr.literal.string.result] -The expression's value is a reference to a statically allocated [`str`][textual types] containing the UTF-8 encoding of the represented string. +The expression's value is a reference to a statically allocated [`str`] containing the UTF-8 encoding of the represented string. Examples of string literal expressions: @@ -521,8 +521,9 @@ The expression's type is the primitive [boolean type], and its value is: [suffix]: ../tokens.md#suffixes [negation operator]: operator-expr.md#negation-operators [overflow]: operator-expr.md#overflow -[textual types]: ../types/textual.md [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [Unicode scalar values]: http://www.unicode.org/glossary/#unicode_scalar_value +[`char`]: ../types/char.md [`f32::from_str`]: ../../core/primitive.f32.md#method.from_str [`f64::from_str`]: ../../core/primitive.f64.md#method.from_str +[`str`]: ../types/str.md diff --git a/src/names.md b/src/names.md index f5b1bda9bd..412f9ed6cd 100644 --- a/src/names.md +++ b/src/names.md @@ -75,7 +75,7 @@ The following entities are implicitly defined by the language, or are introduced r[names.implicit.primitive-types] * [Language prelude]: * [Boolean type] --- `bool` - * [Textual types] --- `char` and `str` + * Textual types --- [`char`] and [`str`] * [Integer types] --- `i8`, `i16`, `i32`, `i64`, `i128`, `u8`, `u16`, `u32`, `u64`, `u128` * [Machine-dependent integer types] --- `usize` and `isize` * [floating-point types] --- `f32` and `f64` @@ -113,6 +113,7 @@ Additionally, the crate root module does not have a name, but can be referred to [*scope*]: names/scopes.md [*visibility*]: visibility-and-privacy.md [`'static`]: keywords.md#weak-keywords +[`char`]: types/char.md [`for`]: expressions/loop-expr.md#iterator-loops [`if let`]: expressions/if-expr.md#if-let-patterns [`let` statement]: statements.md#let-statements @@ -120,6 +121,7 @@ Additionally, the crate root module does not have a name, but can be referred to [`macro_rules` declarations]: macros-by-example.md [`macro_use` attribute]: macros-by-example.md#the-macro_use-attribute [`match`]: expressions/match-expr.md +[`str`]: types/str.md [`while let`]: expressions/loop-expr.md#while-let-patterns [associated items]: items/associated-items.md [attributes]: attributes.md @@ -156,7 +158,6 @@ Additionally, the crate root module does not have a name, but can be referred to [Standard library prelude]: names/preludes.md#standard-library-prelude [Static item declarations]: items/static-items.md [struct]: items/structs.md -[Textual types]: types/textual.md [Tool attributes]: attributes.md#tool-attributes [tool lint attributes]: attributes/diagnostics.md#tool-lint-attributes [Trait item declarations]: items/traits.md diff --git a/src/names/namespaces.md b/src/names/namespaces.md index 7e76e60b16..1f4b5f0f01 100644 --- a/src/names/namespaces.md +++ b/src/names/namespaces.md @@ -17,7 +17,7 @@ The following is a list of namespaces, with their corresponding entities: * [Trait item declarations] * [Type aliases] * [Associated type declarations] - * Built-in types: [boolean], [numeric], and [textual] + * Built-in types: [boolean], [numeric], [`char`], and [`str`] * [Generic type parameters] * [`Self` type] * [Tool attribute modules] @@ -115,6 +115,7 @@ For example, the [`cfg` attribute] and the [`cfg` macro] are two different entit [`cfg` attribute]: ../conditional-compilation.md#the-cfg-attribute [`cfg` macro]: ../conditional-compilation.md#the-cfg-macro +[`char`]: ../types/char.md [`for`]: ../expressions/loop-expr.md#iterator-loops [`if let`]: ../expressions/if-expr.md#if-let-patterns [`let`]: ../statements.md#let-statements @@ -122,6 +123,7 @@ For example, the [`cfg` attribute] and the [`cfg` macro] are two different entit [`match`]: ../expressions/match-expr.md [`Self` constructors]: ../paths.md#self-1 [`Self` type]: ../paths.md#self-1 +[`str`]: ../types/str.md [`use` import]: ../items/use-declarations.md [`while let`]: ../expressions/loop-expr.md#while-let-patterns [Associated const declarations]: ../items/associated-items.md#associated-constants @@ -158,7 +160,6 @@ For example, the [`cfg` attribute] and the [`cfg` macro] are two different entit [Static item declarations]: ../items/static-items.md [Struct constructors]: ../items/structs.md [Struct]: ../items/structs.md -[textual]: ../types/textual.md [Tool attribute modules]: ../attributes.md#tool-attributes [Tool attributes]: ../attributes.md#tool-attributes [Trait item declarations]: ../items/traits.md diff --git a/src/names/preludes.md b/src/names/preludes.md index d114a7d63c..0daf4a1d7a 100644 --- a/src/names/preludes.md +++ b/src/names/preludes.md @@ -122,7 +122,8 @@ It includes the following: * [Type namespace] * [Boolean type] --- `bool` - * [Textual types] --- `char` and `str` + * [`char`] + * [`str`] * [Integer types] --- `i8`, `i16`, `i32`, `i64`, `i128`, `u8`, `u16`, `u32`, `u64`, `u128` * [Machine-dependent integer types] --- `usize` and `isize` * [floating-point types] --- `f32` and `f64` @@ -223,10 +224,12 @@ r[names.preludes.no_implicit_prelude.edition2018] > [!EDITION-2018] > In the 2015 edition, the `no_implicit_prelude` attribute does not affect the [`macro_use` prelude], and all macros exported from the standard library are still included in the `macro_use` prelude. Starting in the 2018 edition, the attribute does remove the `macro_use` prelude. +[`char`]: ../types/char.md [`extern crate`]: ../items/extern-crates.md [`macro_use` attribute]: ../macros-by-example.md#the-macro_use-attribute [`macro_use` prelude]: #macro_use-prelude [`no_std` attribute]: #the-no_std-attribute +[`str`]: ../types/str.md [attribute]: ../attributes.md [Boolean type]: ../types/boolean.md [Built-in attributes]: ../attributes.md#built-in-attributes-index @@ -239,7 +242,6 @@ r[names.preludes.no_implicit_prelude.edition2018] [Macro namespace]: namespaces.md [name resolution]: name-resolution.md [standard library prelude]: names.preludes.std -[Textual types]: ../types/textual.md [tool attributes]: ../attributes.md#tool-attributes [Tool prelude]: #tool-prelude [Type namespace]: namespaces.md diff --git a/src/types.md b/src/types.md index de67050ad0..42dd2ca75a 100644 --- a/src/types.md +++ b/src/types.md @@ -19,7 +19,8 @@ The list of types is: * Primitive types: * [Boolean] --- `bool` * [Numeric] --- integer and float - * [Textual] --- `char` and `str` + * [`char`] + * [`str`] * [Never] --- `!` --- a type with no values * Sequence types: * [Tuple] @@ -76,7 +77,7 @@ r[type.name.sequence] r[type.name.path] * [Type paths] which can reference: - * Primitive types ([boolean], [numeric], [textual]). + * Primitive types ([boolean], [numeric], [`char`], [`str`]). * Paths to an [item] ([struct], [enum], [union], [type alias], [trait]). * [`Self` path] where `Self` is the implementing type. * Generic [type parameters]. @@ -149,6 +150,8 @@ enum List { let a: List = List::Cons(7, Box::new(List::Cons(13, Box::new(List::Nil)))); ``` +[`char`]: types/char.md +[`str`]: types/str.md [Array]: types/array.md [Boolean]: types/boolean.md [Closures]: types/closure.md @@ -163,7 +166,6 @@ let a: List = List::Cons(7, Box::new(List::Cons(13, Box::new(List::Nil)))); [References]: types/pointer.md#shared-references- [Slice]: types/slice.md [Struct]: types/struct.md -[Textual]: types/textual.md [Trait objects]: types/trait-object.md [Tuple]: types/tuple.md [Type paths]: paths.md#paths-in-types diff --git a/src/types/char.md b/src/types/char.md new file mode 100644 index 0000000000..c242c14e40 --- /dev/null +++ b/src/types/char.md @@ -0,0 +1,27 @@ +r[type.char] +# Character type + +r[type.char.intro] +The `char` type represents a single [Unicode scalar value] (i.e., a code point that is not a surrogate). + +> [!EXAMPLE] +> ```rust +> let c: char = 'a'; +> let emoji: char = '😀'; +> let unicode: char = '\u{1F600}'; +> ``` + +> [!NOTE] +> See [the standard library docs][`char`] for information on the impls of the `char` type. + +r[type.char.value] +A value of type `char` is represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF range. It is immediate [undefined behavior] to create a `char` that falls outside this range. + +r[type.char.layout] +`char` is guaranteed to have the same size and alignment as `u32` on all platforms. + +r[type.char.validity] +Every byte of a `char` is guaranteed to be initialized. In other words, `transmute::()]>(...)` is always sound -- but since some bit patterns are invalid `char`s, the inverse is not always sound. + +[Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value +[undefined behavior]: ../behavior-considered-undefined.md diff --git a/src/types/str.md b/src/types/str.md new file mode 100644 index 0000000000..21f6d1c9e8 --- /dev/null +++ b/src/types/str.md @@ -0,0 +1,25 @@ +r[type.str] +# String slice type + +r[type.str.intro] +The `str` type represents a sequence of characters. + +```rust +let greeting1: &str = "Hello, world!"; +let greeting2: &str = "你好,世界"; +``` + +> [!NOTE] +> See [the standard library docs][`str`] for information on the impls of the `str` type. + +r[type.str.value] +A value of type `str` is represented in the same way as `[u8]`, a slice of 8-bit unsigned bytes. + +> [!NOTE] +> The standard library makes extra assumptions about `str`: methods working on `str` assume and ensure that the data it contains is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause [undefined behavior] now or in the future. + +r[type.str.unsized] +A `str` is a [dynamically sized type]. It can only be instantiated through a pointer type, such as `&str`. The layout of `&str` is the same as the layout of `&[u8]`. + +[undefined behavior]: ../behavior-considered-undefined.md +[dynamically sized type]: ../dynamically-sized-types.md diff --git a/src/types/textual.md b/src/types/textual.md index 32722e7203..05068309a7 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -5,26 +5,16 @@ r[type.text.intro] The types `char` and `str` hold textual data. r[type.text.char-value] -A value of type `char` is a [Unicode scalar value] (i.e. a code point that is -not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF -or 0xE000 to 0x10FFFF range. +A value of type `char` is a [Unicode scalar value] (i.e. a code point that is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF range. r[type.text.char-precondition] -It is immediate [undefined behavior] to create a -`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 -string of length 1. +It is immediate [undefined behavior] to create a `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 string of length 1. r[type.text.str-value] -A value of type `str` is represented the same way as `[u8]`, a slice of -8-bit unsigned bytes. However, the Rust standard library makes extra assumptions -about `str`: methods working on `str` assume and ensure that the data in there -is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause -[undefined behavior] now or in the future. +A value of type `str` is represented the same way as `[u8]`, a slice of 8-bit unsigned bytes. However, the Rust standard library makes extra assumptions about `str`: methods working on `str` assume and ensure that the data in there is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause [undefined behavior] now or in the future. r[type.text.str-unsized] -Since `str` is a [dynamically sized type], it can only be instantiated through a -pointer type, such as `&str`. The layout of `&str` is the same as the layout of -`&[u8]`. +Since `str` is a [dynamically sized type], it can only be instantiated through a pointer type, such as `&str`. The layout of `&str` is the same as the layout of `&[u8]`. r[type.text.layout] ## Layout and bit validity @@ -33,9 +23,7 @@ r[type.layout.char-layout] `char` is guaranteed to have the same size and alignment as `u32` on all platforms. r[type.layout.char-validity] -Every byte of a `char` is guaranteed to be initialized (in other words, -`transmute::()]>(...)` is always sound -- but since -some bit patterns are invalid `char`s, the inverse is not always sound). +Every byte of a `char` is guaranteed to be initialized (in other words, `transmute::()]>(...)` is always sound -- but since some bit patterns are invalid `char`s, the inverse is not always sound). [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [undefined behavior]: ../behavior-considered-undefined.md