diff --git a/2020-global/README.md b/2020-global/README.md index f610e0d..f8cb5ae 100644 --- a/2020-global/README.md +++ b/2020-global/README.md @@ -10,32 +10,34 @@ Thanks! ## List of talks +[All collected limericks (speaker introductions)](./llogiq.md) - written and presented by Andre Bogus 'llogiq' + ### APAC Block -0. [Using Rust in Metal Fabrication](./talks/01_APAC/04-Aki.txt) - Aki -0. [Piecing together Rust: It is more than just writing code](./talks/01_APAC/05-Tarun-Pothulapti.txt) - Tarun Pothulapti -0. [Everything is serialization](./talks/01_APAC/06-Zac-Burns.txt) - Zac Burns -0. [Architect a High-performance SQL Query Engine in Rust](./talks/01_APAC/07-Jin-Mingjian.txt) - Jin Mingjian +0. [Using Rust in Metal Fabrication](./talks/01_APAC/04-Aki-published.md) - Aki +0. [Piecing together Rust: It is more than just writing code](./talks/01_APAC/05-Tarun-Pothulapti-published.md) - Tarun Pothulapti +0. [Everything is serialization](./talks/01_APAC/06-Zac-Burns-published.md) - Zac Burns +0. [Architect a High-performance SQL Query Engine in Rust](./talks/01_APAC/07-Jin-Mingjian-published.md) - Jin Mingjian ### UTC Block -0. [Learnable Programming with Rust](./talks/02_UTC/01-Nikita-Baksalyar.txt) - Nikita Baksalyar -0. [Build your own (Rust-y) robot!](./talks/02_UTC/02-Aissata-Maiga.txt) - Aïssata Maiga -0. [Rust for Safer Protocol Development](./talks/02_UTC/03-Vivian-Band.txt) - Vivian Band -0. [Rust as foundation in a polyglot development environment](./talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt) - Gavin Mendel-Gleason & Matthijs van Otterdijk -0. [Rust for Artists. Art for Rustaceans.](./talks/02_UTC/05-Anastasia-Opara.txt) - Anastasia Opara -0. [Miri, Undefined Behavior and Foreign Functions](./talks/02_UTC/06-Christian-Poveda.txt) - Christian Poveda -0. [RFC: Secret types in Rust](./talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt) - Diane Hosfelt & Daan Sprenkels +0. [Learnable Programming with Rust](./talks/02_UTC/01-Nikita-Baksalyar.md) - Nikita Baksalyar +0. [Build your own (Rust-y) robot!](./talks/02_UTC/02-Aissata-Maiga.md) - Aïssata Maiga +0. [Rust for Safer Protocol Development](./talks/02_UTC/03-Vivian-Band.md) - Vivian Band +0. [Rust as foundation in a polyglot development environment](./talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md) - Gavin Mendel-Gleason & Matthijs van Otterdijk +0. [Rust for Artists. Art for Rustaceans.](./talks/02_UTC/05-Anastasia-Opara.md) - Anastasia Opara +0. [Miri, Undefined Behavior and Foreign Functions](./talks/02_UTC/06-Christian-Poveda.md) - Christian Poveda +0. [RFC: Secret types in Rust](./talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md) - Diane Hosfelt & Daan Sprenkels ### LATAM Block -0. [Learning Rust with Humility and in Three Steps](./talks/03_LATAM/01-Stefan-Baerisch.txt) - Stefan Baerisch -0. [Ochre: Highly portable GPU-accelerated vector graphics](./talks/03_LATAM/02-glowcoil.txt) - glowcoil -0. [The Anatomy of Error Messages in Rust](./talks/03_LATAM/03-Sean-Chen.txt) - Sean Chen -0. [Considering Rust for scientific software](./talks/03_LATAM/04-Max-Orok.txt) - Max Orok -0. [Project Necromancy: How to Revive a Dead Rust Project](./talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina.txt) - Micah Tigley & Carlo Supina -0. [Tier 3 Means Getting Your Hands Dirty](./talks/03_LATAM/06-Andrew-Dona-Couch.txt) - Andrew Dona-Couch -0. [Rust for Freshmen](./talks/03_LATAM/07-Colton-Donnelly.txt) - Colton Donnelly +0. [Learning Rust with Humility and in Three Steps](./talks/03_LATAM/01-Stefan-Baerisch-published.md) - Stefan Baerisch +0. [Ochre: Highly portable GPU-accelerated vector graphics](./talks/03_LATAM/02-glowcoil-published.md) - glowcoil +0. [The Anatomy of Error Messages in Rust](./talks/03_LATAM/03-Sean-Chen-published.md) - Sean Chen +0. [Considering Rust for scientific software](./talks/03_LATAM/04-Max-Orok-published.md) - Max Orok +0. [Project Necromancy: How to Revive a Dead Rust Project](./talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina-published.md) - Micah Tigley & Carlo Supina +0. [Tier 3 Means Getting Your Hands Dirty](./talks/03_LATAM/06-Andrew-Dona-Couch-published.md) - Andrew Dona-Couch +0. [Rust for Freshmen](./talks/03_LATAM/07-Colton-Donnelly-published.md) - Colton Donnelly ## License diff --git a/2020-global/llogiq.md b/2020-global/llogiq.md new file mode 100644 index 0000000..1d726e2 --- /dev/null +++ b/2020-global/llogiq.md @@ -0,0 +1,131 @@ +**RustFest Global 2020 Speaker Introductions** + +**keen** is, as the name implicates +very keen on how he creates +with code many kind +of lisps and behind +all this are, well, some Rusty crates + +**Yousuke Onoue** will teach +a way the Rust language can reach +the web, so assemble +make JavaScript tremble +and onwards go into the breach + +**Tomohiro Kato** does try +to get a Rust-based A.I. +into chips to embed +to get stuff on the net +here's hope that the circuits won't fry + +**Aki** has the unbreakable will +to get Rust to the metal, to mill +it into shapes snappy +make customers happy +though this battle is somewhat uphill + +**Tarun** helpfully teaches Rust +to newbies who start out and must +learn the concepts, the tools +and the various rules +until their own experience they trust + +**Zac Burns** wants to serialize +some ideas that we all won't despise +into one talk to make +us see what it will take +to make code easier to realize + +**Jin Mingjian** uses Rust to enhance +some database apps' performance +as he breaks apart +the state of the art +to make hashtables and b-trees dance + +❖ + +**Nikita** makes Rust interactive +so if learning it is your directive +you won't need to fight +to see what's inside +to become a debugging detective + +**Aïssata Maiga** lets me know +how to make bots without Arduino +writing Rust to move +her bot to my groove +Sure there will be some cool stuff to see, no? + +**Daan and Diane** get us to the hype +Of keeping secrets in a type +Disallowing creation +of some optimization +that just might tell the feds what you type + +**Gavin and Matthijs** show how one might +a large project in Rust rewrite +start out small, let it grow +until stealing the show +from whatever was there before, right? + +**Vivian** wants us to be safe +and our code on the web to behave +use Rust to generate +code that will validate +risky inputs, no need to be brave + +Miri ist Rust's interpreter +And **Christian** will gladly debate'er +On how to bequeath +her the stuff underneath +so she can run until much later + +**Anastasia** plays Rust like a flute +or maybe a magical lute +to then simulate +things that art may create +and this art does really compute + +❖ + +**Stefan** gives us three steps to learn Rust +Not saying that follow you must, +but if humble you are +with Rust you'll go far +as you learn the compiler to trust + +**Glowcoil** shows how vectors can act +to create a great UI, in fact +they are easy to do +on a slow GPU +and they won't fall together when stacked + +**Sean Chen** wants to show the appeal +of nicely with errors to deal +seeing rustc's example +there really are ample +suggestions you really should steal + +**Max Orok** shows science, not fiction +and Rust ain't no contradiction +it sure won't spill your beans, +so use it by all means +if permitted by your jurisdiction + +**Carlo Supina and Micah** strive +left for dead Rust projects to revive +by making it dress +up with an ECS +now it's perfectly looking alive. + +**Andrew Dona-Couch** will now go +to the farthest reach of Rust to show +if you're willing to get +your coding feet wet +Tier three has got some room to grow + +**Colton Donelly** takes Rust to school +to show freshmen the language is cool +and capable, fun +great to fail or to run +all in all it's a great teaching tool diff --git a/2020-global/talks/01_APAC/04-Aki-published.md b/2020-global/talks/01_APAC/04-Aki-published.md new file mode 100644 index 0000000..79a3f6a --- /dev/null +++ b/2020-global/talks/01_APAC/04-Aki-published.md @@ -0,0 +1,44 @@ +**Using Rust in Metal Fabrication** + +**Bard:** +Aki has the unbreakable will +to get Rust to the metal, to mill +it into shapes snappy +make customers happy +though this battle is somewhat uphill + + +**Aki:** +All right. Thank you very much. Good day, everybody. Thank you very much for joining me today. My name is Aki. Today I would like to share some of my experiences using Rust in metal fabrication. Let's get started. Of course, I am not referring to this kind of oxidized kind of Rust which is generally not very desirable in metal fabrication. The oxygen in the air reacts with iron to form the weaker substance we know as rust. Today I would like to explore uses of a more desirable rust. The Rust programming language and share my experience using it to develop enterprise software. Today's talk will be a primarily non technical perspective on using Rust in the development of enterprise software. What is metal fabrication anyways? Well, it refers to the manufacturing process that gives us these goodies in life. Large parts of this Ferari are made from metal. The engine blocks are carved from large block of metal and the hub caps are created through metal fabrication processes. It also brings us the metal factory equipment that bottles our beloved Coca-Cola drinks and the belt conveyer that run the Amazon warehouses that bring the boxes to our front doors. And of course, our beloved gaming rigs with the metal power supplies and the cases all made out of sheet metal bent and cut into just the right shapes. + +So, let's take a look at what the processes are that are used to modify the raw metal into the these goods that we know and love. I did what anybody would do and asked Google image search what metal fabrication was. As you can see, a lot of metal and lots of sparks. On the right here, we see a welder. This is a process known as welder where metal parts are melted to form one solid block. Here with more sparks we see a laser cutter which uses high intensity laser beams to cut shapes out of sheets of metal. As we can imagine, these are expensive, high precision machines and using this manufacturing equipment is very, very important to the success of any manufacturing operation so the metal fabrication industry benefits quite a bit from the use of software to run the business. Software is used throughout all phases of metal fabrication from design and simulation on the left all the way to final delivery on the right. + +In design, we have CAD 3D model, computer-aided design, where the end product is designed on the computer using simulation tools for aero dynamic testing and whatnot as well as something we software developers know as version control, the equivalent of making sure that various iterations are tracked and the differences can be seen. Once the design is complete we need to go purchase the raw materials. This requires the exchange of money and that requires workflows and integration with counting systems. Once the materials are on hand, then we need to make sure we utilize these expensive resources, the manufacturing equipment and scheduling workers and the machines to optimally plan the manufacturing system. Most -- once it is created we need to deliver it to requiring this forklift and track inventory and know when it is finally delivered. Most of the information in today's talk is around my experience developing the right hand three and developing software for these areas. Purchasing, manufacturing, and logistics. These three areas are created primarily when we organize that information and +track the flow of physical goods and the money. + +A brief overview of the system architecture. My development team builds enterprise software using the browser as the primary interface so our architecture looks very much like a web application with a front-end and some back-end processes. But one of the most important things that we have done is in enterprise software the Des Moines -- the domain is complex and we are achieving a higher level of efficiency separating this out. On the left is the front-end which is designed to optimaly present data. The BFF is a TypeScript node.js process its role is to fetch data from wherever the data is whether it is an API, another system, a third party API, whatever it is its job is to go and get the data. And finally, the backend. This is written in Rust. It's goal is to enforce that the main model and maintain data integrity. This is the area that I would like to talk about today. + +First, I would like to cover some of the technical aspects of using Rust specifically using Rust in the domain driven design sense. And second, our impressions of the library ecosystem. The main driven design is one of the most important concepts we value in the development of enterprise software and it is a coin termed by Eric evans in this 2003 book emphasizing the importance of for software developers to understand the real world domain and try to model and express it in software. So, as an example, in metal fabrication we use paint to prevent rusting, given the various paints and colors, let try to create a data structure to express the color of paint in Rust. RGB is a way that we use often in web development and let's say 8-bit colors can be used. Take in printing and we may need to use CMYK so another expression of color with a different set of tuples. Coming from C++, this way of expressing typed unions in Rust has been incredibly powerful, but as with all things in the real world, things were not as easy as they seem. We found in certain areas of the industry, our customers didn't know what color they wanted or were not able to give us an RBG or a CMYK color they wanted. They had a physical product they made and wanted the same color as that. As an industry and a manufacturer, we need to make sure our enterprise system would be able to handle this new way of specifying color so Rust enum allowed us to do this. Create a way to express physical color sample. This was a simple way to know we needed to get the physical color sample from the customer. Handling this color is easy with a match statement. We can easily decompose each RGC or CMYK and utilize them appropriately. And the Rust compiler and the ability for the match statement to tell us when we are missing things has actually saved us many, many times. Here, the commented outline, for physical color sample in the match statement would generate a compiler error and so if somebody modifying the domain model, or the actual struct, did not realize all of the places where it was being used, we could simply ask the compiler to detect it for us. + +The next aspect after enums that we have utilized extensively is using phantom data types as markers for compile data type checking. We use money in enterprise and we use multiple currencies and so if we were to have a money struct here, of course, we would have an amount as some kind of decimal value and the currency could be held as some kind of string but that would require us to do run time checks which are, of course, sometimes necessary but when we have business logic, coded in, we often want to have compile time checking. By using phantom data types here, we can create emums for a Japanese Yin or U.S. dollar so when we create a statement like this where we try to add the Japanese Yin to the U.S. dollar we would get a friendly compile error that tells us exactly what happened. We use this not just for monetary types but also for physical types such as meters, length versus volume versus area. It doesn't make sense to add volume to a length, for example, and using these kind of phantom data types has helped us extensively and optimized our development process. + +Next, I would like to talk about some of our impressions of the library ecosystem. The good parts. We actually found a very high-quality coverage across our needs. We use rabbitMQ, Redis and Postgres and gRPC. We have listed libraries we have used and had good experiences with them. As for the wishes that we had, it would be great -- it would have been great if we could find more production use-cases because production ready can be a very subjective term but the real world -- using a real-world product as an example really adds concreteness to it. We started using Tower-g rec pc because linkerd was using it. + +When async/await rolled around we were not sure about jumping on the bandwagon and in the end we don't use it. We horizontally scale until we can handle it. Some guidance as to where the async/await is going is interesting and we are watching the community to see where things are going right now. And finally, I would like to cover some things that are non-technical. Primarily, hiring for Rust, new engineering training, and how to maintain a good community within the development and finally, a bit about the community work. Deciding what qualities to look for in a candidate when your technology stack uses Rust is incredibly challenging because it is such a bleeding edge technology and so new that compared to other languages that have been around for a lot longer it can be hard to figure out what to write in a job description. Here in Japan, one of the ways that we have decided to do hiring is to look for the core aspects of what are the attributes that make somebody a good Rust engineer. + +For us, what worked and has worked for the last year or two, is the familiarity with a type system with traits and generics or something resembling that such Java or TypeScript as well as functions of functional programming and familiar with computer architecture and management and low level aspect of how programs run. Of course, if somebody had all three of these aspects, they probably already have touched Rust a little but, but we have found many engineers from other languages have been very, very successful using Rust in the development of enterprise software with us and it was because of their strengths in one or more of these areas. And as we with many engineers, we don't write code just to write code but we write good code to make a change. Enterprise can be kind of a black box so we have made efforts to make sure people know what we make using Rust and what is Rust being used for and how are we optimizing the metal fabrication supply chain using Rust? And once we have new hires, training them has been required new kinds of efforts because sometimes relearning new foundational concepts and learning to talk with the compiler. We have found once people get used to talking to the compiler they don't need very much hand holding. They will get up to speed very, very quickly. And finally, Rust is good at some things and not as good at some other things so making sure everybody is aware of how our development team uses Rust and where we choose not to use Rust. In the development of the frontend it is probably much easier to use React than to use Rust. And as a development team, having a critical mass of Rustaceans is critically important because it allows us to create a community within the development team and fostering a culture because the ecosystem is changing constantly and the language is always evolving. We have found fostering this kind of culture has been very, very important to the success of using Rust within our development team. And finally, the success of the Rust community contributes to the success of the business and our development team. So as an enterprise software development corporation, it is important that we participate in the community and so, with the pandemic, here in Tokyo a lot of things started shutdown in March so in April we started an online Meetup called Shitamachi-rs. We believe it is important to keep the community alive and make sure there is constant innovation and that people are always engaged. + +And with that, I would like to give a few closing words about using Rust in metal fabrication. Metal fabrication itself is a very, very complicated realworld domain but Rust has been an amazing language for developing this software. The library ecosystem maturity has been actually good enough for us but at the beginning it was a bit difficult for us to gauge. Using Rust itself has been technically an amazing experience and from a non-technical perspective, different kind of efforts were required for hiring and training to maintain this kind of efficiency in our development but the friendly and welcoming committee has been absolutely wonderful. The resources are available in Japanese as well. So, as my last words, let's, as corporations and any enterprise user, or any commercial entity, or any development team for that matter, I think it is really, really important to share real-world use-cases of Rust and to really support the Rust of the community because the success of the community gives back to the success of every project that we work on. So thank you very much for your time today. It was a pleasure. + + +**Moderator:** +Thanks, Aki. If you have any questions, please, leave comments to the chat. Anyway, there are some questions... Aki, Do your Rust structures touch or mirror your Typescript types in your BFF layer in any ways? + +**Aki:** +The Rust structures are reflected in the TypeScript layer, in fact. It requires -- we use GRPC in between and we can share some of those definition files so that we can do it easily and translate them to both sides. However, of course, the type systems are not exactly the same in Rust and TypeScript, so there is a little bit of messaging we need to do to pac sure we can handle it but Rust and TypeScript have similar notions and that has been instrumental in optimizing or to increase the efficiency of our development process. + +**Moderator:** +Any other questions? OK. Thank you, Aki. That's all. + +**Aki:** +All right. Thank you very much. diff --git a/2020-global/talks/01_APAC/04-Aki.txt b/2020-global/talks/01_APAC/04-Aki.txt deleted file mode 100644 index 9d64e63..0000000 --- a/2020-global/talks/01_APAC/04-Aki.txt +++ /dev/null @@ -1,7 +0,0 @@ -MODERATOR: Aki has the unbreakable will to get Rust to the metal and to mill it into shapes snappy and make customers happy though this battle is somewhat uphill. -AKI: All right. Thank you very much. Good day, everybody. Thank you very much for joining me today. My name is Aki. Today I would like to share some of my experiences using Rust in metal fabrication. Let's get started. Of course, I am not referring to this kind of oxidized kind of Rust which is generally not very desirable in metal fabrication. The oxygen in the air reacts with iron to form the weaker substance we know as rust. Today I would like to explore uses of a more desirable rust. The Rust programming language and share my experience using it to develop enterprise software. Today's talk will be a primarily non technical perspective on using Rust in the development of enterprise software. What is metal fabrication anyways? Well, it refers to the manufacturing process that gives us these goodies in life. Large parts of this Ferari are made from metal. The engine blocks are carved from large block of metal and the hub caps are created through metal fabrication processes. It also brings us the metal factory equipment that bottles our beloved Coca-Cola drinks and the belt conveyer that run the Amazon warehouses that bring the boxes to our front doors. And of course, our beloved gaming rigs with the metal power supplies and the cases all made out of sheet metal bent and cut into just the right shapes. So, let's take a look at what the processes are that are used to modify the raw metal into the these goods that we know and love. I did what anybody would do and asked Google image search what metal fabrication was. As you can see, a lot of metal and lots of sparks. On the right here, we see a welder. This is a process known as welder where metal parts are melted to form one solid block. Here with more sparks we see a laser cutter which uses high intensity laser beams to cut shapes out of sheets of metal. As we can imagine, these are expensive, high precision machines and using this manufacturing equipment is very, very important to the success of any manufacturing operation so the metal fabrication industry benefits quite a bit from the use of software to run the business. Software is used throughout all phases of metal fabrication from design and simulation on the left all the way to final delivery on the right. In design, we have CAD 3D model, computer-aided design, where the end product is designed on the computer using simulation tools for aero dynamic testing and whatnot as well as something we software developers know as version control, the equivalent of making sure that various iterations are tracked and the differences can be seen. Once the design is complete we need to go purchase the raw materials. This requires the exchange of money and that requires workflows and integration with counting systems. Once the materials are on hand, then we need to make sure we utilize these expensive resources, the manufacturing equipment and scheduling workers and the machines to optimally plan the manufacturing system. Most -- once it is created we need to deliver it to requiring this forklift and track inventory and know when it is finally delivered. Most of the information in today's talk is around my experience developing the right hand three and developing software for these areas. Purchasing, manufacturing, and logistics. These three areas are created primarily when we organize that information and track the flow of physical goods and the money. A brief overview of the system architecture. My development team builds enterprise software using the browser as the primary interface so our architecture looks very much like a web application with a front-end and some back-end processes. But one of the most important things that we have done is in enterprise software the Des Moines -- the domain is complex and we are achieving a higher level of efficiency separating this out. On the left is the front-end which is designed to optimaly present data. The BFF is a TypeScript node.js process its role is to fetch data from wherever the data is whether it is an API, another system, a third party API, whatever it is its job is to go and get the data. And finally, the backend. This is written in Rust. It's goal is to enforce that the main model and maintain data integrity. This is the area that I would like to talk about today. First, I would like to cover some of the technical aspects of using Rust specifically using Rust in the domain driven design sense. And second, our impressions of the library ecosystem. The main driven design is one of the most important concepts we value in the development of enterprise software and it is a coin termed by Eric evans in this 2003 book emphasizing the importance of for software developers to understand the real world domain and try to model and express it in software. So, as an example, in metal fabrication we use paint to prevent rusting, given the various paints and colors, let try to create a data structure to express the color of paint in Rust. RGB is a way that we use often in web development and let's say 8-bit colors can be used. Take in printing and we may need to use CMYK so another expression of color with a different set of tuples. Coming from C++, this way of expressing typed unions in Rust has been incredibly powerful, but as with all things in the real world, things were not as easy as they seem. We found in certain areas of the industry, our customers didn't know what color they wanted or were not able to give us an RBG or a CMYK color they wanted. They had a physical product they made and wanted the same color as that. As an industry and a manufacturer, we need to make sure our enterprise system would be able to handle this new way of specifying color so Rust enum allowed us to do this. Create a way to express physical color sample. This was a simple way to know we needed to get the physical color sample from the customer. Handling this color is easy with a match statement. We can easily decompose each RGC or CMYK and utilize them appropriately. And the Rust compiler and the ability for the match statement to tell us when we are missing things has actually saved us many, many times. Here, the commented outline, for physical color sample in the match statement would generate a compiler error and so if somebody modifying the domain model, or the actual struct, did not realize all of the places where it was being used, we could simply ask the compiler to detect it for us. The next aspect after enums that we have utilized extensively is using phantom data types as markers for compile data type checking. We use money in enterprise and we use multiple currencies and so if we were to have a money struct here, of course, we would have an amount as some kind of decimal value and the currency could be held as some kind of string but that would require us to do run time checks which are, of course, sometimes necessary but when we have business logic, coded in, we often want to have compile time checking. By using phantom data types here, we can create emums for a Japanese Yin or U.S. dollar so when we create a statement like this where we try to add the Japanese Yin to the U.S. dollar we would get a friendly compile error that tells us exactly what happened. We use this not just for monetary types but also for physical types such as meters, length versus volume versus area. It doesn't make sense to add volume to a length, for example, and using these kind of phantom data types has helped us extensively and optimized our development process. Next, I would like to talk about some of our impressions of the library ecosystem. The good parts. We actually found a very high-quality coverage across our needs. We use rabbitMQ, Redis and Postgres and gRPC. We have listed libraries we have used and had good experiences with them. As for the wishes that we had, it would be great -- it would have been great if we could find more production use-cases because production ready can be a very subjective term but the real world -- using a real-world product as an example really adds concreteness to it. We started using Tower-g rec pc because linkerd was using it. -When async/await rolled around we were not sure about jumping on the bandwagon and in the end we don't use it. We horizontally scale until we can handle it. Some guidance as to where the async/await is going is interesting and we are watching the community to see where things are going right now. And finally, I would like to cover some things that are non-technical. Primarily, hiring for Rust, new engineering training, and how to maintain a good community within the development and finally, a bit about the community work. Deciding what qualities to look for in a candidate when your technology stack uses Rust is incredibly challenging because it is such a bleeding edge technology and so new that compared to other languages that have been around for a lot longer it can be hard to figure out what to write in a job description. Here in Japan, one of the ways that we have decided to do hiring is to look for the core aspects of what are the attributes that make somebody a good Rust engineer. For us, what worked and has worked for the last year or two, is the familiarity with a type system with traits and generics or something resembling that such Java or TypeScript as well as functions of functional programming and familiar with computer architecture and management and low level aspect of how programs run. Of course, if somebody had all three of these aspects, they probably already have touched Rust a little but, but we have found many engineers from other languages have been very, very successful using Rust in the development of enterprise software with us and it was because of their strengths in one or more of these areas. And as we with many engineers, we don't write code just to write code but we write good code to make a change. Enterprise can be kind of a black box so we have made efforts to make sure people know what we make using Rust and what is Rust being used for and how are we optimizing the metal fabrication supply chain using Rust? And once we have new hires, training them has been required new kinds of efforts because sometimes relearning new foundational concepts and learning to talk with the compiler. We have found once people get used to talking to the compiler they don't need very much hand holding. They will get up to speed very, very quickly. And finally, Rust is good at some things and not as good at some other things so making sure everybody is aware of how our development team uses Rust and where we choose not to use Rust. In the development of the frontend it is probably much easier to use React than to use Rust. And as a development team, having a critical mass of Rustaceans is critically important because it allows us to create a community within the development team and fostering a culture because the ecosystem is changing constantly and the language is always evolving. We have found fostering this kind of culture has been very, very important to the success of using Rust within our development team. And finally, the success of the Rust community contributes to the success of the business and our development team. So as an enterprise software development corporation, it is important that we participate in the community and so, with the pandemic, here in Tokyo a lot of things started shutdown in March so in April we started an online Meetup called Shitamachi-rs. We believe it is important to keep the community alive and make sure there is constant innovation and that people are always engaged. And with that, I would like to give a few closing words about using Rust in metal fabrication. Metal fabrication itself is a very, very complicated realworld domain but Rust has been an amazing language for developing this software. The library ecosystem maturity has been actually good enough for us but at the beginning it was a bit difficult for us to gauge. Using Rust itself has been technically an amazing experience and from a non-technical perspective, different kind of efforts were required for hiring and training to maintain this kind of efficiency in our development but the friendly and welcoming committee has been absolutely wonderful. The resources are available in Japanese as well. So, as my last words, let's, as corporations and any enterprise user, or any commercial entity, or any development team for that matter, I think it is really, really important to share real-world use-cases of Rust and to really support the Rust of the community because the success of the community gives back to the success of every project that we work on. So thank you very much for your time today. It was a pleasure. -MODERATOR: Thanks, Aki. If you have any questions, please, leave comments to the chat. Anyway, there are some questions... Aki, Do your Rust structures touch or mirror your Typescript types in your BFF layer in any ways? -AKI: The Rust structures are reflected in the TypeScript layer, in fact. It requires -- we use GRPC in between and we can share some of those definition files so that we can do it easily and translate them to both sides. However, of course, the type systems are not exactly the same in Rust and TypeScript, so there is a little bit of messaging we need to do to pac sure we can handle it but Rust and TypeScript have similar notions and that has been instrumental in optimizing or to increase the efficiency of our development process. -MODERATOR: Any other questions? OK. Thank you, Aki. That's all. -AKI: All right. Thank you very much. \ No newline at end of file diff --git a/2020-global/talks/01_APAC/05-Tarun-Pothulapti-published.md b/2020-global/talks/01_APAC/05-Tarun-Pothulapti-published.md new file mode 100644 index 0000000..c4140aa --- /dev/null +++ b/2020-global/talks/01_APAC/05-Tarun-Pothulapti-published.md @@ -0,0 +1,47 @@ +**Piecing together Rust: It is more than just writing code** + +**Bard:** +Tarun helpfully teaches Rust +to newbies who start out and must +learn the concepts, the tools +and the various rules +until their own experience they trust + + +**Tarun:** +Hello, everyone, my name is Tarun and in today's talk we will cover the basic principles and tools to get started with Rust without having to do it the hard way. First, let me introduce myself. My name is Tarun and I am an engineer at Buoyant. We have the makers of Linkerd. Previously I was an intern at CNCF. I work on Golang in my current job. Our proxy is domain Rust and we use Rust were other reasons. I also try to contribute to projects in Rust like the tracing project. Once COVID started I thought I would share my learnings so it would be easier for other folks. This talk is in three stages. + +Let's first see the installation. Rust has a great installation experience. Once you get to tool you can use it in multiple ways. Rustup is a toolchain multipleasure -- multiplexor. It installs and managing multiple Rust toolchains. All the tools that make the Rust programming language are there. Each tool chain is like a package and you have multiple variants. One type of variant uses the channels. Rust has three different cycles, stable, beta and nightly and these have releases. The stable release happens every six weeks, and beta happens before every stable release and the nightly is daily. We will see an example. I am using Rust tool chain to install a specific version. We are installing this into our local package. It is still not the run you will use when you run Cargo or any other Rust tool. You will use the Rust default command to make that as the default. Rust internally uses -- whenever you run Cargo it chooses the default tool chain. It is a multiplexor. Let's talk about the compilation and talk about formatting and linting. + +For formatting Rust provides a tool that allows you to style your Rust code on official Rust guidelines. Whenever you run Cargo it runs from the tool internally and fixes all of your code to follow the same guideline. This is useful because the project will be in the same format for the developers to easily read and understand them without having to figure out stuff. Next, Rust has a tool called rust-clippy which minds common mistakes and also find improvements. It has over 400 lint includes and these are pretty awesome and I highly suggest you to have clippy in your TAR and at least development work so you can find common bugs or common improvements. Here I have an example where we have a variable set to 0. Inside we are not updating the variable. It is mostly infinite loop. Now you can run this code and it will run. But if you have clippy and run Cargo clippy it will tell you that the variable that the depends on is not there. It finds better lints than this one. I highly recommend checking out that. + +Next we will talk about IDE experience which is very important it makes developer's job easier. We have rust-analyzer which is a relatively new tool. It is a common open source that allows you multiple compiler frontends to use a common language server so that the language service don't have to be built for each compile or idea. All the compilers talk to the same language server using the language server protocol. One language server can be used with many other IDEs. Using a language server you will get -- this is pretty awesome. Previously there was a tool in Rust doing the same job but it was slow especially for bigger ones. It runs the Cargo compeller, the full Cargo on everything. That is pretty hard right. It gets the JSON documents and then it helps you understand how to do things. As you know for bigger products, leverages are just able to be more dynamic and perform analysis only on the code and it is pretty fast that way. Next we will talk about documentation. Rust documentation is one of my favorite parts because there is a standard way enforced on how to write documentation so it is common across many libraries. I think having this standardized tool made it very easy to ad documentation and that is why we see a lot more docs in the Rust ecosystem. We have a tool called rustdoc. All your documentation is annotated on top of the code so that you don't have to do it separate. Using rustdoc you can generate a site with the UI of all the data. The dashes are used as the syntax for the doc. + +Here we have an implementation with two functions. Each type is annotated with the three slashes. For the first instruct, we have a comment called a human being and instead of the person, we have new and hello. For the new function we have an explanation but also arguments and examples. All these comments are for the markdown so that it is easier. Once you have this code with these types of annotations and you run Cargo docs you get this output. This is the final HTML site that Cargo docs generates with all of the type data and the comments are now converted into documentation which is pretty awesome to look at and pretty useful. Documentation is awesome. Compilation there is not much to talk about. If you do the release, they are present in the folders. + +Next we will talk about testing. In Rust, you have two ways of writing tests. The unit test and integration test. The unit test -- for the integration test you have to put them in the test directory especially because you want to be able to test your code like an external and binary uses. It helps you by having them in the test folder. Once you have those tests, every test is attributed so whenever you run Cargo test it recognizes functions that are test and runs them as a binary and outputs the results. So, for example, here we have a module with the function called it works. The first is a configure environment. If it is not a test it is not compiled. That is pretty good. We have a test and the test passes. Whenever we run Cargo test, that test is run and the output is recorded. There are arguments on specific tests. Next is package management. Rust package management is a pretty full-fledged package tool by Cargo -- package management is talking about Cargo. Cargo is a full-fledged tool. We have been using it for various things. + +First we will talk about dependency management. Cargo is more than a dependency management. What is Cargo? Cargo allows you to manage dependencies and have repeatable builds. It does this with two files. In the developer facing part, you edit the package. The Cargo is used by the compiler to maintain the state of the project. But Cargo also introduces a package layout that will follow. Cargo is like an umbrella to do most of the operations like testing docs. We will see an example. We have a Cargo file here that is for example package. Whenever we Cargo build, these packages are fetched and linked with that library so the R package can use them. Next we will talk about workspaces. As we saw previously there are use-cases of multiple packages. You may have divided your library into multiplies as it grows and gets better. Workspace allows you to group multiple packages. This is done by creating a package in the binary or library crate. They share the same common Cargo.toml. We have there workspace including all these tracings and they can be binary or library plates. + +Next we will talk about features. One of my favorite features in Rust is the compiler feature flex which allows you to effect the compilation essentially. For example, I am the owner of a library and I want to offer people a variant of my library. I want to offer a variant where I don't take a different -- standard library. This is important for me because I want the users of my library to not have to take dependencies because I am taking a dependency. I want to offer variant of my library where I am not taking a dependency. This is useful for embedded systems. Offer multiple variant of their libraries. For example, they can offer a variant where there is no dependency on printing Etc. First, the important thing is to note if feature is a package it is an optional dependency or a set of other features. Now we will see an example. We have a -- this is taken from the tracing grid. In the dependency section, we can see we take a dependency on the laser static and it is optional. Now in the feature section, we have two. Alloc and std. Lazy static internally depends on a grid and internally. Because you are taking a dependence on lazy static -- if it wasn't for the feature Alloc they are not taking that. Alloc doesn't take a dependence on any library. You can ask me how is my grid able to offer two variants? How is the code part separated, essentially? This is done by using the cfg table. We have the implementation of inner module. In the first case, where we annotated, this is included in the feature std is enabled and in the second case not. This happens when std is not enabled. Different dependency or without that dependency and in a more hard format for you. The consumers of this library will prefer that feature flag. In their binary grid, they depend on the tracing code package and the first disable the default features because they don't want to opt-into the default feature and disable them and enable the Alloc feature. They want that variant of the library and they want the variant where there is no dependency on lazy stack ge this is awesome because it allows you to have multiple variants of the package to support multiple use-cases and systems. + +Next we will talk about binary management. Cargo tool is also very extensible in itself. You can use Cargo to essentially run external binary tools. This is done whenever you run Cargo and a binary names. If it is not present it expands it into cargo-expand meaning Cargo is expandable on it's own. How is this possible? -- its. This is made easier by having the Cargo install which can be used to install binaries. These binaries are installed to home-Cargo. We will try to install the Cargo expand binary. Whenever we install Cargo expand tool, we look at the dash, and that tool is envoked and the Cargo expand tool is an awesome tool. This helped me in learning Rust. + +As you can see, the printer lint function is expanded. Now let's talk about debugging. In debugging first we will tart with logging, tracing and then GDB. First, logging. In Rust, there is a crate called log that provides a simple API with multiple log levels that you can use to emit events. It abstracts over the actual logging implementation. If your library is using the log grid, the macros from the lock grid, there is no default implementation log implementation. It does not emit. So the consumer of the library, like it could be a binary force would force the log implementation used by itself and other dependencies of that grid. Essentially, the consumer can decide which log implementation and the logging library will just use the API. The order for the library is pretty small. It provides a very simple API to have your own log implementation. Here is an example using the macros from the log grid and in the third line we are using to report action. You can use the log levels and emit events in your libraries or binaries essentially. The logging implementation is in the main file. You can use the setlogger function to set the logger. The logger is implementation that all the formats would send the events to. If a project is a library, you will not set this and there is no image because you are not able to run the library on itself. Once you use that library as a dependency and binary project, they will use that function to set the log implementation which is already like that. + +Here is an example of some logs that were instrumented using the log grid. It is a very simple set of logs. Next we will talk about tracing. As you saw, the logs are pretty simple and pretty hard to comprehend because there is no contextual information. Tracing allows you to have contextual information. It is more than logging library but provides the same simple API for consumers. It introduces a new primitive called Span. Performing a function is a Span. These spans can have -- a function span has multiple sub-spans. This provides distributed implementation or asynchronous systems like Tokyo and Etc. You don't have to change a lot. You replace the log trace with the tracing and everything should work. They both use the same API essentially. As we saw, as I mentioned, the spans, the parent spans. Here we are instrumenting the connect two function with the instrument attribute. It will expand automatically once the span is started and it will close the span once the function is ended. On the trace events that are happening inside spans diameter are essentially function spans. Now this will produce more contextual logs like this. We have the load function which has multiple requests and error unknown and this means that there is more contextual data added to your logs rather than a single log message that is very hard to comprehend on which lifecycle. Next we will talk about the debugging. GNU project debugger allows you to understand what is on in the program whiles it is executing. It allows you to retrieve information. It has a support for multiple languagesism there is the Rust-gdb which is a wrapper to provide a more easier understanding. + +We have the same similar program here that runs in a loop, essentially. Here we use the gdb tool to compile that. We have to compile that in debug mode to get the symbols out. If you want to do the same, you can enable debugging in the Cargo and you will get the symbols in your binary and then run that using the gdb command and wrap that and once you do that then gdb provides an interface to interact. We set a break point at line 8. Then we are running the program using the R command. Once you do that, the program is running until the 8th line and then it break the 8th line and essentially we can print here and it is three. It allows you to have run time capabilities on what is happening in the run time. This is very useful if you are trying to find hard bugs in runtime. Gdb provides various other features. It is a pretty standard reject -- project for debugging in Linux. All these tools should work with Rust. Thank you. Now I will take questions. + + +**Moderator:** +OK. Thanks, Tarun, for quite the informative talk. Yeah, it is covering I would say a lot of areas and various topics. It looks like it should be a good overview for the newbies. It should be informative. It was informative to me as well. We are running out of time. But I have one question. This question come from the chat. Any front end on gdb you like to use? + +**Tarun:** +I don't work on complex systems. I know there are a lot on gdb you can use but I am not sure I can add anything here. One other thing I wanted to end my talk is we covered a lot of things so if you have not understood anything, please, don't feel intimidated. There are good resources online for Cargo and books for everything. Feel free to check them out to make sure your understanding is correct. There are great resources on it for sure. + +**Moderator:** +Yeah, for sure, we have a lot of good documents online. OK. Thanks, again, for a good presentation. And next session starting in 10 minutes. See you later. OK. + +**Tarun:** +Thank you, everyone. diff --git a/2020-global/talks/01_APAC/05-Tarun-Pothulapti.txt b/2020-global/talks/01_APAC/05-Tarun-Pothulapti.txt deleted file mode 100644 index c7dbabe..0000000 --- a/2020-global/talks/01_APAC/05-Tarun-Pothulapti.txt +++ /dev/null @@ -1,8 +0,0 @@ - - ->> And must learn the concepts, the tools and the various rules, until their own experience they trust. ->> Hello, everyone, my name is Tarun and in today's talk we will cover the basic principles and tools to get started with Rust without having to do it the hard way. First, let me introduce myself. My name is Tarun and I am an engineer at Buoyant. We have the makers of Linkerd. Previously I was an intern at CNCF. I work on Golang in my current job. Our proxy is domain Rust and we use Rust were other reasons. I also try to contribute to projects in Rust like the tracing project. Once COVID started I thought I would share my learnings so it would be easier for other folks. This talk is in three stages. Let's first see the installation. Rust has a great installation experience. Once you get to tool you can use it in multiple ways. Rustup is a toolchain multipleasure -- multiplexor. It installs and managing multiple Rust toolchains. All the tools that make the Rust programming language are there. Each tool chain is like a package and you have multiple variants. One type of variant uses the channels. Rust has three different cycles, stable, beta and nightly and these have releases. The stable release happens every six weeks, and beta happens before every stable release and the nightly is daily. We will see an example. I am using Rust tool chain to install a specific version. We are installing this into our local package. It is still not the run you will use when you run Cargo or any other Rust tool. You will use the Rust default command to make that as the default. Rust internally uses -- whenever you run Cargo it chooses the default tool chain. It is a multiplexor. Let's talk about the compilation and talk about formatting and linting. For formatting Rust provides a tool that allows you to style your Rust code on official Rust guidelines. Whenever you run Cargo it runs from the tool internally and fixes all of your code to follow the same guideline. This is useful because the project will be in the same format for the developers to easily read and understand them without having to figure out stuff. Next, Rust has a tool called rust-clippy which minds common mistakes and also find improvements. It has over 400 lint includes and these are pretty awesome and I highly suggest you to have clippy in your TAR and at least development work so you can find common bugs or common improvements. Here I have an example where we have a variable set to 0. Inside we are not updating the variable. It is mostly infinite loop. Now you can run this code and it will run. But if you have clippy and run Cargo clippy it will tell you that the variable that the depends on is not there. It finds better lints than this one. I highly recommend checking out that. Next we will talk about IDE experience which is very important it makes developer's job easier. We have rust-analyzer which is a relatively new tool. It is a common open source that allows you multiple compiler frontends to use a common language server so that the language service don't have to be built for each compile or idea. All the compilers talk to the same language server using the language server protocol. One language server can be used with many other IDEs. Using a language server you will get -- this is pretty awesome. Previously there was a tool in Rust doing the same job but it was slow especially for bigger ones. It runs the Cargo compeller, the full Cargo on everything. That is pretty hard right. It gets the JSON documents and then it helps you understand how to do things. As you know for bigger products, leverages are just able to be more dynamic and perform analysis only on the code and it is pretty fast that way. Next we will talk about documentation. Rust documentation is one of my favorite parts because there is a standard way enforced on how to write documentation so it is common across many libraries. I think having this standardized tool made it very easy to ad documentation and that is why we see a lot more docs in the Rust ecosystem. We have a tool called rustdoc. All your documentation is annotated on top of the code so that you don't have to do it separate. Using rustdoc you can generate a site with the UI of all the data. The dashes are used as the syntax for the doc. Here we have an implementation with two functions. Each type is annotated with the three slashes. For the first instruct, we have a comment called a human being and instead of the person, we have new and hello. For the new function we have an explanation but also arguments and examples. All these comments are for the markdown so that it is easier. Once you have this code with these types of annotations and you run Cargo docs you get this output. This is the final HTML site that Cargo docs generates with all of the type data and the comments are now converted into documentation which is pretty awesome to look at and pretty useful. Documentation is awesome. Compilation there is not much to talk about. If you do the release, they are present in the folders. Next we will talk about testing. In Rust, you have two ways of writing tests. The unit test and integration test. The unit test -- for the integration test you have to put them in the test directory especially because you want to be able to test your code like an external and binary uses. It helps you by having them in the test folder. Once you have those tests, every test is attributed so whenever you run Cargo test it recognizes functions that are test and runs them as a binary and outputs the results. So, for example, here we have a module with the function called it works. The first is a configure environment. If it is not a test it is not compiled. That is pretty good. We have a test and the test passes. Whenever we run Cargo test, that test is run and the output is recorded. There are arguments on specific tests. Next is package management. Rust package management is a pretty full-fledged package tool by Cargo -- package management is talking about Cargo. Cargo is a full-fledged tool. We have been using it for various things. First we will talk about dependency management. Cargo is more than a dependency management. What is Cargo? Cargo allows you to manage dependencies and have repeatable builds. It does this with two files. In the developer facing part, you edit the package. The Cargo is used by the compiler to maintain the state of the project. But Cargo also introduces a package layout that will follow. Cargo is like an umbrella to do most of the operations like testing docs. We will see an example. We have a Cargo file here that is for example package. Whenever we Cargo build, these packages are fetched and linked with that library so the R package can use them. Next we will talk about workspaces. As we saw previously there are use-cases of multiple packages. You may have divided your library into multiplies as it grows and gets better. Workspace allows you to group multiple packages. This is done by creating a package in the binary or library crate. They share the same common Cargo.toml. We have there workspace including all these tracings and they can be binary or library plates. Next we will talk about features. One of my favorite features in Rust is the compiler feature flex which allows you to effect the compilation essentially. For example, I am the owner of a library and I want to offer people a variant of my library. I want to offer a variant where I don't take a different -- standard library. This is important for me because I want the users of my library to not have to take dependencies because I am taking a dependency. I want to offer variant of my library where I am not taking a dependency. This is useful for embedded systems. Offer multiple variant of their libraries. For example, they can offer a variant where there is no dependency on printing Etc. First, the important thing is to note if feature is a package it is an optional dependency or a set of other features. Now we will see an example. We have a -- this is taken from the tracing grid. In the dependency section, we can see we take a dependency on the laser static and it is optional. Now in the feature section, we have two. Alloc and std. Lazy static internally depends on a grid and internally. Because you are taking a dependence on lazy static -- if it wasn't for the feature Alloc they are not taking that. Alloc doesn't take a dependence on any library. You can ask me how is my grid able to offer two variants? How is the code part separated, essentially? This is done by using the cfg table. We have the implementation of inner module. In the first case, where we annotated, this is included in the feature std is enabled and in the second case not. This happens when std is not enabled. Different dependency or without that dependency and in a more hard format for you. The consumers of this library will prefer that feature flag. In their binary grid, they depend on the tracing code package and the first disable the default features because they don't want to opt-into the default feature and disable them and enable the Alloc feature. They want that variant of the library and they want the variant where there is no dependency on lazy stack ge this is awesome because it allows you to have multiple variants of the package to support multiple use-cases and systems. Next we will talk about binary management. Cargo tool is also very extensible in itself. You can use Cargo to essentially run external binary tools. This is done whenever you run Cargo and a binary names. If it is not present it expands it into cargo-expand meaning Cargo is expandable on it's own. How is this possible? -- its. This is made easier by having the Cargo install which can be used to install binaries. These binaries are installed to home-Cargo. We will try to install the Cargo expand binary. Whenever we install Cargo expand tool, we look at the dash, and that tool is envoked and the Cargo expand tool is an awesome tool. This helped me in learning Rust. As you can see, the printer lint function is expanded. Now let's talk about debugging. In debugging first we will tart with logging, tracing and then GDB. First, logging. In Rust, there is a crate called log that provides a simple API with multiple log levels that you can use to emit events. It abstracts over the actual logging implementation. If your library is using the log grid, the macros from the lock grid, there is no default implementation log implementation. It does not emit. So the consumer of the library, like it could be a binary force would force the log implementation used by itself and other dependencies of that grid. Essentially, the consumer can decide which log implementation and the logging library will just use the API. The order for the library is pretty small. It provides a very simple API to have your own log implementation. Here is an example using the macros from the log grid and in the third line we are using to report action. You can use the log levels and emit events in your libraries or binaries essentially. The logging implementation is in the main file. You can use the setlogger function to set the logger. The logger is implementation that all the formats would send the events to. If a project is a library, you will not set this and there is no image because you are not able to run the library on itself. Once you use that library as a dependency and binary project, they will use that function to set the log implementation which is already like that. Here is an example of some logs that were instrumented using the log grid. It is a very simple set of logs. Next we will talk about tracing. As you saw, the logs are pretty simple and pretty hard to comprehend because there is no contextual information. Tracing allows you to have contextual information. It is more than logging library but provides the same simple API for consumers. It introduces a new primitive called Span. Performing a function is a Span. These spans can have -- a function span has multiple sub-spans. This provides distributed implementation or asynchronous systems like Tokyo and Etc. You don't have to change a lot. You replace the log trace with the tracing and everything should work. They both use the same API essentially. As we saw, as I mentioned, the spans, the parent spans. Here we are instrumenting the connect two function with the instrument attribute. It will expand automatically once the span is started and it will close the span once the function is ended. On the trace events that are happening inside spans diameter are essentially function spans. Now this will produce more contextual logs like this. We have the load function which has multiple requests and error unknown and this means that there is more contextual data added to your logs rather than a single log message that is very hard to comprehend on which lifecycle. Next we will talk about the debugging. GNU project debugger allows you to understand what is on in the program whiles it is executing. It allows you to retrieve information. It has a support for multiple languagesism there is the Rust-gdb which is a wrapper to provide a more easier understanding. We have the same similar program here that runs in a loop, essentially. Here we use the gdb tool to compile that. We have to compile that in debug mode to get the symbols out. If you want to do the same, you can enable debugging in the Cargo and you will get the symbols in your binary and then run that using the gdb command and wrap that and once you do that then gdb provides an interface to interact. We set a break point at line 8. Then we are running the program using the R command. Once you do that, the program is running until the 8th line and then it break the 8th line and essentially we can print here and it is three. It allows you to have run time capabilities on what is happening in the run time. This is very useful if you are trying to find hard bugs in runtime. Gdb provides various other features. It is a pretty standard reject -- project for debugging in Linux. All these tools should work with Rust. Thank you. Now I will take questions. ->> OK. Thanks, Tarun, for quite the informative talk. Yeah, it is covering I would say a lot of areas and various topics. It looks like it should be a good overview for the newbies. It should be informative. It was informative to me as well. We are running out of time. But I have one question. This question come from the chat. Any front end on gdb you like to use? -TARUN: I don't work on complex systems. I know there are a lot on gdb you can use but I am not sure I can add anything here. One other thing I wanted to end my talk is we covered a lot of things so if you have not understood anything, please, don't feel intimidated. There are good resources online for Cargo and books for everything. Feel free to check them out to make sure your understanding is correct. There are great resources on it for sure. -MODERATOR: Yeah, for sure, we have a lot of good documents online. OK. Thanks, again, for a good presentation. And next session starting in 10 minutes. See you later. OK. -TARUN: Thank you, everyone. \ No newline at end of file diff --git a/2020-global/talks/01_APAC/06-Zac-Burns-published.md b/2020-global/talks/01_APAC/06-Zac-Burns-published.md new file mode 100644 index 0000000..33386f8 --- /dev/null +++ b/2020-global/talks/01_APAC/06-Zac-Burns-published.md @@ -0,0 +1,97 @@ +**Everything is serialization** + +**Bard:** +Zac Burns wants to serialize +some ideas that we all won't despise +into one talk to make +us see what it will take +to make code easier to realize + + + +**Zac:** +Your computer by itself doesn't do anything of value. Here is a picture of the inside of a computer. You can't tell from the picture what the computer is doing, if it is doing anything at all. For the computer to be useful it must be a component connected to a larger system. The system that the computer is a part of includes other components attached to the computer components like mice, keyboards, speakers, screens, and network cards send data to and from the computer through wires like in this picture. Because of these wires, and the physical separation of the system's components, the data which drives each component must be and well specified agreed upon formats. At this level of abstraction of the system, we usually think of the data in terms of serialization. Serialization at this level includes many well known formats MP3, JSON, and Http among others. + +Here is a picture of the inside of a computer sub-system compriseing several components, each driven by data, sent over the wires that connect the components. The CPU, GPU, RAM and hard drive are all data-driven components and sub-systems. We don't always think of the things happening at this level of abstraction in terms of serialization but it is serialization just the same. Here too the physical separation of each component aside from the wires connecting them necessitates the data which drives each component to be serialized in well specified agreed upon formats. File system formats like mtfs are serialized by the CPU and sent to the hard drive and buffered to be sent for the GPU for drawing when data is fetched from RAM, at fetch instruction is bytes and a serialization format sent over wires between the CPU and RAM. Instructions are serialized code coming from the assembly coming from serialized mirror coming from serialized Rust source files coming from serialized keyboard presses and so on. If you look into any of these sub-systems, RAM, CPU, GPU, network card, you will find the exact same setup. Data driven competents connected by wires. We can take the CPU and see what it looks like on the inside. + +Here is a CPU with components for instructions to coding, branch predictions, caches and scheduleers. Each component is data driven and transfers information on wires in purpose built. At each level of abstraction of the computer system, you will find components driven by data, sent over wires in serialization formit. Unsurprisingly, the design of the serialization format, which is the design of how the components interact, has large effect on the system as a whole. -- affect. Maybe you feel this characterization of the computer as driven by serialization to be reductionist. Maybe you prefer to think in abstractions. I would like to point out that abstractions cannot be implemented without the use of serialization. + +Perhaps the greatest abstraction of all time is the function call. What happens when you call a function? The first thing that happens is the arguments to the function are serialized to the stack. The order and layout of the argument, or the file format if you will, is called the calling convention. Maybe you don't like to think in terms of implementing. Perhaps you think in high level tasks like serving an http task. It is thought in terms of parsing data in this case an URL which is a standardized serialization having a path and followed by a transform and lastly a serialization and in this case an http response. There are two serialization steps. If we were to look at what the transform steps entails, we would see it breaks down further into a series of parse, transform and serialized steps. It's serialization all the way down. Your database is just a giant serialized file afterall, as are the requests for the data in the database. In fact, all of the state of your entire returning program is stored in memory in a complex format comprised of other nested simpler serialization formats. + +The first point I am trying to make today is that at every level, serialization doesn't just underlie everything we do but is in some sense the means and ends of all programs. As such, serialization should be at the front of your mind when engineering and yet despite this, how to represent data as bytes is not a hot topic in programming circles. There is the occasional flame war about whether it is bet tour have a human readable format like JSON or a high performance format with a schema like protobuff but the tradeoff space goes much deeper. My experience has been that the choice of representation of the data is an essential factor in determining the system's performance, the engineering effort required to produce the system, and the system's capabilities as a whole. The reasons for this are easy to overlook so I will go over each. The first statement was that the data representation is an essential factor in determining the system's performance. This comes down to the limits of each component in the system to produce and consume data and the limits of the wires connecting these components. If the serialization format has low entropy the throughput of data flowing through the system is limited by the wires connecting components. Put in another way, bloat in the representation of the data throttles the throughput of information. Also, data dependencies in the serialization format pause the flow of data through the system incurring the latency cost of the wires. The system operates at peak efficiency only when the communication between components is saturated with efficiently represented data. + +The second point is that data representation is an essential factor in determining the engineering effort required to produce the system. Given input and output data representations and an algorithm sitting in between, a containing to either representation necessitates a corresponding change to the algorithm and note that the inverse is not always true. A change in the algorithm does not necessarily require a change to the data. The algorithms available to us, and their limitations, including the algorithm's minimum possible complexity, are determined by the characteristics and representations of the data. The third point was that data representation is an essential factor in determining the system's capabilities as a whole. Given a representation in time limit, the set of inputs and calculated outputs expressed within those bounds is finite. That finite set of inputs and outputs is the total of the capabilities of the system. There is always a size limit. Even when that limit is bound primarily by throughput and time. I would like to drive these points home with a series of case studies. We will look at some properties inherent to the data representations used by specific serialization formats and see how the formats either help us solve a problem or get in the way. In each example, we will also get to see how Rust gives you best in-class to for manipulating data across any representation. The first example will be in parsing GraphQL and the second in buffers and the third is the use of compression in the Tree-Buf format. + +First, GraphQL. Serialization formats tend to reflect the architectures of the systems that use them. Our computer systems are comprised of many formats nested. For example, inside a tcp packet, a serialization format, you may find part of an http request. The http request comprises multiple serialization formats. The http headers nest further and other serialization formats being the payload, the format which is specified by the headers. The payload may nest to Unicode which may nest to GraphQL which itself nest to many different subformats as defined by the spec. If you find a string in the GraphQL it may nest further. The nesting reflects the system's architecture because many layers exist to address concerns that manifest at that particular layer of abstraction of the computer. Because testing is the natural tendency of serialization, we need formats that allow us to nest data efficiently. We also need tools for parsing and manipulating nest data. Rust gives you these tools in abundance. You need to view slices of strings and byte arrays safely and without copying. Interpreting bytes as another type like a string or integer is also important. Safe, mutable, appendable strings or binary types allow us to progressively push serialized data from each format into the same buffer rather than serializing in the separate buffers and then copying each buffer into the nesting format above. Moving control to memory and safely passing data is the name of the game. + +These capabilities are -- a surprising number of languages don't meet the requirements. Rust is the only memory safe language that I am aware of that does. What Rust gives you is much more? Here is a type from the GraphQL parser crate. It is an enum named value containing all the different kind of values in a GraphQL query like numbers and objects and so on. Value is generic over the kind of text to parse into. One type that implements the text trait this string so you can parse a GraphQL query into string as the text type and because value will own its beta it allows you to manipulate the GraphQL and write it back out. That capability comes with a tradeoff. The performance will be about as bad as those other garbage collected languages because of all the extra allocating and copying that necessarily entails. Reference to stir also implements text. So you could parse the GraphQL in a read-only mode that references to underlying text that the GraphQL was parsed from. With that, you get the best performance possible by avoiding allocations and copies but you loose out on the ability to manipulate the data. In some cases that's OK. Rust takes this up a notch because there is a third type from the standard library that implements text. This type is a cow of string. With this safe and convenient type enabled by our friend and allied the borrow checker we can parse the GraphQL in such a way that all of the text efficiently refers to the source except just the parts that you manipulate and it is all specified at the call site. This is the kind of pleasantry I have come to expect from Rust dependencies. If you want to change some of the GraphQL text you can do so efficiently and safely with this type, almost. + +I say almost because there is a fundamental limitation to the GraphQL that no amount of Rust features or library APIs could overcome. Looking at the list of different values here, we see that the entries for variable, emum and objects are generic over text. Ironically, the string variant is not. The string variant just contains to string type requiring, allocating and copying. What's going on here? The issue is in the way that GraphQL nests its serialization formats. The GraphQL string value is Unicode but the way that GraphQL embeds strings is by putting quotes around them. With this design choice, any quotes in the string must be escaped which inserts new data interpursed with the original data and this comes with consequences. + +One, when encoding a value the length is not known up front and may increase and that means you can't rely on resizing the buffer up front but instead must continually check this buffer size when coding this value or over allocate by twice as much. When reading GraphQL, it is impossible to refer to data because it needs to go through a parsed step to remove the escape character. This problem compounds if you want to nest a byte array containing another serialization format in GraphQL. There is no support for directly supporting bytes in GraphQL so may must be encoded with base 64 or something similar. That means three encode steps are necessary to nest another format. There is encoding the data as bytes, encoding that as a string, and finally re-encoding the escaped string. That may compound even further if you want to store GraphQL in another format. It is common to store GraphQL query as a string embedded in JSON alongside the GraphQL variables. JSON strings are also quoted strings meaning the same data go through another allocation and decode step. It is common to log the JSON. Another layer on another encode step. Now, if we want to get that binary data from the logs, it is just allocating is decoding the same data over and over, up through each layer for every field. It doesn't have to be this way. + +One alternate method of storing a string and other variable length data is to prefix the data with its length. Not doing this is a familiar mistake that was made way back since null terminated strings in C. The difference between the two can be the difference between having decoding be a major bottleneck or instant. No amount of engineering effort spent on optimizing the pipeline that consumes the data can improve the situation because the cost is a representation of the data. You have to design the representation differently to overcome this. I am not saying to avoid GraphQL. I use GraphQL and we are all on the same team. I mentioned GraphQL in this example because using a standard format to illustrate this problem is easier for me than just inventing one for this cautionary tale. When you go out and design your formats, consider designing with efficient nesting in mind. + +Let's look at one more example of how we can build capabilities into a serialization format and how Rust works with us to take advantage of those capabilities. For this case study, we are going to be sending some data to the GPU. A GPU is driven by data, sent to it in a serialization format we will call vertex buffers which contain data like the positions of the points that make up polygons, colors and material needed for rendering. It comes in two parts and the first describes the format and the second part is a contigous amount of memory containing the instructs in an array. This is a vertext buffer. The top portion is a description including the name pause X, Y and Z for a vertex position and rb and g color channels. The bottom part depicts the data with three F-64 slots. Three ua slots and a blank slot for padding making it all log up. These logs repeat over and over again taking up the same amount of space each time. There is a good reason that the GPU receives data in size structs and spaced in contigious arrays. The latest NVIDIA cards have a staggering 10.496 CUDA cores and that's not even counting tensor cores. This is 10.496 boxes. It is a lot. I am even in the way of some of these. + +If you want to break up data into batches for parallelism, the most straightforward way to do that is to have fixed size structs in contigous arrays. You can know where any arbitrary piece of data lives and break it up into any desired size. It reflects the architecture of the system. Contrast that to sending the data to the GPU, in, say, JSON. With JSON, the interpretation of every single bite in the data depends on every proceeding bite. The current length is unknown until you search for and find a token indicating the end of that item. Often a comma an or a closed bracket. If we graphed the data dependencies of a JSON document it would form a continuous train starting with the second byte and the first depending on the first and the third depending on the previous two continuing until the very last byte of that document. Considering a screen in JSON. It is a key or a value? It depends on when it is inside an object. If I hid the values of any proceeding bytes in the document, it would be impossible to tell. The problem with that is that data dependencies limit parallelism. A JSON document must be processed sequentially because that is intrinsic to the format making JSON a non starter for a GPU. The data dependencies limit parallelism and add complexity into the engineering that goes into writing a parser. It is the data dependencies that make writing a correct JSON parser a challenging engineering problem in the first place. If we were to graph the buffer vertex dependency, the interpretation of each byte in the data is only dependent on the first few bytes in the description of there buffer. Aside from that all bytes are independent. By representing this in an array of fixed sized elements we can process data independently and therefore parallelized. There are downsides to arrays of fixed width elements. While we gain data independence we lose the ability to use compression techniques that would rely on variable length and coding. This means you can use some kind of lossy compression but not loss-less compression. JSON can utilize both. + +In JSON, a smaller number takes fewer bytes to represent than a larger number. Integers between 0-9 take one byte because they only need a single character. Numbers between 10-9 take 2 bytes and so on. Here is a depiction of that. I wouldn't ever call JSON a compression format but in principle the building blocks of loss less compression with there. There are better ways to do this which we will return to later. The building blocks for lossy compression are present in the form of truncating plots. The first rendition is only 3.14. The format used by vertex buffers has a different set of capabilities of JSON and that can't be worked around when consuming the data. Those capabilities are inherent to the representations themselves. If you want different capabilities you need to change the representation. + +OK. Having established that writing the data is the problem we are trying to solve and the characteristics the serialization format must have because of the GPU architectures, let's write a program to serialize the data. We will write the program in two languages. First in TypeScript and then in Rust. I don't do this to despairage TypeScript. Parts of TypeScript are pretty neat but rather show you the complexity a memory managed program adds to the problem that wasn't there to start. Without seeing the difference, it is hard to appreciate the power that Rust has over data. The function we will write is a stripped down version of what you might need to write a single vertex to a vertex buffer for a game. Our vertex consists of only a position with three 32 bit float coordinates and a color having 3 UA channels. There are likely significantly more fields you would want to pack into a vertex into real game but this is good for illustration. Let's start with the TypeScript code. + +If you are thinking whoa. That is too much code to put on a slide that's the right reaction. It is also the point I am trying to make. I am going describe the code but don't worry about falling too much. There is not going to be a quiz and this is not a talk about TypeScript. Just listen enough to get a high level field for the concern the code addresses and don't worry about the details. The first section defines our interfaces. Vertex, position, and color are unsurprising. We have this other interface, buffer, which has a byte array and a count of how many items are written in the array. The next session is all about calculating offsets of where the data lives in the buffer. You could hard code these but the comment explaining what the magic numbers were would be just as long as the code any way so it might as well be code since that makes it more likely to be correct and in sync with the rest of the function. Particularly cumbersome is the line that offsets the R field. The value is a byte but the offset the offset of the previous field plus 1 times the size of an F32 in bytes. That mixing of types accounts for a discontinuity because later we will use two different views over the same allocation which is profoundly unsettling. We also have to calculate each element size both in units of bytes and floats for similar reasons. The next thing we are going to do is to possibly resize the buffer. This part is not interesting but the code has to be there or the program will crash when the buffer runs out of space. + +Next, we setup the views and calculate the beginning position of the data we want to write within each view relative to the data size in each view. These offsets are different even though they point to the same place. Lastly, we can finally copy the data from our text into the buffer assuming all of the previous code is correct. Phew. Now, let's take a look at the Rust program. We design the structs. We leave out the interface for buffer and hold the byte away and count. We aren't going to need that. Let's look at the function to write the vertex. Buffer.push vertex. That's it. Rust isn't hiding the fact that our data is represented as bytes under the hood and has given us control of the representation. We needed to annotate the structs and move all error prone work into the compiler. Between JavaScript and Rust which do you think would have better performance? The difference is starker than you might think and not just because of the extra boilerplate code, or it being JavaScript, or the cache from float to int were typed arrays, but mostly because of, again, data dependencies. This time in the form of pointer chases when accessing the properties of objects in TypeScript. For example, element.position.x. it is slow because the serialization format used by the JavaScript run time to represent objects introduced data dependencies. One thing we mean by 0 cost abstractions are abstractions that don't introduce unnecessarily serialization formats. Remember that because the choice of serialization format is a deciding factor in how you can approach the problem that the advantage Rust gives us of being able to choose how data is represented carries forward into every problem not just writing vertex buffers. + +For the final case study, I would like to take some time to go into how a new experimental serialization format called Tree-Buf represents data in a way that is amendable to fast compression. Before we talk about the format we need to talk talk about the nature of datasets and use the game of Go. We will use the game of Go as a baseline against Tree-Buf. This movie depicts a game of Go. I haven't told you anything about how Go works but by watching the movie you might pick up on some patterns in the data. The first pattern we might pick up on is most of the moves are being made in certain areas of the board and many are on the sides and corners. Very little is going on in the center. Another thing you might pick up on is a lot of the time a move is adjacent or near the previous move. The observation that local data is related and the type space is not to be used is not specific to Go. If you have an image, adjacent pixels are likely to be the same and most of the images are not far off from the color palette. We can extend this to a complex 2D polygon described by a series of points. Any given point is not likely to be randomly selected from all possible points with an even probability. No, each point is very likely to be near the previous. There are vast, vast regions of the possibility space that will not be selected at all. And so, what we observe is that datasets containing arrays are often predictable. Predict what the data will be and assign representations to the values so if the prediction is accurate few bits can be used to represent the value. If the prediction is wrong you have to pay more bits. The quality of the prediction is the manufactureer -- determining how well the compression works. If you could accurately predict the context of every byte in a file you could compress that file to 0 bytes. No such prediction method exists. + +We have a dataset and a game of Go and we want an algorithm to predict the next move into the game. To help us we will visualize the raw data from the dataset. This scatter plot is a visual representation of the actual bytes of a Go game. The height correspond do is the value of the byte. If the game starts with a move X coordinate 4 and Y coordinate 4 there would be a dot with high 4 followed by a dot with height 3 and so on. Our eyes can kind of pick up on some kind of clustering of the dots. They don't appear random. The data does not appear random is a good indication some sort of compression is possible. Coming up with an algorithm to predict the value of a dot may not be apparent by looking at scatter plot. + +We can see that the there is probably something there. We just don't know yet what it is. It is worth taking a moment to consider how a general purpose algorithm like deflate which is the algorithm used by jesup which searches for redundancy in the method. If you have seen some sequence of bytes you are likely to find them later. The prediction works great for text. At least in the English language, words are constructed from syllables so it is possible to find repetition in a text even in the absence of repeated word. In the Go game the same coordinate on the board is seldom repeated. You can't place a stone on top of a previously played stone. Barring a few exceptions, each two-byte sequence in the file is unique. A redundancy solution like this will produce a compressed file that is far from optimal because the underlying prediction that sequences of bytes repeat would not help. This observation generalizes to many other kinds of data as well. Recall that we stated that each move is likely to be near the previous move. We could try subtracting each byte from the last so that instead of seeing moves in absolute coordinates we will see them in relative coordinates. + +Here is a visual representation of that. This is garbage. There are points everywhere and seems to be no visual pattern at all. It looks random indicating the data is difficult to predict and therefore difficult to compress. The problem with subtracting is that the X and Y coordinates from the data are independent but interweaved in the data. When we subtract adjacent bytes X was subtracted from Y and vice versa. Here is the same image as before. We first need separate the data so logically related data are stored locally. Instead of writing an X followed by a Y like most serialization formats would do let's write out all of the Xs first and then all of the Ys. Here is a visual representation of that. It looks maybe tighter than before. This indicates our data is less random. Now let's try subtracting. Here is a visual representation of that. Now we are making progress. What I want you to notice is three horizontal lines right near center. Most of the points, about two thirds lie on these lines. These lines correspond to the values 0, negative 1, and 1. If we wanted to write an algorithm to predict what would come next in the sequence, the algorithm could be minimal. It is just the value is probably 0, negative 1, or 1. We can simplify this table further and say that the number is likely to be near 0. A "small" number which sounds familiar when we looked at the variable length encoding used in JSON. + +With a prediction algorithm in hand we need to come up with a representation. We are going to write a variable length encoding. In this graphic, we have three rows of boxes where we will describe the variable length encoding. Each box holds a single bit. There are three boxes on the top row. The first box contains a 0, the next two boxes are blank. The 0 at the beginning is a tag bit. It will indicate whether we are in the four smallest values and the most likely cases 0, 1, negative 1 and 2 or the unlikely value case for all the other values. The first bit is taken for the tag bit using two bits for storing those four values. On the second row, we have the tag bit 1 followed by 4 bits allowing us to score the 16-least likely values. The bottom row shows 8 bits for reference which is how many bits are in a byte. Before we were writing each coordinate in a single byte so with this encoding all moves will also save some amount of space. It didn't have to work out that way but we can do this because a Go board only has 19 points along each axis which means we are not using the full range of a byte. If we did use the full range, the encoding would have to have some values extend beyond 8 bits but indeed most datasets don't use the full range. This generalizes well to other datasets. The result is our Go game compresss to less than half the size of writing the data out using one byte per -- the prediction is more accurate while being computationally easier to introduce. It is better than searching for redundancy by scanning many values. + +Note this isn't the best prediction algorithm possible. If you want to get serious about compression and squeeze the file done further you could make an even better prediction algorithm. You could write a deterministic Go AI and have it sort prediction from best to worse and predict it is more likely the player will make a good move versus a bad one. This could give twice the information but the AI would be expensive, require a lot of engineering effort and once completed would only be able to guess the game of Go whereas the delta compression method sounds like it might be useful for more than just Go. Let's compare the methods in a matrix. This shows GZip, delta and AI. We have the compression ratio which is how small the file is, performance is how fast we can read and write the file and the difficulty is the level of engineering experience it take. A checkmark goes to the best and an X to the worse and no mark for the compressor in between. The delta is at a sweet spot and not the worst at any category. If we were to assign a score of plus 1 for being the best at something and minus 1 for being the worse delta compression would come out on top with a score of 1. Gzip second with a score of 0. And AI last with a score of negative 1. The overall score hard playmateer because where Gzip wins is in the difficulty category. -- matter. + +You get a lot with minimum effort using something like GZIP. Effort is important for working professionals under tight deadlines. I would go so far as to say many of us code in a culture that is hostile to high performance programming methods and this is true when those gains come with any engineering cost. They are not likely to be criticize by peers for using GZIP whereas the delta compression method requires a fair bit of custom code. What if we could use the checkmark from GZIP to the delta compression method? If we could do that, then the delta compression method would dominate GZIP and that is the aspiration of Tree-Buf. + +If you have followed so far in understanding how the delta compression method works you are already almost there in understanding Tree-Buf. If we forget about the details and look at the delta compression methods underlying principles we find the essence of Tree-Buf. The first thing we did when applying our custom design delta compression method was to separate the X and Y coordinate storages. Tree-Buf generalizes this to the entire schema. If we were going to extend from just the X and Y cord to a whole term innocent it might look like this. At the top we have the root element tournament which is a type struct. It has three methods. If you follow that through all of the moves of the games and the coordinates down to the bottom row there is an X and Y property which are bufferers holding all of the games and another buffer containing all the Y buffers of all the games in the tournament. This a tree of buffers hence the name. This brings locality to data that is semantically related and of the same type. This transformation is only possible if you know the schema of the data being written. The next thing we know with the compression is we applied a type aware arranging the data. Writing the deltas and crafting ints was only possible because we knew the bytes were UAs and not strings where subtracting JSON characters produces nonsense. + +Tree-Buf generalizes this principle and uses type aware compression methods for the different kinds of data in the tree. Since no compression method is one-size-fits-all it even spends some performance trying a new techniques on the sample of the data from each buffer. The result approximates a hand rolled file format that generally understands your data. What we have is fantastic performance and compression. What about ease of use and engineering effort? I claim it is easier to use Tree-Buf than GZIP? Yes. The trick is that GZIP is not by itself a serialization format. Using GZIP assumes you already have some method for writing structured data like protobuff or CSV or Message Pack or whatever. Using GZIP also entails introducing a second step. Writing a Tree-Buf file is one step. The Rust implementation has an API very much like Saturday. You just put in code or decode attributes on your structs and call the data. There is no impass using another dependency and format. It is just the same amount of work it would take to use any serialization format. Tree-Buf is easy to use. + +How does it do on compression and performance? Time to look at benchmarks. This benchmark will use real-world production data served by the graph, a decentralized indexing and query protocol for blockchain data. For this, a GraphQL was made to an indexer from the graph for 1,000 recent wearable entity auctions. Each entity and the response looks something like this. There are many properties of different types. There are nested objects, arrays and thousands of other entities like this one but with a cardinality in the data that reflects a real-world distribution of values. What we measured is relative CPU time to roundtrip the data through serialized and deserialized. As described by message pack they are like JSON but fast and small. I have chosen this format because message pack is smaller and faster than JSON and is self-describing like JSON which works well for GraphQL. Tree-Buf is also self-describing which means you could open up and read any Tree-Buf file without requiring a separate schema to interpret the data also making it a good fit for GraphQL. The sets are similar enough that we can't attribute in the results to a difference in capabilities. Side stepping the argument that schemas are necessary for performance. + +Here are the results. The big green box's height is how long it takes the CPU to roundtrip the message back file. Its width is the size of the file and bytes. The message pack file is more than 17 times as large as the Tree-Buf file and takes more than twice as long to serialize and de-serialize. The improvements are significant. Considering that the first thing Tree-Buf has to do is reorganize your data into a tree of buffers before starting and reverse that information when reading the data it has no right to match the speed of message pack much less significantly out perform it. The answers have everything to do with data dependencies and choices made in representing the data as bytes. Everything we just covered. Let's take a look at a different dataset. For this benchmark we will consider geoJSON for serializing a list of all the countries. The dataset includes things like the countries name and the bulk of the data is in the polygons that describe the order. Geo-JSON is a compact formula because it doesn't describe each point with redundant tags like longitude and latitude repeated over and over but instead opts to score that data in a giant nested array to minimize overhead. Here are the results. The green box is geo-JSON and the blue box is Tree-Buf. Tree-Buf is more than 10 times as fast as geo-JSON. And it produces a file that is less than one third of the size. The red box is what we get if we opt into the lossy float compression which allows us to specify how much precision is necessary to represent the dataset. The resulting file compresses down to less than one tenth the size without sacrificing speed. + +We have seen how the choices in the representation of data can have a significant impact on the speed, size, and engineering effort and capabilities. These impacts are not restricted to the cases we have studied but effect every software engineering problem. There are many capabilities that you can design into representations that we did not explore today. Do consider serialization and representation as first class citizens next to algorithms and code structure. If you use the proper tools to parse and manipulate the data you will be surprised by the impact. Thank you. + + +**Moderator:** +Thank you, Zac. Your serialization is much deeper, to be honest, and I understand now. Yeah, thank you for such a deep presentation. We have three minutes? No questions by nowism do you have anything to add to your presentation or comment? Or additional message? + +**Zac:** +I am sorry I didn't make it as accessible as planned. I did find it a struggle to kind of get the ideas down into something like a small package and really present them. I am sorry it wasn't as easy to follow as I hoped when I planned for the talk. + +**Moderator:** +Yeah, no, no worries about that. It has a lot of, I say, case studies, so it should be -- to be honest, I need some more time to digest your representation I know. I understand how important actually serialization should be in programming. That I think is quite an important message. I agree. + +**Zac:** +Yeah, that really is the focus of the talk. If you want to focus on the problem and if people are interested in that kind of thing there is other interesting presentations that you could watch. I would recommend watching data oriented programming by Mike Acton. He talks about a lot of things in the same term so that's interesting. Definitely follow there and fall in line with data oriented programming. There is a lot to learn in that field. + +**Moderator:** +We have one question. Is there anything Tree-Buf is bad for? + +**Zac:** +Sure. Tree-Buf is taking advantage of being able to like find predictability in data with arrays. If you -- if you want to do server to server communication for things which do not contain arrays, maybe something like protobuff would be better for that. It tries hard not to be bad in that case where there are no arrays but there are some fundamental tradeoffs that wherever Tree-Buf can optimize with arrays. We are doing pretty well in that area with other serialization formats. + +**Moderator:** +And we are running out of time. OK. + +**Zac:** +I am going to stick around in the chat so if anyone has questions there, I will answer them. + +**Moderator:** +Thanks, again, Zac for the great presentation. And the next session is starting in 10 minutes, I believe. I will see you later. diff --git a/2020-global/talks/01_APAC/06-Zac-Burns.txt b/2020-global/talks/01_APAC/06-Zac-Burns.txt deleted file mode 100644 index 1f2c113..0000000 --- a/2020-global/talks/01_APAC/06-Zac-Burns.txt +++ /dev/null @@ -1,13 +0,0 @@ - ->> Into one talk to make or see what it will take to make coat easier to realize. -ZAC: Your computer by itself doesn't do anything of value. Here is a picture of the inside of a computer. You can't tell from the picture what the computer is doing, if it is doing anything at all. For the computer to be useful it must be a component connected to a larger system. The system that the computer is a part of includes other components attached to the computer components like mice, keyboards, speakers, screens, and network cards send data to and from the computer through wires like in this picture. Because of these wires, and the physical separation of the system's components, the data which drives each component must be and well specified agreed upon formats. At this level of abstraction of the system, we usually think of the data in terms of serialization. Serialization at this level includes many well known formats MP3, JSON, and Http among others. Here is a picture of the inside of a computer sub-system compriseing several components, each driven by data, sent over the wires that connect the components. The CPU, GPU, RAM and hard drive are all data-driven components and sub-systems. We don't always think of the things happening at this level of abstraction in terms of serialization but it is serialization just the same. Here too the physical separation of each component aside from the wires connecting them necessitates the data which drives each component to be serialized in well specified agreed upon formats. File system formats like mtfs are serialized by the CPU and sent to the hard drive and buffered to be sent for the GPU for drawing when data is fetched from RAM, at fetch instruction is bytes and a serialization format sent over wires between the CPU and RAM. Instructions are serialized code coming from the assembly coming from serialized mirror coming from serialized Rust source files coming from serialized keyboard presses and so on. If you look into any of these sub-systems, RAM, CPU, GPU, network card, you will find the exact same setup. Data driven competents connected by wires. We can take the CPU and see what it looks like on the inside. Here is a CPU with components for instructions to coding, branch predictions, caches and scheduleers. Each component is data driven and transfers information on wires in purpose built. At each level of abstraction of the computer system, you will find components driven by data, sent over wires in serialization formit. Unsurprisingly, the design of the serialization format, which is the design of how the components interact, has large effect on the system as a whole. -- affect. Maybe you feel this characterization of the computer as driven by serialization to be reductionist. Maybe you prefer to think in abstractions. I would like to point out that abstractions cannot be implemented without the use of serialization. Perhaps the greatest abstraction of all time is the function call. What happens when you call a function? The first thing that happens is the arguments to the function are serialized to the stack. The order and layout of the argument, or the file format if you will, is called the calling convention. Maybe you don't like to think in terms of implementing. Perhaps you think in high level tasks like serving an http task. It is thought in terms of parsing data in this case an URL which is a standardized serialization having a path and followed by a transform and lastly a serialization and in this case an http response. There are two serialization steps. If we were to look at what the transform steps entails, we would see it breaks down further into a series of parse, transform and serialized steps. It's serialization all the way down. Your database is just a giant serialized file afterall, as are the requests for the data in the database. In fact, all of the state of your entire returning program is stored in memory in a complex format comprised of other nested simpler serialization formats. The first point I am trying to make today is that at every level, serialization doesn't just underlie everything we do but is in some sense the means and ends of all programs. As such, serialization should be at the front of your mind when engineering and yet despite this, how to represent data as bytes is not a hot topic in programming circles. There is the occasional flame war about whether it is bet tour have a human readable format like JSON or a high performance format with a schema like protobuff but the tradeoff space goes much deeper. My experience has been that the choice of representation of the data is an essential factor in determining the system's performance, the engineering effort required to produce the system, and the system's capabilities as a whole. The reasons for this are easy to overlook so I will go over each. The first statement was that the data representation is an essential factor in determining the system's performance. This comes down to the limits of each component in the system to produce and consume data and the limits of the wires connecting these components. If the serialization format has low entropy the throughput of data flowing through the system is limited by the wires connecting components. Put in another way, bloat in the representation of the data throttles the throughput of information. Also, data dependencies in the serialization format pause the flow of data through the system incurring the latency cost of the wires. The system operates at peak efficiency only when the communication between components is saturated with efficiently represented data. The second point is that data representation is an essential factor in determining the engineering effort required to produce the system. Given input and output data representations and an algorithm sitting in between, a containing to either representation necessitates a corresponding change to the algorithm and note that the inverse is not always true. A change in the algorithm does not necessarily require a change to the data. The algorithms available to us, and their limitations, including the algorithm's minimum possible complexity, are determined by the characteristics and representations of the data. The third point was that data representation is an essential factor in determining the system's capabilities as a whole. Given a representation in time limit, the set of inputs and calculated outputs expressed within those bounds is finite. That finite set of inputs and outputs is the total of the capabilities of the system. There is always a size limit. Even when that limit is bound primarily by throughput and time. I would like to drive these points home with a series of case studies. We will look at some properties inherent to the data representations used by specific serialization formats and see how the formats either help us solve a problem or get in the way. In each example, we will also get to see how Rust gives you best in-class to for manipulating data across any representation. The first example will be in parsing GraphQL and the second in buffers and the third is the use of compression in the Tree-Buf format. First, GraphQL. Serialization formats tend to reFREKTflect the architectures of the systems that use them. Our computer systems are comprised of many formats nested. For example, inside a tcp packet, a serialization format, you may find part of an http request. The http request comprises multiple serialization formats. The http headers nest further and other serialization formats being the payload, the format which is specified by the headers. The payload may nest to Unicode which may nest to GraphQL which itself nest to many different subformats as defined by the speck. If you find a string in the GraphQL it may nest further. The nesting reflects the system's architecture because many layers exist to address concerns that manifest at that particular layer of abstraction of the computer. Because testing is the natural tendency of serialization, we need formats that allow us to nest data efficiently. We also need tools for parsing and manipulating nest data. Rust gives you these tools in abundance. You need to view slices of strings and byte arrays safely and without copying. Interpreting bytes as another type like a string or integer is also important. Safe, mutable, appendable strings or binary types allow us to progressively push serialized data from each format into the same buffer rather than serializing in the separate buffers and then copying each buffer into the nesting format above. Moving control to memory and safely passing data is the name of the game. These capabilities are -- a surprising number of languages don't meet the requirements. Rust is the only memory safe language that I am aware of that does. What Rust gives you is much more? Here is a type from the GraphQL parser crate. It is an enum named value containing all the different kind of values in a GraphQL query like numbers and objects and so on. Value is generic over the kind of text to parse into. One type that implements the text trait this string so you can parse a GraphQL query into string as the text type and because value will own its beta it allows you to manipulate the GraphQL and write it back out. That capability comes with a tradeoff. The performance will be about as bad as those other garbage collected languages because of all the extra allocating and copying that necessarily entails. Reference to stir also implements text. So you could parse the GraphQL in a read-only mode that references to underlying text that the GraphQL was parsed from. With that, you get the best performance possible by avoiding allocations and copies but you loose out on the ability to manipulate the data. In some cases that's OK. Rust takes this up a notch because there is a third type from the standard library that implements text. This type is a cow of string. With this safe and convenient type enabled by our friend and allied the borrow checker we can parse the GraphQL in such a way that all of the text efficiently refers to the source except just the parts that you manipulate and it is all specified at the call site. This is the kind of pleasantry I have come to expect from Rust dependencies. If you want to change some of the GraphQL text you can do so efficiently and safely with this type almost. I say almost because there is a fundamental limitation to the GraphQL that no amount of Rust features or library APIs could overcome. Looking at the list of different values here, we see that the entries for variable, emum and objects are generic over text. Ironically, the string variant is not. The string variant just contains to string type requiring, allocating and copying. What's going on here? The issue is in the way that GraphQL nests its serialization formats. The GraphQL string value is Unicode but the way that GraphQL embeds strings is by putting quotes around them. With this design choice, any quotes in the string must be escaped which inserts new data interpursed with the original data and this comes with consequences. -One, when encoding a value the length is not known up front and may increase and that means you can't rely on resizing the buffer up front but instead must continually check this buffer size when coding this value or over allocate by twice as much. When reading GraphQL, it is impossible to refer to data because it needs to go through a parsed step to remove the escape character. This problem compounds if you want to nest a byte array containing another serialization format in GraphQL. There is no support for directly supporting bytes in GraphQL so may must be encoded with base 64 or something similar. That means three encode steps are necessary to nest another format. There is encoding the data as bytes, encoding that as a string, and finally re-encoding the escaped string. That may compound even further if you want to store GraphQL in another format. It is common to store GraphQL query as a string embedded in JSON alongside the GraphQL variables. JSON strings are also quoted strings meaning the same data go through another allocation and decode step. It is common to log the JSON. Another layer on another encode step. Now, if we want to get that binary data from the logs, it is just allocating is decoding the same data over and over, up through each layer for every field. It doesn't have to be this way. One alternate method of storing a string and other variable length data is to prefix the data with its length. Not doing this is a familiar mistake that was made way back since null terminated strings in C. The difference between the two can be the difference between having decoding be a major bottleneck or instant. No amount of engineering effort spent on optimizing the pipeline that consumes the data can improve the situation because the cost is a representation of the data. You have to design the representation differently to overcome this. I am not saying to avoid GraphQL. I use GraphQL and we are all on the same team. I mentioned GraphQL in this example because using a standard format to illustrate this problem is easier for me than just inventing one for this cautionary tale. When you go out and design your formats, consider designing with efficient nesting in mind. Let's look at one more example of how we can build capabilities into a serialization format and how Rust works with us to take advantage of those capabilities. For this case study, we are going to be sending some data to the GPU. A GPU is driven by data, sent to it in a serialization format we will call vertex buffers which contain data like the positions of the points that make up polygons, colors and material needed for rendering. It comes in two parts and the first describes the format and the second part is a contigous amount of memory containing the instructs in an array. This is a vertext buffer. The top portion is a description including the name pause X, Y and Z for a vertex position and rb and g color channels. The bottom part depicts the data with three F-64 slots. Three ua slots and a blank slot for padding making it all log up. These logs repeat over and over again taking up the same amount of space each time. There is a good reason that the GPU receives data in size structs and spaced in contigious arrays. The latest NVIDIA cards have a staggering 10, 496 CUDA cores and that's not even counter tensor cores. This is 10, 496 boxes. It is a lot. I am even in the way of some of these. If you want to break up data into batches for parallelism, the most straightforward way to do that is to have fixed size structs in contigous arrays. You can know where any arbitrary piece of data lives and break it up into any desired size. It reflects the architecture of the system. Contrast that to sending the data to the GPU, in, say, JSON. With JSON, the interpretation of every single bite in the data depends on every proceeding bite. The current length is unknown until you search for and find a token indicating the end of that item. Often a comma an or a closed bracket. If we graphed the data dependencies of a JSON document it would form a continuous train starting with the second byte and the first depending on the first and the third depending on the previous two continuing until the very last byte of that document. Considering a screen in JSON. It is a key or a value? It depends on when it is inside an object. If I hid the values of any proceeding bytes in the document, it would be impossible to tell. The problem with that is that data dependencies limit parallelism. A JSON document must be processed sequentially because that is intrinsic to the format making JSON a non starter for a GPU. The data dependencies limit parallelism and add complexity into the engineering that goes into writing a parser. It is the data dependencies that make writing a correct JSON parser a challenging engineering problem in the first place. If we were to graph the buffer vertex dependency, the interpretation of each byte in the data is only dependent on the first few bytes in the description of there buffer. Aside from that all bytes are independent. By representing this in an array of fixed sized elements we can process data independently and therefore parallelized. There are downsides to arrays of fixed width elements. While we gain data independence we lose the ability to use compression techniques that would rely on variable length and coding. This means you can use some kind of lofty compression but not lost-less compression. JSON can utilize both. In JSON, a smaller number takes fewer bytes to represent than a larger number. Integerss between 0-9 take one byte because they only need a single character. Numbers between 10-9 take 2 bytes and so on. Here is a depiction of that. I wouldn't ever call JSON a compression format but in principle the building blocks of loss less compression with there. There are better ways to do this which we will return to later. The building blocks for lossy compression are present in the form of truncating plots. The first rendition is only 3.14. The format used by vertex buffers has a different set of capabilities of JSON and that can't be worked around when consuming the data. Those capabilities are inherent to the representations themselves. If you want different capabilities you need to change the representation. OK. Having established that writing the data is the problem we are trying to solve and the characteristics the serialization format must have because of the GPU architectures, let's write a program to serialize the data. We will write the program in two languages. First in TypeScript and then in Rust. I don't do this to despairage TypeScript. Parts of TypeScript are pretty neat but rather show you the complexity a memory managed program adds to the problem that wasn't there to start. Without seeing the difference, it is hard to appreciate the power that Rust has over data. The function we will write is a stripped down version of what you might need to write a single vertex to a vertex buffer for a game. Our vertex consists of only a position with three 32 bit float coordinates and a color having 3 UA channels. There are likely significantly more fields you would want to pack into a vertex into real game but this is good for illustration. Let's start with the TypeScript code. If you are thinking whoa. That is too much code to put on a slide that's the right reaction. It is also the point I am trying to make. I am going describe the code but don't worry about falling too much. There is not going to be a quiz and this is not a talk about TypeScript. Just listen enough to get a high level field for the concern the code addresses and don't worry about the details. The first section defines our interfaces. Vertex, position, and color are unsurprising. We have this other interface, buffer, which has a byte array and a count of how many items are written in the array. The next session is all about calculating offsets of where the data lives in the buffer. You could hard code these but the comment explaining what the magic numbers were would be just as long as the code any way so it might as well be code since that makes it more likely to be correct and in sync with the rest of the function. Particularly cumbersome is the line that offsets the R field. The value is a byte but the offset the offset of the previous field plus 1 times the size of an F32 in bytes. That mixing of types accounts for a discontinuity because later we will use two different views over the same allocation which is profoundly unsettling. We also have to calculate each element size both in units of bytes and floats for similar reasons. The next thing we are going to do is to possibly resize the buffer. This part is not interesting but the code has to be there or the program will crash when the buffer runs out of space. Next, we setup the views and calculate the beginning position of the data we want to write within each view relative to the data size in each view. These offsets are different even though they point to the same place. Lastly, we can finally copy the data from our text into the buffer assuming all of the previous code is correct. Phew. Now, let's take a look at the Rust program. We design the structs. We leave out the interface for buffer and hold the byte away and count. We aren't going to need that. Let's look at the function to write the vertex. Buffer.push vertex. That's it. Rust isn't hiding the fact that our data is represented as bytes under the hood and has given us control of the representation. We needed to annotate the structs and move all error prone work into the compiler. Between JavaScript and Rust which do you think would have better performance? The difference is starker than you might think and not just because of the extra boilerplate code, or it being JavaScript, or the cache from float to int were typed arrays, but mostly because of, again, data dependencies. This time in the form of pointer chases when accessing the properties of objects in TypeScript. For example, element.position.x. it is slow because the serialization format used by the JavaScript run time to represent objects introduced data dependencies. One thing we mean by 0 cost abstractions are abstractions that don't introduce unnecessarily serialization formats. Remember that because the choice of serialization format is a deciding factor in how you can approach the problem that the advantage Rust gives us of being able to choose how data is represented carries forward into every problem not just writing vertex buffers. For the final case study, I would like to take some time to go into how a new experimental serialization format called Tree-Buf represents data in a way that is amendable to fast compression. Before we talk about the format we need to talk talk about the nature of datasets and use the game of Go. We will use the game of Go as a baseline against Tree-Buf. This movie depicts a game of Go. I haven't told you anything about how Go works but by watching the movie you might pick up on some patterns in the data. The first pattern we might pick up on is most of the moves are being made in certain areas of the board and many are on the sides and corners. Very little is going on in the center. Another thing you might pick up on is a lot of the time a move is adjacent or near the previous move. The observation that local data is related and the type space is not to be used is not specific to Go. If you have an image, adjacent pixels are likely to be the same and most of the images are not far off from the color palette. We can extend this to a complex 2D polygon described by a series of points. Any given point is not likely to be randomly selected from all possible points with an even probability. No, each point is very likely to be near the previous. There are vast, vast regions of the possibility space that will not be selected at all. And so, what we observe is that datasets containing arrays are often predictable. Predict what the data will be and assign representations to the values so if the prediction is accurate few bits can be used to represent the value. If the prediction is wrong you have to pay more bits. The quality of the prediction is the manufactureer -- determining how well the compression works. If you could accurately predict the context of every byte in a file you could compress that file to 0 bytes. No such prediction method exists. We have a dataset and a game of Go and we want an algorithm to predict the next move into the game. To help us we will visualize the raw data from the dataset. This scatter plot is a visual representation of the actual bytes of a Go game. The height correspond do is the value of the byte. If the game starts with a move X coordinate 4 and Y coordinate 4 there would be a dot with high 4 followed by a dot with height 3 and so on. Our eyes can kind of pick up on some kind of clustering of the dots. They don't appear random. The data does not appear random is a good indication some sort of compression is possible. Coming up with an algorithm to predict the value of a dot may not be apparent by looking at scatter plot. We can see that the there is probably something there. We just don't know yet what it is. It is worth taking a moment to consider how a general purpose algorithm like deflate which is the algorithm used by jesup which searches for redundancy in the method. If you have seen some sequence of bytes you are likely to find them later. The prediction works great for text. At least in the English language, words are constructed from syllables so it is possible to find repetition in a text even in the absence of repeated word. In the Go game the same coordinate on the board is seldom repeated. You can't place a stone on top of a previously played stone. Barring a few exceptions, each two-byte sequence in the file is unique. A redundancy solution like this will produce a compressed file that is far from optimal because the underlying prediction that sequences of bytes repeat would not help. This observation generalizes to many other kinds of data as well. Recall that we stated that each move is likely to be near the previous move. We could try subtracting each byte from the last so that instead of seeing moves in absolute coordinates we will see them in relative coordinates. Here is a visual representation of that. This is garbage. There are points everywhere and seems to be no visual pattern at all. It looks random indicating the data is difficult to predict and therefore difficult to compress. The problem with subtracting is that the X and Y coordinates from the data are independent but interweaved in the data. When we subtract adjacent bytes X was subtracted from Y and vice versa. Here is the same image as before. We first need separate the data so logically related data are stored locally. Instead of writing an X followed by a Y like most serialization formats would do let's write out all of the Xs first and then all of the Ys. Here is a visual representation of that. It looks maybe tighter than before. This indicates our data is less random. Now let's try subtracting. Here is a visual representation of that. Now we are making progress. What I want you to notice is three horizontal lines right near center. Most of the points, about two thirds lie on these lines. These lines correspond to the values 0, negative 1, and 1. If we wanted to write an algorithm to predict what would come next in the sequence, the algorithm could be minimal. It is just the value is probably 0, negative 1, or 1. We can simplify this table further and say that the number is likely to be near 0. A "small" number which sounds familiar when we looked at the variable length and coding used in JSON. With a prediction algorithm in hand we need to come up with a representation. We are going to write a variable length encoding. In this graphic, we have three rows of boxes where we will describe the variable length encoding. Each box holds a single bit. There are three boxes on the top row. The first box contains a 0, the next two boxes are blank. The 0 at the beginning is a tag bit. It will indicate whether we are in the four smallest values and the most likely cases 0, 1, negative 1 and 2 or the unlikely value case for all the other values. The first bit is taken for the tag bit using two bits for storing those four values. On the second row, we have the tag bit 1 followeded by 4 bits allowing us to score the 16-least likely values. The bottom row shows 8 bits for reference which is how many bits are in a byte. Before we were writing each coordinate in a single byte so with this encoding all moves will also save some amount of space. It didn't have to work out that way but we can do this because a Go board only has 19 points along each axis which means we are not using the full range of a byte. If we did use the full range, the encoding would have to have some values extend beyond 8 bits but indeed most datasets don't use the full range. This generalizes well to other datasets. The result is our Go game compresss to less than half the size of writing the data out using one byte per -- the prediction is more accurate while being computationally easier to introduce. It is better than searching for redundancy by scanning many values. Note this isn't the best predict algorithm possible. If you want to get serious about compression and squeeze the file done further you could make an even better prediction algorithm. You could write a deterministic Go AI and have it sort prediction from best to worse and predict it is more likely the player will make a good move versus a bad one. This could give twice the information but the AI would be expensive, require a lot of engineering effort and once completed would only be able to guess the game of Go whereas the delta compression method sounds like it might be useful for more than just Go. Let's compare the methods in a matrix. This shows GZip, delta and AI. We have the compression ratio which is how small the file is, performance is how fast we can read and write the file and the difficulty is the level of engineering experience it take. A checkmark goes to the best and an X to the worse and no mark for the compressor in between. The delta is at a sweet spot and not the worst at any category. If we were to assign a score of plus 1 for being the best at something and minus 1 for being the worse delta compression would come out on top with a score of 1. Gzip second with a score of 0. And AI last with a score of negative 1. The overall score hard playmateer because where Gzip wins is in the difficulty category. -- matter. You get a lot with minimum effort using something like GZIP. Effort is important for working professionals under tight deadlines. I would go so far as to say many of us code in a culture that is hostile to high performance programming methods and this is true when those gains come with any engineering cost. They are not likely to be criticize by peers for using GZIP whereas the delta compression method requires a fair bit of custom code. What if we could use the checkmark from GZIP to the delta compression method? If we could do that, then the delta compression method would dominate GZIP and that is the aspiration of Tree-Buf. If you have followed soy far in understanding how the delta compression method works you are already almost there in understanding Tree-Buf. If we forget about the details and look at the delta compression methods underlying principles we find the essence of Tree-Buf. The first thing we did when applying our custom design delta compression method was to separate the X and Y coordinate storages. Tree-Buf generalizes this to the entire schema. If we were going to extend from just the X and Y cord to a whole term innocent it might look like this. At the top we have the root element tournament which is a type struct. It has three methods. If you follow that through all of the moves of the games and the coordinates down to the bottom row there is an X and Y property which are bufferers holding all of the games and another buffer containing all the Y buffers of all the games in the tournament. This a tree of buffers hence the name. This brings locality to data that is semantically related and of the same type. This transformation is only possible if you know the schema of the data being written. The next thing we know with the compression is we applied a type aware arranging the data. Writing the deltas and crafting ints was only possible because we knew the bytes were UAs and not strings where subtracting JSON characters produces nonsense. Tree-Buf generalizes this principle and uses type aware compression methods for the different kinds of data in the tree. Since no compression method is one-size-fits-all it even spends some performance trying a new techniques on the sample of the data from each buffer. The result approximates a hand rolled file format that generally understands your data. What we have is fantastic performance and compression. What about ease of use and engineering effort? I claim it is easier to use Tree-Buf than GZIP? Yes. The trick is that GZIP is not by itself a serialization format. Using GZIP assumes you already have some method for writing structured data like protobuff or CSV or Message Pack or whatever. Using GZIP also entails introducing a second step. Writing a Tree-Buf file is one step. The Rust implementation has an API very much like Saturday. You just put in code or decode attributes on your structs and call the data. There is no impass using another dependency and format. It is just the same amount of work it would take to use any serialization format. Tree-Buf is easy to use. How does do on compression and performance? Time to look at benchmarks. This benchmark will use real-world production data served by the graph, a decentralized indexing and query protocol for blockchain data. For this, a GraphQL was made to an indexer from the graph for 1,000 recent wearable entity auctions. Each entity and the response looks something like this. There are many properties of different types. There are nested objects, arrays and thousands of other entities like this one but with a cardinality in the data that reflects a real-world distribution of values. What we measured is relative CPU time to roundtrip the data through serialized and deserialized. As described by message pack they are like JSON but fast and small. I have chosen this format because message pack is smaller and faster than JSON and is self-describing like JSON which works well for GraphQL. Tree-Buf is also self-describing which means you could open up and read any Tree-Buf file without requiring a separate schema to interpret the data also making it a good fit for GraphQL. The sets are similar enough that we can't attribute in the results to a difference in capabilities. Side stepping the argument that schemas are necessary for performance. Here are the results. The big green box's height is how long it takes the CPU to roundtrip the message back file. Its width is the size of the file and bytes. The message pack file is more than 17 times as large as the Tree-Buf file and takes more than twice as long to serialize and de-serialize. The improvements are significant. Considering that the first thing Tree-Buf has to do is reorganize your data into a tree of buffers before starting and reverse that information when reading the data it has no right to match the speed of message pack much less significantly out perform it. The answers have everything to do with data dependencies and choices made in representing the data as bytes. Everything we just covered. Let's take a look at a different dataset. For this benchmark we will consider geoJSON for serializing a list of all the countries. The dataset includes things like the countries name and the bulk of the data is in the polygons that describe the order. Geo-JSON is a compact formula because it doesn't describe each point with redundant tags like longitude and latitude repeated over and over but instead opts to score that data in a giant nested array to minimize overhead. Here are the results. The green box is geo-JSON and the blue box is Tree-Buf. Tree-Buf is more than 10 times as fast as geo-JSON. And it produces a file that is less than one third of the size. The red box is what we get if we opt into the lofty float compression which allows us to specify how much permission is necessary to represent the dataset democrat. The resulting file compresses down to less,000 one tenth the side without sacrificing speed. We have seen how the choices in the representation of data can have a significant impact on the speed, size, and engineering effort and capabilities. These impacts are not restricted to the cases we have studied but effect every software engineering problem. There are many capabilities that you can design into representations that we did not explore today. Do consider serialization and representation as first class citizens next to algorithms and code structure. If you use the proper tools to parse and manipulate the data you will be surprised by the impact. Thank you. -MODERATOR: Thank you, Zac. Your serialization is much deeper, to be honest, and I understand now. Yeah, thank you for such a deep presentation. We have three minutes? No questions by nowism do you have anything to add to your presentation or comment? Or additional message? -ZAC: I am sorry I didn't make it as accessible as planned. I did find it a struggle to kind of get the ideas down into something like a small package and really present them. I am sorry it wasn't as easy to follow as I hoped when I planned for the talk. -MODERATOR: Yeah, no, no worries about that. It has a lot of, I say, case studies, so it should be -- to be honest, I need some more time to digest your representation I know. I understand how important actually serialization should be in programming. That I think is quite an important message. I agree. ->> Yeah, that really is the focus of the talk. If you want to focus on the problem and if people are interested in that kind of thing there is other interesting presentations that you could watch. I would recommend watching data oriented programming by Mike Acton. He talks about a lot of things in the same term so that's interesting. Definitely follow there and fall in line with data oriented programming. There is a lot to learn in that field. -MODERATOR: We have one question. Is there anything Tree-Buf is bad for? -ZAC: Sure. Tree-Buf is taking advantage of being able to like find predictability in data with arrays. If you -- if you want to do server to server communication for things which do not contain arrays, maybe something like protobuff would be better for that. It tries hard not to be bad in that case where there are no arrays but there are some fundamental tradeoffs that wherever Tree-Buf can optimize with arrays. We are doing pretty well in that area with other serialization formats. -MODERATOR: And we are running out of time. OK. -ZAC: I am going to stick around in the chat so if anyone has questions there, I will answer them. -MODERATOR: Thanks, again, Zac for the great presentation. And the next session is starting in 10 minutes, I believe. I will see you later. \ No newline at end of file diff --git a/2020-global/talks/01_APAC/07-Jin-Mingjian-published.md b/2020-global/talks/01_APAC/07-Jin-Mingjian-published.md new file mode 100644 index 0000000..51007e3 --- /dev/null +++ b/2020-global/talks/01_APAC/07-Jin-Mingjian-published.md @@ -0,0 +1,36 @@ +**Architect a High-performance SQL Query Engine in Rust** + +**Bard:** +Jin Mingjian uses Rust to enhance +some database apps' performance +as he breaks apart +the state of the art +to make hashtables and b-trees dance + + +**Jin:** +I hope to leave time to ask questions but if not you can reach out. Related works compare some projects. In Rust, it is called a data fusion which is used Apache arrow to require data image. The problem of age is still traditional. OK. Another paper named how to architecture a query compiler revisited which is paper only but inspires tensorbase.io. OK. Another projection from the work uses dedicated weld IR for data. It has abstract overhead and deep binding to the LLVM. Here is an architecture of tensorbase. It is a graph of language systems like Rust. I will talk about this today. + +Because this is Rustfest I will talk a little bit more about Rust. Rust is great. TensorBase benefit from parts of Rust's ecosystem. One thing I want to say is TensorBase is a one-person project over several months. In the TensorBase we are forced to decide performance in the corement we do modernization. We hope to keep a good signal so it is highly hackable. For Cargo it is the future in Rust. Everybody should familiarize the tooling. It is just some quick tooling. OK. Procedure macro which is great for learning Rust but it is all in problems. Here we see. What I suggest is use the nightly as possible. You can use the nightly for the future like proc macro diagnostic for debugging which is important. If you want to see from the source of TensorBase. The C interopability, having used TensorBase, thanks to the zero overhead but it has it's own problem. For example, resource management, or the error handling but for the time limit with escape. + +For concurrency it is either great. Fairly confident in Rust, right? It is nice for share-nothing thread safety in Rust. But maybe it is a little awkward with memory sharing needed. Memory sharing needed here we list some reasons. The main reason is Rust likes the memory model like Java. Here is an example that we made to take a quick look. If we want to imply the singleton like in Java or C++ we may need lazy lock which in fact can be avoided if we can have a memory model to establish this before the relationship between the change and the between the write and the read. Async await is just another feature which just a little tweak. The style is orthogonal to the performance so use it correctly. You have to. If not correctly, you may harm performance. Lifetime is an engineering excellence in Rust but it maybe make code complex. What I always recommend is to dance with rather than to evade because for high performance system you need to think carefully about resources manager. If you want to the number two way is arena allocator. We are going a little quick. + +We are back to this graph but I want to point to the corner in TensorBase is a safe organizeded component which will interact to the whole Rust system. OK. Input is just plain SQL in a parse tree. Then it transform to an IR which the IR is laid. The main reason is we want to reuse modern low-level compilation. R -- HIR is made for data relateded optimizations which cannot be handled by low levels compilers. And we will do relational algebra also. I want to point some relational algebra can be optimized by a compiler that loads this in HIR. One interesting great use is we have a unified RA operators. You may say in the traditional textbooks relational algebra operators but here we just unified into four operators. They are map, union, join, sort. Here is a prettyprinted HIR. You can see how the HIR can get transformed from the top circle query. OK. + +The core idea here is what I call the sea of pipes which unify the data and the control-flow dependencies in the graph of pipes. What is pipes? They just operator fused computing data unit. Here is just a piper. You may have heard in textbook that what kind of model it is operating model which is low and inefficient which in TensorBase so we don't use operator level volcano. We just depend on what we want to review. Low level IR is just for platform related optimization. For example, multi cores or codegen. We have parallelization representation for multi-cores. Map reduce and fork-join are there. We are talking more about this. And for linearization representation for codegen. We provide human read. The mechanics to enable and write in an elegant way to IR. OK. This the data structure in late IR. We come to the cer which is decentralized, self-scheduling, JIT compilation based kernel. Compared with the popular centralized schedule. We use decentralized. Now, two problems for centralized schedule. One is it is a single point. Single point has failure and single point limited in scalability. Lightning fast query, we have no time budget for you to initial the load for coordination. We want top performance. The compiler you may compare with the popular JIT engine because it can run on almost everything. CPU, GPU, and it is human debugable and fast enough compilation for OLAP. + +Here is a tensorbase generated kernel that you can compare with the kernel source from paper. You will see here we have max screen which is the data partition location. This is just the number of hardware inquiries in the current socket. The JIT compilation, the advantage is we can embed the runtime information in the code, which we have more optimization for compiler which cannot be done on OAT compilation. Benchmark time. Too much information here. We just give the simple summary here. You can see it later. Here with TensorBase can do the end to end inquery in the 6 to 10 times faster than C++ or OLAP database. Let's do a little point here. First, Rust is lightning fast even untuned. Because the time limit have no current compilation system. Second, C based JIT com pilation is lightning fast and it is much faster than C++ and Rust and it is quite enough for OLAP. We can do it in mere seconds. For point two, you can saturate the memory bandwidth of the in core of such runs with 100 gigabytes per second memory bandwidth. This is already memory bound. We just need the Tensor were the 60 milliseconds to scan over the memory. We just -- when we scan over the memory, we do something to get a result, and here is just you can say the sleep here is just that we can turn the query to the titleal -- title of the corner. For point three, partial compilation is a way to make the compilation time correlate to the size of the hot kernel rather than the total size of the execution codes. + +One thing I want to mention here is the height is a little high. -- overhead is a little high. Future insights. I want to point you in some direction where we are moving. One is storage layer. We have all storage layer because popular storage and the computer separation is genetically less efficient. Second, optimizer. Our core is forced to make queries can cannot be optimized fastest here. We may know to use the popular CPO but what we want is data driven and low entropy inference here. It is a little new. I want tiered C compilation. Maybe we want faster codegen. C compilation or interpretation can possibly be done in microseconds. We consider alternative to JIT compilation choices. The cranelift is a little slow as it ties on the ned. OK. + +Scheduling. OK. We may have enough time. We just leave some entry so you can think. If we can talk more. We are nearing here. Finally, the next version of TensorBase we will have main operators on single table and have storage layer V1. The biggest difference to the current version is we are provide compatibility with ClickHouse that include compatibility to ClickHouse native protocol on desktop storage. In the next version we also want to continue to superbin complex aggregations, for example, group by. Early results compared to the ClickHouse and based on the ClickHouse mergetree we can get 6-8 times faster. OK. Finally, I do a recap. Abstract overhead is everywhere. We should carefully make trade-offs in performance or future. Sometimes you can learn some little performance -- if you want more performance. Second is high performance programming in paradigm in Rust. You can, in fact we do not need to reject and save. If you hit control and save in a -- top performance OLAP is firstly achieved with engineering Rust and all shown can be picked up from the Open Source tensorbase.io. OK. Thanks. OK. Any questions? We do it quick because it is late. The time is limited. + + +**Moderator:** +Jin, thanks for the presentation. We have a question from the chat which is are there any Rust paradigms that have been getting in the way for this project? Or has it been basically positive? + +**Jin:** +I see the question. In fact, we have used the raster paradigm in the TensorBase. My so-called high performance - we have some low level inquiries here. How singleton. In fact, it is not a singleton because we need lock. It is impacting some problems. We can improve in Rust. OK. We, in fact, in my presentation we just -- may I present here is the problem we could improve in Rust. For example, left off is a great idea/concept but sometimes when the compiler nodes select especially before the -- in IRR no code left is introduced. We have swung the limit on the lifetime. We have two sort many ways to complete the problem. We are getting better. In fact, the communicator is continuing to improve the problem on more ecosystem. Sometimes you need the workaround but sometimes you may still consider the workaround. For example, it is because sometimes we just not leave the time for us. We just unlock the object as we want. When in the IR phase, when we parse the phase, we dispose the object and the allocatur together. So it is a nice. We don't think much more. Basically my experience from roster is a positive. With the engineering Rust tooling, the semantic language here we mentioned here but you can find more on the open source repo. + +**Moderator:** +OK. Thanks for answering. We are running out of time. So thanks again for the great presentation about TensorBase. Thank you. diff --git a/2020-global/talks/01_APAC/07-Jin-Mingjian.txt b/2020-global/talks/01_APAC/07-Jin-Mingjian.txt deleted file mode 100644 index b42e452..0000000 --- a/2020-global/talks/01_APAC/07-Jin-Mingjian.txt +++ /dev/null @@ -1,6 +0,0 @@ - ->> Jin Mingjian uses Rust to enhance some database apps performance as he breaks apart the state-of-the-art to make half tables and petri stands. ->> Jin: I hope to leave time to ask questions but if not you can reach out. Related works compare some projects. In Rust, it is called a data fusion which is used Apache arrow to require data image. The problem of age is still traditional. OK. Another paper named how to architecture a query compiler revisited which is paper only but inspires tensorbase.io. OK. Another projection from the work uses dedicated weld IR for data. It has abstract overhead and deep binding to the LLVM. Here is an architecture of tensorbase. It is a graph of language systems like Rust. I will talk about this today. Because this is Rustfest I will talk a little bit more about Rust. Rust is great. TensorBase benefit from parts of Rust's ecosystem. One thing I want to say is TensorBase is a one-person project over several months. In the TensorBase we are forced to decide performance in the corement we do modernization. We hope to keep a good signal so it is highly hackable. For Cargo it is the future in Rust. Everybody should familiarize the tooling. It is just some quick tooling. OK. Procedure macro which is great for learning Rust but it is all in problems. Here we see. What I suggest is use the nightly as possible. You can use the nightly for the future like proc macro diagnostic for debugging which is important. If you want to see from the source of TensorBase. The C interopability, having used TensorBase, thanks to the zero overhead but it has it's own problem. For example, resource management, or the error handling but for the time limit with escape. For concurrency it is either great. Fairly confident in Rust, right? It is nice for share-nothing thread safety in Rust. But maybe it is a little awkward with memory sharing needed. Memory sharing needed here we list some reasons. The main reason is Rust likes the memory model like Java. Here is an example that we made to take a quick look. If we want to imply the singleton like in Java or C++ we may need lazy lock which in fact can be avoided if we can have a memory model to establish this before the relationship between the change and the between the write and the read. Async await is just another feature which just a little tweak. The style is orthogonal to the performance so use it correctly. You have to. If not correctly, you may harm performance. Lifetime is an engineering excellence in Rust but it maybe make code complex. What I always recommend is to dance with rather than to evade because for high performance system you need to think carefully about resources manager. If you want to the number two way is arena allocatur. We are going a little quick. We are back to this graph but I want to point to the corner in TensorBase is a safe organizeded component which will interact to the whole Rust system. OK. Input is just plain SQL in a parse tree. Then it transform to an IR which the IR is laid. The main reason is we want to reuse modern low-level compilation. R -- HIR is made for data relateded optimizations which cannot be handled by low levels compilers. And we will do relational algebra also. I want to point some relational algebra can be optimized by a compiler that loads this in HIR. One interesting great use is we have a unified RA operators. You may say in the traditional textbooks relational algebra operators but here we just unified into four operators. They are map, union, join, sort. Here is a prettyprinted HIR. You can see how the HIR can get transformed from the top circle query. OK. The core idea here is what I call the sea of pipes which unify the data and the control-flow dependencies in the graph of pipes. What is pipes? They just operator fused computing data unit. Here is just a piper. You may have heard in textbook that what kind of model it is operating model which is low and inefficient which in TensorBase so we don't use operator level volcano. We just depend on what we want to review. Low level IR is just for platform related optimization. For example, multi cores or codegen. We have parallelization representation for multi-cores. Map reduce and fork-join are there. We are talking more about this. And for linearization representation for codegen. We provide human read. The mechanics to enable and write in an elegant way to IR. OK. This the data structure in late IR. We come to the cer which is decentralized, self-scheduling, JIT compilation based kernel. Compared with the popular centralized schedule. We use decentralized. Now, two problems for centralized schedule. One is it is a single point. Single point has failure and single point limited in scalability. Lightning fast query, we have no time budget for you to initial the load for coordination. We want top performance. The compiler you may compare with the popular JIT engine because it can run on almost everything. CPU, GPU, and it is human debugable and fast enough compilation for OLAP. Here is a tensorbase generated kernel that you can compare with the kernel source from paper. You will see here we have max screen which is the data partition location. This is just the number of hardware inquiries in the current socket. The JIT compilation, the advantage is we can embed the runtime information in the code, which we have more optimization for compiler which cannot be done on OAT compilation. Benchmark time. Too much information here. We just give the simple summary here. You can see it later. Here with TensorBase can do the end to end inquery in the 6 to 10 times faster than C++ or OLAP database. Let's do a little point here. First, Rust is lightning fast even untuned. Because the time limit have no current compilation system. Second, C based JIT com pilation is lightning fast and it is much faster than C++ and Rust and it is quite enough for OLAP. We can do it in mere seconds. For point two, you can saturate the memory bandwidth of the in core of such runs with 100 gigabytes per second memory bandwidth. This is already memory bound. We just need the Tensor were the 60 milliseconds to scan over the memory. We just -- when we scan over the memory, we do something to get a result, and here is just you can say the sleep here is just that we can turn the query to the titleal -- title of the corner. For point three, partial compilation is a way to make the compilation time correlate to the size of the hot kernel rather than the total size of the execution codes. One thing I want to mention here is the height is a little high. -- overhead is a little high. Future insights. I want to point you in some direction where we are moving. One is storage layer. We have all storage layer because popular storage and the computer separation is genetically less efficient. Second, optimizer. Our core is forced to make queries can cannot be optimized fastest here. We may know to use the popular CPO but what we want is data driven and low entropy inference here. It is a little new. I want tiered C compilation. Maybe we want faster codegen. C compilation or interpretation can possibly be done in microseconds. We consider alternative to JIT compilation choices. The cranelift is a little slow as it ties on the ned. OK. Scheduling. OK. We may have enough time. We just leave some entry so you can think. If we can talk more. We are nearing here. Finally, the next version of TensorBase we will have main operators on single table and have storage layer V1. The biggest difference to the current version is we are provide compatibility with ClickHouse that include compatibility to ClickHouse native protocol on desktop storage. In the next version we also want to continue to superbin complex aggregations, for example, group by. Early results compared to the ClickHouse and based on the ClickHouse mergetree we can get 6-8 times faster. OK. Finally, I do a recap. Abstract overhead is everywhere. We should carefully make trade-offs in performance or future. Sometimes you can learn some little performance -- if you want more performance. Second is high performance programming in paradigm in Rust. You can, in fact we do not need to reject and save. If you hit control and save in a -- top performance OLAP is firstly achieved with engineering Rust and all shown can be picked up from the Open Source tensorbase.io. OK. Thanks. OK. Any questions? We do it quick because it is late. The time is limited. -MODERATOR: Jin, thanks for the presentation. We have a question from the chat which is are there any Rust paradigms that have been getting in the way for this project? Or has it been basically positive? ->> Jin: I see the question. In fact, we have used the raster paradigm in the TensorBase. My so-called high performance - we have some low level inquiries here. How singleton. In fact, it is not a singleton because we need lock. It is impacting some problems. We can improve in Rust. OK. We, in fact, in my presentation we just -- may I present here is the problem we could improve in Rust. For example, left off is a great idea/concept but sometimes when the compiler nodes select especially before the -- in IRR no code left is introduced. We have swung the limit on the lifetime. We have two sort many ways to complete the problem. We are getting better. In fact, the communicator is continuing to improve the problem on more ecosystem. Sometimes you need the workaround but sometimes you may still consider the workaround. For example, it is because sometimes we just not leave the time for us. We just unlock the object as we want. When in the IR phase, when we parse the phase, we dispose the object and the allocatur together. So it is a nice. We don't think much more. Basically my experience from roster is a positive. With the engineering Rust tooling, the semantic language here we mentioned here but you can find more on the open source repo. -MODERATOR: OK. Thanks for answering. We are running out of time. So thanks again for the great presentation about TensorBase. Thank you. And actually, this is the last session of the APAC block. I hope you enjoy really well all sessions. I have two announcements. First is Rustfest is still going. UTC block is starting in two and a half hours. It is going to start 6:00 p.m. in Japan time. That is the first announcement so stay tuned. The second announcement is we have another artist act after this session. It should be starting. Please enjoy it. OK. Thanks again for joining the conference and thanks for everything. Thank you so much for making this conference happen in the APAC block. Thank you. See you later in the UTC \ No newline at end of file diff --git a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md new file mode 100644 index 0000000..9f96acf --- /dev/null +++ b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md @@ -0,0 +1,136 @@ +**Learnable Programming with Rust** + +**Bard:** +Nikita makes Rust interactive +so if learning it is your directive +you won't need to fight +to see what's inside +to become a debugging detective + + +**Nikita:** +Hi, I'm Nikita, and today I'm going to share with you a different approach to writing documentation and educating people about programming in Rust. +Let's start by dephenotyping the problem. + +What exactly is learnable programming? It is defined by Brett Victor in essay of the same name, as a set of design principles that helps programmers to understand and see how their programs execute. +And some of the defining characteristics are of this idea are seeing the state of a running program, and giving the programmer tools to easily play with their code. +And these things have become available and important when they apply to the problems of education and writing documentation because they can help to lower the entry barriers for people who are new to programming, and, today, I'm going to show you how we can apply these ideas to Rust and systems programming in particular. +So, let's see how it it looks in action. + +Let's imagine you're someone new to asynchronous programming in Rust and you want to learn more about it. +So you go to the Tokio's website, and you see they have really nice tutorials. +Before you can start experimenting with asynchronous code, you need to follow a number of steps because they have prerequisites. +You need to set up a new Rust project and then add dependencies through Cargo. +Then you can take a look at the code provided in this example, and this code connects to our Redis-like server, and sends a key to "hello with value "world" and gets the request with the same key and verifies that it has the expected volume. +Before you can run this example, you need to make sure that a mini Redis is running on your machine. + +Why don't we try including all the necessary dependencies right there on the same page, so that a user can immediately run this code and see the results? This is how it might look. +I go to the same example and I click on the Run button. +This is compiled, and, on the right side, I can see a key "hello" has been set to value "world". +This gives the user an immediate context and response of what happens with their code and the state of the mini Redis server can be observed right there in the browser. + +But we can take this idea a little bit further and make this code example editable, so I can change the value of the key and run this code again, and I can see that the state has also changed. +Or, I can add a new key while you pair "hey, RustFest" and you can see there is a new key now. +This approaches requires a minimal set-up so I can start exploring the Tokio API straightaway and I believe this is something we should do to make Rust more accessible to more people. +Of course, setting up a new project is also an important skill, but, the first impression to show that this whole thing is not really that complicated is also important, I think. + +And, well, it's not some kind of magic, because we already can make it happen. +We are able to run the Rust code in the browser by using assembly, and the platform support is constantly expanded and improved in the compiler so we can make it work quite easily. +When you click on the Run button, the code that you have in the editor is sent to the server where it is compiled by the Rust compiler, and the model is produced as a result. +This model is then executed in your browser, and you can see the output. + +This approach can be expanded way beyond Tokio or any other single library, because we can use it to enhance documentation that is produced by Rust doc automatically. +This can be used in interactive playground in which you can execute in your web browser. +And, while it already kind of works like that with the standard library documentation because if you go, for example, to the page that documents ... +we can click on the run button in any of the examples and you will be directed to the Rust playground where we can see the output. + +Of course, you can also play with this code when and what I think it, but what if we make it simpler by running examples in the browser on the same page, and showing the output besides the code? This will make it easier for people to just immediately see the output without switching the context. +But there is a problem if we go beyond the standard libraries. + +So the problem is how do we deal with dependencies, and especially those dependencies which won't compile? The thing is we don't really have access to dependencies on the Rust playground either, because Rust Playground has a limited set available to us because it's an isolated environment and that's an expected thing because the Rust Playground suits the code on their server, and they want to make sure that the code is not malicious, and they limit things like inputs and outputs, and loading the external dependencies. +On the other hand, this makes it harder for us, and practically impossible to run examples from codebases, or even from public crates, and it makes it harder for us to mentor and educate people through examples. +WebAssembly does not have this problem. + +The sandboxed environment is not someone else's machine or server, but it's your own web browser, and this sandbox is secured by definition. +But the main problem remains: not all Rust code compiles down yet. +Even if it don't, we can argue that this is a good thing, because if you want to add this kind of interactivity from your project, it will also incentivise you to make your code more portable, and make it compilable into a new target. + +When you write tests for code that, for example, sends network requests, usually what you want to do is to - meaning your code won't really send real network requests, but instead it will pretend to do so. +So, you can test the behaviour of your code in isolation from the external environment, and that exactly the same thing that we want to do with our interactive documentation too, because we want to give the user the option to run their code independent of the external environment. +The main thing so look for here is the API performance, because we want to make sure that mock functions that we call have the exact same interface as your real code. +And while you can do this manually by using feature flags, for example, or you can use one of the many Rust libraries for automatic mocking, but the main idea remains, that you want to provide the same interface, both in your real code and in the mocked version. +And this idea of running the Playground code in the browser can go a lot further, because we have lots of techniques for visualising data. +So, for example, on this slide, you can see some code from the documentation of an HTTP client library called Request. +It sends a get request and outputs the response. + +This example demonstrates only the abstract idea of sending an HTTP request, but how do we know how it actually works? And what kind of request it actually sends? In this case, we can output the HTTP request, and it is really helpful to see what kind of request gets sent when you send this code because it can help you learn more about HTTP, and, more than that, it can also help to you debug problems, because with an interactive Playground like this, you can replace the code with your own code and observe the changes, and observe its behaviour. +And the good news is that a lot of cross libraries already do something like that with tracing, and logging, but to use that, you also have to choose a logging library, and enable it to just get the log output. + +With WebAssembly enabled, Playground, we just need to provide our own login library that will redirect output from the logs to a user's browser. +And the user's code can also be seen as data that we can visualise and inspect, and some of the components of the Rust compiler are already available as libraries and there are even some for creates that can be used for parsing the Rust code. +Almost all of them can be built in the WebAssembly so we can parse the code and extract the information that we can use to show hints to our user, even before they execute their code. +It can make our examples even more clear, because these hints can be shown as a user types code, and they can provide context information almost like in the case of IDs. +On this slide, you can see the request documentation again and it encodes parameters as HTTP form data, and, as you look at this code in the editor, you can see what result this or that function returns with your chosen hints because this compiler provides us with information about the function names, and types that are used there, and all the kinds of expressions, and it is really straightforward work with code as data because all expressions are actually variants of one large - so you can use pattern matching to work with this enum as you normally would do with Rust. + +There is one more thing that we can do this year, and this is highlighting the code execution flow. +It can be compared to how you work with the debugger which can go through a program step by step, and while it goes through the program in this fashion, it also can show the internal state of the program at this particular point in time, and we can do the same thing for our interactive Playgrounds, because, it can really help in understanding the cases like, for example, ... +and on this side we can see an example from the Rust standard library documentation to instruct the enumerators in a functional style, and while it makes some sense, it's hard to see what kind of output we will get when, for example, we call the filter function on the third line. + +But we can give a user this to go through the example line-by-line, while also showing what each function will do. +And we can also combine this technique with the others we have discussed so far, because this time, we can have access not only to the static context information that is provided by the compiler, but also to the dynamic state at the runtime. +And we can display data like local variables, or function call results, and as a user steps through the code, it becomes really easy to see how the state changes with each function call. +With asynchronous code, this approach can really make a difference in understanding how it works. + +If we treat each step with its own function call, we can do an interesting thing here. +We can record each state at each step and move it forwards and backwards in time. +Since we are talking mainly about things like short snippets of code, and examples instead of, like, large codebases, it's possible to give a user a kind of a slider to go back and forth to really see the execution flow, or they can just jump straight to a point in time that is interesting to them, just by clicking on the slider. +And, again, this is not some sort of magic, because even if we don't have an immediate access to the WebAssembly execution engine in the browser, and we don't have a kind of fine-grained control over the execution, we can transform through the compilation step, and we can do that even without modifying and talking to the Rust compiler. + +This transformation technique is actually pretty well known, and even the Rust compiler itself uses it for asynchronous code. +It works by basically breaking down each block into individual function that represents a single execution step. + +This is known as continuation, and it means that we can continue the execution of a function from the point that we left it at. +And Rust has an unstable feature called Generators, and this is used for using the Async/Await syntax. +While it works almost exactly as you see it on the slide, so we have an struct that holds the variable state and local variables, and each block is transformed into a separate function. + +So, when you want to execute this snippet, all you have to do is to call these functions one by one, and the state changes. +So these functions can be called from the browser, and we are very flexible in choosing in what order we call them, and what kind of state we record. + +So far, we have discussed some implementation details for these ideas, but, overarchingly, how do we actually make it work? And how to make it accessible to everyone? And there are several problems to solve here, and not all of them are technical. +So, first of all, the Rust compiler itself cannot be built into the WebAssembly model yet, so this requires us to have a compiler service that will build arbitrary Rust code into the WebAssembly for everyone. + +So we already have something like that on the Rust Playground, so it's not that big of a deal, and, well, so I tried implementing this idea in production for a systems programming course, and surprisingly, this infrastructure is not really hard to provide, and it's not expensive, because a single CPU optimised virtual machine can easily handle hundreds of thousands of compilation requests per day, but, still, we need to make sure that this infrastructure is easily available, and it should be possible to deploy it, to deploy this compilation server locally and privately. + +There is another problem that we have discussed briefly. +If we start to include dependencies in our small code snippets, the compilation will take a large amount of time, and resources, and the resulting module will have a huge site easily taking up several megabytes, making it harder to scale. +While the obvious solution here is to compile these dependencies as separate assembly models instead and link the dependencies when we need them, but, this problem is made worse by the fact that there is no dynamic linking standard for the WebAssembly. + +So you're basically left with the only choice of statically compiling the dependencies. +But, technically, the linking is still possible, even though there is no standard way of doing it. +It's possible to make two of the assembly models work together. +Each model consists of functions that it exports, and that would be our Rust functions, and it also has functions that are declared as imported, and these imported functions are provided by the caller, and usually they're used for calling JavaScript functions from the Rust code, and this is what makes it possible for Rust programs to interact, for example, with DOM and browser functions. + +We can use this trick. +When Rust module A calls some imported function, what we are going to do is to call an exported function from Rust module B, and this works, but it works, but there is another problem with it. +Each model has its own separate memory, and this memory is not shared between the modules, and this means that if an imported Rust function tries to access its state when it is executed, it will fail because its memory reading does not contain the expected data. + +What we will need to do is to basically copy memory from module A to module B before we can call an imported function. +The main disadvantage of this approach is of course that it is not standard, and can be currently, it requires manual implementation. +Ideally, the Rust compiler should take care of this for us, but for now, to make it work in the general case, we will need to find a way to automatically link Rust dependencies which are compiled as WebAssembly modules. +Now that we have covered all this, what is the conclusion? I think that it is well worth the effort to make our documentation interactive, because, it will help us to bring more people into Rust, and part of the reason why JavaScript is so popular is that it is so easy to use it and access it. +You don't need any complicated set-up. + +All you have to do is open a console in your web browser, and you can start writing and experimenting with code. +With WebAssembly, we have a really good chance of bringing this simplicity into the Rust world. +But we still have a long way ahead of us, because the WebAssembly ecosystem has not matured yet. +But still, we can try our best in making this as easy as possible to have add these sort of interactive elements to any project, because we have tools like Rust Doc which can generate documentation from the source code. +What we need to do is to improve Rust Doc to automatically generate Playgrounds for our documentation, and we also need to have a tool kit to simplify building these interactive playgrounds. +The good news is that we don't have to start from scratch. +The most crucial part is, actually, the compiling of the code into WebAssembly, and the Rust compiler has got us covered there. + +We just need to build the tooling around it. +I have started a project that contains the Docker image for the compiler service and some front-end components. +You can find the repository at the address you see on the slide, so, if you're interested, you can join the development effort, and help to make this a reality. +And that is all from me, and thanks for your attention, and I will be answering your questions in the chat room. + +Thank you! diff --git a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt deleted file mode 100644 index b9b5e2a..0000000 --- a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt +++ /dev/null @@ -1,88 +0,0 @@ -Welcome, and Learnable Programming with Rust - Nikita Baksalyar -[Music] -> Hello! -> Good morning, everyone. -> Hello, good morning. -> So how did you all like part one? I mean, I saw there was a lot of activity on the chat, so I'm guessing people stayed up for that, and, of course, APAC enjoyed that. -> Some people tried the snake game. You managed to crash it after 2.5 hours! -> I mean, that is a lot of dedication, though! -> Some lasted 2.5 hours. -> I'm pretty proud, and it was Rust, so, hey, I crashed here. It's fixed again, so, please, crash it again! [Laughter]. -> Not the snakes, just the game. -> Just the game. -> All right, so we have Stefan: would you like to do a bit of an intro? We have some stuff to go through before we get started with the talks. Got to get the preparatory caffeine in first. -> Ready! All right! Here we go. -> That's just me! -> What's up! So, welcome. So this will be fun, because there are three of us, so we will keep talking all over each other. Welcome to the second block of RustFest. Welcome to the UTC, or Europe, or euro-African time zones. What is RustFest all about? It started out as the community conference in Berlin, went to Kiev, to Zrich, to - I'm getting this wrong - Paris, then Rome, then Barcelona, and now we are in the Netherlands somewhat. -> Kind of! -> Or I am! -> Jeske presenting! -> Yes, so, it's about meeting people, connecting with people. That's why we have a huge amount of chat rooms. I don't know if you've seen that, but if you're watching us on RustFest Global, there is a house icon, and then on the right side, it opens a box, and the chat, and you can enter each - for each talk, there is a chat room that you can enter. Click on it, and then Ferrous, our trustee admin bot will invite you to the room, and you can ask questions there. There are also moderators. -> Yes, there are moderators, and we also have - I mean, for this round, we have life scope of noting. If you go - *we have live sketch noting. You can follow Malwine doing live sketch notes which is super cool - not just presenting it at the end, but as far as I understand, you get to see stroke by stroke, and like the erasing and everything. I know I'm going to be watching that. I'm going to be watching that when I'm not presenting someone! -> Yes. -> You can see yourself being drawn! -> Hmm. -> That's pretty cool. -> So what - there are also buttons on the bottom. One is the snake game, the other is the sponsor, and there is a little "e" with a slash which has the live transcriptions, which I hope people that cannot hear me find on their own. -> I think it will be announced on the chat anyway, and it's been shared, so, hopefully, the people that need it can find it. I actually really prefer using - you know, it's a global conference, not everyone is a native speaker, so I find it so cool that we have this, because not everyone can understand accents, or - you know? It's so helpful to have. Thank you, White Coat for joining us! -> Thank you. All right. We talked about this, multiple communities. If you allow me this political side note, thanks to certain countries and their politics, Europe has benefited from that, since we have speakers from all continents from the very start, which is great for us. And now, it's live. All over the planet! So, RustFest isn't just for super high-tech people that work on the compiler, it's for everybody, actually. The next talk, which Jeske will present, is very beginner-friendly, in my opinion. You were going to say something afterwards? -> From the three of us and the most senior level, so very suitable, I would say, it's a good start to the early morning here in Europe. -> I mean, absolutely, and I think that's part of the reason why this is such a - like even the technology might be complex when you go deeper. The community is so wonderful. I didn't know anything about Rust when I first attended RustFest in Paris, and, like, oh, such a welcoming, amazing community, and it's so sad that we can't all be together to celebrate and hug, and just have an amazing time together, but we are all here, so, you know, definitely join the chat, tweet out stuff. And let us know how you're enjoying it. I don't know, take a selfie of you watching it at home in your PJs! -> I think the screens are our Twitter handle, so if you have any questions to us, just tag us, mention us, and we will respond. -> Or ask @RustFest and then the whole team can respond. -> If you have something super cool to share with everyone, they might retweet it too. -> Yes, 21 talks, all confirmed. 24 speakers, 12 artists, three teams. I think we are in eight time zones right now, so, ... - -> Thank you, day light saving time! -> Yes. Vote for normal time if you get the chance too! So these all the artists. In our block, it's Earth to Abigail after, and Aesthr. -> DJ Dibs. We are super lucky. This is so cool. I'm so excited about this, especially because it's global, we get to have artists from around the world, and that it's much easier for everyone to join. Not everyone has to drag around their whole kit. You've seen when artists attend conferences, that they have to bring everything with them, they get stopped at the airport because of the electronics, and everything. Now we just all get to share, and enjoy, so that is super cool. I'm really excited about this. I hope you all really enjoy it. This is appropriate. There are some/ing lights. -> Artists have lights that might be triggering for photosensitive people. If you are not comfortable, if it is unsafe for you, it's better to opt out. We just want you all to be okay. You could also listen in, because there will be musical performances too. -> Like minimise the tab, or something. -> Stay away from the blinky stuff! -> Pilar already said it would be that we have a wonderful community, and part of that is because we have a code of conduct, and, lucky us, we have hardly ever had to enforce it, because people think about it, and they say, "Okay, I can agree with that." It's basically be very kind, and assume the best of people. Especially keep in mind that most people like us have English as their second, third, or fourth language, so - -> For sure, be considerate. Because this is a global edition, you might be impacting people from a different culture, upbringing, different understanding of the world. -> Completely different sleep level! -> [Laughter]. That's true! If you've been watching from the beginning, oh, boy! I'm going to be extra kind! [Laughter]. We urge you please to read the code of conduct. If there are any misunderstandings, if you need any help, please reach out to us. Us three will be probably a bit busy, but for sure, the team are out on the chat. Please reach out. We're there for you. -> We don't have any - like when you scroll down in the chat system, and you can see the administer or moderator. If you text Ferrous, you may not get an answer! -> I think it would be good to read it out so we can have it in our transcription because these slides cannot be screen read, as far as I know. -> Would you mind? -> So our code of conduct: RustFest is dedicated to provide a harassment-free conference experience for everyone. Harassment includes, amongst other things, offensive comments related to gender, gender identity, and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion, technology choices, deliberate intimidation, stalking, and unwelcome sexual attention. Thank you all for listening to that. Hopefully, you've read it on the code of conduct, and you're familiarised with it. Yes, be kind to each other. Please be respectful. We are all here to have a good time. -> Thank you. -> I like that we have these so ingrained that we keep on going over things that we've already said! -> Wonderful. So, we go ahead. The schedule is online. On a side note, if your browser detects the wrong time zone, you can change that on the top of the page. But, on certain browsers, this dropdown menu takes quite some time to generate, so if you click and nothing happens, just wait, let's say up to 15 seconds, and then you can select your time zone, and then it will recalculate all the times to your local time zone so you don't have to do the math in your head. -> The chat that we've been talking about over and over Chen, rustch.at, which is funny if you're in the German area, but it's really fun. -> It's matrix-based, and most people have been able to log on. Otherwise drop us an email or tweet. -> So this is the - I have to drink something! -> Hydrate! So this is the APAC team who just signed off. We're so grateful to each and every one of them. You know, they've been pulling all of their weight, and more, for this edition, and they're hopefully taking it easy now. If not, you know, indulging in watching a little bit of the UTC timeline. Stefan, did you want to say something? -> No, no. Same. Thank you very much! -> Big applause to them this they did a lot of hard work. -> So this is us. -> This is us. -> Maybe we can say also, because I think we already said, that we're all on the same time zone, for sure, but Pilar, I'm in Amsterdam, so whereabouts are you? -> You can see I'm the stand-out name in the team, originally from Chile, but I'm based near Vienna. I live in - if you know Lord of the Rings, I live in the shire, in the middle of nowhere in Austria. -> You have a lot of dogs there? -> They might make an appearance every now and again. The mail is coming. So I've banned them from the streaming room! But if they behave ... maybe they can pop on later and say hi. What about you, Stefan? -> I'm currently in Switzerland, in Zrich. It's quite nice here. We have fibre optic cable to the house, so -. -> The luck of some of us. -> The luck of some of us. The internet is really fast and close here. -> That is also the reason why you chose this place, right? -> Yes, yes! Well, not just ... -> If you're going to be locked down somewhere ... -> Better have 6.5 gigabits per second. -> I think you also have to think a lot of the other members here, thank the other members here, Alexander, Jan-Erik, Flaki, and everybody else, they've all been real troupers here. -> Jan-Erik has not slept yet! -> Won't. You see him in varying levels of drowsiness. Naah, he's a superstar! -> The European time zone, that is the three of us, so we are looking forward to it. [Alarm sounds]. -> We should have finished-by-now-time. The upcoming is from Latin America. I think they're mostly on the east coast, right? -> That time zone? -> I'm not that sure. I mean, let's leave it up to them, and they will introduce themselves! -> Curious already! -> So, huge thanks to our sponsors. Thank you very much. Thanks to Coil, Embark, Parity, Mux - Mux is also part of the streaming infrastructure that you're seeing in this one. Thanks to Mozilla, Centricular, OpenSuse, Mullvad VPN, OBB, the Rail Company - Redis, TerminusDB, Nervos, Truelayer, Technolution, Iomentum, and Ferrous Systems from Berlin which you might know from the embedded Rust conference, Oxidize. Thank you very much! Now, enjoy and have fun. -> We've taken up enough time. It was lovely to get to set the stage for you all. You will see all three of us throughout. -> If you allow me, there may be T-shirts on the horizon! [Laughter]. -> We're very jealous of his T-shirt. -> I turned on my purple light. You all are way too ahead of me. Purple T-shirt ... the green screen. I just can't compete! We will leave it up to you. -> I think, indeed, as the process will follow, like every presenter will have a short intro, and then we will have a ... and then we will drive into the talk. The first talk is going to be Nikita Baksalyar from Scotland, push Rust developer tool kits forward. I'm especially into this, because as we already mentioned, I'm a beginner of Rust, so this is going to be a talk that is really good for that. So it is system engineering and decentralised systems. If we can start, I will speak to you afterwards. -> Technical problems. You know it! -> Nikita makes Rust interactive, so if learning it is your directive, you won't need to fight to see what is inside to become a bugging detective. -> Hi, I'm Nikita, and today I'm going to share with you a different approach to writing documentation and educating people about programming in Rust. Let's start by dephenotyping the problem. What exactly is learnable programming? It is defined by Brett Victor in essay of the same name, as a set of design principles that helps programmers to understand and see how their programs execute. And some of the defining characteristics are of this idea are seeing the state of a running program, and giving the programmer tools to easily play with their code. And these things have become available and important when they apply to the problems of education and writing documentation because they can help to lower the entry barriers for people who are new to programming, and, today, I'm going to show you how we can apply these ideas to Rust and systems programming in particular. So, let's see how it it looks in action. Let's imagine you're someone new to asynchronous programming in Rust and you want to learn more about it. So you go to the Tokio's website, and you see they have really nice tutorials. Before you can start experimenting with asynchronous code, you need to follow a number of steps because they have prerequisites. You need to set up a new Rust project and then add dependencies through Cargo. Then you can take a look at the code provided in this example, and this code connects to our Redis-like server, and sends a key to "hello with value "world" and gets the request with the same key and verifies that it has the expected volume. Before you can run this example, you need to make sure that a mini Redis is running on your machine. Why don't we try including all the necessary dependencies right there on the same page, so that a user can immediately run this code and see the results? This is how it might look. I go to the same example and I click on the Run button. This is compiled, and, on the right side, I can see a key "hello" has been set to value "world". This gives the user an immediate context and response of what happens with their code and the state of the mini Redis server can be observed right there in the browser. But we can take this idea a little bit further and make this code example editable, so I can change the value of the key and run this code again, and I can see that the state has also changed. Or, I can add a new key while you pair "hey, RustFest" and you can see there is a new key now. This approaches requires a minimal set-up so I can start exploring the Tokio API straightaway and I believe this is something we should do to make Rust more accessible to more people. Of course, setting up a new project is also an important skill, but, the first impression to show that this whole thing is not really that complicated is also important, I think. And, well, it's not some kind of magic, because we already can make it happen. We are able to run the Rust code in the browser by using assembly, and the platform support is constantly expanded and improved in the compiler so we can make it work quite easily. When you click on the Run button, the code that you have in the editor is sent to the server where it is compiled by the Rust compiler, and the model is produced as a result. This model is then executed in your browser, and you can see the output. This approach can be expanded way beyond Tokio or any other single library, because we can use it to enhance documentation that is produced by Rust doc automatically. This can be used in interactive playground in which you can execute in your web browser. And, while it already kind of works like that with the standard library documentation because if you go, for example, to the page that documents ... we can click on the run button in any of the examples and you will be directed to the Rust playground where we can see the output. Of course, you can also play with this code when and what I think it, but what if we make it simpler by running examples in the browser on the same page, and showing the output besides the code? This will make it easier for people to just immediately see the output without switching the context. But there is a problem if we go beyond the standard libraries. So the problem is how do we deal with dependencies, and especially those dependencies which won't compile? The thing is we don't really have access to dependencies on the Rust playground either, because Rust Playground has a limited set available to us because it's an isolated environment and that's an expected thing because the Rust Playground suits the code on their server, and they want to make sure that the code is not malicious, and they limit things like inputs and outputs, and loading the external dependencies. On the other hand, this makes it harder for us, and practically impossible to run examples from codebases, or even from public crates, and it makes it harder for us to mentor and educate people through examples. WebAssembly does not have this problem. The sandboxed environment is not someone else's machine or server, but it's your own web browser, and this sandbox is secured by definition. But the main problem remains: not all Rust code compiles down yet. Even if it don't, we can argue that this is a good thing, because if you want to add this kind of interactivity from your project, it will also incentivise you to make your code more portable, and make it compilable into a new target. When you write tests for code that, for example, sends network requests, usually what you want to do is to - meaning your code won't really send real network requests, but instead it will pretend to do so. So, you can test the behaviour of your code in isolation from the external environment, and that exactly the same thing that we want to do with our interactive documentation too, because we want to give the user the option to run their code independent of the external environment. The main thing so look for here is the API performance, because we want to make sure that mock functions that we call have the exact same interface as your real code. And while you can do this manually by using feature flags, for example, or you can use one of the many Rust libraries for automatic mocking, but the main idea remains, that you want to provide the same interface, both in your real code and in the mocked version. And this idea of running the Playground code in the browser can go a lot further, because we have lots of techniques for visualising data. So, for example, on this slide, you can see some code from the documentation of an HTTP client library called Request. It sends a get request and outputs the response. This example demonstrates only the abstract idea of sending an HTTP request, but how do we know how it actually works? And what kind of request it actually sends? In this case, we can output the HTTP request, and it is really helpful to see what kind of request gets sent when you send this code because it can help you learn more about HTTP, and, more than that, it can also help to you debug problems, because with an interactive Playground like this, you can replace the code with your own code and observe the changes, and observe its behaviour. And the good news is that a lot of cross libraries already do something like that with tracing, and logging, but to use that, you also have to choose a logging library, and enable it to just get the log output. With WebAssembly enabled, Playground, we just need to provide our own login library that will redirect output from the logs to a user's browser. And the user's code can also be seen as data that we can visualise and inspect, and some of the components of the Rust compiler are already available as libraries and there are even some for creates that can be used for parsing the Rust code. Almost all of them can be built in the WebAssembly so we can parse the code and extract the information that we can use to show hints to our user, even before they execute their code. It can make our examples even more clear, because these hints can be shown as a user types code, and they can provide context information almost like in the case of IDs. On this slide, you can see the request documentation again and it encodes parameters as HTTP form data, and, as you look at this code in the editor, you can see what result this or that function returns with your chosen hints because this compiler provides us with information about the function names, and types that are used there, and all the kinds of expressions, and it is really straightforward work with code as data because all expressions are actually variants of one large - so you can use pattern matching to work with this enum as you normally would do with Rust. There is one more thing that we can do this year, and this is highlighting the code execution flow. It can be compared to how you work with the debugger which can go through a program step by step, and while it goes through the program in this fashion, it also can show the internal state of the program at this particular point in time, and we can do the same thing for our interactive Playgrounds, because, it can really help in understanding the cases like, for example, ... and on this side we can see an example from the Rust standard library documentation to instruct the enumerators in a functional style, and while it makes some sense, it's hard to see what kind of output we will get when, for example, we call the filter function on the third line. But we can give a user this to go through the example line-by-line, while also showing what each function will do. And we can also combine this technique with the others we have discussed so far, because this time, we can have access not only to the static context information that is provided by the compiler, but also to the dynamic state at the runtime. And we can display data like local variables, or function call results, and as a user steps through the code, it becomes really easy to see how the state changes with each function call. With asynchronous code, this approach can really make a difference in understanding how it works. If we treat each step with its own function call, we can do an interesting thing here. We can record each state at each step and move it forwards and backwards in time. Since we are talking mainly about things like short snippets of code, and examples instead of, like, large codebases, it's possible to give a user a kind of a slider to go back and forth to really see the execution flow, or they can just jump straight to a point in time that is interesting to them, just by clicking on the slider. And, again, this is not some sort of magic, because even if we don't have an immediate access to the WebAssembly execution engine in the browser, and we don't have a kind of fine-grained control over the execution, we can transform through the compilation step, and we can do that even without modifying and talking to the Rust compiler. This transformation technique is actually pretty well known, and even the Rust compiler itself uses it for asynchronous code. It works by basically breaking down each block into individual function that represents a single execution step. This is known as continuation, and it means that we can continue the execution of a function from the point that we left it at. And Rust has an unstable feature called Generators, and this is used for using the Async/Await syntax. While it works almost exactly as you see it on the slide, so we have an struct that holds the variable state and local variables, and each block is transformed into a separate function. So, when you want to execute this snippet, all you have to do is to call these functions one by one, and the state changes. So these functions can be called from the browser, and we are very flexible in choosing in what order we call them, and what kind of state we record. So far, we have discussed some implementation details for these ideas, but, overarchingly, how do we actually make it work? And how to make it accessible to everyone? And there are several problems to solve here, and not all of them are technical. So, first of all, the Rust compiler itself cannot be built into the WebAssembly model yet, so this requires us to have a compiler service that will build arbitrary Rust code into the WebAssembly for everyone. So we already have something like that on the Rust Playground, so it's not that big of a deal, and, well, so I tried implementing this idea in production for a systems programming course, and surprisingly, this infrastructure is not really hard to provide, and it's not expensive, because a single CPU optimised virtual machine can easily handle hundreds of thousands of compilation requests per day, but, still, we need to make sure that this infrastructure is easily available, and it should be possible to deploy it, to deploy this compilation server locally and privately. There is another problem that we have discussed briefly. If we start to include dependencies in our small code snippets, the compilation will take a large amount of time, and resources, and the resulting module will have a huge site easily taking up several megabytes, making it harder to scale. While the obvious solution here is to compile these dependencies as separate assembly models instead and link the dependencies when we need them, but, this problem is made worse by the fact that there is no dynamic linking standard for the WebAssembly. So you're basically left with the only choice of statically compiling the dependencies. But, technically, the linking is still possible, even though there is no standard way of doing it. It's possible to make two of the assembly models work together. Each model consists of functions that it exports, and that would be our Rust functions, and it also has functions that are declared as imported, and these imported functions are provided by the caller, and usually they're used for calling JavaScript functions from the Rust code, and this is what makes it possible for Rust programs to interact, for example, with DOM and browser functions. We can use this trick. When Rust module A calls some imported function, what we are going to do is to call an exported function from Rust module B, and this works, but it works, but there is another problem with it. Each model has its own separate memory, and this memory is not shared between the modules, and this means that if an imported Rust function tries to access its state when it is executed, it will fail because its memory reading does not contain the expected data. What we will need to do is to basically copy memory from module A to module B before we can call an imported function. The main disadvantage of this approach is of course that it is not standard, and can be currently, it requires manual implementation. Ideally, the Rust compiler should take care of this for us, but for now, to make it work in the general case, we will need to find a way to automatically link Rust dependencies which are compiled as WebAssembly modules. Now that we have covered all this, what is the conclusion? I think that it is well worth the effort to make our documentation interactive, because, it will help us to bring more people into Rust, and part of the reason why JavaScript is so popular is that it is so easy to use it and access it. You don't need any complicated set-up. All you have to do is open a console in your web browser, and you can start writing and experimenting with code. With WebAssembly, we have a really good chance of bringing this simplicity into the Rust world. But we still have a long way ahead of us, because the WebAssembly ecosystem has not matured yet. But still, we can try our best in making this as easy as possible to have add these sort of interactive elements to any project, because we have tools like Rust Doc which can generate documentation from the source code. What we need to do is to improve Rust Doc to automatically generate Playgrounds for our documentation, and we also need to have a tool kit to simplify building these interactive playgrounds. The good news is that we don't have to start from scratch. The most crucial part is, actually, the compiling of the code into WebAssembly, and the Rust compiler has got us covered there. We just need to build the tooling around it. I have started a project that contains the Docker image for the compiler service and some front-end components. You can find the repository at the address you see on the slide, so, if you're interested, you can join the development effort, and help to make this a reality. And that is all from me, and thanks for your attention, and I will be answering your questions in the chat room. Thank you! -> Hello, everybody. Thanks for the interesting talk. I like the idea of the interactive documentation from mocking and visualisation with the help of WebAssembly. As Nikita already mentioned in his talk, he's online in the chat, so, if you have any questions, or you want to follow him, or you want to follow up on some issues like he is active in the chat right now, so, I do encourage you to ask any questions over there. Rest assured that the next talk is going to be in five minutes, so, at 1050CST, I would say, thank you all for listening in for the first talk, and I will see you in the next one. Thank you. diff --git a/2020-global/talks/02_UTC/02-Aissata-Maiga.md b/2020-global/talks/02_UTC/02-Aissata-Maiga.md new file mode 100644 index 0000000..1648042 --- /dev/null +++ b/2020-global/talks/02_UTC/02-Aissata-Maiga.md @@ -0,0 +1,314 @@ +**Build your own (Rust-y) robot!** + +**Bard:** +Aïssata Maiga lets me know +how to make bots without Arduino +writing Rust to move +her bot to my groove +Sure there will be some cool stuff to see, no? + + +**Aïssata:** +Hello, I'm Aïssata Maiga, just your regular computer science student and I live in Sweden. +I discovered Rust this summer and fell in love with it. + +So, let's just start. +This presentation will be about making a robot in Rust and working with no std. +It is a fun project for you to try for yourself and children. +It's also very easy. + +The most intimidating part is to get started and order stuff from the internet. +I will show you the robot, and then explain everything you need to know about every part of the code, and share a lot of mistakes I've made, and then there will be a little surprise. +I used the avr-hal, and I got a lot of help. + +It has great documentation, a lot of templates, how to start your own project, and of course how to use your cargo file and basic templates, but it also has many examples for every avenue of work. +For example, I'm using uno, and, if you go to "examples", you can see how to blink an LED, which is the "hello, world" of Arduino systems. +It really works great, and I would recommend it heartfully. + +The components first. +With time, you will notice that all components are standard and pretty much the same, but the easiest way to get started is just to buy a kit. +Many are available online, Amazon. +If you google "smart car", you will see a bunch of suppliers that you can choose. +The cheapest start at ten or 15 euros. +If you want to assemble, it's the same. + +If you just google "assembly instructions smart car Arduino", you will have a lot of good videos, and I link them in my repository. +A word of caution, and a good opportunity to share my big error number one. +In most assembly videos and on the rep on. + +There are schematics that follow. +You must be careful to follow them and you must plug them as they look on the image, but the most important is to make sure that the circuit is grounded, that means that all ground cables are connected, that the circuit has a common ground. +If not, bad things will happen, and bad things also called "undefined behaviour", so if you're here and nothing is working and you're getting frustrated, just check if everything has a common ground. + +Arduino is ideal to begin to kind of project. +They're relatively affordable, and there are tons of robot-making with Arduino. +What is great with the brand is the ecosystem. + +All the libraries and the IDE, but we can't use that in Rust and that is where avr-hal has you covered. +We're using an eight-bit microcontroller. +One of the Uno and the Nano is to make it - it has a general-purpose input and output pins - those little holes, and all the protocols. +What do I mean by that? Is you're not programming all the board, but you are programming the microcontroller. + +This one. +You can see here on this board. +There is Arduino and Arduino, as we can say, when you order your kit, it will come with a board. +It is a clone, and it works as well as a regular to show for this kind of project. + +I will now tell you about the Servo motors, but also the timers which are very important in this kind of project. +I will of course show you what I mean with an animation at the end of the slide. +So the Servo motor is a simple rotating motor, but we do not need it to go all the way to the end of the rotation, so we can think about it as a light dimmer, we use something called pulse modulation. +If you have a romantic night with your partner, you need to control the light, right? This is what we are going to do. + +We're going to control the duty cycle. +They're nothing mystical. +It's the fraction of time where the signal is active. + +In other words, we're going to tell the microcontroller how long do we want the signal to be active? Now, all microcontrollers have an internal clock and it is 16 megahertz. +It defines a period of time of one divided by frequency. +So, one divided by 16 million is really fast, 16 nanoseconds. +We can't work with that. + +Even if you multiply two of the power of eight which is the size of the timer register, it will overflow in a few milliseconds. +And now is a good time to share a big error number two, on the micro control, the timers do not all have the same size, you must make sure that you are doing the calculation with the right timer, or the right size. +If nothing working and your calculations are wrong, it might be the cause. +We are going to reduce the frequency by dividing it by 1,024, and we can then work with a cycle of 16 milliseconds. + +Why do we do that? Most Servo motors have a frequency of 60 Hertz in short duty cycles. +For example, to centre it, you need to set it high for 1.5 milliseconds, so 1.5 divided by 16, times the size of the register is 24 ticks, so, let's go and look at some code. +So this is for the Servo motor. + +First, the magic numbers that we calculated together. +To centre it, we define the time times the register. +Then we declare immutable timer, immutable pin. + +You notice that it is prescaled with a factor of 1,024, and the pin is d3 and then we enable it. +So how do we know how to do that? We can go into the documentation. +You can see here that I just follow the documentation. + +Please note that it is very important to choose them right, the pins, because they're hard wired, actually, so this is my big error number three: so timer two that I'm using it hard wired to win d3. + +If you're going to use a timer for the Servo, you have to make sure you're using the right pin. +If we go back to the code, here, you see that I just have a mutable delay to make the rotation not too fast, and then we just set the duty to 24 - sorry, to 16, we wait a bit, and we centre it again, and then we wait a bit, 400 milliseconds, and then we put it to the left, and I'm going to show how it looks like. +Now the sensor. + +So you can think of the sensor as a bat. +A bat sends sound waves every now and then and waits for them to return to calculate how far is it from an obstacle. +This is exactly how it works. + +We need to send sound waves about every 100 milliseconds. +The sensor has a trigger and an echo. +The trigger turns it on, and senses the sound wave. + +When the obstacle is met, the sound wave will bounce on it and return as the echo, and we will measure the length of it. +There are many details, but I will just cover the ones I will show in the code. + +We would use a timer, and another timer, timer 1. +This one does not need as much prescaling as timer 2 that we use for the Servo, so we will just make it 64 times slower. + +So, another magic number that I would like to explain is 58 that you will see in the code. +When the sound is travelling 340 metres in one second, it will bounce on an obstacle and come back, so we need to calculate 34,000 divided by 2 in milliseconds. +So, it's going to be 0.017 which is the same as one divided by 58. + +Also, every tick is four milliseconds, so you will see a multiplication by four that is suspicious. +You do not need to pay attention to all those details. +I just explained the magic numbers because I know some of you are interested in them. +This is the code for the sensor. + +So we are using timer 1 which is 16 bit, and we are prescaling it with a factor of 64 here. +We declare mutable trigger that I connected to pin d12 and configured into output. +All pins are input by default, and that's why you need to configure it to an output, and then you need to declare an echo. +I collected it to pin d11. + +You don't need to configure it into input, because they're all input by default, and then you don't need to have it mutable, because we are just going to monitor how long it is high. +Those in the comments are commands to get your console working. + +So you can get the address of your Arduino, and then, if you type "screen", then the tty, and you can see how everything is showing on the screen. +To see things on the screen, you will need the serial, so, we've a receiver, a sender, and a baud rate, so this is. +This is nothing to worry about. +This is described in the documentation. +If you just copy-paste the declaration of a serial, it's going to work. + +And then, in an infinite loop, we are going to write zero to the timer, set the trigger high for ten microseconds and set the trigger low again. +This is going to send the sound wave. + +Then we have to manage an eventual error with the hardware, so, if we have waited for more than 50,000 ticks, it means that we have waited for more than 200 milliseconds, so this is probably an error, so we need to use, to exit the loop, since Rust is allowing us to name loops, we continue to work, we continue with the outer loop. +If not, if we have detected something, we just write zero to the timer register, and then we monitor how long the echo is high. +It means that we don't do anything while the echo is high. + +And then we get the number of ticks in the timer register divided by the magic number 58, and mull apply it by four, because the unit is four milliseconds. +And then we wait 100 milliseconds between the sound waves, so 100 milliseconds is corresponding to 25,000 ticks. +And, at last, we print on the screen how far we are from the target. + +Now, I want to show the motor driver. +The first time you will see a motor move, because, you decided it, you will be hooked. + +Arduino does not have enough power to move the motors, so we connect it to the driver. +And connect the logical pins to the Arduino. +Which means that you will be plugging the cable for the wheels in those two. +Those are to communicate with the battery, and this is a five-volt logical pin that I'm going to use in the demo. +There is also an enable pin to control the speed, but this will be for Rust-y 2.0. + +So now it's time for the walk-through, and talking a bit about no-std. +Why do we have to work with no-std, which is Rust non-standard code? On the, there is no OS, which means we need to do it ourselves, and to indicate to the compiler that we are going to work with a no-std and no name. +We also need to build with cargo the nightly build to indicate that we are going to use Nightly. + +To get the cargo from the configuration, you can again go to the documentation here, and everything is explained in 0.5. +This is what you need for your cargo file, so we can go back to the code. +So, we are going to, because we are in no-std, we are going to need to import a panic handler, and `panic_halt` here, and those two are the crates I'm importing to make it work. + +Those three crates are modules that I used to separate my code when I was refactoring, because I felt that it would be more clear, and also because I was training with Rust's data structures. +To make it work, you will need some constants. + +How long do you want our bot to wait between actions, the mini distance you want to have between itself and an obstacle, and what is an acceptable distance to make an alternative choice? +So this macro is an attribute macro, since we are working no-std, we have to assume that is the point of entry of the code, and the exclamation mark here is never type, which means nothing should return from this function. + +So we start by downloading everything. +We download everything we have on the MCU. +And then we collapse all the pins into a single variable that we are going to use here. +This is the general timer that has been prescaled by a factor of 64 that we are going to use mostly with the sensor, but also as a general time-checker for the whole project, and then the timer2, and it is pin d3 that we are going to use for the cell. + +I created the Servo unit which was to work with Rust structures. +You do not need to do that. +It's going to work fine. + +But then I connected those logical pins to d7, d5, d6, and d4, and that I have them long variable names to refer to each wheel. +Then those pins can be downgraded. +Downgraded means they can be put in a mutable array that we can send to other modules to modify them. +But, wheels is still the owner of those wheels. + +And then the infinite loop that is going to control the robot, it is still called outur: loop. +It starts with the Servo unit that is rotated to the front, and then the wheels are going to move forward. +We are reading the value with the sensor continuously, but if the value is smaller than the minimal distance that we decided, then we are going to stop the wheels, and I'm going to show a bit later how to stop the wheels, and then check the distance on the right. +We are going to turn the Servo to the right, get the value here, wait between to interaction and then do the same for the left, and the rest is just - if the value is bigger on the left than the right, and it's an acceptable distance, like there is not another obstacle here, then we're going to turn the wheels left and then continue to the outer, that is, go forward. +Else, if the value on the right is better than, then we are going to turn right, and then continue to the outer loop. +Else, we're just going to go backwards. +And turn right. + +Going back to show the model, I think this is the only thing that I didn't show. +For we can decide the constant for how long do we want our car to turn? And moving forward, it just receiving a mutable reference to wheels. +This type seems really, really long, but you know how you do with Rust, when you don't know a type, you just declare another type, the compiler will complain and give you the right type, and you can just copy-paste it. + +I did some unpacking here. +I put the wheels into a new array to make sure that I wrote it correctly, and then you just need, when you go forward, you just need to set forward the motion, the left and right forward motion high, and the right and back motion low. +To turn right, you need to stop the wheels, that is exactly the - to stop the wheels, you just need to set all the pins low, right? I just removed it from the presentation for clarity. +And to turn right, you have to set the left forward wheel high, and the right forward wheel low for an amount of time, so, if you move the left wheel, the robot is going to turn to the right. +You need to know where to find help. + +Actually, if there is one thing you must get from this talk, it is where to find help. +The Rust community's very welcoming, and one of their core values is to provide a safe, friendly, and welcoming environment. +This is a community in which I felt safe and comfortable from day one. + +You can ask any question on the community forum. +Overall, people have been providing me with technical consultancy as well as psychological support since the start of my adventures in Rust. +When I arrived to matrix, people realised I didn't do anything and sent me to do homework but helped me anyway. +I want to thank my mentor and avh-hal. + +That's it. +Thank you very much for your attention. +All the project is on GitHub, so please don't hesitate to do whatever you want with it, and show me what you did. +You can ask me any question you like. + +That's it, and thank you again, and it's time for the surprise. + +**Lyrics:** +> they see me rollin'. +> see me riding Rust-y. +> wanna see me riding Rust-y! +> thinking cool to ride Rust-y. See you riding Rust-y. +> wanna see me riding Rust-y +> showing, moving, grooving, want to see me riding Rust-y. +> wanna see me riding Rust-y +> now that I'm riding Rust-y +> want to see me - want to see me riding Rust-y. + + +**Pilar**: +That was incredible! It's so good that you all could not see me during the talk, because it was just me grinning from ear to ear and clapping my hands off! I said I was excited about this talk, but, wow! + +Thank you so much, Aïssata. That was an incredible talk, and, yes, like, you mentioned the community at the end, and your love for it, and your being such an integral part of it, at least to me, shows so much, because it's that spirit - like you just held our hand through all of that. +If somebody wanted to try that out, it's like I messed up here, there and, you know? Thank you so much. + +That was really, really great. +And as a special treat, I mean, besides the amazing ending, Aïssata is here to join us for a live Q&A, so I'm going to add her on now, and we had a lot of questions in the chat, so we will try to get through as many of those as we can. + +So, hi, you're live with us now. + +**Aïssata:** +Thank you very much. Fantastic. + +**Pilar:** +You had to watch your own talk. I don't know how you could do that. +Personally, I can't! Don't ask me to! What an amazing talk, really. I was so excited. +Absolutely called dibs on introducing, and being here for this talk. + +**Aïssata:** +I say something about the sound, and I jump from 340 to 334,000, and because, it was really weird. +It's because I was talking in metres, and then centimetres, and then I forgot to do that, to explain it? And then, oh, I don't even know what to say, and I saw that in my comments, I write something about the God bat. +That was embarrassing. Like, you know ... how to be a God bat. I meant the sensor is working like a bat, and well, let's just forget that. + +**Pilar:** +I mean, if it is fair at all, I think it was very clear. I know we are multi-cultured, and everything, and everyone might not be on the same technological or English-speaking level, +but I thought you two things you mentioned were fairly clear, but thank you for clearing that up too. + +**Aïssata:** +Speaking about the community, because, instead of thinking about the bad things, the good things, that is true, when I joined the Rust community, +it was the only time that I used my own name and my own picture on the internet, and I never do that, because, you know, you're always afraid of mean comments, and abuse, but, yes, like from day 1, it never happened. +It never happened, and that I've felt so welcomed anywhere. I think I made pull requests after a week. + +**Pilar:** +That's amazing. I know people who have been in the industry for years and years and they're too scared to make PRs to put their stuff out there, so that is really cool. It's so great that you actually went for it, and that you felt safe and comfortable to do so. I hope, you know, that's why I love your mentoring work as well, you mentioning that and sharing so much with us because you're encouraging other people to feel safe, go for it, also try it, and that is amazing. +Thank you so much. +Do you mind if I go into a couple of questions that were on the chat? + +**Aïssata:** +No, please, yes. My child is here! + +**Pilar:** +Don't worry about it. We are all at home and in it together. +I think a couple of fun ones for you first. +So, you know, there was kind the line of thinking why robots? +Is it hard to start off with something that is embedded, and what your next planned robot is? +I think people are very fascinated by the topic of embedded, and robots, so, please, like I saw how excited you are for it. + +**Aïssata:** +Yes, okay, so what I really wanted to von way with this talk was that it is not that hard. +You have to have help, and, if you have the right help, it's not that difficult, and the most difficult is really do you I start with it? This is what I wanted to show in my talk: how do I get started? +I'm pretty sure you have your own ideas and objectives, you have your own crazy stuff that you want to implement, talking robots, so I don't know. +So I just wanted to be sure. +The confidence that I'm using, how do they work, so we can put them to this. I don't know, maybe you want to have a fridge that comments if you open it at night. +This is something you can do with this stuff. But, when ... + +**Pilar:** +I like how you mention where to get things and how to spot fakes. Watching your talk, I want to do something too. I want to buy components. + +**Aïssata:** +It's easy, and it's really cool and fun and easy. Please go for it. You have to show me. I want to see. + +**Pilar:** +Thank you for that. Any future plans for more robots. + +**Aïssata:** +I've brought my robot skull. As you can see, it's not done, it can't be closed. +But this is also, you know, you can do that with whatever small board you want, and then a small sensor, and then it reacts to the sound, but also to shock. +If you turn the battery on first, because then the demo, the demo is not going to work. And you can put some LED into it, and program that thing with for loops. + +**Pilar:** +That is so cool. Thank you for bringing it! Wow! That's amazing! I hope people are tuning into the Q&A to get to see this! That's so cool! Wow. Thank you. + +**Aïssata:** +Thank you, too. I'm all sweaty, and happy! + +**Pilar:** +That's just part of being in the community, and part of getting to partake in this. So, I think I wanted to jump on to more technical questions, but I think we actually have to go to the next talk, but you showed us the next project, which is super cool. Thanks so much for joining us. Thank you for everything. + +**Aïssata:** +Thank you, everyone! + +**Pilar:** +It's been amazing. Thank you for being here. We will see you in the chat, right? + +**Aïssata:** +Yes, I will stay here. We will be in the chat. + +**Pilar:** +We will see you in the chat. diff --git a/2020-global/talks/02_UTC/02-Aissata-Maiga.txt b/2020-global/talks/02_UTC/02-Aissata-Maiga.txt deleted file mode 100644 index 3689f92..0000000 --- a/2020-global/talks/02_UTC/02-Aissata-Maiga.txt +++ /dev/null @@ -1,38 +0,0 @@ -Build your own (Rust-y) robot - Aissata Maiga -PILAR: Hello, everyone. So, I hope you - wait, I'm going to remove this off my face. That's the talk that's coming up. I hope you all enjoyed that first talk. I thought it was a really great way to set the day. It was a good way to start the mood and get us excited for what is coming up. So, up next, as you may have seen in the chat, on the schedule, and in our little announcement bubble over here - I'm not very used to that! - is Assata Maiga. She is giving a talk which I'm so, so excited about. Excuse my robotic delivery! Assata is a computer science student at the Royal Institute of Technology, and she, like, if you've seen her activity, she's just absolutely passionate about getting people excited about code, which is something that just it is we are on the same wavelength there. She mentors a programming club for women, and it's so cool because it's for mothers and daughters, and you will be dazzled by her talk. I'm going to let her get down to it. I'm also going to bring our bard to share more about Assata's talk. I will see you after the talk for Q&A, and I hope you're enjoying. -> Assata Maiga lets me know how to make bots without Arduino, making Rust to move to the groove, sure, there will be some cool stuff to see, no? -ASSATA: Hello, I'm Assata Maiga, just your regular computer science student and I live in Sweden. I discovered Rust this summer and fell in love with it. So, let's just start. This presentation will be about making a robot in Rust and working with no std. It is a fun project for you to try for yourself and children. It's also very easy. The most intimidating part is to get started and order stuff from the internet. I will show you the robot, and then explain everything you need to know about every part of the code, and share a lot of mistakes I've made, and then there will be a little surprise. I used the avr-hal, and I got a lot of help. It has great documentation, a lot of templates, how to start your own project, and of course how to use your cargo file and basic templates, but it also has many examples for every avenue of work. For example, I'm using uno, and, if you go to "examples", you can see how to blink an LED, which is the "hello, world" of Arduino systems. It really works great, and I would recommend it heartfully. The components first. With time, you will notice that all components are standard and pretty much the same, but the easiest way to get started is just to buy a kit. Many are available online, Amazon. If you Google "smart car", you will see a bunch of suppliers that you can choose. The cheapest start at ten or 15 euros. If you want to assemble, it's the same. If you just Google "assembly instructions smart car Arduino", you will have a lot of good videos, and I link them in my repository. A word of caution, and a good opportunity to share my big error number one. In most assembly videos and on the rep on. There are schematics that follow. You must be careful to follow them and you must plug them as they look on the image, but the most important is to make sure that the circuit is grounded, that means that all ground cables are connected, that the circuit has a common ground. If not, bad things will happen, and bad things also called "undefined behaviour", so if you're here and nothing is working and you're getting frustrated, just check if everything has a common ground. Arduino is ideal to begin to kind of project. They're relatively affordable, and there are tons of robot-making with Arduino. What is great with the brand is the ecosystem. All the libraries and the IDE, but we can't use that in Rust and that is where avr-hal has you covered. We're using an eight-bit microcontroller. One of the Uno and the Nano is to make it - it has a general-purpose input and output pins - those little holes, and all the protocols. What do I mean by that? Is you're not programming all the board, but you are programming the microcontroller. This one. You can see here on this board. There is Arduino and Arduino, as we can say, when you order your kit, it will come with a board. It is a clone, and it works as well as a regular to show for this kind of project. I will now tell you about the Servo motors, but also the timers which are very important in this kind of project. I will of course show you what I mean with an animation at the end of the slide. So the Servo motor is a simple rotating motor, but we do not need it to go all the way to the end of the rotation, so we can think about it as a light dimmer, we use something called pulse modulation. If you have a romantic night with your partner, you need to control the light, right? This is what we are going to do. We're going to control the duty cycle. They're nothing mystical. It's the fraction of time where the signal is active. In other words, we're going to tell the microcontroller how long do we want the signal to be active? Now, all microcontrollers have an internal clock and it is 16 megahertz. It defines a period of time of one divided by frequency. So, one divided by 16 million is really fast, 16 nanoseconds. We can't work with that. Even if you multiply two of the power of eight which is the size of the timer register, it will overflow in a few milliseconds. And now is a good time to share a big error number two, on the micro control, the timers do not all have the same size, you must make sure that you are doing the calculation with the right timer, or the right size. If nothing working and your calculations are wrong, it might be the cause. We are going to reduce the frequency by dividing it by 1,024, and we can then work with a cycle of 16 milliseconds. Why do we do that? Most Servo motors have a frequency of 60 Hertz in short duty cycles. For example, to centre it, you need to set it high for 1.5 milliseconds, so 1.5 divided by 16, times the size of the register is 24 ticks, so, let's go and look at some code. So this is for the Servo motor. First, the magic numbers. First, the magic numbers that we calculated together. To centre it, we define the time times the register. Then we declare immutable timer, immutable pin. You notice that it is prescaled with a factor of 1,024, and the pin is d3 and then we enable it. So how do we know how to do that? We can go into the documentation. You can see here that I just follow the documentation. Please note that it is very important to choose them right, the pins, because they're hard wired, actually, so this is my big error number three: so timer two that I'm using it hard wired to win d3. If you're going to use a timer for the Servo, you have to make sure you're using the right pin. If we go back to the code, here, you see that I just have a mutable delay to make the rotation not too fast, and then we just set the duty to 24 - sorry, to 16, we wait a bit, and we centre it again, and then we wait a bit, 400 milliseconds, and then we put it to the left, and I'm going to show how it looks like. Now the sensor. So you can think of the sensor as a bat. A bat sends sound waves every now and then and waits for them to return to calculate how far is it from an obstacle. This is exactly how it works. We need to send sound waves about every 100 milliseconds. The sensor has a trigger and an echo. The trigger turns it on, and senses the sound wave. When the obstacle is met, the sound wave will bounce on it and return as the echo, and we will measure the length of it. There are many details, but I will just cover the ones I will show in the code. We would use a timer, and another timer, timer 1. This one does not need as much prescaling as timer 2 that we use for the Servo, so we will just make it 64 times slower. So, another magic number that I would like to explain is 58 that you will see in the code. When the sound is travelling 340 metres in one second, it will bounce on an obstacle and come back, so we need to calculate 34,000 divided by 2 in milliseconds. So, it's going to be 0.017 which is the same as one divided by 58. Also, every tick is four milliseconds, so you will see a multiplication by four that is suspicious. You do not need to pay attention to all those details. I just explained the magic numbers because I know some of you are interested in them. This is the code for the sensor. So we are using timer 1 which is 16 bit, and we are prescaling it with a factor of 64 here. We declare mutable trigger that I connected to pin d12 and configured into output. All pins are input by default, and that's why you need to configure it to an output, and then you need to declare an echo. I collected it to pin d11. You don't need to configure it into input, because they're all input by default, and then you don't need to have it mutable, because we are just going to monitor how long it is high. Those in the comments are commands to get your console working. So you can get the address of your Arduino, and then, if you type "screen", then the tty, and you can see how everything is showing on the screen. To see things on the screen, you will need the serial, so, we've a receiver, a sender, and a because rate, so this is - *and a baud rate. This is nothing to worry about. This is described in the documentation. If you just copy-paste the declaration of a serial, it's going to work. And then, in an infinite loop, we are going to write zero to the timer, set the trigger high for ten microseconds and set the trigger low again. This is going to send the sound wave. Then we have to manage an eventual error with the hardware, so, if we have waited for more than 50,000 ticks, it means that we have waited for more than 200 milliseconds, so this is probably an error, so we need to use, to exit the loop, since Rust is allowing us to name loops, we continue to work, we continue with the outer loop. If not, if we have detected something, we just write zero to the timer register, and then we monitor how long the echo is high. It means that we don't do anything while the echo is high. And then we get the number of ticks in the timer register divided by the magic number 58, and mull apply it by four, because the unit is four milliseconds. And then we wait 100 milliseconds between the sound waves, so 100 milliseconds is corresponding to 25,000 ticks. And, at last, we print on the screen how far we are from the target. Now, I want to show the motor driver. The first time you will see a motor move, because, you decided it, you will be hooked. Arduino does not have enough power to move the motors, so we connect it to the driver. And connect the logical pins to the Arduino. Which means that you will be plugging the cable for the wheels in those two. Those are to communicate with the battery, and this is a five-volt logical pin that I'm going to use in the demo. There is also an enable pin to control the speed, but this will be for Rust-y 2.0. So now it's time for the walk-through, and talking a bit about no-std. Why do we have to work with no-std, which is Rust non-standard code? On the, there is no OS, which means we need to do it ourselves, and to indicate to the compiler that we are going to work with a no-std and no name. We also need to build with cargo the nightly build to indicate that we are going to use Nightly. To get the cargo from the configuration, you can again go to the documentation here, and everything is explained in 0.5. This is what you need for your cargo file, so we can go back to the code. So, we are going to, because we are in no-std, we are going to need to import a panic handler, and panic_halt here, and those two are the crates I'm importing to make it work. Those three crates are modules that I used to separate my code when I was refactoring, because I felt that it would be more clear, and also because I was training with Rust's data structures. To make it work, you will need some constants. How long do you want euro bot to wait between actions, the mini distance you want to have between itself and an obstacle, and what is an acceptable distance to make an alternative choice? So this macro is an attribute macro, since we are working no-std, we have to assume that is the point of entry of the code, and the exclamation mark here is never type, which means nothing should return from this function. So we start by downloading everything. We download everything we have on the MCU. And then we collapse all the pins into a single variable that we are going to use here. This is the general timer that has been prescaled by a factor of 64 that we are going to use mostly with the sensor, but also as a general time-checker for the whole project, and then the timer2, and it is pin d3 that we are going to use for the cell. I created the Servo unit which was to work with Rust structures. You do not need to do that. It's going to work fine. But then I connected those logical pins to d7, d5, d6, and d4, and that I have them long variable names to refer to each wheel. Then those pins can be downgraded. Downgraded means they can be put in a mutable array that we can send to other modules to modify them. But, wheels is still the owner of those wheels. And then the infinite loop that is going to control the robot, it is still called outur: loop. It starts with the Servo unit that is rotated to the front, and then the wheels are going to move forward. We are reading the value with the sensor continuously, but if the value is smaller than the minimal distance that we decided, then we are going to stop the wheels, and I'm going to show a bit later how to stop the wheels, and then check the distance on the right. We are going to turn the Servo to the right, get the value here, wait between to interaction and then do the same for the left, and the rest is just - if the value is bigger on the left than the right, and it's an acceptable distance, like there is not another obstacle here, then we're going to turn the wheels left and then continue to the outer, that is, go forward. Else, if the value on the right is better than, then we are going to turn right, and then continue to the outer loop. Else, we're just going to go backwards. And turn right. Going back to show the model, I think this is the only thing that I didn't show. For we can decide the constant for how long do we want our car to turn? And moving forward, it just receiving a mutable reference to wheels. This type seems really, really long, but you know how you do with Rust, when you don't know a type, you just declare another type, the compiler will complain and give you the right type, and you can just copy-paste it. I did some unpacking here. I put the wheels into a new array to make sure that I wrote it correctly, and then you just need, when you go forward, you just need to set forward the motion, the left and right forward motion high, and the right and back motion low. To turn right, you need to stop the wheels, that is exactly the - to stop the wheels, you just need to set all the pins low, right? I just removed it from the presentation for clarity. And to turn right, you have to set the left forward wheel high, and the right forward wheel low for an amount of time, so, if you move the left wheel, the robot is going to turn to the right. You need to know where to find help. Actually, if there is one thing you must get from this talk, it is where to find help. The Rust community's very welcoming, and one of their core values is to provide a safe, friendly, and welcoming environment. This is a community in which I felt safe and comfortable from day one. You can ask any question on the community forum. Overall, people have been providing me with technical consultancy as well as psychological support since the start of my adventures in Rust. When I arrived to matrix, people realised I didn't do anything and sent me to do homework but helped me anyway. I want to thank my mentor and avh-hal. That's it. Thank you very much for your attention. All the project is on GitHub, so please don't hesitate to do whatever you want with it, and show me what you did. You can ask me any question you like. That's it, and thank you again, and it's time for the surprise. -* they see me rollin'. -* see me riding Rust-y. -* wanna see me riding Rust-y! -* thinking cool to ride Rust-y. See you riding Rust-y. -* wanna see me riding Rust-y p -* showing, moving, grooving, want to see me riding Rust-y. -* wanna see me riding Rust-y. -* now that I'm riding Rust-y. -* want in a see me - *want to see me riding Rust-y. -PILAR: That was incredible! It's so good that you all could not see me during the talk, because it was just me grinning from ear to ear and clapping my hands off! I said I was excited about this talk, but, wow! Thank you so much, Assata. That was an incredible talk, and, yes, like, you mentioned the community at the end, and your love for it, and your being such an integral part of it, at least to me, shows so much, because it's that spirit - like you just held our hand through all of that. If somebody wanted to try that out, it's like I messed up here, there and, you know? Thank you so much. That was really, really great. And as a special treat, I mean, besides the amazing ending, Assata is here to join us for a live Q&A, so I'm going to add her on now, and we had a lot of questions in the chat, so we will try to get through as many of those as we can. So, hi, you're live with us now. -> Thank you very much. Fantastic. -PILAR: You had to watch your own talk. I don't know how you could do that. Personally, I can't! Don't ask me to! What an amazing talk, really. I was so excited. Absolutely called dibs on introducing, and being here for this talk. -> I say something about the sound, and I jump from 340 to 334,000, and because, it was really weird. It's because I was talking in metres, and then centimetres, and then I forgot to do that, to explain it? And then, oh, I don't even know what to say, and I saw that in my comments, I write something about the God bat. That was embarrassing. Like, you know ... how to be a God bat. I meant the sensor is working like a bat, and well, let's just forget that. -PILAR: I mean, if it is fair at all, I think it was very clear. I know we are multi-cultured, and everything, and everyone might not be on the same technological or English-speaking level, but I thought you two things you mentioned were fairly clear, but thank you for clearing that up too. -> Speaking about the community, because, instead of thinking about the bad things, the good things, that is true, when I joined the Rust community, it was the only time that I used my own name and my own picture on the internet, and I never do that, because, you know, you're always afraid of mean comments, and abuse, but, yes, like from day 1, it never happened. It never happened, and that I've felt so welcomed anywhere. I think I made pull requests after a week. -PILAR: That's amazing. I know people who have been in the industry for years and years and they're too scared to make PRs to put their stuff out there, so that is really cool. It's so great that you actually went for it, and that you felt safe and comfortable to do so. I hope, you know, that's why I love your mentoring work as well, you mentioning that and sharing so much with us because you're encouraging other people to feel safe, go for it, also try it, and that is amazing. Thank you so much. Do you mind if I go into a couple of questions that were on the chat? -> No, please, yes. My child is here! -PILAR: Don't worry about it. We are all at home and in it together. I think a couple of fun ones for you first. So, you know, there was kind the line of thinking why robots? Is it hard to start off with something that is embedded, and what your next planned robot is? I think people are very fascinated by the topic of embedded, and robots, so, please, like I saw how excited you are for it. -> Yes, okay, so what I really wanted to von way with this talk was that it is not that hard. You have to have help, and, if you have the right help, it's not that difficult, and the most difficult is really do you I start with it? This is what I wanted to show in my talk: how do I get started? I'm pretty sure you have your own ideas and objectives, you have your own crazy stuff that you want to implement, talking robots, so I don't know. So I just wanted to be sure. The confidence that I'm using, how do they work, so we can put them to this. I don't know, maybe you want to have a fridge that comments if you open it at night. This is something you can do with this stuff. But, when ... -> I like how you mention where to get things and how to spot fakes. Watching your talk, I want to do something too. I want to buy components. -> It's easy, and it's really cool and fun and easy. Please go for it. You have to show me. I want to see. -PILAR: Thank you for that. Any future plans for more robots. -> I've brought my robot skull. As you can see, it's not done, it can't be closed. But this is also, you know, you can do that with whatever small board you want, and then a small sensor, and then it reacts to the sound, but also to shock. If you turn the battery on first, because then the demo, the demo is not going to work. And you can put some LED into it, and program that thing with for loops. -PILAR: That is so cool. Thank you for bringing it! Wow! That's amazing! I hope people are tuning into the Q&A to get to see this! That's so cool! Wow. Thank you. -> Thank you, too. I'm all sweaty, and happy! -PILAR: That's just part of being in the community, and part of getting to partake in this. So, I think I wanted to jump on to more technical questions, but I think we actually have to go to the next talk, but you showed us the next project, which is super cool. Thanks so much for joining us. Thank you for everything. -> Thank you, everyone! -PILAR: It's been amazing. Thank you for being here. We will see you in the chat, right? -> Yes, I will stay here. We will be in the chat. -PILAR: We will see you in the chat. Exactly. Oh, darn, technical thing. I'm so sorry! [Laughter]. -> It's okay. -PILAR: See you, then. -> See you. -PILAR: Just wanted to add a quick note. Our replays are already working, so hop on to the next talk, and we will see you there in a bit. Enjoy, everyone. diff --git a/2020-global/talks/02_UTC/03-Vivian-Band.md b/2020-global/talks/02_UTC/03-Vivian-Band.md new file mode 100644 index 0000000..121ca32 --- /dev/null +++ b/2020-global/talks/02_UTC/03-Vivian-Band.md @@ -0,0 +1,245 @@ +**Rust for Safer Protocol Development** + +**Bard:** +Vivian wants us to be safe +and our code on the web to behave +use Rust to generate +code that will validate +risky inputs, no need to be brave + + +**Vivian:** +Hello, my name is Vivian Band. +I'm a second-year PhD at Glasgow University studying network security. +I was on the safer protocol development project. + +So, improving protocol standards: the Internet Engineering Task Force standardises network protocols. +These are initially presented as drafts to working groups and then become official standards after further review. +However, even after all of this peer review from several different sources, mistakes still sometimes appear in these documents. + +For example, in the image on the right, the ASCII diagram shows the option real port is 13 bits long and 19 bits long, but the text description says these should be 16 bits in length. +These create ambiguity for people implementing protocols. + +What the improving protocols standard aims to do is to provide a machine-readable ASCII format to detect these inconsistencies much more easily. +These are minimally different from the format using existing diagrams with authors using consistent label names and certain specific stock phrases in their descriptions. +These machine-readable documents allow us to build a typed model of protocol data. + +We call this custom typed system developed as part of our project, network packet representation. +Network packet representation is program-agnostic. + +I had used Rust earlier to implement a bear-bones version of the protocol a few years ago on my final-year undergrad project and I was impressed how much safety it added to the system's programming. +Our first automatically generated libraries would be rainfalls files because we wanted resulting is have a good level of type safety. + +Okay, so, first of all, let's take a step back and take a look at which types we need to describe network protocols. +Before we can start building parsers and parser combinators. + +I use a lot of analogies when learning new concepts, so I like to think of these basic types like Lego bricks. +There are several basic types that we can identify from protocol standard documents and we will take a TCP header to demonstrate this. +Fields which contain raw data can be modelled, so source port is just 16-bit unsigned integer. +That's just raw data. + +Fields which could contain one or more of the same element could be modelled as an array. +Some fields only appear under certain conditions, and rely on values from other fields within the same protocol data unit, in this case, the TCP header, to establish whether we're using that or not. +We can call these constraint types since they need to be checked. + +Some fields require information not contained within this packet, like an encryption key, or a value from another data unit somewhere in this draft. +We can hold this information in a context-data type which can be received by other protocol data units which also feature in this draft if required. +A field which can take on one of a limited possible set of values can be modelled as an enum, indicated in each drafts with the stock phrase, a TCP option is one of, so "is one of" is the key phrase we need to use in the modified standard documents. + +Packets and protocol units as a whole can be considered as structure-like types given they contain field type as constituent type members. +One that doesn't feature in TCP is the function data type. +These are required to form congresses between different data unit, in this case, encryption and decryption of packet headers. + +We've got seven types in total in our network packet representation system. +Bit strings, arrays structs - arrays, contexts, and functions. + +Let's get to the fun stuff. +Automatic Rust parser generation. + +We've got our basic building blocks sorted out. +How can they be used for the complex combinators in Rust? + +Let's go to the bit string when we were explaining our custom types. +We can automatically generate this as a wrapper under an unsigned 16-bit integer in a Rust output file easily. +Immediately after that, we can generate a non-based parser for that type. +This is a little bit more difficult to generate. + +There is a lot going on here, so we will highlight a few key details. +Our first argument for all our parser function assist an input tuple, a borrowed array of bytes which would be an incoming packet of some source. +Our parsers work at the bit level so our second tuple level is how many bits we've read in the current bite. + +Our second argument is the mutable borrow from the context instance since we might want to update.- our outputs are non-specific result type containing the remaining bytes left to be parsed, an updated bit counter, instantiated with the correct value read from the bite array. +We also return mutable reference to our possibly updated context. + +The parser function itself takes a defined number of bits from the input byte array, this this case, it will take 16 bits. +It assigns the value of those taken bits to the custom bus type as needed. + +The order in which we generate these custom types and parsers in the Rust output file is determined by the search. +We generate a custom type and parser whenever we reach a leaf node and generate the combinator when there are no more leaf nodes found for that parent. +The overall protocol data unit is a TCP header which is a struct type in our custom network packet representation system, so this is the root of the depth of the search tree, and will generate the password combinator. + +The first parser will be for source port which is a 16-bit long bit string which was the parser we walked through earlier. +Bit strings are leaf nodes so we move to the next child destination port. +This also a bit string and therefore a leaf node so we write a custom type in a 16-bit parser for this. + +The first non-bit string being counter in TCP header is options which is an array type. +The elements which could be present in the options array are TCP options. +TCP options is an enum type with a limited range of possible choices. +Each of those enum variants are described in their own small ASCII diagrams in another section of the same document. +This makes each enum variant a struct type in our network packet representation system in this case EOL option is a struct. + +The value of the field in this ASCII diagram is a bit string. +This means we are finally reached the leaf node and we can write a custom Rust-type definition and a custom parser, and a Rust-type definition and a parser for its parent node, EOL option. +We find that there are more TCP option variants so we repeat this process for each one. +Once we have written parsers for all of the variants, we can write the Rust type definition and parser combinator for the parent nodes and TCP options. +The last in the packet is the pay loads which we can parse as a bit string. + +Finally, we write the Rust-type definition ... in one function call. +We also create a context object which all parser functions have access to. + +So, to recap the system that we developed in this project, we have the machine-readable protocol document at stage one with our minimal changes to ASCII diagrams and text descriptions. +We have the custom protocol typing system developed in teenage 2, our network packet representation language, and in stage 3, we have the results of the internship ... +a Rust library file automatically generated from the information we have in stage 2. + +Remember earlier when I mentioned that I think of these basics types and parsers as building blocks? To go further with that analogy as quickly as possible a TCP header is like this Lego block. +It is difficult to build manually without making mistakes. + +Our generated parser libraries are not only a manual explaining how this data should be parsed, they also allow protocol developers to build the struct with extracted values with a single function code. +This is ideal for protocol testing. +The picture on the left is a genuine sample of our generated TCP parser code from our modified TCP document. + +So, conclusions: initially, I decided on Rust as our first parser output language, because I enjoyed Rust for systems programming on a previous project. +Using parser combinators turned out to be an ideal fit since assigning them to network protocols both used depth of search. +Parsers can be difficult to write manually and are prone to containing errors. + +Automatically generating parsers minimises the chance of some of these errors occurring, for example, the number of bits being read will always match the specification. +The typing guarantees offered by Rust will help us ensure we get the machine-readable specification document, and in our network packet representation system. +If there are errors, the Rust compiler will alert us to this. + +The next steps: this project is still ongoing, and there are more directions that this research can go in. +We are aiming to show our system to the IIETF. +We need to put in more work on function types so we can create encryption and decryption functions for protocols like QUIC which heavily rely on this. +We would like to use the Rust libraries for protocol and error correction to support more protocol languages in the future. +Resources for this project can be found at these links. +We have a peer-reviewed publication which goes into more detail about our network packet representation typing system and a GitHub repository containing the codes for all automatic Rust parser generator. + +Thank you for your time, and I would be happy to answer any questions. + + +**Vivian:** +That was brilliant. Loved it! [Laughter]. Thanks so much. + +**Stefan:** +Thank you. +I know we have 25 to 40 second-delay to the stream, so, just to get ahead of time, I have two questions if you don't mind. +The first one is, there is a push for native implementation of the networking type, so the Rust standard library doesn't Lewis LIPSE any more but directly operates with system calls. Do you think that will affect you in any way like in developing new types? + +**Vivian:** +Potentially. +So, the whole point of us developing the network packet representation system was to have something that was completely agnostic of any programming languages, or output libraries we want to use in the actual parser fields themselves, so it should be fairly easy for us to adopt to these things, I think. +I think we could maybe have to consider, like, how we can convert from network packet representation to different codes - different types featured in the output code, but that's relatively straightforward, I think. + +**Stefan:** +Wonderful. So, this feeds into my other question: so, I guess you can use the higher level parsers for TCP, UDP, what not, regardless of the underlying types of IPV4 versus version 6? + +**Vivian:** +Yes, so what we are aiming to do is have these run through a single protocol specified in a draft. +It's very rare that you would have an RFC that specifies multiple protocols, so if you wanted to make an IPV6 generator, go ahead, run it on the RFC. +We are aiming to introduce our machine-readable ASCII format to feature IETF drafts and hopefully we will see more adoption of that so we can see automated testing going forward. +What we've done for showing the TCP example, we've gone through an older RFC, and made minimal changes to it to generate parsers, so, if you wanted to do that with protocols, that's absolutely fine as well. +So, again, in answer to your question: sorry, the question was about multiple protocols nested? + +**Stefan:** +Yes, if you can use the parser coming out of the RTC for PC6, and what the - + +**Vivian:** +Yes, we can use this for all sorts of different patrols Coles. The nice thing about parser combinators, you can have a ... if you like. Maybe one day in the future. + +**Stefan:** +Yes. Cool. Wonderful. There is also a question from the audience: how do you deal with non-bite aligned structures, so, if like a five-bit word crosses the eight -bit alignment? + +**Vivian:** +So, we had - so I think I had a small file for test when that I was doing the internship about what if this happens and non-bite aligned words was one of them. +What we found was with the bit-level parsers, it tends to go straight into the next byte if you happen to - if the counter exceeds seven, so it will just run forwards happily. +We haven't found any issues with that so far. It's been very good to us. + +**Stefan:** +Yes, it has been released. Version 6 has been out since Tuesday, I think? + +**Vivian:** +Yes, I haven't had time to update that yet, and this was written on five, so we will see if it works with six and see if there is anything that needs changed. + +**Stefan:** +Wonderful. If this were a physical conference, we would probably meet Jeffrey who wrote the thing. + +**Vivian:** +Sure, we would love to. + +**Stefan:** +Wonderful. Do you want to precise something, or say this is something that came to mind just now? + +**Vivian:** +No, I think I've kind of said everything that I want to say in the presentation, mostly. +So what we've - it's mostly a proof-of-concept at the moment. +So I posted a link to the repository and our paper explaining our system in the conference room chat, so if people want to take a look at our library and have a play about it, see how the generated Rust code looks, +we will happily take feedback if people want to improve our parsers, so I consider myself a novice at Rust. +We used using num functions as opposed to macros so we knew what was going on. If people want to talk how to optimise that, make it cleaner or more improvements, that would be great. We would love that. + +**Stefan:** +Wonderful. So, to the lovely people in the stream, this is about the last chance you get to ask more questions. Has the IETF been receptive to the machine-readable diagram format? + +**Vivian:** +So, the problem with the IETF is there are so many different groups, it's impossible to get a group consensus for the whole organisation, so what we've got at the moment is a small side meeting at the formal descriptions technique and side groups, I think, which is aiming to say, okay, how can we deploy this? +So Stephen and Paul Perkins, two people involved in this project are heavily involved with the IETF, so I think they're having discussions to see how we can get this deployed. +So it's been past attempts about okay, we can have custom tooling to do this and this, all singing and dancing, but we tried to make something relatively simple and unintrusive that could work for multiple workflows. + +**Stefan:** +Cool. + +**Vivian:** +So the answer with somebody haven't published using it yet, but watching this space. + +**Stefan:** +I guess you will be trying to investigate like the correctness of the middle boxes and what-not, or maybe try to circumvent them? + +**Vivian:** +Yes. So one of the examples that we are working on at the moment is QUIC. QUIC being high-profile, and a complex protocol, I think. If we can successfully parse this, and we can successfully use it for testing, then we think that's quite a good promotion, I suppose. + +**Stefan:** +Definitely. Having an actually correct implementation that is done when the specification is finished ... + +**Vivian:** +This was one of the main motivations. You get protocols that are becoming increasingly more complex, like QUIC. It's not surprised, and there will be flows with it. Say you got a package generated by C, and we fed it through our Rust parsers, we could potentially find - so it is written in other languages, we just need the output that they generate. + +**Stefan:** +So tools like cargoes, expand, the generated code, and maybe check out the state machine that has been generated to see ... + +**Vivian:** +Yes. + +**Stefan:** +To see if the specified behaviour makes any sense, right? Or if there is, like, obvious flaws in the - + +**Vivian:** +Yes, to catch the subtle bugs, which, okay, you know, essentially, what our parsers are testing is your output on the wire correct, doing what you think it's doing? We could maybe come up with more advanced testing, and automated error correction later on possibly, but that's going to take some time to develop. + +**Stefan:** +Yes. Looks like a long ongoing project. + +**Vivian:** +For sure. Hopefully, yes! + +**Stefan:** +Wonderful. So, I'm currently not seeing any more questions. I hope I haven't missed any. + +**Vivian:** +It seems like that's all of them. + +**Stefan:** +Wonderful. Thank you again very much. + +**Vivian:** +Thank you for having me. + +**Stefan:** +Yes, you're welcome. diff --git a/2020-global/talks/02_UTC/03-Vivian-Band.txt b/2020-global/talks/02_UTC/03-Vivian-Band.txt deleted file mode 100644 index 30b0388..0000000 --- a/2020-global/talks/02_UTC/03-Vivian-Band.txt +++ /dev/null @@ -1,40 +0,0 @@ -Rust for Safer Protocol Development - Vivian Band. -STEFAN: Hello, again. Welcome back from our quick break. So, the next talk is about to start. Vivian Band will tell us all about safer protocol development. She's doing her PhD in Scotland, and I'm really excited to see the talk. Remember, just like last time, please join the room that's for the questions - that's number 10. And without further ado, let's see the intro. -> Daan and Diane get us to the hype of keeping secrets in a type, thus allowing creation of optimisation that just might tell the FEDs what you type. -> Hello, my name is Vivian Band. I'm a second-year PhD at Glasgow University studying network security. I was on the safer protocol development project. So, improving protocol standards: the Internet Engineering Task Force standardises network protocols. These are initially presented as drafts to working groups and then become official standards after further review. However, even after all of this peer review from several different sources, mistakes still sometimes appear in these documents. For example, in the image on the right, the ASCII diagram shows the option real port is 13 bits long and 19 bits long, but the text description says these should be 16 bits in length. These create ambiguity for people implementing protocols. What the improving protocols standard aims to do is to provide a machine-readable ASCII format to detect these inconsistencies much more easily. These are minimally different from the format using existing diagrams with authors using consistent label names and certain specific stock phrases in their descriptions. These machine-readable documents allow us to build a typed model of protocol data. We call this custom typed system developed as part of our project, network packet representation. Network packet representation is program-agnostic. I had used Rust earlier to implement a bear-bones version of the protocol a few years ago on my final-year undergrad project and I was impressed how much safety it added to the system's programming. Our first automatically generated libraries would be rainfalls files because we wanted resulting is have a good level of type safety. Okay, so, first of all, let's take a step back and take a look at which types we need to describe network protocols. Before we can start building parsers and parser combinators. I use a lot of analogies when learning new concepts, so I like to think of these basic types like Lego bricks. There are several basic types that we can identify from protocol standard documents and we will take a TCP header to demonstrate this. Fields which contain raw data can be modelled, so source port is just 16-bit unsigned integer. That's just raw data. Fields which could contain one or more of the same element could be modelled as an array. Some fields only appear under certain conditions, and rely on values from other fields within the same protocol data unit, in this case, the TCP header, to establish whether we're using that or not. We can call these constraint types since they need to be checked. Some fields require information not contained within this packet, like an encryption key, or a value from another data unit somewhere in this draft. We can hold this information in a context-data type which can be received by other protocol data units which also feature in this draft if required. A field which can take on one of a limited possible set of values can be modelled as an enum, indicated in each drafts with the stock phrase, a TCP option is one of, so "is one of" is the key phrase we need to use in the modified standard documents. Packets and protocol units as a whole can be considered as structure-like types given they contain field type as constituent type members. One that doesn't feature in TCP is the function data type. These are required to form congresses between different data unit, in this case, encryption and decryption of packet headers. We've got seven types in total in our network packet representation system. Bit strings, arrays structs - arrays, contexts, and functions. Let's get to the fun stuff. Automatic Rust parser generation. We've got our basic building blocks sorted out. How can they be used for the complex combinators in Rust. Let's go to the bit string when we were explaining our custom types. We can automatically generate this as a wrapper under an unsigned 16-bit integer in a Rust output file easily. Immediately after that, we can generate a non-based parser for that type. This is a little bit more difficult to generate. There is a lot going on here, so we will highlight a few key details. Our first argument for all our parser function assist an input tuple, a borrowed array of bytes which would be an incoming packet of some source. Our parsers work at the bit level so our second tuple level is how many bits we've read in the current bite. Our second argument is the mutable borrow from the context instance since we might want to update.- our outputs are non-specific result type containing the remaining bytes left to be parsed, an updated bit counter, instantiated with the correct value read from the bite array. We also return mutable reference to our possibly updated context. The parser function itself takes a defined number of bits from the input byte array, this this case, it will take 16 bits. It assigns the value of those taken bits to the custom bus type as needed. The order in which we generate these custom types and parsers in the Rust output file is determined by the search. We generate a custom type and parser whenever we reach a leaf node and generate the combinator when there are no more leaf nodes found for that parent. The overall protocol data unit is a TCP header which is a struct type in our custom network packet representation system, so this is the root of the depth of the search tree, and will generate the password combinator. The first parser will be for source port which is a 16-bit long bit string which was the parser we walked through earlier. Bit strings are leaf nodes so we move to the next child destination port. This also a bit string and therefore a leaf node so we write a custom type in a 16-bit parser for this. The first non-bit string being counter in TCP header is options which is an array type. The elements which could be present in the options array are TCP options. TCP options is an enum type with a limited range of possible choices. Each of those enum variants are described in their own small ASCII diagrams in another section of the same document. This makes each enum variant a struct type in our network packet representation system in this case EOL option is a struct. The value of the field in this ASCII diagram is a bit string. This means we are finally reached the leaf node and we can write a custom Rust-type definition and a custom parser, and a Rust-type definition and a parser for its parent node, EOL option. We find that there are more TCP option variants so we repeat this process for each one. Once we have written parsers for all of the variants, we can write the Rust type definition and parser combinator for the parent nodes and TCP options. The last in the packet is the pay loads which we can parse as a bit string. Finally, we write the Rust-tripe definition ... in one function call. We also create a context object which all parser functions have access to. So, to recap the system that we developed in this project, we have the machine-readable protocol document at stage one with our minimal changes to ASCII diagrams and text descriptions. We have the custom protocol typing system developed in teenage 2, our network packet representation language, and in stage 3, we have the results of the internship ... a Rust library file automatically generated from the information we have in stage 2. Remember earlier when I mentioned that I think of these basics types and parsers as building blocks? To go further with that analogy as quickly as possible a TCP header is like this Lego block. It is difficult to build manually without making mistakes. Our generated parser libraries are not only a manual explaining how this data should be parsed, they also allow protocol developers to build the struct with extracted values with a single function code. This is ideal for protocol testing. The picture on the left is a genuine sample of our generated TCP parser code from our modified TCP document. So, conclusions: initially, I decided on Rust as our first parser output language, because I enjoyed Rust for systems programming on a previous project. Using parser combinators turned out to be an ideal fit since assigning them to network protocols both used depth of search. Parsers can be difficult to write manually and are prone to containing errors. Automatically generating parsers minimises the chance of some of these errors occurring, for example, the number of bits being read will always match the specification. The typing guarantees offered by Rust will help us ensure we get the machine-readable specification document, and in our network packet representation system. If there are errors, the Rust compiler will alert us to this. The next steps: this project is still ongoing, and there are more directions that this research can go in. We are aiming to show our system to the IIETF. We need to put in more work on function types so we can create encryption and decryption functions for protocols like QUIC which heavily rely on this. We would like to use the Rust libraries for protocol and error correction to support more protocol languages in the future. Resources for this project can be found at these links. We have a peer-reviewed publication which goes into more detail about our network packet representation typing system and a GitHub repository containing the codes for all automatic Rust parser generator. Thank you for your time, and I would be happy to answer any questions. -STEFAN: Thank you, Vivian. I, clicked the wrong intro. -> Vivian wants us to be safe and our code on the web to behave. Use Rust to generate code that will validate risky inputs - no need to be brave! -> That was brilliant. Loved it! [Laughter]. Thanks so much. -STEFAN: Thank you. I know we have 25 to 40 second-delay to the stream, so, just to get ahead of time, I have two questions if you don't mind. The first one is, there is a push for native implementation of the networking type, so the Rust standard library doesn't Lewis LIPSE any more but directly operates with system calls. Do you think that will affect you in any way like in developing new types? -> Potentially. So, the whole point of us developing the network packet representation system was to have something that was completely agnostic of any programming languages, or output libraries we want to use in the actual parser fields themselves, so it should be fairly easy for us to adopt to these things, I think. I think we could maybe have to consider, like, how we can convert from network packet representation to different codes - different types featured in the output code, but that's relatively straightforward, I think. -STEFAN: Wonderful. So, this feeds into my other question: so, I guess you can use the higher level parsers for TCP, UDP, what not, regardless of the underlying types of IPV4 versus version 6? -> Yes, so what we are aiming to do is have these run through a single protocol specified in a draft. It's very rare that you would have an RFC that specifies multiple protocols, so if you wanted to make an IPV6 generator, go ahead, run it on the IFC. We are aiming to introduce our machine-readable ASCII format to feature ITF drafts and hopefully we will see more adoption of that so we can see automated testing going forward. What we've done for showing the TCP example, we've gone through an older RFC, and made minimal changes to it to generate parsers, so, if you wanted to do that with protocols, that's absolutely fine as well. So, again, in answer to your question: sorry, the question was about multiple protocols nested? -STEFAN: Yes, if you can use the parser coming out of the RTC for PC6, and what the - -> Yes, we can use this for all sorts of different patrols Coles. The nice thing about parser combinators, you can have a ... if you like. Maybe one day in the future. -STEFAN: Yes. Cool. Wonderful. There is also a question from the audience: how do you deal with non-bite aligned structures, so, if like a five-bit word crosses the eight -bit alignment? -> So, we had - so I think I had a small file for test when that I was doing the internship about what if this happens and non-bite aligned words was one of them. What we found was with the bit-level parsers, it tends to go straight into the next byte if you happen to - if the counter exceeds seven, so it will just run forwards happily. We haven't found any issues with that so far. It's been very good to us. -STEFAN: Yes, it has been released. Version 6 has been out since Tuesday, I think? -> Yes, I haven't had time to update that yet, and this was written on five, so we will see if it works with six and see if there is anything that needs changed. -STEFAN: Wonderful. If this were a physical conference, we would probably meet Jeffrey who wrote the thing. -> Sure, we would love to. -STEFAN: Wonderful. Do you want to precise something, or say this is something that came to mind just now? -> No, I think I've kind of said everything that I want to say in the presentation, mostly. So what we've - it's mostly a proof-of-concept at the moment. So I posted a link to the repository and our paper explaining our system in the conference room chat, so if people want to take a look at our library and have a play about it, see how the generated Rust code looks, we will happily take feedback if people want to improve our parsers, so I consider myself a novice at Rust. We used using num functions as opposed to macros so we knew what was going on. If people want to talk how to optimise that, make it cleaner or more improvements, that would be great. We would love that. -STEFAN: Wonderful. So, to the lovely people in the stream, this is about the last chance you get to ask more questions. Has the ITF been receptive to the machine-readable diagram format? -> So, the problem with the ITF is there are so many different groups, it's impossible to get a group consensus for the whole organisation, so what we've got at the moment is a small side meeting at the formal descriptions technique and side groups, I think, which is aiming to say, okay, how can we deploy this? So Stephen and Paul Perkins, two people involved in this project are heavily involved with the ITF, so I think they're having discussions to see how we can get this deployed. So it's been past attempts about okay, we can have custom tooling to do this and this, all singing and dancing, but we tried to make something relatively simple and unintrusive that could work for multiple workflows. -STEFAN: Cool. -> So the answer with somebody haven't published using it yet, but watching this space. -STEFAN: I guess you will be trying to investigate like the correctness of the middle boxes and what-not, or maybe try to circumvent them? -> Yes. So one of the examples that we are working on at the moment is QUIC. QUIC being high-profile, and a complex protocol, I think. If we can successfully parse this, and we can successfully use it for testing, then we think that's quite a good promotion, I suppose. -STEFAN: Definitely. Having an actually correct implementation that is done when the specification is finished ... -> This was one of the main motivations. You get protocols that are becoming increasingly more complex, like QUIC. It's not surprised, and there will be flows with it. Say you got a package generated by C, and we fed it through our Rust parsers, we could potentially find - so it is written in other languages, we just need the output that they generate. -STEFAN: So tools like cargoes, expand, the generated code, and maybe check out the state machine that has been generated to see ... -> Yes. -STEFAN: To see if the specified behaviour makes any sense, right? Or if there is, like, obvious flaws in the - -> Yes, to catch the subtle bugs, which, okay, you know, essentially, what our parsers are testing is your output on the wire correct, doing what you think it's doing? We could maybe come up with more advanced testing, and automated error correction later on possibly, but that's going to take some time to develop. -STEFAN: Yes. Looks like a long ongoing project. -> For sure. Hopefully, yes! -STEFAN: Wonderful. So, I'm currently not seeing any more questions. I hope I haven't missed any. -> It seems like that's all of them. -STEFAN: Wonderful. Thank you again very much. -> Thank you for having me. -STEFAN: Yes, you're welcome. So please stick around, because now it's at all the people, hello? I think I will let you go, so you can enjoy the next act. Thank you. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md new file mode 100644 index 0000000..0f5e6e3 --- /dev/null +++ b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md @@ -0,0 +1,209 @@ +**Rust as a foundation in a polyglot development environment** + +**Bard:** +Gavin and Matthijs show how one might +a large project in Rust rewrite +start out small, let it grow +until stealing the show +from whatever was there before, right? + + +**Gavin:** +That's an excellent introduction. + +I'm Gavin Mendel-Gleason, the CTO for TerminusDB, and I wanted to talk a little bit today about Rust as a foundation in a polyglot environment. +First, I'm going to give a little outline of the talk with the motivation, challenges and solution. + - and why we used Rust as a foundation in our environment. + +First, you have to know about our problem, we are an in-memory, revision control graph database, and we have slightly unusual features which has driven some of the tool chain requirements we have. +Our software is a polyglot house, so we have clients written in JavaScript and in Python. +We have Rust, and we have prolog which is somewhat unusual in the modern day. +There is also C involved there as well. + +Some of the unusual features that drive our design requirements, so we're an in-memory database which enables faster query. +It's also simpler to implement, and I have some experience in implementing on ACID databases and so I know a lot about the difficulties that you can encounter when trying to page in. + +We chose this time to leave it in memory for the simplicity of design and performance. +We are, however, also ACID, so we use backing store. +We actually write everything to disk, but we leave things in memory. + +We also use succinct data structures which approach the information theoretic minimum size whilst allowing query in the data structure. +This allows us to get large graphs in memory simultaneously, but this requires a lot of bit-twiddling. +They're relatively complicated data structures, and they're compact but not so transparent to the developer, so you really need to be able to do effective bit-twiddling, which, of course, is Rust comes in. + +We have a bunch of git-like features like revision control, push, pull, clone, and all of the things that you know from git. +We do those on databases. +So that also drives a lot of our requirements. + +We have a data log query engine, and we also have complex schema constraint management. +So, first, why did we look into Rust in the first place? So, we were not initially a Rust house. +We didn't have any Rust in our development at all. +I didn't come from a Rust background and although I have a lot of experience in different programming languages, Rust was not one of those programming languages. +Our earlier prototype is actually in Java. + +It was hard to write, and it had mediocre performance, and so I started prototyping in prolog. +Because prolog was very logical, especially the schema-checking parts of it, it was extremely fast for us to write it in prolog, but however it had poor performance, so obviously it is not the best for bit-twiddling. + +Later, we moved to Library and C++ called HDT, and we used that as our storage layer which radically improved the performance of the application. +However, we had a lot of trouble with this, and it was a persistent source of pain, so C++ was crashing regularly, and this is partly because we needed - we had requirements that we had to be multithreaded for performance reasons, because we were dealing with very, very large databases in the billions of nodes, and the code was not re-entrant, although it was supposed to be written with the intent of being re- entrant but it wasn't in practice and this would come up with the server crashed. + +It was really, really hard to find the source of these crashes, and that was a persistent source of problems for us. +So then there was a secondary problem which is that HDT was not designed for write-transactions. +It was really designed for datasets and not databases so we were using orchestration logic on top of it where we would journal transactions and stuff like that. +It wasn't designed that way. +So we had feelings about what the interface should be for a library, HDT wasn't it, and it also had these crashing problems, and we were finding it hard to find the source of them. +Matthijs off his own bat went out and wrote a prototype in Rust of the succinct data structures that we needed to replace HDT and like a simple library around it, and it looked really very promising. +I had heard of Rust, but I had not written anything in Rust. +This drove me to take a look at Rust. + +I know a lot of languages have learned Kam, C++, Haskell, prolog, Lisp, I've been through the gamut of all of these, and I don't try to learn a new language unless there is something peculiar that drives it as something you might need in your tool kit. +Rust had this kind of incredible aspect to it which is this ability to avoid memory problems whilst still being extremely low-level programming language. + +So thread safety was one of our major headaches. +We were getting segfaults and we were finding it difficult to time-consuming to sort it out. + +This library was exhibiting none of these problems, and this was really promising. +We decided we were just going to take the plunge and rewrite the foundations of our system in Rust. + +So, it also gave us the chance to re-engineer our data structure, simplify code, improve fitness for purpose, change the low-level primitives, and cater to write-transactions in particular, but also enabled us to do some performance enhancements that we would like to have done but were afraid to do because in C++ there is kind of a fear factor where, if you had anything new, you might add something that causes it to crash. + +So, of course, in terms of challenges, I'm sure everyone in the Rust community knows about challenges of FFI, but I don't want to belabour the point. +We had - we had a comfortable interaction with C stack, and this is annoying, because if we're interfacing with Rust, we're actually interfacing it through a C FFI, and that kills some of the nice guarantees you get from Rust, but at least they're isolated to the interaction surface rather than completely. +So, we also ended up trampolining through a light C shim which is not the best approach. +We are evaluating a more direct approach currently. + +I didn't want to tell everybody we've done it right, we've done some things right, but we can improve a lot here. +Now, what we would really like, though, is a Rust prolog because then we could have a nice clean Rust FFI, and everything would be beautiful and perfect. +There's some progress being made on Scryer prolog which has cool features that you should look at if you're interested in a Rust prolog project. + +Then some of the challenges that we ran into, I would like to go through really quickly. +So we initially expected to write a lot more of the product in Rust, so we started off replacing the HDT layer, +and then we expected to write a lot more from the ground up, so it's essentially like we had this building, +we went in, we replaced the foundations, and then we were going to start replacing the walls, so, unfortunately, developer-time constraints has favoured a different approach for us, so we're doing rapid prototyping in prolog. +We essentially rewrite the kind of feature that we are interested in there, and then instead of just immediately going to Rust from there, we actually wait, so, we're much more selective about what we put into Rust than we had initially imagined. + +Partly this is due to the learning curve of thorough checking semantics meaning there is a difficulty in getting our developers to understand how this stuff works, so that takes some time. +And there is a higher front cost here, and you win it back, and, if you're replacing C++, you win it back very quickly. +You win it back very quickly because seeking out those bugs dominates in terms of time, so that upfront learning cost is nothing compared to the cost of some horrible seg fault that you can't find. +But, if you're replacing prolog, the sort of amortized costs are more important, so you have to worry about where you replace it, and you have to be more careful about that. + +Once you've gotten the knack of the checker, things go a lot faster but they're still writer than writing prolog, because it's a lower-level language which is why we use it, but it's also why we don't always use it. +So, our solution has been a late optimisation approach, and the way that we do this is we developed the low-level primitives in Rust for our low-level storage layer, and then we designed the orchestration of these in prolog. +When we find a performance bottleneck, we think about how to press that orchestration, or what unit of that orchestration, to press down, and try to find good sort of boundaries, module boundaries, essentially, so that we can press it down into Rust to improve performance. + +We have really been performance-driven on this, so the things that get pressed into Rust are those things that need performance enhancements. +So we started with this storage layer in Rust and have extended this to several, like operations that have proved to be slow when they were in prolog and needed to be faster. +These include things like, you know, patch application, and squash operations, things of that nature. +So these are larger orchestrated - they're not as low-level, so they have logic in them. + +We also have done some bulk operations that, for instance, in csc loading has been written completely in Rust as well, +because, if you have hundreds of thousands of rows in your csv, we get a ten- to 20-times speed-up going from prolog to Rust using the same algorithm because there's some kind of constant time that you can imagine expanding out, +but the cost of these operations, +and for hundreds of thousands of lines, that becomes a really significant time sink, so csv load has now been moved completely into Rust and we imagine large-scale bulk operations will all have to be moved into Rust eventually. + +So there are some features that we know we're going to add directly to the Rust library, so we have specific feature enhancements that we are never going to even bother trying to do in prolog. +They generally have to do with low-level manipulation. +It would be silly to write them. +There's no point in prototyping them even there. + +However, there's a lot of features that we expect will end up in Rust as we move forward, and they really, it's going to be a slow replacement strategy, +and it's not clear that we will ever replace all of prolog, although we may, +but there is even like in the ACID future where this product is well developed, ten years from now, and very solid, +we can imagine that probably some of the schema checking, et cetera, will be done in prolog, even though it will be perhaps prolog embedded in Rust, or using Scryer for prolog or something along those lines. + +One of the things, though, that we ran into was the unexpected bonus, and we kind of knew this was here, but are amazingly impressed with it. +This is the unexpected bonus round. +We got data parallelism from switching to Rust at a very low cost, using Rayon, and it really blew our minds. +We had things we hardly changed at all. + +We had the logic written there, and we used these magic incantations into_par, and others, and everything is way, way faster, +and we didn't have to think about it the hard way, and I love that, because I'm lazy! +So anything that can reduce the amount of time we spend writing things while also improving performance, it's a huge win. +I can't impress upon people enough how awesome this is, and how much we need other people to start using it. +So the borrow-checker, there is a cost but huge benefits that come from it - not just safety, but also potentially speed. +So, if you're interested in an open-source solution, you should give TerminusDB a try. +And that's it! + + +**Jeske:** +Yes thank you so much for the talk. That was really interesting. + +**Gavin:** +Thank you. Let me check the chat. I don't think there are open questions yet. I have a question. You always build a release mode, or is there speed-up and debug mode also good enough? + +**Matthijs:** +No, debug is definitely not fast enough. +Well, I mean, it is fast enough, it's fast enough when we're just testing out things, +and it's great sometimes to be able to use a debugger, or something, but like an actual general use, also when we are developing and not developing the low-level library, +we definitely build a release bug always, and it is a tremendous speed-up between them. + +**Jeske:** +Cool. Thank you so much. +I see a lot of clapping hands in the chat right now. +Thank you for joining in. +Matthijs, is there a last thing that you would like to add because we have a few minutes also still left? + +**Matthijs:** +Wow, no. [Laughter]. I don't know if I could add anything to that! + +**Gavin:** +People should try Rayon is definitely one thing. + +**Matthijs:** +Rayon was a great thing to try. +We were scared to try it, because oh, data parallelism, scary, but it's literally just replacing a few calls, and it just works. +We got so much speed out of it, so, yes, Rust's ecosystem is just amazing. We love it. + +**Jeske:** +There is a warming community, I have to say, also. + +**Matthijs:** +It's really great. It's a good community. + +**Jeske:** +I see a question happening. Do you have any idea what hinders productivity in Rust beside the borrow-checker? + +**Gavin:** +Well, like, types just introduce extra overhead. +In prolog, you don't have to worry about garbage collection or how you allocate things. +It's just a few things to worry about. +It costs you later in terms of performance but it's really helpful in terms of developer time, and lots of things, it doesn't matter what the constant time cost is, because it's just glue. +Most software is just glue code, and, if you're just writing glue, you don't want to be worried about lots of details, I think. + +**Matthijs:** +There is another thing here, which is to compare with prolog. +In prolog, you would have a running instance, and then you do live recompilation of parts of that program, so it is a very short loop between writing your code and seeing it in action. +With Rust, you have to compile, and then you can run the unit tests, and I mean it's not a big thing, but it is a thing. So having that kind of repo experience, that really does help development. + +**Jeske:** +Thank you. There are some questions popping up for use cases and what applications of use of TerminusDB at the moment? Can you elaborate a little bit on that? + +**Gavin:** +It's like machine learning where you need to have revision control of your data sets and there is any kind of large-scale graph manipulation if you want to - if you want to keep revisions, and be able to pipeline your data, that's where we would use it. +We scale up to quite large graphs. You would be able to stick something large in there if you would like. + +**Jeske:** +I think we are running out of time. Will you both be active in the chat to help around? I see already Matthijs you're in the chat as well. + +**Matthijs:** +Yes. + +**Jeske:** +We had some technical difficulties sometimes which one does with this online experience, I would say, also, it's kind of fun experiences now, I have to say. +I want to thank you both so much for your time, and interesting presentation, and please do check out the chat. +And then I see that in eight minutes would be will he start the next speaker already. Please also, for the people watching their live streams, stick around for that. We will be back in eight minutes, I would say. Thank you so much, again, Gavin and Matthijs. + +**Gavin:** +Thanks for having us. + +**Jeske:** +See you in the chat. + +**Matthijs:** +Thank you for having us. I'm looking forward for the rest of the talks. + +**Jeske:** +Ciao! + +**Matthijs:** +Bye-bye! diff --git a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt deleted file mode 100644 index 2ab9a05..0000000 --- a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt +++ /dev/null @@ -1,59 +0,0 @@ -Rust as a foundation in a polyglot development environment - Gavin Mendel-Gleason and Matthijs van Otterdijk. -JESKE: And we are back! -PILAR: That was a great start to the day! -JESKE: Loved it. -PILAR: I should not have had coffee this morning, between the caffeine and the adrenaline, I'm like super jittery! But it's amazing. And, like, wow, I don't - a lot of people are very sceptical still of online conferences, but, wow, the engagement in the chat, and online, it's been really, really cool. -JESKE: Really lovely to see tweets popping up, and like people are really actively participating, so thank you all for that, and just keeping mentioning it, and keep tagging everybody, and so we can also see your questions as they pop up. Stefan you wanted to introduce our next act? -STEFAN: Wait, wait ... . One second! -PILAR: We threw you off with the hype. We were just so excited! It's just been really cool and really great to be here. Amazing speakers. Amazing audience. -STEFAN: Yes, sorry. This was the classic having a bug in the ear. I just heard the update from the next act will start on time, which is in seven-ish minutes from now. -JESKE: Perfect. -STEFAN: At 12 Europe time, UTC11. My head doesn't work any more! -JESKE: Where can people find the next act as well? -STEFAN: So, for you out there watching the stream, stay tuned, we will have the same break as before, which means a couple of minutes nothing, and then the artist Dibs with immersive Afro-beats will start. This is one of the good occasions to stand up and dance. -JESKE: I know I will. -PILAR: I didn't think of that. That's a great break activity as well, because we probably are all sitting at our computers at home, so, yes, maybe you cannot, like it won't annoy your surrounding people if you blast the conference for a couple of minutes while we have our break. -STEFAN: If you have daytime, stomping helps me to calm all the caffeine down, because, yes. -JESKE: I think having a festival feeling, right? In Amsterdam, the sun is shining, so it's a little bit like picking up again. -PILAR: Absolutely. Get your friends in on it. Get your family in on it. If I start stomping, my dogs will get the zoomies. The mail man came in while we were streaming, and we were all saved from the barks there! It's been really cool to have the live Q&A as well. Thank you so much to our speakers for being here, for providing amazing talks, like, oh. I couldn't even pick my favourite one. They're all so amazing, and there are more amazing talks coming up too. -JESKE: It's really nice they're active in the chat, and indeed online, and a few of them are here live in our "studio" so thanks for that as well, and I'm looking forward to what is coming after the break. But, I think we will give everybody a few minutes to get a coffee, or get their dancing outfit on, maybe. -STEFAN: And eat something, if you have some warm food, also it's nice. And then, after this, we will commence at 1250 local time. There will be a 20-minute break between the performance and the next talk. I think that's all. -PILAR: The replays are working again, snake game is working again unless you managed to crash it. -JESKE: Enjoy the dancing, and I will see you in 55 minutes. -PILAR: See each other then. Have a great break. Enjoy our artists. -STEFAN: See you soon! -PILAR: See you! [Break]. [Music]. -JESKE: Hello. -STEFAN: Hello. -PILAR: Welcome back, everyone. How - let us know in the chat what you felt about the amazing artist break we had. I hope you all had a - -JESKE: I did a little dance! -STEFAN: More than one! -PILAR: I couldn't help it, and it did give my dogs zoomies, but that's just part of the energy. -JESKE: Thanks for the for the music, and the nice introductions, and looking forward to a lot of nice content again. For the purposes of not running too late, I would like to dive immediately into the intro. -PILAR: We will leave you to it. -STEFAN: See you in the chat room for the talk. See you later. -JESKE: Rust as a foundation in a polyglot environment. Please join the chat room, ask questions over there, so the following speaker will be Gavin and Matthijs, with and they will be having this talk live. I will hand over to Gavin after the short introduction, but please also ask questions in the chat in the meantime, and will he will save them for after the talk of Gavin, and Matthijs will join to answer those questions. -BARD: Gavin and Matthijs show how one might, a large project in Rust rewrite. Start out small, let it grow, until stealing the show from whatever was there before, right? -GAVIN: That's an excellent introduction. I'm Gavin Mendel-Gleason, the CTO for TerminusDB, and I wanted to talk a little bit today about Rust as a foundation in a polyglot environment. First, I'm going to give a little outline of the talk with the motivation, challenges and solution. - and why we used Rust as a foundation in our environment. First, you have to know about our problem, we are an in-memory, revision control graph database, and we have slightly unusual features which has driven some of the tool chain requirements we have. Our software is a polyglot house, so we have clients written in JavaScript and in Python. We have Rust, and we have prolog which is somewhat unusual in the modern day. There is also C involved there as well. Some of the unusual features that drive our design requirements, so we're an in-memory database which enables faster query. It's also simpler to implement, and I have some experience in implementing on ACID databases and so I know a lot about the difficulties that you can encounter when trying to page in. We chose this time to leave it in memory for the simplicity of design and performance. We are, however, also ACID, so we use backing store. We actually write everything to disk, but we leave things in memory. We also use succinct data structures which approach the information theoretic minimum size whilst allowing query in the data structure. This allows us to get large graphs in memory simultaneously, but this requires a lot of bit-twiddling. They're relatively complicated data structures, and they're compact but not so transparent to the developer, so you really need to be able to do effective bit-twiddling, which, of course, is Rust comes in. We have a bunch of git-like features like revision control, push, pull, clone, and all of the things that you know from git. We do those on databases. So that also drives a lot of our requirements. We have a data log query engine, and we also have complex schema constraint management. So, first, why did we look into Rust in the first place? So, we were not initially a Rust house. We didn't have any Rust in our development at all. I didn't come from a Rust background and although I have a lot of experience in different programming languages, Rust was not one of those programming languages. Our earlier prototype is actually in Java. It was hard to write, and it had mediocre performance, and so I started prototyping in prolog. Because prolog was very logical, especially the schema-checking parts of it, it was extremely fast for us to write it in prolog, but however it had poor performance, so obviously it is not the best for bit-twiddling. Later, we moved to Library and C++ called HDT, and we used that as our storage layer which radically improved the performance of the application. However, we had a lot of trouble with this, and it was a persistent source of pain, so C++ was crashing regularly, and this is partly because we needed - we had requirements that we had to be multithreaded for performance reasons, because we were dealing with very, very large databases in the billions of nodes, and the code was not re-entrant, although it was supposed to be written with the intent of being re- entrant but it wasn't in practice and this would come up with the server crashed. It was really, really hard to find the source of these crashes, and that was a persistent source of problems for us. So then there was a secondary problem which is that HDT was not designed for write-transactions. It was really designed for datasets and not databases so we were using orchestration logic on top of it where we would journal transactions and stuff like that. It wasn't designed that way. So we had feelings about what the interface should be for a library, HDT wasn't it, and it also had these crashing problems, and we were finding it hard to find the source of them. Matthijs off his own bat went out and wrote a prototype in Rust of the succinct data structures that we needed to replace HDT and like a simple library around it, and it looked really very promising. I had heard of Rust, but I had not written anything in Rust. This drove me to take a look at Rust. I know a lot of languages have learned Kam, C++, Haskell, prolog, Lisp, I've been through the gamut of all of these, and I don't try to learn a new language unless there is something peculiar that drives it as something you might need in your tool kit. Rust had this kind of incredible aspect to it which is this ability to avoid memory problems whilst still being extremely low-level programming language. So thread safety was one of our major headaches. We were getting seg faults and we were finding it difficult to time-consuming to sort it out. This library was exhibiting none of these problems, and this was really promising. We decided we were just going to take the plunge and rewrite the foundations of our system in Rust. So, it also gave us the chance to re-engineer our data structure, simplify code, improve fitness for purpose, change the low-level primitives, and cater to write-transactions in particular, but also enabled us to do some performance enhancements that we would like to have done but were afraid to do because in C++ there is kind of a fear factor where, if you had anything new, you might add something that causes it to crash. So, of course, in terms of challenges, I'm sure everyone in the Rust community knows about challenges of FFI, but I don't want to belabour the point. We had - we had a comfortable interaction with C stack, and this is annoying, because if we're interfacing with Rust, we're actually interfacing it through a C FFI, and that kills some of the nice guarantees you get from Rust, but at least they're isolated to the interaction surface rather than completely. So, we also ended up trampolining through a light Cshim which is not the best approach. We are evaluating a more direct approach currently. I didn't want to tell everybody we've done it right, we've done some things right, but we can improve a lot here. Now, what we would really like, though, is a Rust prolog because then we could have a nice clean Rust FFI, and everything would be beautiful and perfect. There's some progress being made on Scryer prolog which has cool features that you should look at if you're interested in a Rust prolog project. Then some of the challenges that we ran into, I would like to go through really quickly. So we initially expected to write a lot more of the product in Rust, so we started off replacing the HDT layer, and then we expected to write a lot more from the ground up, so it's essentially like we had this building, we went in, we replaced the foundations, and then we were going to start replacing the walls, so, unfortunately, developer-time constraints has favoured a different approach for us, so we're doing rapid prototyping in prolog. We essentially rewrite the kind of feature that we are interested in there, and then instead of just immediately going to Rust from there, we actually wait, so, we're much more selective about what we put into Rust than we had initially imagined. Partly this is due to the learning curve of thorough checking semantics meaning there is a difficulty in getting our developers to understand how this stuff works, so that takes some time. And there is a higher front cost here, and you win it back, and, if you're replacing C++, you win it back very quickly. You win it back very quickly because seeking out those bugs dominates in terms of time, so that upfront learning cost is nothing compared to the cost of some horrible seg fault that you can't find. But, if you're replacing prolog, the sort of amortized costs are more important, so you have to worry about where you replace it, and you have to be more careful about that. Once you've gotten the knack of the checker, things go a lot faster but they're still writer than writing prolog, because it's a lower-level language which is why we use it, but it's also why we don't always use it. So, our solution has been a late optimisation approach, and the way that we do this is we developed the low-level primitives in Rust for our low-level storage layer, and then we designed the orchestration of these in prolog. When we find a performance bottleneck, we think about how to press that orchestration, or what unit of that orchestration, to press down, and try to find good sort of boundaries, module boundaries, essentially, so that we can press it down into Rust to improve performance. We have really been performance-driven on this, so the things that get pressed into Rust are those things that need performance enhancements. So we started with this storage layer in Rust and have extended this to several, like operations that have proved to be slow when they were in prolog and needed to be faster. These include things like, you know, patch application, and squash operations, things of that nature. So these are larger orchestrated - they're not as low-level, so they have logic in them. We also have done some bulk operations that, for instance, in csc loading has been written completely in Rust as well, because, if you have hundreds of thousands of rows in your csv, we get a ten- to 20-times speed-up going from prolog to Rust using the same algorithm because there's some kind of constant time that you can imagine expanding out, but the cost of these operations, and for hundreds of thousands of lines, that becomes a really significant time sink, so csv load has now been moved completely into Rust and we imagine large-scale bulk operations will all have to be moved into Rust eventually. So, the - so there are some features that we know we're going to add directly to the Rust library, so we have specific feature enhancements that we are never going to even bother trying to do in prolog. They generally have to do with low-level manipulation. It would be silly to write them. There's no point in prototyping them even there. However, there's a lot of features that we expect will end up in Rust as we move forward, and they really, it's going to be a slow replacement strategy, and it's not clear that we will ever replace all of prolog, although we may, but there is even like in the ACID future where this product is well developed, ten years from now, and very solid, we can imagine that probably some of the schema checking, et cetera, will be done in prolog, even though it will be perhaps prolog embedded in Rust, or using Scryer for prolog or something along those lines. One of the things, though, that we ran into was the unexpected bonus, and we kind of knew this was here, but are amazingly impressed with it. This is the unexpected bonus round. We got data parallelism from switching to Rust at a very low cost, using Rayon, and it really blew our minds. We had things we hardly changed at all. We had the logic written there, and we used these magic incantations into_par, and others, and everything is way, way faster, and we didn't have to think about it the hard way, and I love that, because I'm lazy! So anything that can reduce the amount of time we spend writing things while also improving performance, it's a huge win. I can't impress upon people enough how awesome this is, and how much we need other people to start using it. So the borrow-checker, there is a cost but huge benefits that come from it - not just safety, but also potentially speed. So, if you're interested in an open-source solution, you should give TerminusDB a try. And that's it! -> Yes thank you so much for the talk. That was really interesting. -GAVIN: Thank you. Let me check the chat. I don't think there are open questions yet. I have a question. You always build a release mode, or is there speed-up and debug mode also good enough? -MATTHIJS: No, debug is definitely not fast enough. Well, I mean, it is fast enough, it's fast enough when we're just testing out things, and it's great sometimes to be able to use a debugger, or something, but like an actual general use, also when we are developing and not developing the low-level library, we definitely build a release bug always, and it is a tremendous speed-up between them. -JESKE: Cool. Thank you so much. I see a lot of clapping hands in the chat right now. Thank you for joining in. Matthijs, is there a last thing that you would like to add because we have a few minutes also still left? -MATTHIJS: Wow, no. [Laughter]. I don't know if I could add anything to that! -GAVIN: People should try Rayon is definitely one thing. -MATTHIJS: Rayon was a great thing to try. We were scared to try it, because oh, data parallelism, scary, but it's literally just replacing a few calls, and it just works. We got so much speed out of it, so, yes, Rust's ecosystem is just amazing. We love it. -JESKE: There is a warming community, I have to say, also. -MATTHIJS: It's really great. It's a good community. -JESKE: I see a question happening. Do you have any idea what hinders productivity in Rust beside the borrow-checker? -GAVIN: Well, like, types just introduce extra overhead. In prolog, you don't have to worry about garbage collection or how you allocate things. It's just a few things to worry about. It costs you later in terms of performance but it's really helpful in terms of developer time, and lots of things, it doesn't matter what the constant time cost is, because it's just glue. Most software is just glue code, and, if you're just writing glue, you don't want to be worried about lots of details, I think. -MATTHIJS: There is another thing here, which is to compare with prolog. In prolog, you would have a running instance, and then you do live recompilation of parts of that program, so it is a very short loop between writing your code and seeing it in action. With Rust, you have to compile, and then you can run the unit tests, and I mean it's not a big thing, but it is a thing. So having that kind of repo experience, that really does help development. -JESKE: Thank you. There are some questions popping up for use cases and what applications of use of TerminusDB at the moment? Can you elaborate a little bit on that? -GAVIN: It's like machine learning where you need to have revision control of your data sets and there is any kind of large-scale graph manipulation if you want to - if you want to keep revisions, and be able to pipeline your data, that's where we would use it. We scale up to quite large graphs. You would be able to stick something large in there if you would like. -JESKE: I think we are running out of time. Will you both be active in the chat to help around? I see already Matthijs you're in the chat as well. -MATTHIJS: Yes. -JESKE: We had some technical difficulties sometimes which one does with this online experience, I would say, also, it's kind of fun experiences now, I have to say. I want to thank you both so much for your time, and interesting presentation, and please do check out the chat. And then I see that in eight minutes would be will he start the next speaker already. Please also, for the people watching their live streams, stick around for that. We will be back in eight minutes, I would say. Thank you so much, again, Gavin and Matthijs. -GAVIN: Thanks for having us. -JESKE: See you in the chat. -MATTHIJS: Thank you for having us. I'm looking forward for the rest of the talks. -JESKE: Ciao! -MATTHIJS: Bye-bye! \ No newline at end of file diff --git a/2020-global/talks/02_UTC/05-Anastasia-Opara.md b/2020-global/talks/02_UTC/05-Anastasia-Opara.md new file mode 100644 index 0000000..21f0430 --- /dev/null +++ b/2020-global/talks/02_UTC/05-Anastasia-Opara.md @@ -0,0 +1,211 @@ +**Rust for Artists, Art for Rustaceans** + +**Bard:** +Anastasia plays Rust like a flute +or maybe a magical lute +to then simulate +things that art may create +and this art does really compute + + +**Anastasia:** +Hi, and welcome to Rust for artists, art for Rustaceans. + +This talk covers from multiple personality disorder like talk about similarities between art, programming, and science, and at the same time showing you complicated drawing algorithms and also has practical tips. +If you're only interested in the Rust and algorithm parts, go get yourself a coffee and come back in about ten minutes, since we are going to go through the meta part first. +Without further ado, let's start with introductions. + +My name is Anastasia Opara, and I'm a procedural artist at Embark Studios, a Stockholm-based game development company. +You might be wondering what is a procedural artist? Procedural arts' distinguishing feature is that it is executed by a computer following a set of design procedures designed in code - that's why it is procedural. +As a procedural artist at Embark, I spend most of my time meditating on workflow in games from player and developer perspectives. +There is head scratching and more head banging on the wall, which is fun, exhausting, and sometimes outright terrifying, but never boring. + +To give you an example, I would like to show you two of the recent projects we did at Embark. +It will be tl;dr versions, and we have separate talks about both of them if you want to learn more. + +The first one is texture synthesis where I got introduced to Rust. +It's an example-based algorithm for image generation, meaning you can give it an example image and it will generate more similar-looking images. +You can also do things like guided generations, style transfer, fill in missing content and simple geometry generation. + +The second project is called Kittiwake which is a game-like environment where we explore a feeling of co-creation with an example-based algorithm which is embodied into this little creature. +You create a small screen, like a dialogue, and Kittis tries to minimise the way you create by analysing how you place things and using it as an example. +One of the key similarities between these projects is that both of them heavily rely on performing some kind of search. +Both of them give an example like object arrangement in Kittiwake, or pixel image synthesis, we search for a new configuration that looks perceptually similar to the example while not being a copy. +You can think of a search process in a simplified way, that is we try a bunch of configurations, we present you with the most promising one. +From the user perspective, it can happen so fast that is might not seem like a search. + +For example, in texture synthesis, you can see pixels appearing from possible neighbourhoods in the example. +In the example-based placement you can see objects magically moving around until they settle into the positions that satisfy the example. +Even though the notion of search in this project can be argued purely algorithmical, it's closer to a genuine art process than we might initially think. +It is easy to perceive the final artwork as the outcome of a linear pre-calculated path, like an artist just sits down and does art. + +However, if we dig deeper, we will discover there is always an underlying network of trial and error. +For example, Picasso's famous painting The Ladies of Avignon is the result of hundreds of sketches. +You can see how earlier sketches were in a different style than the final work. +We can argue that of course this worked required a lot of exploration because of how stylised it is, therefore the search was purely about finding the stylisation and re imagining what we see into something completely different. +However, even when painting from reference, with a goal of copying reality, it is never a passive observation but active interpretation, and engineering of usual forms, which together construct a presentation. + +For example, if we look at digital photo studies, we can see a process of searching for textures, colours, forms, that conjure a similar perceptual response to the target photograph. +It is a dynamic problem-solving of simplifying the object of depiction while keeping the perceptual essence. +Any painting if you look close enough is just an amorphous jumble of brushstrokes but they magically come together and make you believe what they present is real. +That I believe is any part is a search for a presentation that conveys a target experience, and the human ability to comprehend similarity between a representation and a thing that it aims to represent is an astonishing example of abstract thinking that comes to us so naturally. +And through the lens of representations, art can invite to us perceive the same object in different ways. + +Like let's consider this sheep, for example. +As artists, we might emphasise the way the wool curls are repetition of a pattern that trees make when they sway, abandoning - or we might adopt a different perspective. +And explore the sheep not the way we see it, but invite to experience the concept of a sheep through its hoof marks as it walks on the canvas. +Both works aim to capture the sheep, but the outcomes or representations differ drastically. +Art is not alone in its pursuits to construct things into representations. + +In programming, we are often faced with a challenge of translating the language of our thoughts into the language of implementation. +If we were asked to represent the sheep in code, we might adopt an inheritance pattern of thinking and inherit it from an animal class, or might say that animal is just one of the traits and there are other traits we are interested in such as adorable or fluffy. + +In mathematics, we can transform and map it into a new space. +Even if when dealing with data, we are faced with a choice of a model or a representation that will explain it. +In the end, these are all representations. +They don't change the way the sheep is, but they change the we think about it. +And quite often, there is no-one representation that just works for a - and it becomes an iterative search for pieces needed to design a new presentation. +The pieces we minute might be different but the pieces of search in art, programming, science, are similar. +That's why arts, programming, and science are actually much closer to each other than we usually portray. + +If science helps us to reason about the external world and deconstruct a problem into processing flow, I think art is about looking inward. +A self-introspection and observation of one's perceptions. +Art is not just about recreation of reality, even if you do choose to make it your focus; it's an invitation to co-experience something from your perspective, something that used to be bodiless, but you invented a representation for it, and from that perspective, I genuinely think anyone can be an artist. + +Today, there are plenty of art media to choose from, and code is a particularly fascinating one as it invites to convey not just a final destination of the art process, but the author's workflow, the search itself. +It invites us artists to reverse-engineer our own thought process and deconstruct it into an algorithmical form. +Pushing back the process into the background and putting the experience of the search to be THE main art piece. +And that is what computational drawing aims to capture. + +Computational drawing is my hobby project which is designed to imitate traditional drawing from reference by searching for a Deacon instruction of the target into discrete brushstrokes, inviting the experience to become of the work to its rough stages to final details. +And just like many paintings are not a faithful representation of reality, so is computational drawing not meant to be a recreation of an artist. +It was originally inspired by many implementations of genetic algorithms available on the internet. + +Genetic algorithm is a search algorithm loosely inspired by natural selection. +But what was more interesting in my opinion is the way this project's objective, that is to represent a target image in a budgets of 50 Polygon slides, or hundreds like the Eiffel Tower. + +It was the summer of 2017, and I finished my user the, and I had no idea about searching algorithms. +Seeing the genetic stuff really triggered me. +The process, as brute force as it was, reminded me of my own experience during life-drawing classes, having to translate what I see into a set of discrete motions with a pencil or a charcoal. +That personal experience combined with discovering an algorithmic representation I could use, I just had to try it out. +In the end, I modified the search quite a bit which actually made it redundant to frame it as a genetic algorithm and I will touch upon it later in the presentation. +So, this is a result from 2017. + +At the time I was just learning Python, so this was written in Python. +And it was very slow to calculate, like a couple of hours almost to a day for one image. +It was ugly, and I never really showed it to anyone except a couple of friends. +I thought it was unsophisticated, not worth sharing, so it just kind of collected virtual dust in my hard drive, until this summer of 2020, when, thanks to Covid, there was a lot of free time, and I went through my old hard drive and rediscovered it. +And the reception was beyond my expectations. +Which motivated me to clean it up a bit and open source it. +And it was super rewarding to see people getting inspired and trying out their own versions. + +And while cleaning up the Python code, I started having a lot of new ideas coming from the experience I accumulated. +So I decided to start anew, and, of course, for my sanity, I restarted the whole thing in Rust, and, yes, we are finally getting to the part when I talk about Rust! How does the algorithm work? Let's imagine we can draw a single brushstroke, which is parameterised by its scale, and value. +I like to - there are functions for rotation as well as scale and to change value, we can simply access pixel data. + +So now imagine we drew this brushstroke on a canvas, and just like we thought of scale of parameters, we can think of the brush as a parameter of the canvas. +And representing our brush configuration as one dimension, it is just a mental short cut, in fact, that one dimension encapsulates five scale rotation, value, and position of x and y. +Now suppose we added a second brush, extending our brush space to be two-dimensional, and this new space encodes both brushes and thus appearance our canvas would have. +So far, it might seem like quite a redundant transformation. +Cool - but it gets more conceptually interesting as we add more brushes. +It becomes messier to visualise. + +Let's imagine this 2D space is a space defined by 100 brushes, and a dot in the space represents a particular canvas appearance defined by our 100 brushes are configured. +To move in the space, all we have to do is change our canvas. +If we just take a stroll and aimlessly wander around in the space, we might discover that with just 100 brushstrokes, we can depict a lot of interesting stuff, but also a lot of random stuff. +In fact, the proportion of interesting stuff to random it insanely low. +It is very unlikely that we will just stumble upon a good painting. +So, the question becomes how do we stop and search for something interesting? We provide a target that guides our search to a space containing similar images, and the way we can define similarity is simply a difference between the pixels of the target and the pixels of the drawn images. +So, if after imitating a brushstroke our pixel difference is smaller, that means we are moving towards a space with more similar images and we should keep the mutation, and then we just do it again, again, and again. +Here, you can see the beginning of a search as brushes move around trying to position themselves in such a way that looks more like a target, and if it continues its guide the search long enough, we will eventually reach a can various configuration whose brushstroke arrangement looks similar to the target. +In general, that is pretty much it. + +There are many ways one might implement this search. +It can be a genetic algorithm, gradient descent, simulated annealing. +I will show you how I approached it from my art education and incorporating it into the search. +I was greatly inspired in fine-art classes, especially when doing oil still lives, we were never taught to solve the detail frequencies is at once. +Most of the time, you will get yourself into a corner you can't solve, like a - so you deconstruct the object of depiction into big shapes first. +Only once you've got them you go into details. + +I wanted to incorporate the same kind of wisdom in the way my algorithm would do the search. +Therefore, I broke it down into multiple stages, and each stage only solves a particular level of details, starting with very big brushstrokes, forcing to generalise the shape, and then applying more details. +Each of those stages is a completely separate searches, so when a first stage is done, the second stage has to just draw on top of what has been drawn before. +Here, you can see the search process happening for different stages. +And when you see a sudden jump, that's when your brushes are added, and the algorithm is using them to better approximate the target. +And during every stage, the algorithm needs to place 100 brushes, and it has to do so in 10,000 search steps. +10,000 steps might seem like a lot, but if you need to place 100 brushes, that is 8,000 parameters, and remember, the algorithm cannot remove them. +It is forced to place strokes even if it is not perfect. +And one of the reasons to limit the step number is to encourage happy accidents, mistakes, and imperfections. + +If something goes bad, it can be fixed in later stages, giving a conception of continuous problem-solving in the brush layout itself. +As brushstrokes become smaller, I use a sampling mask to guide the brush placement towards places of higher frequencies. +I do so to preserve the loose brush work while giving a perception of a deliberate intent. + +We're not just splattering a uniform brush texture, but have a specific thought process manifested in the way the brushes non-uniform sizes and visually interacting with each other. +There is an expression, "Don't overpad the painting" meaning don't overwork it and kill the playfulness. +That's what I'm trying to avoid by having the sampling mask. +You can notice the sampling mask is generated based on edges, and edges play a very important role in drawing. + +If you have had drawing classes, you probably recognise this. +This is used as a homework to copy and learn from on how to deconstruct a 3D shape into simplified contours. +As humans, we are sensitive to sharp transitions between darker and lighter elements and a small deviation can make something look wrong. +Therefore, one of the new additions to the Rust version was using contours to guide the search. +This is done by comparing edges of target versus drawn, and computing their distance. + +Here's a comparison of using versus not using edges to guide the search. +Notice how much better defined the face is, and how it looks perceptually closer to the original. +It's subtle, but I think it gives an extra push towards believability that there is an artist's thought process behind each brush placement. +Since we need to perform iteration, it needs to be fast. + +Here are some comparisons of how long it takes for different edge-detection iterations available in different Rust crates. +In the end, I went with a custom implementation that uses chunks to make it parallel. +And here is a time and quality comparison for Canny versus without-edge detection. +Edge detection takes a huge bulk of the generation time going from five minutes to 20, even with 1.5 hours with Canny. +The facial features are captured so much more precisely with Canny or Sobel. +Canny gives a crisper result but comes at a four-times slower generation cost. +In case one is interested in how the parallel Sobel is done, here is a code. +Feel free to pause when this goes online. + +Another way I'm moving edges is to drive the brushstroke's orientation. +Brushstrokes follow along the edges and don't cross them, because that would violate the perceptual contour border. +To guide the brushstrokes, I generate an image-gradient field. +The stronger the direction, the more influence it has on a brushstroke that might be placed in that region. + +For example, here, there is almost no gradient information and therefore the brushstroke might have any orientation. +Here, closer to the edges, there is a strong directionality indicated by the length of the lines, and the brushstrokes placed here are more likely to follow along the field's direction. +The reason why I made it to the always probabilistic is I don't want to exclude any happy accidents. +Perhaps if a brushstroke is placed completely perpendicular, it might actually be a very good solution, and to the pixel and edge difference is what matters. +Computational drawing is still very much work in progress, and I hope to open source it once it's done. +One thing I still haven't gotten to is figuring out a good strategy for searching for a colour solution. +That is still on my to-do list. + +Right now, I'm directly targets from the target image. +At the moment been the algorithm is running on CPU. +The code is parallelised, and drawing on CPU is quite slow, and it just so happens, recently Embark has announced Rust on GPU project. +I'm really looking forward to its development, so please go and contribute so I can do the paintings on the GPU! + +We are reaching the end of the presentation now, so let's summarise: +we have talked about art as a search process for new representations and how representations can invite us to view the same object from different perspectives. +Art, science, and programming are similar in that regard. + +We discussed how code as an art medium invites us to convey our search process in an algorithmic form, making it the main art piece, +and how computational drawing tries to capture that search in the context of traditional drawing from reference. + +And lastly, we have covered the algorithm details as well as how we can use our artistic intent to guide the search by translating it into code. +And if you're an artist, and you are interesting getting into Rust, I really recommend that you stop considering and just do it. + +First of all, the ecosystem is great, the package management is heavenly, and if you just want to get started ASAP, learning about ownership is all you really need, which is literally like reading the first four chapters of the Rust Book. +It's very rare I find myself in the need of advanced features when doing this kind of art tools. + +I also can't recommend enough getting Rust Analyzer, it will show you types, tips, it is absolutely amazing. +When I prototype, I often write very messy code, and it's a breath of fresh air to have the language guarding me against stupid mistakes, +and Clippy shout at me for making a variable and never using it! Having that confidence that if my code compiles, it works. +that really frees my brain to focus exclusively on the algorithm design and logic flow, and I don't envision myself prototyping in any other language now. + +Before we wrap up, a quick thought out to Thomas and Maik for dealing with my Rust programming on at that daily basis. +A lot of things I learned about Rust, I learned from them, and I would like to thank Embark giving me time to discuss this report. +I would like to thank you for listening. +I hope it was useful, and you learn something new, and, if you have any questions, write a comment if you're watching it off line, or post a message in the chat if you're watching it live. + +Thank you. +Have an awesome remainder of RustFest. diff --git a/2020-global/talks/02_UTC/05-Anastasia-Opara.txt b/2020-global/talks/02_UTC/05-Anastasia-Opara.txt deleted file mode 100644 index d583d54..0000000 --- a/2020-global/talks/02_UTC/05-Anastasia-Opara.txt +++ /dev/null @@ -1,5 +0,0 @@ -Rust for Artists, Art for Rustaceans - Anastasia Opara -PILAR: Welcome back, everyone. I hope you've all been having a great day. I'm trying to not be over hyped! I'm going to wear myself out! The next talk is again something very near and dear to my heart. You might sense how excited we all are for all these talks. Our next speaker is Anastasia Opara. Yes, and, so to preface this talk, if there is anything cool that has come from tech, it is that we can express ourselves creatively through it, and that is what Anastasia Opara is here for. She comes from a family of artists, continuing the family tradition by swapping out a paintbrush for code. She is a procedural artist at Embark, and her passion is to enable people to blur the lines between algorithmic language and art language. So, it's going to be a fantastic talk. I'm going hand it over to our Bard, and I hope you enjoy. -BARD: Anastasia plays Rust like a flute, or magical lute, to simulate things that art may create, and art really does compute! -ANASTASIA: Hi, and reck to Rust for artists, art for Rustaceans. This talk covers from multiple personality disorder like talk about similarities between art, programming, and science, and at the same time showing you complicated drawing algorithms and also has practical tips. If you're only interested in the Rust and algorithm parts, go get yourself a coffee and come back in about ten minutes, since we are going to go through the meta part first. Without further ado, let's start with introductions. My name is Anastasia Opara, and I'm a procedural artist at Embark Studios, a Stockholm-based game development company. You might be wondering what is a procedural artist? Procedural arts' distinguishing feature is that it is executed by a computer following a set of design procedures designed in code - that's why it is procedural. As a procedural artist at Embark, I spend most of my time meditating on workflow in games from player and developer perspectives. There is head scratching and more head banging on the wall, which is fun, exhausting, and sometimes outright terrifying, but never boring. To give you an example, I would like to show you two of the recent projects we did at Embark. It will be tl;dr versions, and we have separate talks about both of them if you want to learn more. The first one is texture synthesis where I got introduced to Rust. It's an example-based algorithm for image generation, meaning you can give it an example image and it will generate more similar-looking images. You can also do things like guided generations, style transfer, fill in missing content and simple geometry generation. The second project is called Kittiwake which is a game-like environment where we explore a feeling of co-creation with an example-based algorithm which is embodied into this little creature. You create a small screen, like a dialogue, and Kittis tries to minimise the way you create by analysing how you place things and using it as an example. One of the key similarities between these projects is that both of them heavily rely on performing some kind of search. Both of them give an example like object arrangement in Kittiwake, or pixel image synthesis, we search for a new configuration that looks perceptually similar to the example while not being a copy. You can think of a search process in a simplified way, that is we try a bunch of configurations, we present you with the most promising one. From the user perspective, it can happen so fast that is might not seem like a search. For example, in texture synthesis, you can see pixels appearing from possible neighbourhoods in the example. In the example-based placement you can see objects magically moving around until they settle into the positions that satisfy the example. Even though the notion of search in this project can be argued purely algorithmical, it's closer to a genuine art process than we might initially think. It is easy to perceive the final artwork as the outcome of a linear pre-calculated path, like an artist just sits down and does art. However, if we dig deeper, we will discover there is always an underlying network of trial and error. For example, Picasso's famous painting The Ladies of Avignon is the result of hundreds of sketches. You can see how earlier sketches were in a different style than the final work. We can argue that of course this worked required a lot of exploration because of how stylised it is, therefore the search was purely about finding the stylisation and re imagining what we see into something completely different. However, even when painting from reference, with a goal of copying reality, it is never a passive observation but active interpretation, and engineering of usual forms, which together construct a presentation. For example, if we look at digital photo studies, we can see a process of searching for textures, colours, forms, that conjure a similar perceptual response to the target photograph. It is a dynamic problem-solving of simplifying the object of depiction while keeping the perceptual essence. Any painting if you look close enough is just an amorphous jumble of brushstrokes but they magically come together and make you believe what they present is real. That I believe is any part is a search for a presentation that conveys a target experience, and the human ability to comprehend similarity between a representation and a thing that it aims to represent is an astonishing example of abstract thinking that comes to us so naturally. And through the lens of representations, art can invite to us perceive the same object in different ways. Like let's consider this sheep, for example. As artists, we might emphasise the way the wool curls are repetition of a pattern that trees make when they sway, abandoning - or we might adopt a different perspective. And explore the sheep not the way we see it, but invite to experience the concept of a sheep through its hoof marks as it walks on the canvas. Both works aim to capture the sheep, but the outcomes or representations differ drastically. Art is not alone in its pursuits to construct things into representations. In programming, we are often faced with a challenge of translating the language of our thoughts into the language of implementation. If we were asked to represent the sheep in code, we might adopt an inheritance pattern of thinking and inherit it from an animal class, or might say that animal is just one of the traits and there are other traits we are interested in such as adorable or fluffy. In mathematics, we can transform and map it into a new space. Even if when dealing with data, we are faced with a choice of a model or a representation that will explain it. In the end, these are all representations. They don't change the way the sheep is, but they change the we think about it. And quite off, there is no-one representation that just works for a - and it becomes an iterative search for pieces needed to design a new presentation. The pieces we minute might be different but the pieces of search in art, programming, science, are similar. That's why arts, programming, and science are actually much closer to each other than we usually portray. If science helps us to reason about the external world and deconstruct a problem into processing flow, I think art is about looking inward. A self-introspection and observation of one's perceptions. Art is not just about recreation of reality, even if you do choose to make it your focus; it's an invitation to co-experience something from your perspective, something that used to be bodiless, but you invented a representation for it, and from that perspective, I genuinely think anyone can be an artist. Today, there are plenty of art media to choose from, and code is a particularly fascinating one as it invites to convey not just a final destination of the art process, but the author's workflow, the search itself. It invites us artists to reverse-engineer our own thought process and deconstruct it into an algorithmical form. Pushing back the process into the background and putting the experience of the search to be THE main art piece. And that is what computational drawing aims to capture. Computational drawing is my hobby project which is designed to imitate traditional drawing from reference by searching for a Deacon instruction of the target into discrete brushstrokes, inviting the experience to become of the work to its rough stages to final details. And just like many paintings are not a faithful representation of reality, so is computational drawing not meant to be a recreation of an artist. It was originally inspired by many implementations of genetic algorithms available on the internet. Genetic algorithm is a search algorithm loosely inspired by natural selection. But what was more interesting in my opinion is the way this project's objective, that is to represent a target image in a budgets of 50 Polygon slides, or hundreds like the Eiffel Tower. It was the summer of 2017, and I finished my user the, and I had no idea about searching algorithms. Seeing the genetic stuff really triggered me. The process, as brute force as it was, reminded me of my own experience during life-drawing classes, having to translate what I see into a set of discrete motions with a pencil or a charcoal. That personal experience combined with discovering an algorithmic representation I could use, I just had to try it out. In the end, I modified the search quite a bit which actually made it redundant to frame it as a genetic algorithm and I will touch upon it later in the presentation. So, this is a result from 2017. At the time I was just learning Python, so this was written in Python. And it was very slow to calculate, like a couple of hours almost to a day for one image. It was ugly, and I never really showed it to anyone except a couple of friends. I thought it was unsophisticated, not worth sharing, so it just kind of collected virtual dust in my hard drive, until this summer of 2020, when, thanks to Covid, there was a lot of free time, and I went through my old hard drive and rediscovered it. And the reception was beyond my expectations. Which motivated me to clean it up a bit and open source it. And it was super rewarding to see people getting inspired and trying out their own versions. And while cleaning up the Python code, I started having a lot of new ideas coming from the experience I accumulated. So I decided to start anew, and, of course, for my sanity, I restarted the whole thing in Rust, and, yes, we are finally getting to the part when I talk about Rust! How does the algorithm work? Let's imagine we can draw a single brushstroke, which is Pam terrorised by its - *parameterised by its scale, and value. I like to - there are functions for rotation as well as scale and to change value, we can simply access pixel data. So now imagine we drew this brushstroke on a canvas, and just like we thought of scale of parameters, we can think of the brush as a parameter of the canvas. And representing our brush configuration as one dimension, it is just a mental short cut, in fact, that one dimension encapsulates five scale rotation, value, and position of x and y. Now suppose we added a second brush, extending our brush space to be two-dimensional, and this new space encodes both brushes and thus appearance our canvas would have. So far, it might seem like quite a redundant transformation. Cool - but it gets more conceptually interesting as we add more brushes. It becomes messier to visualise. Let's imagine this 2D space is a space defined by 100 brushes, and a dot in the space represents a particular canvas appearance defined by our 100 brushes are configured. To move in the space, all we have to do is change our canvas. If we just take a stroll and aimlessly wander around in the space, we might discover that with just 100 brushstrokes, we can depict a lot of interesting stuff, but also a lot of random stuff. In fact, the proportion of interesting stuff to random it insanely low. It is very unlikely that we will just stumble upon a good painting. So, the question becomes how do we stop and search for something interesting? We provide a target that guides our search to a space containing similar images, and the way we can define similarity is simply a difference between the pixels of the target and the pixels of the drawn images. So, if after imitating a brushstroke our pixel difference is smaller, that means we are moving towards a space with more similar images and we should keep the mutation, and then we just do it again, again, and again. Here, you can see the beginning of a search as brushes move around trying to position themselves in such a way that looks more like a target, and if it continues its guide the search long enough, we will eventually reach a can various configuration whose brushstroke arrangement looks similar to the target. In general, that is pretty much it. There are many ways one might implement this search. It can be a genetic algorithm, gradient descent, simulated annealing. I will show you how I approached it from my art education and incorporating it into the search. I was greatly inspired in fine-art classes, especially when doing oil still lives, we were never taught to solve the detail frequencies is at once. Most of the time, you will get yourself into a corner you can't solve, like a - so you deconstruct the object of depiction into big shapes first. Only once you've got them you go into details. I wanted to incorporate the same kind of wisdom in the way my algorithm would do the search. Therefore, I broke it down into multiple stages, and each stage only solves a particular level of details, starting with very big brushstrokes, forcing to generalise the shape, and then applying more details. Each of those stages is a completely separate searches, so when a first stage is done, the second stage has to just draw on top of what has been drawn before. Here, you can see the search process happening for different stages. And when you see a sudden jump, that's when your brushes are added, and the algorithm is using them to better approximate the target. And during every stage, the algorithm needs to place 100 brushes, and it has to do so in 10,000 search steps. 10,000 steps might seem like a lot, but if you need to place 100 brushes, that is 8,000 parameters, and remember, the algorithm cannot remove them. It is forced to place strokes even if it is not perfect. And one of the reasons to limit the step number is to encourage happy accidents, mistakes, and imperfections. If something goes bad, it can be fixed in later stages, giving a conception of continuous problem-solving in the brush layout itself. As brushstrokes become smaller, I use a sampling mask to guide the brush placement towards places of higher frequencies. I do so to preserve the loose brush work while giving a perception of a deliberate intent. We're not just splattering a uniform brush texture, but have a specific thought process manifested in the way the brushes non-uniform sizes and visually interacting with each other. There is an expression, "Don't overpad the painting" meaning don't overwork it and kill the playfulness. That's what I'm trying to avoid by having the sampling mask. You can notice the sampling mask is generated based on edges, and edges play a very important role in drawing. If you have had drawing classes, you probably recognise this. This is used as a homework to copy and learn from on how to deconstruct a 3D shape into simplified contours. As humans, we are sensitive to sharp transitions between darker and lighter elements and a small deviation can make something look wrong. Therefore, one of the new additions to the Rust version was using contours to guide the search. This is done by comparing edges of target versus drawn, and computing their distance. Here's a comparison of using versus not using edges to guide the search. Notice how much better defined the face is, and how it looks perceptually closer to the original. It's subtle, but I think it gives an extra push towards believability that there is an artist's thought process behind each brush placement. Since we need to perform iteration, it needs to be fast. Here are some comparisons of how long it takes for different edge-detection iterations available in different Rust crates. In the end, I went with a custom implementation that uses chunks to make it parallel. And here is a time and quality comparison for Canny versus without-edge detection. Edge detection takes a huge bulk of the generation time going from five minutes to 20, even with 1.5 hours with Canny. The facial features are captured so much more precisely with Canny or Sobel. Canny gives a crisper result but comes at a four-times slower generation cost. In case one is interested in how the parallel Sobel is done, here is a code. Feel free to pause when this goes online. Another way I'm moving edges is to drive the brushstroke's orientation. Brushstrokes follow along the edges and don't cross them, because that would violate the perceptual contour border. To guide the brushstrokes, I generate an image-gradient field. The stronger the direction, the more influence it has on a brushstroke that might be placed in that region. For example, here, there is almost no gradient information and therefore the brushstroke might have any orientation. Here, closer to the edges, there is a strong directionality indicated by the length of the lines, and the brushstrokes placed here are more likely to follow along the field's direction. The reason why I made it to the always probabilistic is I don't want to exclude any happy accidents. Perhaps if a brushstroke is placed completely perpendicular, it might actually be a very good solution, and to the pixel and edge difference is what matters. Computational drawing is still very much work in progress, and I hope to open source it once it's done. One thing I still haven't gotten to is figuring out a good strategy for searching for a colour solution. That is still on my to-do list. Right now, I'm directly targets from the target image. At the moment been the algorithm is running on CPU. The code is parallelised, and drawing on CPU is quite slow, and it just so happens, recently Embark has announced Rust on GPU project. I'm really looking forward to its development, so please go and contribute so I can do the paintings on the GPU! We are reaching the end of the presentation now, so let's summarise: we have talked about art as a search process for new representations and how representations can invite us to view the same object from different perspectives. Art, science, and programming are similar in that regard. We discussed how code as an art medium invites us to convey our search process in an algorithmic form, making it the main art piece, and how computational drawing tries to capture that search in the context of traditional drawing from reference. And lastly, we have covered the algorithm details as well as how we can use our artistic intent to guide the search by translating it into code. And if you're an artist, and you are interesting getting into Rust, I really recommend that you stop considering and just do it. First of all, the ecosystem is great, the package management is heavenly, and if you just want to get started ASAP, learning about ownership is all you really need, which is literally like reading the first four chapters of the Rust Book. It's very rare I find myself in the need of advanced features when doing this kind of art tools. I also can't recommend enough getting Rust Analyser, it will show you types, tips, it is absolutely amazing. When I prototype, I often write very messy code, and it's a breath of fresh air to have the language guarding me against stupid mistakes, and Clippy shout at me for making a variable and never using it! Having that confidence that if my code compiles, it works, that really frees my brain to focus exclusively on the algorithm design and logic flow, and I don't envision myself prototyping in any other language now. Before we wrap up, a quick thought out to Thomas and Maik for dealing with my Rust programming on at that daily basis. A lot of things I learned about Rust, I learned from them, and I would like to thank Embark giving me time to discuss this report. I would like to thank you for listening. I hope it was useful, and you learn something new, and, if you have any questions, write a comment if you're watching it off line, or post a message in the chat if you're watching it live. Thank you. Have an awesome remainder of RustFest. -PILAR: Thank you so much, Anastasia. Wow. That was incredible. A lot of people were commenting on how natural everything looked, and were making some really, really great questions, so, if you have any questions for Anastasia, she's over at the chat right now. Please go. It was everything, wasn't it? It was, you know, Bob Ross Whole some, and so, I'm blown away. It was so cool to see things I absolutely love mesh so well and blend so beautifully. And Anastasia also asked please do a shout out to the Rayon crate. She did a version where she did do this on the recording, but, yes, just wanted to let you all know. Helped a lot in the project. That's it for this talk. You can stay tuned for our next great speaker. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/06-Christian-Poveda.md b/2020-global/talks/02_UTC/06-Christian-Poveda.md new file mode 100644 index 0000000..fa16b0e --- /dev/null +++ b/2020-global/talks/02_UTC/06-Christian-Poveda.md @@ -0,0 +1,466 @@ +**Miri, Undefined Behaviour and Foreign Functions** + +**Bard:** +Miri ist Rust's interpreter +And Christian will gladly debate'er +On how to bequeath +her the stuff underneath +so she can run until much later + + +**Christian:** +Thank you. +This talk is called Miri, undefined behaviour and foreign functions. +So let me me introduce myself. +I'm Christian Poveda. +I'm Colombian. +I'm a PhD student. + +Occasionally, I contribute to the Rust compiler project. +I don't work full-time, I just do it when I have free time. + +So, first of all, I want to say what I want to give this talk, what I think is important somehow. +The first thing is that unsafe is a controversial topic in our community, +but, at the same time, it's something that we need something super special that Rust needs to work and be able to do the awesome stuff that it already does. +So, basically, every program you have is unsafe in one way or the other, even if you don't know it. +It is important to have awareness of the implications of what happens when you use misuse unsafe correctly, or if someone else does. + +I also want to show you a super cool that can help you write better code, +because it the empower philosophy of the Rust community has of being reliability software, and at the same time, having a super helpful community with tools that helps you to build that. + +This talk will have four parts, basically. +First, I'm going to show you a bit what is safe, and what is unsafe, +what is undefined behaviour and how everything works in Rust, +and then we're going to talk about Miri, which is a super cool tool I'm talking about. +Then I'm going to talk bit about functions. +If you like this, if you think this is interesting for you, I can give you some ideas at the end on how can you help contribute in all of this. + +So, let's begin by talking about unsafe Rust and undefined behaviour. +Before even talking about undefined behaviour, I think it's super important to know or to discuss why people use unsafe Rust in the first place. + +There are two main reasons. +The first one is that some people use unsafe because they're vested in performance, +they want their programs to run super fast, +so they are ensuring everyone of their programs is running correctly, +even if those programs don't have any check to be sure that they're running correctly, and not doing a lot of checks lets you squeeze a little bit of performance when you're writing your programs. +And there is a lot of controversy around this one. +People say yes, performance matters, but safety's first. +There are a lot of trade-offs you can do there. + +But the second reason is a little less controversial in the sense that many projects we have in Rust, +you need to interact with other languages, or with your operating system, or with a bunch of resources that aren't Brian themselves in Rust, +so most likely you will have to interact with a C, or C++ library, or create that interacts with a C++ library, +and that doesn't have the same guarantees that it has about safety, and having sound programs and so on. +All those functions that interact with C libraries are unsafe too. + +So, now we can discuss what unsafe can do. +Inside unsafe functions, or unsafe blocks, when you have the two, there's not much you can do, actually. +You can do only five things, not any more. +You can de-reference raw pointers, you can call functions that are marked as unsafe, +so if you have a function that is called unsafe and in general the name of the function, you need unsafe to call it. +You have to do an unsafe block or function. + +There are some traits that are marked as unsafe too. +If you want to implement those traits like `Send` from the standard library, you have to use unsafe. +If you want to mutate the statics because you're sure that the program needs some sort of mutable global state, even though some people don't like it, you can use unsafe to do that. +You can use unsafe to access fields of unions. +Unions are like enumerations, but they don't have the consistent back to distinguish each variant, +so you can literally join two types in a single one, and use every value of type as any of the possible variance at the same time, so you need unsafe to access those fields. + +However, for the purposes of this, we are going to focus on the first two, because those are like the more likely - one of the more demon, likely we've been exposed to this at one point. +And, the first one is the de-referencing raw pointers is worth discussing at the moment. +What are raw pointers? +Many of you, if you have already used Rust, you know that we have references, we have ampersand mute and sand mutable references, and these are like two brothers or sisters, siblings, whatever. +They are called raw pointers. +We have `*const` and `*mut`, and they exist because they don't follow the same rules as references. +They don't have this liveness constraints. + +For example, you have some data, and you create a pointer to it, and you drop the data, it goes out over the scope, it's deleted. +You can have the raw pointer to it even though it is pointing to something that doesn't even exist any more. +So for these reasons, there is something else, and you can also offset those pointers using integers, +and, if you have a pointer to a particular memory area, you can add it like an integer, and you can offset it so you can read a part of the memory, and maybe you're not supposed to. +For those two reasons, those pointers might have a lot of problems and might misbehave in several reasons. +You can have null pointers that don't point to anything, really. +They can be dangling. + +There are pointers that, let's say, are pointing to something that doesn't belong to us, +so if you're inside a vector, and you saw a pointer from inside the vector to access something outside the vector, that is a dangling pointer. +Also, if you have a pointer that you offset it a bit but you didn't do it correctly, i +so for example you have a pointer between, I don't know, you use 64s, you use 16 bits instead of 64 bits, you will end up reading, like, in between values - that's an unaligned pointer. + +So, those are real pointers. +You can do a lot of messy stuff with them. +We're not sure why that is wrong really, right now. +We will go into that later. + +But let me show you an example of how to use these raw pointers, and how to use unsafe, and so on. +Here in my terminal, I have this tiny crate. +It has a single struct called `ByteArray` which has a mutable pointer to `u8`. +You can think of this type like a slide, or if you want, like a vector, but we are we only have two simple functions. +We can only read stuff from it. +We cannot grow it or make it smaller. +We can just read stuff from it. + +Usually what happens is the system, like you have these two functions, you have like the unsafe unchecked version of a function, and then you have the you have the safe version of it. +Here, we have the unsafe function called `get_unchecked`. +It receives an index, it takes this pointer, casts it to a `usize`, and then adds the index to it and casts that integer back to a pointer, and offsets a pointer by adding index to it and then we reference it. +Actually, all of this code, all of these three lines are not required to be done inside an unsafe function. +The only thing that is unsafe is reading from the pointer, calling the reference star operator. +So you can use raw pointers however you want, but the reference then, you have to use unsafe. +Then we have like the safe counterpoint of this function, so we guarantee that, +if the index you're reading is out of bounds from the length of this array, then we would return none, and if we are sure that we are in bounds, then we return "some", and then do a `get_unchecked` function. + +When you run this, for example, let's say this is a crate in the Rust ecosystem, using crates.io. +They might just do something like this. +They just import our library, by type, colour function that I didn't show, but it's called zeroes. +They might need to use unsafe, because they need to go super fast with this thing. +They will just use `get_unchecked`, and, if we run this, it returns zero. +It works as intended. +Some did you might be asking if you do this, you call this function with a ... +index. +We will get to that later. +Yes, that's the demo. + +And the big question now is, well, actually, what can go wrong when you use unsafe? +You might have answers if you're using it wrong, you're causing undefined behaviour, or undefined behaviour is super bad. +Anything can happen when you deliver undefined behaviour. + +Let's discuss a little bit undefined behaviour. +Let's say the Rust compiler was written under the same assumption how programs work, +about the programs we write, we write programs that need to meet certain conditions so the compiler can actually compile them into what we want. +If we break any of these rules, we say we are calling undefined behaviour. + +As Stefan said, this is like a way of saying if there is something that is not specified in a clear way, +if the compiler is trusting that to happen and you're breaking that rule, then you're causing undefined behaviour. +There is something super important in that undefined behaviour is different in each language. +C has a lot of rules for undefined behaviour, and those rules are not the same. + +For example, whatever Stefan told you about adding an integer, and going out of bounds and adding too much to your integer +because it can feed a number too big, that's not undefined behaviour in Rust, but that's undefined behaviour in C. +Because both compilers were built with different guarantees in mind. + +Actually, the list of things that was considered important rules when we are dealing with undefined behaviour is a little bit tricky, so I'm just going to mention some of them. +Your program might have undefined behaviour if you're referencing pointer that is dangling unaligned. +Also, if you try to produce a value that is incorrect for their type, so, for example, Booleans, when you look at the actual memory, let's say, Booleans are represented by bytes. +They take one byte exactly, so you have a one or a zero, but a byte has, like, eight bits, so you have a lot of values that you could use. +So, one is true, zero is false, but if you take a three, and you try put that an into a Boolean, doing that is undefined behaviour because three is not specified as a Boolean. +The Boolean should not know what to do if it sees a three, on one, or a zero. +Causing that is also undefined behaviour, and there are lots of rules that need to be taken into account here. + +So what happens if you break these rules? Basically, Rust cannot work correctly. +We lose this guarantee that Rust has that of producing programs that do what we want them to do. +Rust can no longer compile that program correctly, so what this means is that, in the best case, your program might not run, maybe it pros receives them into a folder, memory out of bounds error, or something like that. +In that case, it might run, but not as you intended to, so that program might do anything. +For that reason, it's pretty common to see this kind of psychedelic image with unicorns, and a lot of colourful stuff when people discuss undefined behaviour because when we deal with undefined behaviour, we lose track of what our program is doing in the most basic level. +We don't even know any more. +So there is good gnaws for us in the Rust community. + +If we are using safe Rust, if we promise never, ever, ever to use unsafe, we don't have to worry about undefined behaviours because undefined behaviours should not be happening inside Rust. +If you are super sure you're not causing undefined behaviour and you get performance benefits, or you can interact with C libraries correctly, and you've got undefined behaviour, that is also good. +There are also not such good news, and that is the super important part of our ecosystem. +If we're not causing undesirable behaviour ourselves, someone else in our dependencies might be doing. + +Mere, I have interesting statistics about this. +24% of all the crates that had in crates.io uses them safe directly. +And - of those 20% crates, all those crates, 74% of them do unsafe calls to functions that are in the same crate, so our crates using unsafe to Saul function that in the standard library, or, in other crates? +If you want to get more information about this matrix, you can Google or use your favourite web-search engine to look for this paper about how do programmers use unsafe Rust? +My point is that unsafe is everywhere, not because people aren't good at doing their job, because we actually need it. +It's everywhere. + +I also have good news. +There is a tool that you can use to detect undefined behaviour in our programs, called Miri. +If you want to take a look at the Miri repository now or later, this is the URL. +You can find all the coding there. + +So, what is Miri? It is a virtual machine for Rust programs. +Miri doesn't compile your program, it interprets it in the same sense that the JVM interprets the other code, or byte code, or the Python interpreter runs Python, or the ... +Miri is like that but for Rust. +It has a super cool feature that none of the other interpreters has, and it is that it can detect almost cases of undefined behaviour while running code. +What is interesting is that am so. + +Code used in Miri is used in the engine that does compile-time function evaluation, so, if you have any c assistant in your programs, you have a const function, part of the code is used to run. +It is used to evaluate that scant. + +Yes, ... +but here we are talking as Miri is just a standalone tool outside the compiler that can interpret your programs. +So how to use Miri? You need the version of the fire to do this. +You have to install the nightly toolchain. +You can do this by running the `rustup toolchain install nightly`. +You can install the component. +You just have to do rustup - and then, after Miri installs, it takes a while compiling but you can run binaries, you can run your whole program if you want with Miri, or you can run just your test suite if you have a test. + +Let's do a demo with the same code I was showing you before. +Again, we have these super tiny program using an external crate, let's say. +And maybe the person that is writing this program doesn't know about the garden at this time that that crate has to be sure that these functions don't cause undefined behaviour. +You might be attempted to do something like can I read the 11th precision of an array with ten limits? Who is stopping me? The compiler is not complaining. +It works. +It actually returns zero. +That is a perfectly good value because it returns the same as before. + +If you run this with Miri, you will find this super cool error that says undefined behaviour. +Pointer to allocation was de-referenced after this allocation got freed. +It points to the part of the code that causes this undefined behaviour and is appointed a reference. +You can see more information and so on. + +What we are looking for here is what it happening in the execution of Miri is that this function is creating a pointer that is dangling. +You created a pointer that is outside the actual range of the vectors, so, when the vector gets deleted because it is deleted after everyone has used it, you still have this pointer pointing to nothing. +But, for example, if we go back to the perfect case where we didn't have any undefined behaviour, we can just do cargo Miri run, and Miri won't complain and return the same as your regular program. +So that is how we can use Miri, use it to detect undefined behaviour. + +But, now I want to show you, I want to, because it's a little bit how Miri works, actually. +To talk about how Miri works, we have to dig into how the Rust compiler works. +So this is like a super high-level overview of the Rust compilation pipeline. +This is like the lists that a program follows when it is getting compiled. + +So we always start a source code with our .rs file and end up in machine code, or in a binary, or dynamic library, something like that. +And what happens in the middle are like four stages. + +The first one is parsing. +So Rust reads the text of your source code, let's say, and parses it to produce an abstract syntax three or AST. +Then this AST is transformed and produces a high-level intermediate representation, or HIR. +In this stage, it is where the process happens, the typing happens, so a lot of types are here at this stage. + +Then the HIR is lowered to another representation into the MIR presentation, but this is a mid-level representation, MIR. +This is where the borrow-checking happens. +And after that, we start interacting with LLVM, that is for compilers that Rust uses, and the LLVM project has their own intermediate representation, so we lowered MIR to the. +And finally, LLVM does the code generation to produce your binary file, or your library, and so on. + +Miri works almost in this way. +The only difference is the code generation stages don't run so, we don't get to talk with LLVM. +What happens is that Miri lets the compiler run until you have the review program, and we interprets that. +When it has byte code in the JVM, Rust has MIR when running Miri. +That's why Miri is called Miri, because it's an MIR interpreter. + +Here is something super important, that Miri cannot interpret programs that aren't Rust programs. +So you have a C library that you run with your Rust program. +We can't interpret that in any way. +That program doesn't have the same syntax, the compiler doesn't even understand that program. +You can't interpret that. + +And there are many limitations, actually. +There are some limitation that is Miri has. +Miri is not perfect. +It's not a silver bullet for your undefined behaviour problems. + +Another limitation is that Miri is slow, so, if your test or program is performance-sensitive, it consolidate can take a while to run your program, even if you can do it. +This happens because Miri has to do a lot of runtime checks about your pointers, and how memory is managed to be able to tell you when undefined behaviour is happening. +And the other important point is that Miri can only detect undefined behaviour as it is happening. +If that doesn't happen, Miri won't be of use in this case. +Miri cannot detect data races yet. +And, again, Miri can only interpret Rust programs. +This one is super important. + +You might be wondering why does this matter? And it is because, well, you know, programs don't run in isolation. +We tend to use files, we tend to access files, get resources over the network, interact with databases, we need to go to the primitive of our system, whatever. +And the mechanism that Rust uses to interact with, those are for foreign functions. +That is what this last part is about, foreign functions. + +Some of us might be, let's say, might be think that we don't need foreign functions at all. +Maybe we have never used external functions in our projects. +But I'm not sure - I'm sure everyone, or almost everyone, has interacted with the standard library to do standard operation reading files, whatever, and that means somehow you're using foreign functions. +For example, this is the stack trace when you call `File::open`. +It's on the library for opening files. + +There are six functions here. +The first two are like Rust functions that are in the standard library. +They are platform-independent. +Then we have like four functions that are specific for Unix-like platforms, so those only run on Linux, MacOS. +And then we have this `open64` function at the end. +The only part about this `open64` function - the `open64` function it's a Rust function - it's an Linux system used to open a file. +So this is a foreign function written in C. +And it is an unsafe function, and Miri cannot interpret it, so what happens if in any of this process would be we have undefined behaviour? +Can we run that? +It can interpret the `open64` function. + +The good news is that Miri can actually run your program in a particularly interesting way. +And, yes, Miri cannot interpret your foreign function, but it can intercept this call, so, when you're running your program, +if you call `open64,` meaning it will be someone calling `open64,` that's a foreign function that I don't know, that's not a Rust function, and then contributors can write whatever code they want to emulate that function. +We call the code that emulates a function an shim. +And if an shim needs to interact with the operating system, or with any of the resources that the standard library provides, we use the standard library for that. +So it is funny, because the standard library uses foreign functions, but Miri uses the standard library to emulate some of those foreign functions. + +Let me show you. +We are instilling our example with a `ByteArray` crate. +We have a user that is concise with an index it wants to use from a config file. +It uses file open, so eventually, it will use `open64`. +And we are doing the same. +We're just printing something in unsafe. +If you try to run this with Miri, we will get an error, but it's not because we are causing undefined behaviour. +We have this `open`. +It's not available when isolation is enabled. +This is the `open64` function I was talking about. +Open is not available. +Please pass the flag to disable this isolation. +So if we to that, and set the Miri flags to Miri disable isolation, we can actually run. + +In this case, it seems the config file is causing undefined behaviour. +It says memory access pointer must be in bounds at offset 11, but it is outside the pounds of the allocation which has size 10. +It seems like someone is reading the 11th procedure with ten 11ths. +Yes, it is really in the 10 position for the 11th if you want to think in zero, in zero-based indexes. +And that's the whole problem. + +So, yes, we can use Miri to detect undefined behaviour, even in programs that use foreign functions. +That's super cool. +And, actually, the handling files can do a lot of stuff. +You can minute directories, delete files, create symbolic links, you can spawn threads, using locks and atomics, you can use it to get the current time, so run clocks inside your program, +your Rust program running Miri, you can handle environmental variables, and each of those operations is possible someone decided to write an shim for that specific foreign function. + +And this has a super cool side effect, well, not so side effect. +Some people would target to get this working. +That is the std library works across many platforms. +You can use phenotype opening beneath your one to zero platform. +So this means that you can emulate foreign functions, even if you are not in the platform, the program is going to be compiled on, +so for example if you have a program that is supposed to run in Windows with you you don't have a Windows machine, you can use Miri to interpret that program as if it were a Windows program. + +Let me show you. +So here we have another user of our library. +This time, it is using environment variables to set the size and index it wants to read. +Miri can emulate an environment inside it, so we can do - we can use the size environment variable to set the size of the array. +We set the index to 1 bus we want to run that, and we disable the isolation. +And I'm using a Linux machine. +I'm going to run it for a target that is windows. +I don't have Windows installed here. +And it works. + +If I want to run it in anything else, I can do it. +I'm running it on my regular Linux target, and it is working. +This is super cool, because maybe you can use Miri in one situation when you're not sure these codes that you wrote specifically for Windows is working correctly. +Even if you're not using unsafe, you just want to be sure your program runs as intended. + +And, yes, basically, that's like the hard content of my talk, and I want to spend the last few minutes talking about contributing to Miri. +If anything of this caught your attention, I encourage you to contribute to Miri for many reasons. +My personal reasons is that I always wanted to work in compilers because I find them super interesting. +I really like Rust and I didn't know where to start. +So I found Miri. + +I say, like, okay, I could implement maybe like some foreign functions for opening files, whatever, it sounds not too hard. +It took a while while I understood tomorrow Miri-specific stuff but it helped me a lot to understand how the compiler works and get involved in other stuff that I wouldn't be able to do otherwise. +Even then, if you don't feel comfortable yet contributing, because, I don't know, you can help this project by just using it. +Maybe you want to use it because you actually write in safe, and you want to be sure you're not causing undefined behaviour. +Some of your dependencies use unsafe and you want to be sure that they don't cause any undesired behaviour. +So there is that. + +You can say to yourself what a lot others, many, many heads, debugging, and how the different behaviour works. +Maybe you're expecting that Miri catches something, and it doesn't, or maybe it is the other way round. +You think this program is correct, and complaining. +You can open up - you can contact the contributors to discuss it, or from the Unsafe Working Group also. +There is something super important, and this is not like an obligation you have with the community. +If your program is running really slow in Miri, that's fine. +You don't have to give anyone coverage for this. + +But if you're super interested in contributing to Miri, writing this is a super easy way to start. +Or, if you want to try it yourself, it is super cool, because what you have to learn is actually you have to read your platform specification about how would that foreign function work? +And the stuff that you need to learn about Miri is really small. +You need to know how Miri works completely to do that. +I don't know how Miri works. +Like I use some little parts here in there, and I implemented a lot of things because I like them. +Even if you don't need that shim, maybe someone else needs it, and you're not just testing the undefined behaviour, you're helping everyone write better and safe code, because a lot of people use a bunch of things. +If you want some specifics for Windows, many of the chains haven't been implemented yet, and that is fine, because you can cross-interpret like you were in Windows, in Linux, sorry. +For example, if I go back to the program that opens the file, and I try to run it with the Windows target, it will fail, but it will fail because this function createfile W hasn't been implemented yet. +Maybe one of you wants to do it. +There's a bunch of stuff that hasn't been implemented yet. + +That's all, so, thank you for your time. +I hope you found this interesting, and I think we can do some questions now if you want. + + +**Stefan:** +Yes, the question, so, the 11th element in the - said about receiving allocation that was freed, it was out of bounds, so I guess this is the question: +how far can it track stuff, right? E-yes, so this is not - + +**Christian:** +Yes, this is not clear for me, actually. +Sometimes, this program fails because, when Miri interprets it, it frees the memory for the array before you read the pointer. +So it complain about memory being freed, and sometimes the pointer, the array is not deleted yet, so it hasn't been dropped +So it complains about it being out-of-bounds access, even though the arrays are still there. +So the good news is that any of those are on undefined behaviour, +but Miri tries to be deterministic as much as it can, but, when you disable it in isolation. +For example, it's really hard to be deterministic, because you change your file, that might change how everything worked internally. + +**Stefan:** +So there is a second question: +when Miri's engine is used to execute comms code during calculation, does it run in fast mode with less validation, and how do I assess the difference? + +**Christian:** +I don't know a lot of ... +Miri runs without a lot of of the validations. +It runs when you're running a standalone. +In the current version, it's faster than what I showed you, but it's because they had no do less checks. +Let's say the engine is the same, the same engine but in a different car, let's say. + +**Stefan:** +We don't have dynamic evaluations in const eval. + +**Christian:** +There is a flagging ... +in Rust, that something unleashed. +You can run like, let's say, undisrupted constant evaluation, and most of the time, it breaks the compiler, but, yes, you can actually run whatever you want. +Using Miri inside the compiler. +But that is super experimental. + +**Stefan:** +So long-term, one could have a VM, like a full-functioning VM in Miri? + +**Christian:** +In principle, yes, but there are a lot of questions, like, +for example, you read a file, and you use the file to, I don't know, create some const or define a type that is generic but that makes your compilation unsound because every time you read the file, +it might change, or using random-number generators. + +**Stefan:** +Maybe I can introduce my own question here: +do you think in a very distant future, it will be possible to have Miri included in a binary to have Rust as a scripting language inside your Rust program? + +**Christian:** +Oh, would you. I have no idea! +I remember I read someone was writing an interpreter for - so you can use it like was a rebel. +I don't know what happened with that project. + +**Stefan:** +Was this the Python-like thing? + +**Christian**: +No, it was a little bit different because it didn't run Rust but Miri, you had to write the MIR of your program together with the Rust code. + +**Stefan:** +Okay. Interesting. Another question from the audience: +Would it be possible to do this kind of analysis general LLVM IR? + +**Christian:** +I'm tempted to say yes, yes, you could. +The thing is that you don't have a lot of the type of information you have when you're interpreting the MIR. +In the MIR, you have a lifetime for every single value. +I don't know if you can do that in LLVM IR. +In principle, yes, you can build, for example, a stack model for VMIR, but the inference is you can build it. + +**Stefan:** +You would have to add a lot of metadata because new types maybe conscious in ... + +**Christian:** +Yes, it's harder, but I believe it's possible to do that. + +**Stefan:** +Is there anything you would like to show off, like a final use case, or an idea, hey, if someone is bored, maybe give a shot at this project? + +**Christian:** +Yes, actually, let me ... +let me open a new Firefox window here. +If you're bored and you want to do something inside Miri, we have a lot of issues here. +But, we have this label. +There are a lot of tiny - the shims label, there are tiny problems here. +For example, Miri doesn't support custom allocators, and in the last version, now the pointer allows for a customer locator, +so it is super important now to have a way to use a customer locator for Miri to test with different allocators, for example. +If you're board, you can wrap any of those issues. + +**Stefan:** +Cool. +I'm looking forward to a stable box with customer allocatable support. +That will be very interesting. +Wonderful. +I think we have reached the end. +I don't see any more questions. +It was very well received, and great talk. +Thank you again. +Ferrous thanks you as well. + +**Christian:** +Thanks so much. diff --git a/2020-global/talks/02_UTC/06-Christian-Poveda.txt b/2020-global/talks/02_UTC/06-Christian-Poveda.txt deleted file mode 100644 index a18c4a6..0000000 --- a/2020-global/talks/02_UTC/06-Christian-Poveda.txt +++ /dev/null @@ -1,29 +0,0 @@ -Miri, Undefined Behaviour and Foreign Functions - Christian Poveda. -STEFAN: Hello. Welcome to the second-last talk of the Africa-Euro time zone. Joining me now is Christian. He is doing his PhD on refinement types. He is in Spain, in Madrid, originally from Colombia, and I think you will go into more details. Also, I'm very happy to see that we will learn about undefined types - undefined behaviour. For those of you who don't know what undefined behaviour is, it's a very old term coming from from C, and it talks about that the standard of the library does not specify, so one example is if you have annum and you count it up, what happens if you reach the end of the range that number can hold, does it wrap around, or does it do something else? Can the compiler optimise? Stuff like that. That is undefined behaviour. I think the rest is up to you. Don't forget to go to the chat system. On the right side to chat, and then - press 17, and you will see views of the room, so you can ask questions there. -> Thank you. This talk is called Miri, undefined behaviour and foreign functions. So let me me introduce myself. I'm Christian Poveda. I'm Colombian. I'm a PhD student. Occasionally, I contribute to the Rust compiler project. I don't work full-time, I just do it when I have free time. So, first of all, I want to say what I want to give this talk, what I think is important somehow. The first thing is that unsafe is a controversial topic in our community, but, at the same time, it's something that we need something super special that Rust needs to work and be able to do the awesome stuff that it already does. So, basically, every program you have is unsafe in one way or the other, even if you don't know it. It is important to have awareness of the implications of what happens when you use misuse unsafe correctly, or if someone else does. I also want to show you a super cool that can help you write better code, because it the empower philosophy of the Rust community has of being reliability software, and at the same time, having a super helpful community with tools that helps you to build that. This talk will have four parts, basically, first, I'm going to show you a bit what is safe, and what is unsay, what is undefined behaviour and how everything works in Rust, and then we're going to talk about Miri which is a super cool tool I'm talking about. Then I'm going to talk bit about functions. If you like this, if you think this is interesting for you, I can give you some ideas at the end on how can you help contribute in all of this. So, let's begin by talking about unsafe Rust and undefined behaviour. Before even talking about undefined behaviour, I think it's super important to know or to discuss why people use unsafe Rust in the first place. There are two main reasons. The first one is that some people use unsafe because they're vested in performance, they want their programs to run super fast, so they are ensuring everyone of their programs is running correctly, even if those programs don't have any check to be sure that they're running correctly, and not doing a lot of checks lets you squeeze a little bit of performance when you're writing your programs. And there is a lot of controversy around this one. People say yes, performance matters, but safety's first. There are a lot of trade-offs you can do there. But the second reason is a little less controversial in the sense that many projects we have in Rust, you need to interact with other languages, or with your operating system, or with a bunch of resources that aren't Brian themselves in Rust, so most likely you will have to interact with a C, or C++ library, or create that interacts with a C++ library, and that doesn't have the same guarantees that it has about safety, and having sound programs and so on. All those functions that interact with C libraries are unsafe too. So, now we can discuss what unsafe can do. Inside unsafe functions, or unsafe blocks, when you have the two, there's not much you can do, actually. You can do only five things, not any more. You can de-reference raw pointers, you can call functions that are marked as unsafe, so, if you have a function that is called unsafe and in general the name of the function, you need unsafe to call it. You have to do an unsafe block or function. There are some traits that are marked as unsafe too. If you want to implement those traits like send from the standard library, you have to use unsafe. If you want to mutate the statics because you're sure that the program needs some sort of mutable global state, even though some people don't like it, you can use unsafe to do that. You can use unsafe to access fields of unions. Unions are like enumerations, but they don't have the consistent back to distinguish each variant, so you can literally join two types in a single one, and use every value of type as any of the possible variance at the same time, so you need unsafe to access those fields. However, for the purposes of this, we are going to focus on the first two, because those are like the more likely - one of the more demon, likely we've been exposed to this at one point. And, the first one is the de-referencing raw pointers is worth discussing at the moment. What are raw pointers? Many of you, if you have already used Rust, you know that we have references, we have ampersand mute and sand mutable references, and these are like two brothers or sisters, siblings, whatever. They are called raw pointers. We have star const and star mut, and they exist because they don't follow the same rules as references. They don't have this liveness constraints. For example, you have some data, and you create a pointer to it, and you drop the data, it goes out over the scope, it's deleted. You can have the raw pointer to it even though it is pointing to something that doesn't even exist any more. So for these reasons, there is something else, and you can also offset those pointers using integers, and, if you have a pointer to a particular memory area, you can add it like an integer, and you can offset it so you can read a part of the memory, and maybe you're not supposed to. For those two reasons, those pointers might have a lot of problems and might misbehave in several reasons. You can have null pointers that don't point to anything, really. They can be dangling. There are pointers that, let's say, are pointing to something that doesn't belong to us, so, if you're inside a vector, and you saw a pointer from inside the vector to access something outside the sector, that is a - *vector, that is a dangling pointer. Also, if you have a pointer that you offset it a bit but you didn't do it correctly, so, for example, you have a pointer between, I don't know, you use 64s, you use 16 bits instead of 64 bits, you will end up reading, like, in between values - that's an unaligned pointer. So, those are real pointers. You can do a lot of messy stuff with them. We're not sure why that is wrong really, right now. We will go into that later. But let me show you an example of how to use these raw pointers, and how to use unsafe, and so on. Here in my terminal, I have this tiny crate. It has a single struct called ByteArray which has a mutable pointer to u8. You can think of this type like a slide, or if you want, like a vector, but we are we only have two simple functions. We can only read stuff from it. We cannot grow it or make it smaller. We can just read stuff from it. Usually what happens is the system, like you have these two functions, you have like the unsafe unchecked version of a function, and then you have the you have the safe version of it. Here, we have the unsafe function called get_unchecked. It receives an index, it takes this pointer, casts it to a new size, and then adds the index to it and casts that integer back to a pointer, and offsets a pointer by adding index to it and then we reference it. Actually, all of this code, all of these three lines are not required to be done inside an onsite function. The only thing that is unsafe is reading from the pointer, calling the reference star operator. So you can use raw pointers however you want, but the reference then, you have to use unsafe. Then we have like the safe counterpoint of this function, so we guarantee that, if the index you're reading is out of bounds from the length of this array, then we would return none, and if we are sure that we are in bounds, then we return "some", and then do a get_unchecked function. When you run this, for example, let's say this is a crate in the Rust ecosystem, using crates.io. They might just do something like this. They just import our library, by type, colour function that I didn't show, but it's called zeroes. They might need to use unsafe, because they need to go super fast with this thing. They will just use "get_unchecked", and, if we run this, it returns zero. It works as intended. Some did you might be asking if you do this, you call this function with a ... index. We will get to that later. Yes, that's the demo. And the big question now is, well, actually, what can go wrong when you use unsafe? You might have answers if you're using it wrong, you're causing undefined behaviour, or undefined behaviour is super bad. Anything can happen when you deliver undefined behaviour. Let's discuss a little bit undefined behaviour. Let's say the Rust compiler was written under the same assumption how programs work, about the programs we write, we write programs that need to meet certain conditions so the compiler can actually compile them into what we want. If we break any of these rules, we say we are calling undefined behaviour. As Stefan said, this is like a way of saying if there is something that is not specified in a clear way, if the compiler is trusting that to happen and you're breaking that rule, then you're causing undefined behaviour. There is something super important in that undefined behaviour is different in each language. C has a lot of rules for undefined behaviour, and those rules are not the same. For example, whatever Stefan told you about adding an integer, and going out of bounds and adding too much to your integer because it can feed a number too big, that's not undefined behaviour in risk, but that's undefined behaviour in C. Because both compilers were built with different guarantees in mind. Actually, the list of things that was considered important rules when we are dealing with undefined behaviour is a little bit tricky, so I'm just going to mention some of them. Your program might have undefined behaviour if you're the referencing pointer that is dangling unaligned. Also, if you try to produce or produce a value that is incorrect for their type, so, for example, Booleans, when you look at the actual memory, let's say, Booleans are represented by bytes. They take one byte exactly, so you have a one or a zero, but a byte has, like, eight bits, so you have a lot of values that you could use. So, one is true, zero is false, but if you take a three, and you try put that an into a Boolean, doing that is ... behaviour because three is not specified as a Boolean. The Boolean should not know whether a to do if it sees a three, on one, or a zero. Causing that is also undefined behaviour, and there are lots of rules that need to be taken into account here. So what happens if you break these rules? Basically, Rust cannot work correctly. We lose this guarantee that Rust has that of producing programs that do what we want them to do. Rust can no longer compile that program correctly, so what this means is that, in the best case, your program might not run, maybe it pros receives them into a folder, memory out of bounds error, or something like that. In that case, it might run, but not as you intended to, so that program might do anything. For that reason, it's pretty common to see this kind of psychedelic image with unicorns, and a lot of colourful stuff when people discuss undefined behaviour because when we deal with undefined behaviour, we lose track of what our program is doing in the most basic level. We don't even know any more. So there is good gnaws for us in the Rust community. If we are using safe Rust, if we promise never, ever, ever to use unsafe, we don't have to worry about undefined behaviours because undefined behaviours should not be happening inside Rust. If you are super sure you're not causing undefined behaviour and you get performance benefits, or you can interact with C libraries correctly, and you've got undefined behaviour, that is also good. There are also not such good news, and that is the super important part of our ecosystem. If we're not causing undesirable behaviour ourselves, someone else in our dependencies might be doing. Mere, I have interesting statistics about this. 24% of all the crates that had in crates.io uses them safe directly. And - of those 20% crates, all those crates, 74% of them do unsafe calls to functions that are in the same crate, so our crates using unsafe to Saul function that in the standard library, or, in other crates? If you want to get more information about this matrix, you can Google or use your favourite web-search engine to look for this paper about how do programmers use unsafe Rust? My point is that unsafe is everywhere, not because people aren't good at doing their job, because we actually need it. It's everywhere. I also have good news. There is a tool that you can use to detect undefined behaviour in our programs, called Miri. If you want to take a look at the Miri repository now or later, this is the URL. You can find all the coding there. So, what is Miri? It is a virtual machine for Rust programs. Miri doesn't compile your program, it interprets it in the same sense that the JVM interprets the other code, or byte code, or the Python interpreter runs Python, or the ... Miri is like that but for Rust. It has a super cool feature that none of the other interpreters has, and it is that it can detect almost cases of undefined behaviour while running code. What is interesting is that am so. Code used in Miri is used in the engine that does compile time function evaluation, so, if you have any c assistant in your programs, you have a const function, part of the code is used to run. It is used to evaluate that scant. "Yes, ... but here we are talking as Miri is just a standalone tool outside the compiler that can interpret your programs. So how to use Miri? You need the version of the fire to do this. You have to install the nightly tool chain. You can do this by running the Rustup tool chain install nightly. You can install the component. You just have to do rustup - and then, after Miri installs, it takes a while compiling but you can run binaries, you can run your whole program if you want with Miri, or you can run just your test suite if you have a test. Let's do a demo with the same code I was showing you before. Again, we have these super tiny program using an external crate, let's say. And maybe the person that is writing this program doesn't know about the garden at this time that that crate has to be sure that these functions don't cause undefined behaviour. You might be attempted to do something like can I read the 11th precision of an array with ten limits? Who is stopping me? The compiler is not complaining. It works. It actually returns zero. That is a perfectly good good value because it returns the same as before. If you run this with Miri, you will find this super cool error that says undefined behaviour. Pointer to allocation was de-referenced after this allocation got freed. It points to the part of the code that causes this undefined behaviour and is appointed a reference. You can see more information and so on. What we are looking for here is what it happening in the execution of Miri is that this function is creating a pointer that is dangling. You created a pointer that is outside the actual range of the vectors, so, when the vector gets deleted because it is deleted after everyone has used it, you still have this pointer pointing to nothing. But, for example, if we go back to the perfect case where we didn't have any undefined behaviour, we can just do cargo Miri run, and Miri won't complain and return the same as your regular program. So that is how we can use Miri, use it to detect undefined behaviour. But, now I want to show you, I want to, because it's a little bit how Miri works, actually. To talk about how Miri works, we have to dig into how the Rust compiler works. So this is like a super high-level overview of the Rust compilation pipeline. This is like the lists that a program follows when it is getting compiled. So we always start a source code with our .rs file and end up in machine code, or in a binary, or dynamic library, something like that. And what happens in the middle are like four stages. The first one is parsing. So Rust reads the text of your source code, let's say, and parses it to produce an abstract syntax three or AST. Then this AST is transformed and produces a high-level intermediate representation, or HIR. In this stage, it is where the process happens, the typing happens, so a lot of types are here at this stage. Then the HIR is lowered to another representation into the MIR presentation, but this is a mid-level representation, MIR. This is where the borrow-checking happens. And after that, we start interacting with LLVM, that is for compilers that Rust uses, and the LLVM project has their own intermediate representation, so we lowered MIR to the - and finally, LLVM does the code generation to produce your binary file, or your library, and so on. Miri works almost in this way. The only difference is the code generation stages don't run so, we don't get to talk with LLVM. What happens is that Miri lets the compiler run until you have the review program, and we interprets that. When it has byte coat in the JVM, Rust has MIR when running Miri. That's why Miri is called Miri, because it's an MIR interpreter. Here is something super important, that Miri cannot interpret programs that aren't Rust programs. So you have a C library that you run with your Rust program. We can't interpret that in any way. That program doesn't have the same syntax, the compiler doesn't even understand that program. You can interpret that. And there are many limitations, actually. There are some limitation that is Miri has. Miri is not perfect. It's not a silver bullet for your undefined behaviour problems. Another limitation is that Miri is slow, so, if your test or program is performance-sensitive, it consolidate can take a while to run your program, even if you can do it. This happens because Miri has to do a lot of runtime checks about your pointers, and how memory is managed to be able to tell you when undefined behaviour is happening. And the other important point is that Miri can only detect undefined behaviour as it is happening. If that doesn't happen, Miri won't be you-of-use in this case. Miri cannot detect data races yet. And, again, Miri can only interpret Rust programs. This one is super important. You might be wondering why does this matter? And it is because, well, you know, programs don't run in isolation. We tend to use files, we tend to access files, get resources over the network, interact with databases, we need to go to the primitive of our system, whatever. And the mechanism that Rust uses to interact with, those are for foreign functions. That is what this last part is about, foreign functions. Some of us might be, let's say, might be think that we don't need foreign functions at all. Maybe we have never used external functions in our projects. But I'm not sure - I'm sure everyone, or almost everyone, has interacted with the standard library to do standard operation reading files, whatever, and that means somehow you're using foreign functions. For example, this is the stack trace when you call file::open. It's on the library for opening files. There are six functions here. The first two are like Rust functions that are in the standard library. They are platform-independent. Then we have like four functions that are specific for Unix-like platforms, so those only run on Linux, MacOS. And then we have this open 64 function at the end. The only part about this open '64 function - the open64 function it's a Rust function - it's an Linux system used to open a file. So this is a foreign function written in C. And it is an unsafe function, and Miri cannot interpret it, so what happens if in any of this process would be we have undefined behaviour? Can we run that? It can interpret the 64 function. The good news is that Miri can actually run your program in a particularly interesting way. And, yes, Miri cannot interpret your foreign function, but it can intercept this call, so, when you're running your program, if you call open64, meaning it will be someone calling open64, that's a foreign function that I don't know, that's not a Rust function, and then contributors can write whatever code they want to emulate that function. We call the code that emulates a function an shim. And if an shim needs to interact with the operating system, or with any of the resources that the standard library provides, we use the standard library for that. So it is funny, because the standard library uses foreign functions, but Miri uses the standard library to emulate some of those foreign functions. Let me show you. We are instilling our example with a ByteArray crate. We have a user that is concise with an index it wants to use from a config file. It uses file open, so eventually, it will use open64. And we are doing the same. We're just printing something in unsafe. If you try to run this with Miri, we will get an error, but it's not because we are causing undefined behaviour. We have this open. It's not available when isolation is enabled. This is the open 64 function I was talking about. Open is not available. Please pass the flag to disable this isolation. So if we to that, and set the Miri flags to Miri disable installation, we can actually run. In this case, it seems the config file is causing undefined behaviour. It says memory access pointer must be in bounds at offset 11, but it is outside the pounds of the allocation which has size 10. It seems like someone is reading the 11th procedure with ten 11ths. Yes, it is really in the 10 position for the 11th if you want to think in zero, in zero-based indexes. And that's the whole problem. So, yes, we can use Miri to detect undefined behaviour, even in programs that use foreign functions. That's super cool. And, actually, the handling files can do a lot of stuff. You can minute directories, delete files, create symbolic links, you can spawn threads, using locks and atomics, you can use it to get the current time, so run clocks inside your program, your Rust program running Miri, you can handle environmental variables, and each of those operations is possible someone decided to write an shim for that specific foreign function. And this has a super cool side effect, well, not so side effect. Some people would target to get this working. That is the std library works across many platforms. You can use phenotype opening beneath your one to zero platform. So this means that you can emulate foreign functions, even if you are not in the platform, the program is going to be compiled on, so, for example, if you have a program that is supposed to run in Windows with you you don't have a Windows machine, you can use Miri to interpret that program as if it were a Windows program. Let me show you. So here we have another user of our library. This time, it is using environment variables to set the size and index it wants to read. Miri can emulate an environment inside it, so we can do - we can use the size environment variable to set the size of the array. We set the index to 1 bus we want to run that, and we disable the isolation. And I'm using a Linux machine. I'm going to run it for a target that is windows. I don't have Windows installed here. And it works. If I want to run it in anything else, I can do it. I'm running it on my regular Linux target, and it is working. This is super cool, because maybe you can use Miri in one situation when you're not sure these codes that you wrote specifically for Windows is working correctly. Even if you're not using unsafe, you just want to be sure your program runs as intended. And, yes, basically, that's like the hard content of my talk, and I want to spend the last few minutes talking about contributing to Miri. If anything of this caught your attention, I encourage you to contribute to Miri for many reasons. My personal reasons is that I always wanted to work in compilers because I find them super interesting. I really like Rust and I didn't know where to start. So I found Miri. I say, like, okay, I coup implement maybe like some foreign functions for opening files, whatever, it sounds not too hard. It took a while while I understood tomorrow Miri-specific stuff but it helped me a lot to understand how the compiler works and get involved in other stuff that I wouldn't be able to do otherwise. Even then, if you don't feel comfortable yet contributing, because, I don't know, you can help this project by just using it. Maybe you want to use it because you actually write in safe, and you want to be sure you're not causing undefined behaviour. Some of your dependencies use unsafe and you want to be sure that they don't cause any undesired behaviour. So there is that. You can say to yourself what a lot others, many, many heads, debugging, and how the different behaviour works. Maybe you're expecting that Miri catches something, and it doesn't, or maybe it is the other way round. You think this program is correct, and complaining. You can open up - you can contact the contributors to discuss it, or from the Unsafe Working Group also. There is something super important, and this is not like an obligation you have with the community. If your program is running really slow in Miri, that's fine. You don't have to give anyone coverage for this. But if you're super interested in contributing to Miri, writing this is a super easy way to start, or - or, if you want to try it yourself, it is super cool, because what you have to learn is actually you have to read your platform specification about how would that foreign function work? And the stuff that you need to learn about Miri is really small. You need to know how Miri works completely to do that. I don't know how Miri works. Like I use some little parts here in there, and I implemented a lot of things because I like them. Even if you don't need that shim, maybe someone else needs it, and you're not just testing the undefined behaviour, you're helping everyone write better and safe code, because a lot of people use a bunch of things. If you want some specifics for Windows, many of the chains haven't been implemented yet, and that is fine, because you can cross-interpret like you were in Windows, in Linux, sorry. For example, if I go back to the program that opens the file, and I try to run it with the Windows target, it will fail, but it will fail because this function createfile W hasn't been implemented yet. Maybe one of you wants to do it. There's a bunch of stuff that hasn't been implemented yet. That's all, so, thank you for your time. I hope you found this interesting, and I think we can do some questions now if you want. -STEFAN: We have some questions. I will quickly optimise the audio. So, the first one - oh, no, before we forget that. We have an intro for you. I'm sorry, I forgot to share it. Here is your introduction. -BARD: Miri is Rust's interpreter, and Christian will gladly debate her, on how to bequeath the stuff underneath, so sheen run until much later. -> That's really cool. -STEFAN: Glad you like it. Where were we? Yes, the question, so, the 11th element in the - said about receiving allocation that was freed, it was out of bounds, so I guess this is the question: how far can it track stuff, right? E-yes, so this is not - -> Yes, this is not clear for me, actually. Sometimes, this program fails because, when Miri interprets it, it frees the memory for the array before you read the pointer, so it complain about memory being freed, and sometimes the pointer, the array is not deleted yet, so it hasn't been dropped, so it complains about it being out-of-bounds access, even though the arrays are still there. So the good news is that any of those are on undefined behaviour, but Miri tries to be deterministic as much as it can, but, when you disable it in isolation, for example, it's really hard to be deterministic, because you change your file, that might change how everything worked internally. -STEFAN: So there is a second question: when Miri's engine is used to execute comms code during calculation, does it run in fast mode with less validation, and how do I access - *assess the difference. -> I don't know a lot of ... Miri runs without a lot of of the validations. It runs when you're running a standalone. In the current version, it's faster than what I showed you, but it's because they had no do less checks. Let's say the engine is the same, the same engine but in a different car, let's say. -STEFAN: We don't have dynamic evaluations in const eval. -> There is a flagging ... in Rust, that something unleashed. You can run like, let's say, undisrupted constant evaluation, and most of the time, it breaks the compiler, but, yes, you can actually run whatever you want. Using Miri inside the compiler. But that is super experimental. -STEFAN: So long-term, one could have a VM, like a full-functioning VM in Miri? -> In principle, yes, but there are a lot of questions, like, for example, you read a file, and you use the file to, I don't know, create some const or define a type that is generic but that makes your compilation unsound because every time you read the file, it might change, or using random-number generators. -STEFAN: Maybe I can introduce my own question here: do you think in a very distant future, it will be possible to have Miri included in a binary to have Rust as a scripting language inside your Rust program? -> Oh, would you. I have no idea! I remember I read someone was writing an interpreter for - so you can use it like was a rebel. I don't know what happened with that project. -STEFAN: Was this the Python-like thing? -> No, it was a little bit different because it didn't run Rust but Miri, you had to write the MIR of your program together with the Rust code. -> Okay. Interesting. Another question from the audience: would it be possible to do this kind of analysis general LLVM IR? -> I'm tempted to say yes, yes, you could. The thing is that you don't have a lot of the type of information you have when you're interpreting the MIR. In the MIR, you have a lifetime for every single value. I don't know if you can do that in LLVM IR. In principle, yes, you can build, for example, a stack model for VMIR, but the inference is you can build it. -STEFAN: You would have to add a lot of metadata because new types maybe conscious in ... -> Yes, it's harder, but I believe it's possible to do that. -STEFAN: Is there anything you would like to show off, like a final use case, or an idea, hey, if someone is bored, maybe give a shot at this project? -> Yes, actually, let me ... let me open a new Firefox window here. If you're bored and you want to do something inside Miri, we have a lot of issues here. But, we have this label. There are a lot of tiny - the shims label, there are tiny problems here. For example, Miri doesn't support custom allocators, and in the last version, now the pointer allows for a customer locator, so it is super important now to have a way to use a customer locator for Miri to test with different allocators, for example. If you're board, you can wrap any of those issues. -STEFAN: Cool. I'm looking forward to a stable box with customer allocatable support. That will be very interesting. Wonderful. I think we have reached the end. I don't see any more questions. It was very well received, and great talk. Thank you again. Ferrous thanks you as well. -> Thanks so much. -STEFAN: Will you be in the chat afterwards? -> Yes, I will hang a little bit in the chat. -STEFAN: Wonderful. So, thanks, everyone, for listening, and the final talk will commence in ten-ish minutes. There will be some announcements before and after, so stick around. Also, we have two artists coming up after the last talk. Right, thank you, everybody. Bye. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md new file mode 100644 index 0000000..95e5ecd --- /dev/null +++ b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md @@ -0,0 +1,235 @@ +**RFC: Secret Types in Rust** + +**Bard:** +Daan and Diane get us to the hype +Of keeping secrets in a type +Disallowing creation +of some optimization +that just might tell the feds what you type + + +**Daan:** +Hello, everybody. +I'm here with Diane Hosfelt, and we will be talking about secret types in Rust. +So the main gist of this talk will be that some of you may know that cryptographic engineers tend to write a lot of their code in assembly, +and there is a good reason for that, and I will explain why that is, but, as a cryptographic engineer, or aspiring cryptographic engineer, I have to write it in Rust instead. +Because of some of the compilation quirks in Rust, that's not always a good idea, and what needs to be done to make Rust programming language we can use for cryptographic code. +Both Diane and me are here in the at a conference and in the chat, so feel free to ask any questions at the end of the talk, or put them in the chat during the talk, and we will take care of them. + +**Diane:** +Hi, I'm Diane Hosfelt, and this is Batman. +Before we get started, I have a short disclaimer. +All of this work was done while I was a Mozilla employee and it in no way reflects Apple's views. + +**Daan:** +First, we will talk about timing side channels work, what they are, why are they dangerous, +and then we will talk about how Rust is not suitable to write code that is actually - that actually prevents these channels. +We will look at a couple of hacks that we could use to prevent some of these channels in Rust, +but then we will go more in depth and look at the RSC on secret types to see how we could make Rust for suitable for such code. + +So, first, ... + +**Diane:** +A side channel is any attack based on information gained from the implementation of a crypto system, not a weakness in the system itself. +In this case, we are concerned about timing side chapels which occur when attackers analyse the tame taken to execute a a cryptographic algorithm, which can be seen as an implicit output. +Imagine it takes less time to execute part of the code when a bit is zero than when it does when a bit is one. +That difference is measurable, and it lead to key recovery attacks. +These attacks are a threat in the post-spectre world, primarily used to attack secrets that are long-lived and extremely valuable if compromised, +where each bit compromised provides incremental value and the confidential shalt of compromise is desirable. +The fix is constant time code, or to be more precise, data invariant code, with the time it takes to execute the code doesn't depend on the input. + +**Daan:** +Let me explain to you why at this point it's really hard for us to guarantee that the compiler is constant time. +So this is - this story will be true for basically any programming language that is compiled. +There are some exceptions. +But we are at a Rust conference, so let's focus on Rust. + +So the main problem here is that compilers are in some sense problematic. +They are allowed to optimise thinking they feel does not change the program. +And the behaviour, like, or the runtime of a program, or stuff like that is not considered to change the program in the view of a compiler, so, the compiler might actually optimise stuff that we don't think would be - should be possible. +And so, for example, there is this thing that LVM could do which is eliminate any conditional moves that may load. +Let me show you an example of this. +Okay. + +So what you see here on the left is I have written this nice little CMOV function, so if this conditional value is true, what it should do is that it should move the value in B into A. +And if this conditional value is false, then A should just remain the same value and B should just be dropped by the way. +But the important thing here is that the conditional value is secret. + +We don't want to leak any information about the secret value, so the runtime of this function should be always the same length, the same duration, like depending on the value of this conditional value. +So what we do first is we generate a mask from its conditional value and the value that will come out of this mask will be something like either only once, or if the conditional value is true, sorry, +if the conditional value was false, it will be a mask of only zeros. +And then we will use this mask. + +So the first line here, what this does is, if this mask is only once - so if the conditional was true - then it will x, or the value in A - this will set the value in A to zero. +Then only if this conditional value was true, it will x again with b. +A will become - A will get the value of B. +And then, if this conditional value was false and then this mask would be all zeros, +then both of these end operations will make sure that this value with zero, and this value was zero, and both of these operations would be a no-op, and then A keeps the same value. + +What we see when LLVM compiles this, Rust compiles it, that the compiler is smart. +The compiler sees, this behaviour of this function completely depends on this conditional value, +so first what it does is that it checks if this conditional value is actually zero, so is it false? +And if it is - if it sees that the conditional value is true, it jumps to this instruction, so it skips this complete instruction. +And if this conditional value was false, then it just does instruction and then moves under the instruction, and that's it. + +Basically, in one of the two cases, it's skipped its instruction, and the important thing to see here is that depending on this value, +depending on the value of the conditional, the runtime of the algorithm changes, and so here we have a case where the compiler introduced a side channel which would be a side channel. +The interesting thing is that if we only look at the source code in Rust, it looks like something that, like, it looks like code that feels completely like this could be, or should be implemented in constant time. +We have these operations, and you don't even see them in the compiled code because LLVM is smart enough to see that they're not needed. +And this is actually a pretty big danger for us. +So that is what we mean when we say compilers are problematic. + +**Diane:** +Obviously, we're at RustFest, so we've all bought into Rust, but the question remains if we can do secret invariant programming with assembly, why do we need to do it in Rust at all? +Writing cryptographic in high-level languages like Rust is attractive for numerous reasons. + +First, they're generally more rateable and accessible to developers and reviewers, leading to higher quality, more secure code. +Second, it allows the integration of cryptographic code with the rest of an application without the use of FFI. +Finally, we are motivated to have a reference implementation for algorithms that is portable to architectures that might not be supported by highly optimised assemble implementations. + +**Daan:** +So, then why do we focus on Rust? +Why don't we just, if we can't write secure code, why do we want to use Rust in the first place? + +That is obviously everybody here has their idea of why they would use Rust, and in our case, it's kind of the same. +We want to use Rust. +The reason we want to use Rust is we have all these types, and all these nice checks in the compiler that allow us to make our code that is easier to write secure code. +And we want to utilise these checks and these tools as much as possible because writing just plain assembly is really hard and super error-prone, and then there's the other thing that if we only write assembly, then you've written an assembly for an Intel processor. + +When you want to run the same code on an ARM processor, you have to rewrite the whole code. +We don't want to do that, because it also allows you to make more mistakes, and we want our crypto code to be really secure, so we would like to use a high-level language if that is at all possible. +So it is not all that bad. +So there is some way how Rust can be in a wayside-channel resistant. +And this, like, a couple of these, so, in Rust been make these new-type style references around integers, a struct that only has some integer type and implement some operations on that that are presumably in constant time. + +There are two main examples in the wild. +The first one is the subtle crate which if ever you need to do some stuff in constant time, use this crate. +This is the current state of the art that we have. +This is probably what you should use, and we don't have anything better at the moment, and the other example that I would like to mention is the secret-integers crate which is a bit more academic of nature. +What it does is looks at what if we would replace all of the integer types that is constant time integer type, would that work, +and what the secret-integer crate provides side-channel resistance on the language level, so, on the language level, you're sure that your code looks like something that should be side channel resistant, but it does not actually prevent these compiler optimisations. +The subtle crate does that, and that's why I recommend that crate. + +Both of those crates, they are only a best effort, they're only best effort, and they don't guarantee all of the - they don't fix all of the compiler optimisation issues. +So, it is the language level. +We can also look at like more at the compiler level, what do we need to do in a compiler to actually do it right? It turns out we need to add some kind of optimisation barrier for the secret data. + +Let me go back to the example really quickly. +So it turns out that the problem here is that LLVM seems to be able to completely eliminate this mask variable, so this mask variable is secret, because it directly depends on this conditional value which we said was secret. +And then because LLVM can just analyse through this mask variable, it can do this really nice optimisation of adding a branch, and then just eliminating all these bitwise operations. +We need to add an optimisation barrier to this mask variable. + +And there are a couple of ways that we can add optimisation barriers, and the first example is that we can add an optimisation barrier which adding an empty assembly directive. +We construct an assembly directive which dates this mask value as an input and also takes this mask value as an output. +Then LLVM is not able to reason about what happens inside of an assembly directive. + +We know that nothing happens inside an assembly directive, but LLVM cannot reason about that. +Because it will actually keep this mask value completely intact and it will not be able to optimise through that variable, +and so, the assembly directive doesn't work on stable Rust because, for the assembly derive, you need to have a nightly Rust version to compile, so that is not really optimal. + +And so the other trick that we can use is to do a volatile read of secret data, +and what this does is guarantees that at some point this mask value would have existed on the stack, +and because of that LLVMs are not able to optimise through this read. + +Both tricks kind of work in 90% of the cases. +They do not have like 100% success rate for all our cases. +I won't go into why that is at this moment, but it's important to know that they don't always work. +They're best-effort tricks. + +The most important part is that although these tricks might work at the moment, they are not guarantees, +and the compiler might change in the future, +so perhaps in five years the compiler is actually able to look into this assembly directive and see that nothing happens, +and it might eliminate that assembly directive completely and we don't know, we don't have any guarantee that this kind of stuff won't happen in the future, +so it might be that, in a couple of years, a completely secure version of some software now might actually be insecure with a new compiler version which I find very scary. +So, yes, we like to have guarantees, and we don't want to have just hacks. +So, for the next part, I will give the floor to Diane, and she will be talking how we can use secret types in Rust to make our lives a little bit better. + +**Diane:** +Why aren't these language-level protections good enough? +The compiler and instructions. +it turns out that the general purpose instructions on various platforms take a variable number of cycles, +so for us to truly have secret independent runtimes, we need to eliminate the problematic instructions. +This can only be done at a compiler level. + +Enter RFC, this, 2859. +This defines secret primitive types and the instructions that should be defined for each type. +For all the extra types, we implement all of the normal acceptable operations. +When we know that a value is safe, we can use the declassify version to put it back to a public value. + +For example, a secret key may be an array of secret u8 bytes and keep us secure by disallowing any Rust code that would result in insecure binary code. +For example, we don't allow indexing based on secrets, we don't allow using secret boll and if statements, +and we don't allow division which is a non-constant time algorithm, and we don't allow printing of secret values, +and we say that every time we combine a public value with a secret value, it is also a secret. + +To give you an example of how this would work, here's a mock-up error message of what would happen if we broke one of these rules. +Here, the programmer chose to branch on a secret_bool. +In this case, the compiler should give us an error because that is not allowed. + +There are two parts to this problem: an LLVM part and a Rust part. +There has been some work in that LLVM realm to propose a similar RFC to this one that what we've worked together on at Hacks 2020. +We're not sure what the status of that work is at the moment, but what LLVM needs to do is to make sure that our constant time Rust code is also compiled safely, +so LLVM needs to make sure to guarantee that what we wrote down in the code is safe in the emitted binary, that means no branching on secrets, no branching with secret indices, and no variable time instructions. +At the moment, zeroing memory is out of scope, but when we have this information about public and secret values, then we've laid the groundwork to support that as well. + +Thank you so much for your attention. +If you have any questions, feel free to ask us. +While this is a recorded talk, we are currently present and ready to answer questions. + + +**Pilar:** +All right. +Thank you so much, Daan and Diane. +We're lucky enough to have you both here for a Q&A. +All right, so you've been joined by your friend too! [Laughter]. + +**Diane:** +Batman came back. + +**Pilar:** +Entirely ignored during the day. +We do have a couple of questions from the audience, which is great. +The first one we've got is that this is all very complex. +And how do you discover these kinds of problems, and how do you even begin to think of a solution? +Very broad, but I think it would be great to hear your insight on this. + +**Diane:**: +So there are tools that you can use, verification tools, that can determine if on different inputs, there are different runtimes, so that is one of the ways that you can determine is if a program has non-secret independent runtimes. For part of it. Daan? + +**Daan:** +Yes, the way we discover these kinds of issues is, like, at some point, sometimes, write a piece of assembly, +and the first thing I do before I write it is just program it in C or Rust and see what the compiler tells me to do, +and then these are the moments that I stumble on, these, "Wait, if I would do this, this would not be secure." +And that's when I first discovered this for myself, so, yes. + +**Pilar:** +Cool. You said that you gave us an insight you go with what the compiler says first and then you can discover it. +Be curious about what the compiler tells you, not just like, all right. +Someone has asked if there is a working group working on it on a solution? + +**Diane:** +There isn't a working group. +There is just the RFC, which has been a little bit stale, because, you know, life gets busy. +So if anyone's interested in commenting on the RFC, and trying to help me bring it back to life, you know, that is definitely welcome. + +**Pilar:** +If there is interesting for a working group, then, yes, someone will hop on from the audience. + +**Diane:** +That would be great. +One of the things that needs to happen on the Rust side and on the LLVM side, we are going to have to eventually do some implementation work. +You know, it's not enough just to define what has to happen. We have to implement these instructions on the secret types, so that will actually be a lot of work. + +**Pilar:** +So we have very little time left, but there was a lot of chatter in the chat room, so, I guess people can find you in there, and we can get a few more questions. +There were lots of questions, and we just didn't have enough time, but thank you so much for joining us. +It was great to have you here. +She's asleep. She's melted into a puddle! + +**Diane:** +Say bye to your new friend. + +**Pilar:** +See you both. Thank you for joining us. + +**Diane:** +Thanks so much! diff --git a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt deleted file mode 100644 index 9f47895..0000000 --- a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt +++ /dev/null @@ -1,51 +0,0 @@ -RFC - Secret Types in Rust - Diane Hosfelt and Daan Sprenkels. -PILAR: Welcome back, everyone. I would say that I'm disappointed that it's our last talk of the day, but I can reassure you that it's an amazing one, and I mean, there's a whole lot still coming up, so we will be together for many hours to come, and it's already been a great day, so it's a bit much to ask for more! So, our last talk of the UTC block, our last - it is by Daan Sprenkels, and Diane Hosfelt. I hope I've pronounced those correctly, as someone with a difficult, in the meantime, I'm like ... Daan is a PhD student, and he considers himself an aspiring cryptographic engineer. Diane Hosfelt is a privacy and security researcher in Pittsburgh, has an enormous love for cats and castles, cats in castles, and Rust, and I can think of no two better people to tell us and educate us why our binaries are insecure, and why we need secret types. I let our Bard give us our last limerick of the UTC block. Let's take it away. -BARD: Daan and Diane get to us the hype of keeping secrets in a type, thus allowing creation of some optimisation that just might tell the FEDs what you type. -> Hello, everybody. I'm here with Diane Hosfelt, and we will be talking about secret types in Rust. So the main gist of this talk will be that some of you may know that cryptographic engineers tend to write a lot of their code in assembly, and there is a good reason for that, and I will explain why that is, but, as a cryptographic engineer, or aspiring cryptographic engineer, I have to write it in Rust instead. Because of some of the compilation quirks in Rust, that's not always a good idea, and what needs to be done to make Rust programming language we can use for cryptographic code. Both Diane and me are here in the at a conference and in the chat, so feel free to ask any questions at the end of the talk, or put them in the chat during the talk, and we will take care of them. -> Hi, I'm Diane Hosfelt, and this is Batman. Before we get started, I have a short disclaimer. All of this work was done while I was a Mozilla employee and it in no way reflects Apple's views. -> First, we will talk about timing side channels work, what they are, why are they dangerous, and then we will talk about how Rust is not suitable to write code that is actually - that actually prevents these channels. We will look at a couple of hacks that we could use to prevent some of these channels in Rust, but then we will go more in depth and look at the RSC on secret types to see how we could make Rust for suitable for such code. So, first, ... -> A side channel is any attack based on information gained from the implementation of a crypto system, not a weakness in the system itself. In this case, we are concerned about timing side chapels which occur when attackers analyse the tame taken to execute a cryptographic engineer algorithm - *a cryptographic algorithm which can be seen as an implicit output. Imagine it takes less time to execute part of the code when a bit is zero than when it does when a bit is one. That difference is measurable, and it lead to key recovery attacks. These attacks are a threat in the post-spectre world, primarily used to attack secrets that are long-lived and extremely valuable if compromised, where each bit compromised provides incremental value and the confidential shalt of compromise is desirable. The fix is constant time code, or to be more precise, data invariant code, with the time it takes to execute the code doesn't depend on the input. -> Let me explain to you why at this point it's really hard for us to guarantee that the compiler is constant time. So this is - this story will be true for basically any programming language that is compiled. There are some exceptions. But we are at a Rust conference, so let's focus on Rust. So the main problem here is that compilers are in some sense problematic. They are allowed to optimise thinking they feel does not change the program. And the behaviour, like, or the runtime of a program, or stuff like that is not considered to change the program in the view of a compiler, so, the compiler might actually optimise stuff that we don't think would be - should be possible. And so, for example, there is this thing that LVM could do which is eliminate any conditional moves that may load. Let me show you an example of this. Okay. So what you see here on the left is I have written this nice little CMOR function, so if this conditional value is true, what it should do is that it should move the value in B into A. And if this conditional value is false, then A should just remain the same value and B should just be dropped by the way. But the important thing here is that the conditional value is secret. We don't want to leak any information about the secret value, so the runtime of this function should be always the same length, the same duration, like depending on the value of this conditional value. So what we do first is we generate a mask from its conditional value and the value that will come out of this mask will be something like either only once, or if the conditional value is true, sorry, if the conditional value was false, it will be a mask of only zeros. And then we will use this mask. So the first line here, what this does is, if this mask is only once - so if the conditional was true - then it will x, or the value in A - this will set the value in A to zero. Then only if this conditional value was true, it will x again with b. A will become - A will get the value of B. And then, if this conditional value was false and then this mask would be all zeros, then both of these end operations will make sure that this value with zero, and this value was zero, and both of these operations would be a no, and then A keeps the same value. What we see when LLVM compiles this, Rust compiles it, that the compiler is smart. The compiler sees, this behaviour of this function completely depends on this conditional value, so, first what it does is that it checks if this conditional value is actually zero, so is it false? And if it is - if it sees that the conditional value is true, it jumps to this instruction, so it skips this complete instruction. And if this conditional value was false, then it just does instruction and then moves under the instruction, and that's it. Basically, in one of the two cases, it's skipped its instruction, and the important thing to see here is that depending on this value, depending on the value of the conditional, the runtime of the algorithm changes, and so here we have a case where the compiler introduced a side channel which would be a side channel. The interesting thing is that if we only look at the source code in Rust, it looks like something that, like, it looks like code that feels completely like this could be, or should be implemented in constant time. We have these operations, and you don't even see them in the compiled code because LLVM is smart enough to see that they're not needed. And this is actually a pretty big danger for us. So that is what we mean when we say compilers are problematic. -> Obviously, we're at RustFest, so we've all bought into Rust, but the question remains if we can do secret invariant programming with assembly, why do we need to do it in Rust at all? Writing cryptographic in high-level languages like Rust is attractive for numerous reasons. First, they're generally more rateable and accessible to developers and reviewers, leading to higher quality, more secure code. Second, it allows the integration of cryptographic code with the rest of an application without the use of FFI. Finally, we are motivated to have a reference implementation for algorithms that as portable architectures might not be supported by highly optimised assemble implementations. -> So, then why do we focus on Rust? Why don't we just, if we can't write secure code, why do we want to use Rust in the first place? That is obviously everybody here has their idea of why they would use Rust, and in our case, it's kind of the same. We want to use Rust. The reason we want to use Rust is we have all these types, and all these nice checks in the compiler that allow us to make our code that is easier to write secure code. And we want to utilise these checks and these tools as much as possible because writing just plain assembly is really hard and super error-prone, and then there's the other thing that if we only write assembly, then you've written an assembly for an Intel processor. When you want to run the same code on an ARM processor, you have to rewrite the whole code. We don't want to do that, because it also allows you to make more mistakes, and we want our crypto code to be really secure, so we would like to use a high-level language if that is at all possible. So it is not all that bad. So there is some way how Rust can be in a wayside-channel resistant. And this, like, a couple of these, so, in Rust been make these new-type style references around integers, a struct that only has some integer type and implement some operations on that that are presumably in constant time. There are two main examples in the wild. The first one is the subtle crate which if ever you need to do some stuff in constant time, use this crate. This is the current state of the art that we have. This is probably what you should use, and we don't have anything better at the moment, and the other example that I would like to mention is the secret-integers crate which is a bit more academic of nature. What it does is looks at what if we would replace all of the integer types that is constant time integer type, would that work, and what the secret-integer crate provides side-channel resistance on the language level, so, on the language level, you're sure that your code looks like something that should be side channel resistant, but it does not actually prevent these compiler optimisations. The subtle crate does that, and that's why I recommend that crate. Both of those crates, they are only a best effort, they're only best effort, and they don't guarantee all of the - they don't fix all of the compiler optimisation issues. So, it is the language level. We can also look at like more at the compiler level, what do we need to do in a compiler to actually do it right? It turns out we need to add some kind of optimisation barrier for the secret data. Let me go back to the example really quickly. So it turns out that the problem here is that LLVM seems to be able to completely eliminate this mask variable, so this mask variable is secret, because it directly depends on this conditional value which we said was secret. And then because LLVM can just analyse through this mask variable, it can do this really nice optimisation of adding a branch, and then just eliminating all these bitwise operations. We need to add an optimisation barrier to this mask variable. And there are a couple of ways that we can add optimisation barriers, and the first example is that we can add an optimisation barrier which adding an empty assembly directive. We construct an assembly directive which dates this mask value as an input and also takes this mask value as an output. Then LLVM is not able to reason about what happens inside of an assembly directive. We know that nothing happens inside an assembly directive, but LLVM cannot reason about that. Because it will actually keep this mask value completely intact and it will not be able to optimise through that variable, and so, the assembly directive doesn't work on stable Rust because, for the assembly derive, you need to have a nightly Rust version to compile, so that is not really optimal. And so the other trick that we can use is to do a volatile read of secret data, and what this does is guarantees that at some point this mask value would have existed on the stack, and because of that LLVMs are not able to optimise through this read. Both tricks kind of work in 90% of the cases. They do not have like 100% success rate for all our cases. I won't go into why that is at this moment, but it's important to know that they don't always work. They're best-effort tricks. The most important part is that although these tricks might work at the moment, they are not guarantees, and the compiler might change in the future, so perhaps in five years, the compiler is actually able to look into this assembly directive and see that nothing happens, and it might eliminate that assembly directive completely and we don't know, we don't have any guarantee that this kind of stuff won't happen in the future, so it might be that, in a couple of years, a completely secure version of some software now might actually be insecure with a new compiler version which I find very scary. So, yes, we like to have guarantees, and we don't want to have just hacks. So, for the next part, I will give the floor to Diane, and she will be talking how we can use secret types in Rust to make our lives a little bit better. -> Why aren't these language-level protections good enough? The compiler and instructions - it turns out that the general purpose instructions on various platforms take a variable number of cycles, so for us to truly have secret independent runtimes, we need to eliminate the problematic instructions. This can only be done at a compiler level. Enter RF, this, 2859. This defines secret primitive types and the instructions that should be defined for each type. For all the extra types, we implement all of the normal acceptable operations. When we know that a value is safe, we can use the declassify version to put it back to a public value. For example, a secret key may be an array of secret u8 bytes and keep us secure by disallowing any Rust code that would result in insecure binary code. For example, we don't allow indexing based on secrets, we don't allow using secret boll and if statements, and we don't allow division which is a non-constant time algorithm, and we don't allow printing of secret values, and we say that every time we combine a public value with a secret value, it is also a secret. To give you an example of how this would work, here's a mock-up error message of what would happen if we broke one of these rules. Here, the programmer chose to branch on a secret_bool. In this case, the compiler should give us an error because that is not allowed. There are two parts to this problem: an LLVM part and a Rust part. There has been some work in that LLVM realm to propose a similar RFC to this one that what we've worked together on at Hacks 2020. We're not sure what the status of that work is at the moment, but what LLVM needs to do is to make sure that our constant time Rust code is also compiled safely, so LLVM needs to make sure to guarantee that what we wrote down in the code is safe in the emitted binary, that means no branching on secrets, no branching with secret indices, and no variable time instructions. At the moment, zeroing memory is out of scope, but when we have this information about public and secret values, then we've laid the groundwork to support that as well. Thank you so much for your attention. If you have any questions, feel free to ask us. While this is a recorded talk, we are currently present and ready to answer questions. -PILAR: All right. Thank you so much, Daan and Diane. We're lucky enough to have you both here for a Q&A. All right, so you've been joined by your friend too! [Laughter]. -> Batman came back. -PILAR: Entirely ignored during the day. We do have a couple of questions from the audience, which is great. The first one we've got is that this is all very complex. And how do you discover these kinds of problems, and how do you even begin to think of a solution? Very broad, but I think it would be great to hear your insight on this. -> So there are tools that you can use, verification tools, that can determine if on different inputs, there are different runtimes, so that is one of the ways that you can determine is if a program has non-secret independent runtimes. For part of it. Daan? -> Yes, the way we discover these kinds of issues is, like, at some point, sometimes, write a piece of assembly, and the first thing I do before I write it, is just program it in C or Rust and see what the compiler tells me to do, and then these are the moments that I stumble on, these, "Wait, if I would do this, this would not be secure." And that's when I first discovered this for myself, so, yes. -PILAR: Cool. You said that you gave us an insight you go with what the compiler says first and then you can discover it. Be curious about what the compiler tells you, not just like, all right. Someone has asked if there is a working group working on it on a solution? -> There isn't a working group. There is just the RFC, which has been a little bit stale, because, you know, life gets busy. So if anyone's interested in commenting on the RFC, and trying to help me bring it back to life, you know, that is definitely welcome. -PILAR: If there is interesting for a working group, then, yes, someone will hop on from the audience. -> That would be great. One of the things that needs to happen on the Rust side and on the LLVM side, we are going to have to eventually do some implementation work. You know, it's not enough just to define what has to happen. We have to implement these instructions on the secret types, so that will actually be a lot of work. -PILAR: So we have very little time left, but there was a lot of chatter in the chat room, so, I guess people can find you in there, and we can get a few more questions. There were lots of questions, and we just didn't have enough time, but thank you so much for joining us. It was great to have you here. She's asleep. She's melted into a puddle! -> Say bye to your new friend. -PILAR: See you both. Thank you for joining us. -> Thanks so much! -PILAR: So I will have our co-MCs. What a great day, right? -STEFAN: Yes. -JESKE: Such a great day. -PILAR: Mine has also fallen out of excitement. -STEFAN: I felt the need to be cute when you brought out your animals. -PILAR: Yes, we should probably wrap up. That was our last talk for the day, or at least the UTC block. There is more coming up in the LATAM block. -JESKE: Please stay on board for the upcoming stuff if you can. And also for the next block. -PILAR: If feeling a bit like Fiona, take a little nap. Come back re-energised. -STEFAN: We have two artists coming up. -JESKE: Our beautiful next performer Earth to Abigail create music with computer code, voice and various. I'm excited for that. She integrates electronic soundscapes into her song writing. I think it will be beautiful, and also relaxing, but also like having a little bit of warmth into it for this day. -PILAR: And so the other artist following up after that is, I condition pronounce the name, Aesthr. It's just written interestingly, but that's really cool. Aesthr. So, yes. The description I got which which sounds really cool is Aesthr is piloting a spaceship station made from wires, Tehran sisters, and a little witchcraft. We are in for a huge treat. We should thank all our lovely speakers, our amazing sponsors, all of you for being so great during the day, our sketch noter as well, Malwine, our captioner, Andrew, thank you so much. -JESKE: Thank you, everybody, for tuning in, in to the UTC time block. I think if everybody can check the recordings afterwards, and we wish everybody in the next block a lot of fun as well before -PILAR: Thanks again to our sponsors as well. -STEFAN: Yes. I think we have the Latin team coming. We wish them all the best fun. -JESKE: I had a lot of fun day today. You two? -PILAR: It was really great. -STEFAN: I hope we can keep this on. If I may, I have this idea, so the next time, maybe we're allowed to meet in the hall again, right, so we do this with the 24 hours, we do this again, but next time, we have two walls, and each side of each building, like three buildings around the planet, and the walls project the camera feed from the next venue in that direction. Yes, ... -PILAR: It feels we're in one building. That also sounds tiring but sounds really great. -JESKE: We will do it somewhere where you can bring your dogs. -PILAR: I have three. This is just the calm one. -JESKE: I have zero, that will be fine. I will adopt one for that day! -PILAR: You can borrow one of mine! You too, Stefan. Three dogs for three emcees. -STEFAN: Perfect. -PILAR: Thank you both to you two as well. It's been really great emceeing with you. -STEFAN: I think we will hand over now. -JESKE: Stefan, you're the technician from all three of us, thank you, everybody, and we will see you hopefully. -PILAR: See you at the Latin block [Spanish spoken]. Ciao! \ No newline at end of file diff --git a/2020-global/talks/03_LATAM/01-Stefan-Baerisch-published.md b/2020-global/talks/03_LATAM/01-Stefan-Baerisch-published.md new file mode 100644 index 0000000..652b9f1 --- /dev/null +++ b/2020-global/talks/03_LATAM/01-Stefan-Baerisch-published.md @@ -0,0 +1,109 @@ +**Learning Rust with Humility and in Three Steps** + +**Bard:** +Stefan gives us three steps to learn Rust +Not saying that follow you must, +but if humble you are +with Rust you'll go far +as you learn the compiler to trust + + +**Stefan:** +Okay. Can you hear me now? + +**Stefan (@dns2utf8):** +Yes. + +**Stefan:** +You should be able to hear me now. Hello? Okay. So, as long as I'm not hearing anything, just assume... + +**Stefan:** +Okay. So, can you hear me now? Because we had some technical difficulties. + +**Stefan (@dns2utf8):** +Yes. + +**Stefan:** +Okay. + +**Stefan (@dns2utf8):** +Do you mind restarting quickly? + +**Stefan:** +Yeah, sure. Sorry for that. But you see, we have more to learn, if it's only the mute button. Okay. Rust with humility in three easy ideas. So, what I tried to do, learning Rust, was to learn at the desktop applications. Because honestly, I like desktop applications. And I didn't start out with Rust directly, but I went through different technologies which gave me the opportunity to, well, see how... what made it difficult to learn different things. + +And so, the first idea of learning Rust... learning anything, really, is why do you think it might be hard? Why do you feel it's hard? Know your challenge. So, the background was basically I had my desktop application, I started out with Qt. Qt is a C++ framework and it's rather big. So, I had to learn, or refresh a lot of C++ which is tricky, and I had to learn Qt, which is extremely large as a framework goes. And I also had to deal with quite a lot of surprises. So, I had to learn a build system. And I had to learn how different compilers and different platforms worked. Which took a lot of time. Next attempt was to switch over to web technology and Electron. Which was not hard to learn, which, again, was much to learn. And a few things were, once again, unexpected. + +So, dependencies between the many other packages that I used were really hard to figure out. And signing the code to submit it to app stores was, well, not really nice. Which is... finally I switched over to another architecture which worked. And that was based on, well, Swift, JavaScript and Rust in the backend. + +So, what I came up with is basically VueJS which I had to learn. Swift, which I didn't use before, and Rust, which I didn't use before. So, well, many different things. And finally, I realized that Rust, so it was in certain parts the hardest one to learn because of making sync. Was also quite quick to learn because it was not too much to learn. And everything that was there was rather, well, transparent and expected. There were few surprises. + +So, if you struggle with a language, the first tip, the first idea is, be aware that in learning Rust as in each technology, you have to look at different things. You have to wonder, is it just hard to learn? So, are there some concepts that I don't know yet? Or is this also much to learn? Do I have to learn a lot of libraries, a lot of language contents, a lot of syntax? Are there a lot of things I need to learn package manager, build systems, et cetera? Each language has different degrees of difficulties. Different things that you have to keep in mind. And the thing with Rust is that the language itself, the concepts, et cetera, are quite challenging to learn. Especially if you haven't worked with a system programming language. But it is... once you have done it, it gets easier quite quickly. + +So, it is worthwhile not to get discouraged if you run into this steep learning curve at the beginning. Because everything else will be easier and it will be easier to get along. However, you can do yourself run large favor and try not to do too much at the beginning. Because when you learn concepts with a language like Rust, you will have to face many new things just when you start. And you'll have to have some basic understandings of the model, for example, for the syntax, in order to accomplish even the first steps with the language. That means, give yourself some time. And once you start learning, focus on the language. Go over these. First a step here. And then slowly work yourself up with the ecosystem, learn the different libraries. Learn the context of the language. So, simple application domain. And try not to change too much. + +Rust is extremely powerful. So, there can be many things that you would like to learn that would potentially be useful for your project. But give yourself time and focus on this first step. And, well, learning should be easier. When I did my little project, I basically started out with doing word counting application. Had a word X, et cetera. And only then slowly moved on to more work and more complex things that I needed for my application backend. And even there was, well, I choose the most boring and maybe even inefficient architectures that could possibly work. + +Related to that is the second idea, practice. I have seen... well, I'm gonna say it. The personal weakness to read a lot about languages before I use them. Especially if I have the impression that they are where we when a new and interesting concept in the language. And this can be a difficult� it can be difficult in languages where you can potentially be overwhelmed if you tried too much at once. I'm living here in Munich, we have quite a lot of mountains. And I always saw that the difference is between trying to think about a hiking trip on a map and seeing it in real life. Because in real life, things are always slightly more complex. Which means that if you chose for your project all the features that you could potentially want. And if you wanted to use the most elegant syntax and don't repeat yourself. And also, in quite good idiomatic code. Then it's quite likely that it will work out. But in practice, you run into slight things that hold you back. Might have a problem with one particular syntax feature that you haven't grasped as much as you would like to. And this will hold you back. + +So, it's often at least for somebody like me, a better approach to do very, very small things and practice them than do something else again. This will enable you to have, well, a lot of certainty, a lot of practice, when you finally move on to harder and more complex topics. So, what did I do? +The first things that hi learned is just ownership and basic syntax. Once I had the roughest idea about what copy type was. I just moved into just pushingso structs around and then using clone a lot. And after I did that I started to work more with references. And then slowly worked over to smart pointers. So, in essence, I just tried to get myself to the same knowledge level that I would have in other languages. For example, Go, Java, Python, where I do more programming. And that gave me a rough idea on how I would have to move along. Only then did I look at my application domain. There was this little backend for this time I had. And I sort of said, which features do I actually need? Which collections would be useful? How are those collections used? +I think I need multithreading and maybe a library. And to handle some internal requests. And after I edit each of the features, practiced a little bit, did some perfection, did some changes, and in this way, I was always able to slowly move along. So, I always went through this planning doing phase which allowed me to, well, get some things to work. + +Which brings me to the third and for me, at least, the most important idea in learning something that is... not is but can be challenging like Rust. Be humble. I think we as software developers, architects, we always like to� not to be honest, to feel clever. To feel good about ourselves because we have grasped and mastered the features of our language because we write good code. The most idiomatic code, the safe code, all the most performant code. And sometimes in order to get to this point where we, well, write the best code we could possibly think about, we challenge ourselves too much. We start running and basically decide that within 2 months you want to run a marathon. So, take it slow. And Rust actually allows you to start slow. + +It allows you to be even bored with what you're working on. And then slowly move to more ambitious things. Which gives you the confidence that you will need to move on with your personal project without being slightly overwhelmed all the time. Or without staring at a wall of compiler errors that you don't really understand because you decided to use the most advanced... the most advanced user system traits and the type system. + +This also has... has an advantage that you can actually build higher in terms of knowledge. My personal idea of programming languages, about learning in general, was always that you always build the concepts on the easier concepts that you have grasped before. So, in order to get a good idea, for example, about ownership, I think it helps not only to think about ownership, but while it's not necessary to have somebody about pointer management, memory management works, for example, in C or what reference counting means in other languages like Python. + +And this gives you doing many differences and gives you this broad basis that you can put all the things that you need to do into context. You're not only know, okay, I have seen in this book that I need to do this in the following way and know the rough concepts. But you can also relate it to other concepts that you ever related before. This can only happen if you can be humble and do small little steps which ultimately allow you to ascend faster. If you always work with the most difficult thing that you can possibly grasp, and this is the most clever code you can possibly write, it will be quite hard for you to debug this code or refactor this code or even to come back to this code if you, for example, move to something else for a couple of months. + +So, doing simple things anddoing them often ultimately leads to mastering I would argue quicker than if you tried to challenge yourself too much. There's an old saying that shortcuts make for long delays. And I think this applies to learning programming languages. Well, I can only remember that when I started to learn Rust, the first program I wanted to write was a word counter. I always use a little thing wherever I go of a couple of hundred files on my file systems and counts of words and give me the most frequent ones. Which is usually a couple of minutes or maybe an hour in different programming languages. In Rust, it took a little bit longer. Mostly because I started out assuming I could just assume one to one from the C++ program. I couldn't. I had to do these baby steps to slowly ramp up. And I tried to just allow myself to be less clever with my code. And I came up with six things. I think some of you might disagree. But even if you disagree, maybe just think about it. + +So, first of all, simple code is okay. You don't necessarily need to do pattern matching or complex programs or nice patching if you can just go for refills. You may end up with screens on screens of not nice if/else statements, but then you can end up with a program that you can refactor. It is okay to limit yourself. So, Rust makes the design decision to have many things that are useful but are not in the standard library. They have many good grades for error handling, for better error types, better concurrency, abstractions, et cetera. And I always have to stop myself looking at crate io and look for the most suitable and interesting crate andwrite a simple solution. Even if the simple solution means my error value is a string. And the code is maybe less nice, less idiomatic than it could be. Because it allows me to focus on some basics and at least know what I don't know about my code. + +Inefficient code is okay. Rust can produce very fast programs. And with support, also very fast servers. The thing with especially sync, things get complicated quickly. Allow your first version of your code to be inefficient. Do some memory copies, do some clones. Write in the same server. And later on, if you realize that you need something faster, you can always move there. + +Now to change tone a little bit. Unsafe code is not okay. So, first of all, it's unlikely that you will succeed in tweaking the checker going forward on unsafe code, and it makes things harder. Because one of the nice things about Rust is you don't have to worry too much about shooting yourself in the foot or to make, well, mistakes that you wouldn't know about it. For example when I wrote about a Rust program what I worked with them and had to reallocate some memory. The cadence is much harder than it would be in Go. But thinking about it in this particular corner case, when I would do this in Go, my... yeah, I would most likely do a copy where I wouldn't really want a copy. The safety benefits of Rust help you learn. The compiler is your tutor in a way. + +Boring code is okay. Many nice macros. That can help us. Macro programming is quite interesting. Nightly has many features, meet nightly. I would not necessarily go there at the beginning. Stay with the boring old stable world and move along. And to repeat myself, small steps in your code are okay. So, if you find yourself writing boring short functions, you're quite likely on the right track. Yeah. And that's almost all of the talk. So, the summary is quite easy. There's always this idea of computer science and management and everywhere of improvement loops. Doing something to improve it. Well, almost embarrassingly, it's the same as learning programming language. The only thing that's different with Rust is you should keep your steps. You have learned something, you're motivated to learn something more, write some code, look at the code. Refactor it, think, okay, which additional feature could I... would I want to use? Maybe how I do allow myself to use this particular� this one crate that I will then learn about. + +And research, you can stay motivated. You don't overburden yourself and you make slow progress. Things that start slow concern quite quick. For practical things, if some of you are just getting started with Rust or have, well, done some initial learning, I can just give you some hints from me. I'm a reader. So, I learn mostly by reading and some but doing. Ironically, I don't work particularly well with videos. + +But for me, what brought me first was starting with the Rust book. Which was quite nice because it explains a lot. Then you have some code. Then do some exercises and then go back to Rust by example where you have more code. So, maybe you don't need so much... so many explanations anymore. And you can just read the code, take the code, change the code and do some more experiments. And maybe start. Do some relatively simple things. So, lead code or presenter code or just porting your commandline.a simple commandline tool is something that is, well, limited in scope and that gives you a feeling of success when you run. And after that, what I find quite interesting is just select some books and then go to the GitHub repository. Often you find there example code online even if you don't want to buy the book. And then just go over the example of code. Read the example code and wonder, okay, what are they doing here? Which features are they using? And maybe play around with this example code a little more. + +This maybe then could be a step forward for your complex project. Or even to move to an existing application and see how they are working with on Rust source code. Maybe just download it, make some changes and see if it still compiles. + +Okay. Yeah. That's my personal experience with writing my first Rust program. So, a couple of... I think it's about solid lines of Rust code. Working messages around a desktop application. It took me longer than I initially would have thought, but it was fun and I would say ultimately it felt good writing Rust because even if I at times was quite challenged learning it, I always felt that the code that I was writing would actually work and that I would actually be able to change it in a couple of months� time. So, that's all that I have to say now. Thanks for listening in. If there were any questions, I would be happy to answer. + +**Inaki:** +Thank you, Stefan, for that very validating talk. As I'm sure everyone will agree, it's very rough at the beginning for everyone. Very, very good pointers. + + +**Stefan:** +Thanks. It's not so much, I think what you have to refrain from, and that's what I think, I write a lot of Python. And with Python, it's always... say programming language is easy mode because you have, well, you have a garbage collector, you have a reference counting, you have a syntax. And Python is extremely good at hiding complex libraries from the user. Rust has to expose some of these complexities. And just can be different. But I think what I really like to express, don't be discouraged. There is this bump at the beginning where you have to learn the concept. Unfortunately, there's no way around it. things will get easier. Rust will get easier. + +**Inaki:** +We have one question from the public. Based on the experience you've had with previous languages, what was the most difficult thing to grasp? + +**Stefan:** +Actually, access web. This is not so much... two things. Access tries to use what advanced traits in how you would use the library. And they are moving quite fast. So, I ended up with finding a lot of different or outdated documentation that it was quite a bit of trial and error. And since I'm on the trait system, I couldn't just go and see, okay, this is how this is supposed to work. But I had to slowly work myself up. And yeah. That's actually the one reason where the compiler couldn't help me. The compiler and extremely good, especially with error messages. Okay, ownership is not working. This needs other to be cloned or it's somewhere else. Or the trait's missing. There I got a lot of trait, but I couldn't really at the beginning understand it. + +That's why it changes one of the things and why I would advise to start with the standard where you have more stability, simpler code. Even if it means that the performance and the features are not necessarily optimal. + +**Inaki:** +And what would you say is the best point on the learning curve to start interacting with existing projects in your experience? + +**Stefan:** +I think when you have a skin in the project. If you start a project and they start throwing about you have to work with this particular trait or you have... now, okay. I may not have grasped this now. But I now because I read this chapter in the book and I can sync. I can figure out how it works. So, is that you don't read a sentence and you have the feeling that you have to look up every second word. And then really� depends on the project that you want to work with. I've really found it a good middle point is, well, example source code from books. Because especially if you go into the later features, you can just say, okay. I know this concept. I know this concept, I know this concept. But it's not the code that I've written before or that I have written my exercises in. + +And then you can maybe use, well, start on commandline tools. Maybe. I basically looked at the applications, the direct application. Maybe just go to GitHub and Google for commandline tool, look for commandline tool. Easy application domain, nothing too challenge. And see if you understand it or not. And if you don't understand it, move on to something else. Or dig in. Whatever ends up it depends on what keeps you motivated. Don't get frustrated. That's the most important thing. + +**Inaki:** +Yeah. The most important. If you do get stuck, what place do you turn for help? Any particular website or place? + +**Stefan:** +Not really. I usually go... Rust, for example, is Rust by Example is quite nice in terms of book. I still find the old O'Reilly book quite nice. They have a subscription service. But they have the Mastering Rust, the second one, is quite good. And the source code is also on GitHub. Which at least allows you to search by keyword and see how this is used in context. It depends so much. If you're struggling with a library, like my access, for example, it's not that it's related, I just didn't use a library. You need example code to make it work. If you're struggling with concepts that usually pop from book to book and try to work with the simplest possible example that I can make up and, well, play around with either Rust Playground or local code and see if you can make it work. + +**Inaki:** +Great. All righty, I think that's about it for questions. diff --git a/2020-global/talks/03_LATAM/01-Stefan-Baerisch.txt b/2020-global/talks/03_LATAM/01-Stefan-Baerisch.txt deleted file mode 100644 index fb633ef..0000000 --- a/2020-global/talks/03_LATAM/01-Stefan-Baerisch.txt +++ /dev/null @@ -1,61 +0,0 @@ - Inaki: All righty, then. Welcome, everyone, to the third block of RustFest. For the Americas time zones. I hope you have been having an excellent time with the previous talks. It's been... it's been intense, hasn't it? So, before we continue with this block with a bunch of talks and new artists, I would like to go over a few things first. Because some of you in the Americas in this time zone may not have seen the presentation of the other ones. So, let's just review. For those of you new to RustFest, what is this conference about? - What is it all about? Well, it's about communities coming together. As Stefan mentioned in the introduction to the UTC block, RustFest started out as a community conference in Berlin and has grown and moved around for several years now. This year for obvious reasons it has gone global. So, since it's about meeting people and connecting with other Rustations, RustFest decided to reach out to other communities and make this thing you are experiencing today and since yesterday. So, for those of you just tuning in, as they say, you have the web page, the home page for the event. You can log in there and watch. You have the links to enter different chat rooms. If you see below, there's a series of buttons with which you have goodies from the sponsors and you can see the live transcriptions and sketches. And you can also play the wonderful game that Stefan has made. - So, there's a lot to interact with. And we'll come back to that later. One thing I wanted to mention is that there's this... so, history has given us English as today's lingua franca, or the common tongue. So, to give everyone a voice and the same reach, RustFest has taken on to all the native talks, make sure they have been translated. And those that are in English, to transcribe them so we can all stand on a common ground. And at the same time, encourage everyone to speak in their own voice. - RustFest is also about bringing different communities and people together from different technical backgrounds. So, it's not just from this side or that side with different objectives. What we've learned doing Rust over these years is that diversity brings strength and improves us in many different ways. I think this year's RustFest is like one of the best examples of that with everything we have achieved. - So, we have a bunk of talks, 21 talks given by 4 different speakers. You've probably seen the wonderful content prepared by artists, 12 artists in between the talks. This has been the work of three different teams across three time zones, or lots of time zones, actually, in much more than three. In just one globe. All this to bring more than 20 hours of content. Which we believe have been really good and are really good. - We would like to thank and recognize the artists which we have been seeing. Juan Hansen, DahliaFai & Jay, DJ Dibs, that was a sick set. Earth to Abigail, this time is Linalab && !ME. And we would like to welcome Malwine and our resident bar, logic. Just before we go on, some of the content prepared by artists is pretty intense. So, we want to give out a trigger warning for flashing lights. Some of the visual arts might be triggering for epileptics or those with seizures. I'm sure everyone has been wonderful in the chat rooms and just as a reminder we have a Code of conduct. It's at this address. If you need help, mail us at that address, or talk directly to us through the Matrix chat in the moderator's room. Or just ping us. - I'm sure everyone has read it already, but just as a quick reminder to jog your memory. RustFest Global is dedicated to providing a harassment free conference experience for everyone. Harassment includes, amongst other things, offensive comments related to gender, gender identity, expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion, technology choices, deliberate intimidation, stalking and any other unwelcome sexual attention. - So, the general rule of thumb is, assume the best intent in others. Be nice to each other. Treat with respect. Ask questions. Do participate. And just keep in mind that everyone is different and that's great! We really are better for it. - So, another quick reminder, if you're wondering in which order the talks are, here's the schedule. You can also enter the chat through this address. And I would like to give a bit of recognition to the team that made all this possible. None of this would have been possible without all the collaborators across the three teams. Thank you to the APAC team, Asia Pacific. Intense work there. It's so rewarding to cross that cultural bridge and have contact with you. I would like to especially thank the UTC team. It's difficult to transmit the extent of what they've done because, you know, this is just another conference for many of you. But I would like to bring attention and remind ourselves that it's not only just a usual conference work, but also putting together all this platform in just a few short months. And also, inviting other communities and enabling other communities and helping them grow. All that... everything that has been done for this conference, for this RustFest will have an impact far beyond this year or Europe or any geographical region. So, that's something that should be reminded and a hearty congratulations to everyone. - And I would like to thank my teammates on the Americas team. And, of course, all the sponsors that each one with what they've given has laid a brick in bringing... in building this. So, Coil, Embark, Parity, MUX, Mozilla, Centricular, OpenSuse, Mullvad VPN, OBB, Red Sift, TerminusDB, Nervos, TrueLayer, Tweede Golf, Technolution, IOmentum, Traverse Research, and Ferrous Systems. And thank you, you have made all this possible. With all this recap, we are ready to start the next talk which will begin shortly. - Inaki: So, for our first talk, Learning Rust with Humility in Three Steps, I giveto you, Stefan Baerisch who is a freelance software engineer and a project manager. He likes simple working software. And he will be giving us some hints on how to learn Rust with humility, given its reputation for being hard to learn. - [speaker on mute] - >> Hi, sorry to interrupt, we seem to have a microphone problem. Also, we have... Stefan Baerisch, can you please unmute yourself? - Stefan: Okay. Can you hear me now? - >> Yes. - Stefan: You should be able to hear me now. Hello? Okay. So, as long as I'm not hearing anything, just assume... - >> Stefan gives us three steps to learn Rust. Not saying that follow you must, but if humble you are, with Rust, you'll go far as you learn the compiler to trust. - Stefan: Okay. So, can you hear me now? Because we had some technical difficulties. - >> Yes. - Stefan: Okay. - >> Do you mind restarting quickly? - Stefan: Yeah, sure. Sorry for that. But you see, we have more to learn, if it's only the mute button. Okay. Rust with humility in three easy ideas. So, what I tried to do, learning Rust, was to learn at the desktop applications. Because honestly, I like desktop applications. And I didn't start out with Rust directly, but I went through different technologies which gave me the opportunity to, well, see how... what made it difficult to learn different things. - And so, the first idea of learning Rust... learning anything, really, is why do you think it might be hard? Why do you feel it's hard? Know your challenge. So, the background was basically I had my desktop application, I started out with Qt. Qt is a C++ framework and it's rather big. So, I had to learn, or refresh a lot of C++ which is tricky, and I had to learn Qt, which is extremely large as a framework goes. And I also had to deal with quite a lot of surprises. So, I had to learn a build system. And I had to learn how different compilers and different platforms worked. Which took a lot of time. Next attempt was to switch over to web technology and Electron. Which was not hard to learn, which, again, was much to learn. And a few things were, once again, unexpected. - So, dependencies between the many other packages that I used were really hard to figure out. And signing the code to submit it to app stores was, well, not really nice. Which is... finally I switched over to another architecture which worked. And that was based on, well, Swift, JavaScript and Rust in the backend. - So, what I came up with is basically VueJS which I had to learn. Swift, which I didn't use before, and Rust, which I didn't use before. So, well, many different things. And finally, I realized that Rust, so it was in certain parts the hardest one to learn because of making sync. Was also quite quick to learn because it was not too much to learn. And everything that was there was rather, well, transparent and expected. There were few surprises. - So, if you struggle with a language, the first tip, the first idea is, be aware that in learning Rust as in each technology, you have to look at different things. You have to wonder, is it just hard to learn? So, are there some concepts that I don't know yet? Or is this also much to learn? Do I have to learn a lot of libraries, a lot of language contents, a lot of syntax? Are there a lot of things I need to learn package manager, build systems, et cetera? Each language has different degrees of difficulties. Different things that you have to keep in mind. And the thing with Rust is that the language itself, the concepts, et cetera, are quite challenging to learn. Especially if you haven't worked with a system programming language. But it is... once you have done it, it gets easier quite quickly. - So, it is worthwhile not to get discouraged if you run into this steep learning curve at the beginning. Because everything else will be easier and it will be easier to get along. However, you can do yourself run large favor and try not to do too much at the beginning. Because when you learn concepts with a language like Rust, you will have to face many new things just when you start. And you'll have to have some basic understandings of the model, for example, for the syntax, in order to accomplish even the first steps with the language. That means, give yourself some time. And once you start learning, focus on the language. Go over these. First a step here. And then slowly work yourself up with the ecosystem, learn the different libraries. Learn the context of the language. So, simple application domain. And try not to change too much. - Rust is extremely powerful. So, there can be many things that you would like to learn that would potentially be useful for your project. But give yourself time and focus on this first step. And, well, learning should be easier. When I did my little project, I basically started out with doing word counting application. Had a word X, et cetera. And only then slowly moved on to more work and more complex things that I needed for my application backend. And even there was, well, I choose the most boring and maybe even inefficient architectures that could possibly work. - Related to that is the second idea, practice. I have seen... well, I'm gonna say it. The personal weakness to read a lot about languages before I use them. Especially if I have the impression that they are where we when a new and interesting concept in the language. And this can be a difficult it can be difficult in languages where you can potentially be overwhelmed if you tried too much at once. I'm living here in Munich, we have quite a lot of mountains. And I always saw that the difference is between trying to think about a hiking trip on a map and seeing it in real life. Because in real life, things are always slightly more complex. Which means that if you chose for your project all the features that you could potentially want. And if you wanted to use the most elegant syntax and don't repeat yourself. And also, in quite good idiomatic code. Then it's quite likely that it will work out. But in practice, you run into slight things that hold you back. Might have a problem with one particular syntax feature that you haven't grasped as much as you would like to. And this will hold you back. - So, it's often at least for somebody like me, a better approach to do very, very small things and practice them than do something else again. This will enable you to have, well, a lot of certainty, a lot of practice, when you finally move on to harder and more complex topics. So, what did I do? - The first things that hi learned is just ownership and basic syntax. Once I had the roughest idea about what copy type was. I just moved into just pushingso structs around and then using clone a lot. And after I did that I started to work more with references. And then slowly worked over to smart pointers. So, in essence, I just tried to get myself to the same knowledge level that I would have in other languages. For example, Go, Java, Python, where I do more programming. And that gave me a rough idea on how I would have to move along. Only then did I look at my application domain. There was this little backend for this time I had. And I sort of said, which features do I actually need? Which collections would be useful? How are those collections used? - I think I need multithreading and maybe a library. And to handle some internal requests. And after I edit each of the features, practiced a little bit, did some perfection, did some changes, and in this way, I was always able to slowly move along. So, I always went through this planning doing phase which allowed me to, well, get some things to work. - Which brings me to the third and for me, at least, the most important idea in learning something that is... not is but can be challenging like Rust. Be humble. I think we as software developers, architects, we always like to not to be honest, to feel clever. To feel good about ourselves because we have grasped and mastered the features of our language because we write good code. The most idiomatic code, the safe code, all the most performant code. And sometimes in order to get to this point where we, well, write the best code we could possibly think about, we challenge ourselves too much. We start running and basically decide that within 2 months you want to run a marathon. So, take it slow. And Rust actually allows you to start slow. - It allows you to be even bored with what you're working on. And then slowly move to more ambitious things. Which gives you the confidence that you will need to move on with your personal project without being slightly overwhelmed all the time. Or without staring at a wall of compiler errors that you don't really understand because you decided to use the most advanced... the most advanced user system traits and the type system. - This also has... has an advantage that you can actually build higher in terms of knowledge. My personal idea of programming languages, about learning in general, was always that you always build the concepts on the easier concepts that you have grasped before. So, in order to get a good idea, for example, about ownership, I think it helps not only to think about ownership, but while it's not necessary to have somebody about pointer management, memory management works, for example, in C or what reference counting means in other languages like Python. - And this gives you doing many differences and gives you this broad basis that you can put all the things that you need to do into context. You're not only know, okay, I have seen in this book that I need to do this in the following way and know the rough concepts. But you can also relate it to other concepts that you ever related before. This can only happen if you can be humble and do small little steps which ultimately allow you to ascend faster. If you always work with the most difficult thing that you can possibly grasp, and this is the most clever code you can possibly write, it will be quite hard for you to debug this code or refactor this code or even to come back to this code if you, for example, move to something else for a couple of months. - So, doing simple things anddoing them often ultimately leads to mastering I would argue quicker than if you tried to challenge yourself too much. There's an old saying that shortcuts make for long delays. And I think this applies to learning programming languages. Well, I can only remember that when I started to learn Rust, the first program I wanted to write was a word counter. I always use a little thing wherever I go of a couple of hundred files on my file systems and counts of words and give me the most frequent ones. Which is usually a couple of minutes or maybe an hour in different programming languages. In Rust, it took a little bit longer. Mostly because I started out assuming I could just assume one to one from the C++ program. I couldn't. I had to do these baby steps to slowly ramp up. And I tried to just allow myself to be less clever with my code. And I came up with six things. I think some of you might disagree. But even if you disagree, maybe just think about it. - So, first of all, simple code is okay. You don't necessarily need to do pattern matching or complex programs or nice patching if you can just go for refills. You may end up with screens on screens of not nice if/else statements, but then you can end up with a program that you can refactor. It is okay to limit yourself. So, Rust makes the design decision to have many things that are useful but are not in the standard library. They have many good grades for error handling, for better error types, better concurrency, abstractions, et cetera. And I always have to stop myself looking at crate io and look for the most suitable and interesting crate andwrite a simple solution. Even if the simple solution means my error value is a string. And the code is maybe less nice, less idiomatic than it could be. Because it allows me to focus on some basics and at least know what I don't know about my code. - Inefficient code is okay. Rust can produce very fast programs. And with support, also very fast servers. The thing with especially sync, things get complicated quickly. Allow your first version of your code to be inefficient. Do some memory copies, do some clones. Write in the same server. And later on, if you realize that you need something faster, you can always move there. - Now to change tone a little bit. Unsafe code is not okay. So, first of all, it's unlikely that you will succeed in tweaking the checker going forward on unsafe code, and it makes things harder. Because one of the nice things about Rust is you don't have to worry too much about shooting yourself in the foot or to make, well, mistakes that you wouldn't know about it. For example when I wrote about a Rust program what I worked with them and had to reallocate some memory. The cadence is much harder than it would be in Go. But thinking about it in this particular corner case, when I would do this in Go, my... yeah, I would most likely do a copy where I wouldn't really want a copy. The safety benefits of Rust help you learn. The compiler is your tutor in a way. - Boring code is okay. Many nice macros. That can help us. Macro programming is quite interesting. Nightly has many features, meet nightly. I would not necessarily go there at the beginning. Stay with the boring old stable world and move along. And to repeat myself, small steps in your code are okay. So, if you find yourself writing boring short functions, you're quite likely on the right track. Yeah. And that's almost all of the talk. So, the summary is quite easy. There's always this idea of computer science and management and everywhere of improvement loops. Doing something to improve it. Well, almost embarrassingly, it's the same as learning programming language. The only thing that's different with Rust is you should keep your steps. You have learned something, you're motivated to learn something more, write some code, look at the code. Refactor it, think, okay, which additional feature could I... would I want to use? Maybe how I do allow myself to use this particular this one crate that I will then learn about. - And research, you can stay motivated. You don't overburden yourself and you make slow progress. Things that start slow concern quite quick. For practical things, if some of you are just getting started with Rust or have, well, done some initial learning, I can just give you some hints from me. I'm a reader. So, I learn mostly by reading and some but doing. Ironically, I don't work particularly well with videos. - But for me, what brought me first was starting with the Rust book. Which was quite nice because it explains a lot. Then you have some code. Then do some exercises and then go back to Rust by example where you have more code. So, maybe you don't need so much... so many explanations anymore. And you can just read the code, take the code, change the code and do some more experiments. And maybe start. Do some relatively simple things. So, lead code or presenter code or just porting your commandline.a simple commandline tool is something that is, well, limited in scope and that gives you a feeling of success when you run. And after that, what I find quite interesting is just select some books and then go to the GitHub repository. Often you find there example code online even if you don't want to buy the book. And then just go over the example of code. Read the example code and wonder, okay, what are they doing here? Which features are they using? And maybe play around with this example code a little more. - This maybe then could be a step forward for your complex project. Or even to move to an existing application and see how they are working with on Rust source code. Maybe just download it, make some changes and see if it still compiles. - Okay. Yeah. That's my personal experience with writing my first Rust program. So, a couple of... I think it's about solid lines of Rust code. Working messages around a desktop application. It took me longer than I initially would have thought, but it was fun and I would say ultimately it felt good writing Rust because even if I at times was quite challenged learning it, I always felt that the code that I was writing would actually work and that I would actually be able to change it in a couple of months time. So, that's all that I have to say now. Thanks for listening in. If there were any questions, I would be happy to answer. - Inaki: Thank you, Stefan, for that very validating talk. As I'm sure everyone will agree, it's very rough at the beginning for everyone. Very, very good pointers. - Stefan: Thanks. It's not so much, I think what you have to refrain from, and that's what I think, I write a lot of Python. And with Python, it's always... say programming language is easy mode because you have, well, you have a garbage collector, you have a reference counting, you have a syntax. And Python is extremely good at hiding complex libraries from the user. Rust has to expose some of these complexities. And just can be different. But I think what I really like to express, don't be discouraged. There is this bump at the beginning where you have to learn the concept. Unfortunately, there's no way around it. things will get easier. Rust will get easier. - Inaki: We have one question from the public. Based on the experience you've had with previous languages, what was the most difficult thing to grasp? - Stefan: Actually, access web. This is not so much... two things. Access tries to use what advanced traits in how you would use the library. And they are moving quite fast. So, I ended up with finding a lot of different or outdated documentation that it was quite a bit of trial and error. And since I'm on the trait system, I couldn't just go and see, okay, this is how this is supposed to work. But I had to slowly work myself up. And yeah. That's actually the one reason where the compiler couldn't help me. The compiler and extremely good, especially with error messages. Okay, ownership is not working. This needs other to be cloned or it's somewhere else. Or the trait's missing. There I got a lot of trait, but I couldn't really at the beginning understand it. - That's why it changes one of the things and why I would advise to start with the standard where you have more stability, simpler code. Even if it means that the performance and the features are not necessarily optimal. - Inaki: And what would you say is the best point on the learning curve to start interacting with existing projects in your experience? - Stefan: I think when you have a skin in the project. If you start a project and they start throwing about you have to work with this particular trait or you have... now, okay. I may not have grasped this now. But I now because I read this chapter in the book and I can sync. I can figure out how it works. So, is that you don't read a sentence and you have the feeling that you have to look up every second word. And then really depends on the project that you want to work with. I've really found it a good middle point is, well, example source code from books. Because especially if you go into the later features, you can just say, okay. I know this concept. I know this concept, I know this concept. But it's not the code that I've written before or that I have written my exercises in. - And then you can maybe use, well, start on commandline tools. Maybe. I basically looked at the applications, the direct application. Maybe just go to GitHub and Google for commandline tool, look for commandline tool. Easy application domain, nothing too challenge. And see if you understand it or not. And if you don't understand it, move on to something else. Or dig in. Whatever ends up it depends on what keeps you motivated. Don't get frustrated. That's the most important thing. - Inaki: Yeah. The most important. If you do get stuck, what place do you turn for help? Any particular website or place? - Stefan: Not really. I usually go... Rust, for example, is Rust by Example is quite nice in terms of book. I still find the old O'Reilly book quite nice. They have a subscription service. But they have the Mastering Rust, the second one, is quite good. And the source code is also on GitHub. Which at least allows you to search by keyword and see how this is used in context. It depends so much. If you're struggling with a library, like my access, for example, it's not that it's related, I just didn't use a library. You need example code to make it work. If you're struggling with concepts that usually pop from book to book and try to work with the simplest possible example that I can make up and, well, play around with either Rust Playground or local code and see if you can make it work. - Inaki: All right. I think that's about it for questions. Since we got off to a rough start, we'll send you off with logic's rhyme for you. Thank you very much, again, Stefan. - >> Thank you. - >> Stefan gives us three steps to learn Rust. Not saying that follow you must, but if humble you are, with Rust, you'll go far as you learn the compiler to trust. - Inaki: All righty, then. Until next time. - Stefan: Yeah. Have a good day. - diff --git a/2020-global/talks/03_LATAM/02-glowcoil-published.md b/2020-global/talks/03_LATAM/02-glowcoil-published.md new file mode 100644 index 0000000..a8a0659 --- /dev/null +++ b/2020-global/talks/03_LATAM/02-glowcoil-published.md @@ -0,0 +1,53 @@ +**Ochre: Highly portable GPU accelerated vector graphics** + +**Bard:** +Glowcoil shows how vectors can act +to create a great UI, in fact +they are easy to do +on a slow GPU +and they won't fall together when stacked + +**Glowcoil:** +Hi, I'm Micah Johnston. I also go by the username glowcoil. Today I'm presenting a project I have been working on called Ochre which is a GPU accelerated vector graphics and text renderer library for Rust. And the primary use case that I've intended Ochre for is UI rendering. So, first, I'm gonna try to answer the question why I'm making a vector graphics render in the first place. So, I wouldn't make the claim that vector graphics is by far the dominant type of representation for graphical content in user interfaces today. We use it for our font formats and as well as emoji. And HTML and CSS are... they basically comprise a vector format and a pretty massive amount of UIs use HTML, CSS and the browser rendering engine today. + +And then beyond that, most OS platform UI toolkits and cross platform UI toolkits all use... pretty much use vector graphics at this point. And that's for some important reasons. So, first of all, it's just more space efficient to store the vector form of something than the image form. Especially at higher resolutions. But beyond that, it's a resolution independent format. So, suppose you want your app to run on both a retina MacBook and an older 1080p monitor, if you're using images, you have to export a new image for the retina MacBook, whereas they can be rendered the same on similar pixel densities. That's a powerful benefit. Beyond that, it's just a good toolkit for if you have anything that's layout dependent based on the size of why window or if you're doing something that's a wave form visualizer or or a line graph in a data visualization program. Vector graphics, it's a really good toolkit for procedural visualizations like that. + +So, we would like to have a vector graphics renderer for our UIs. Now I will try to answer the question, why does it need to be GPU accelerated? There are two trends over the past 15, 20 years or so that are the reason I would say GPU acceleration is important for UIs. The resolution of screens are going up. This is a visualization of iPhone sizes over time. This is a drastic increase in and doesn't include the latest iPhone, it's like 2700x1100 pixels. That's a lot of pixels. We need to render every frame. If you want your app to run at a smooth 60 frames per second, you have to render a lot more pixels every second. And that means using more computing power. + +So, at the same time we don't want to drain battery from the laptop or a mobile phone. And you need to hit that 60 frames per second deadline if you want to have a smooth app. So, the other important trend is that GPUs have become really ubiquitous in consumer hardware. So, you can... you can get a lot more computation done both per second and per watt with a GPU than a consumer single CPU core. So, this lets you be both more efficient and hit your frame deadlines and also more power efficient so that you don't drain the battery or use too much power just for rendering something simple like a UI which because presumably you would like to use the rest of your CPU for other applications. Your app is presumably not just a UI. + +So, because of increasing resolutions and you should note, refresh rates are also starting to become an issue because 120 hertz, 144 hertz monitors are starting to enter the market. So, that's even more pixels you have to paint per second. GPUs are really good for highly parallel tasks. And rendering happens to a lot of aspects of 2D rendering happen to be highly parallel. Since a lot of tasks you do the same operation per pixel. So, it's a good fit for GPUs. The set of GPUs available in consumer hardware are a good fit for UIs. And as kind of evidence of this, both macOS and Windows have been using GPU hardware to accurate... basically the step uptaking the windows you have home and painting them on to one frame on your monitor, which is an operation called compositing. Both the Mac and the Windows have been using that since Mac since 2004 and Windows since 2007 with Vista. In addition to that, browsers are also increasingly taking advantage of the GPU. + +So, there's evidence that it is a good idea to use GPU acceleration to render the UI for both efficiency and power efficiency reasons. So, hopefully I've convinced you that a GPU accelerated vector renderer is a desirable thing to have for a UI. But using the GPU comes with a catch. That is that you can't write a single program and run it on every GPU out in the wild. There is a big variety of manufacturers and then of APIs and platforms which give you some subset of the APIs, but not all of them. So, I have this table here showing which APIs are available on which operating systems. + +And there isn't really an API that has full coverage of all the operating systems you might want to target. OpenGL looks promising. But it's officially deprecated on Apple platforms. And even before that, there was only an older version with more limited features was supported. And this table actually looks a bit more rosy than the real situation. Because and the... Vulkan and the Metal are only available if you have a 10 year old desktop, it may not be able to run the features. If you have picked the GPU API, you have to look at which you are willing to target and which users you are going to exclude from your application. So, portability is kind of a hard question. You have to figure out how you're going to make your application use different APIs on different platforms if do want to be cross platform. + +Probably the hardest part of this, when you write code that runs on the GPU, you use what's called a shading language. And each of those different APIs have different shading languages. So, either you're going to have to rewrite your shaders for each platform, or you're going to have to figure out some cross compilation setup where your build system includes a shader compiler which increases complexity, and you have to negotiate platform specific features when you're doing so. It's a big headache. The approach I have taken with Ochre is just to choose a small subset. The smallest subset of GPU features that are going to be available pretty much anywhere. That are beginning to let us leverage GPU performance advantages and still let us do the UI graphics that we would like to do. + +So, to get into a little bit more detail about what a vector renderer has to do, there are two aspects. The first aspect is you take the shapes. For instance, a font glyph, a letter and a font, it basically defines a soled region bounded by a curve and the renderer has to determine which pixels are inside or outside that curve and then fill them in with the appropriate color. Whether there's a solid color or a gradient. + +And the other aspect is taking multiple shapes like that and compositing them in order using what's called the painter's algorithm. It's called that because later things that you draw paint over earlier things. The second step, compositing, on the left here, GPUs are really good at it. And as I mentioned before, this is what operating systems and browsers have been making a lot of use of GPUs for, for a long time now. It's an interesting thing to do with GPUs, they're good at it. It's much more power efficient than a CPU. On the other hand, the operation which I will call painting, determining inside and outside of the shape like this doesn't come naturally to GPUs since they kind of only natively speak in triangles. So, you have to somehow translate these types of shapes into triangles in one way or another for the GPU to understand. + +So, that's the hard part. There are a lot of different approaches to doing so. So, I have this kind of spectrum here from renderers that use more CPU to renderers that use more GPU. This is a huge oversimplification. This is super subjective. And in different ways, you could argue that they should be in different orders. So, this is just intended to kind of give a broad overview. So, on one end we have dune rendering entirely, and then there's Tessellation that busts apart this into triangles and shovels them over to the GPU to be rendered. It's robust, it's simple. It does work well for performance. But there are some big downsides. Namely that GPUs aren't capable of rendering triangles with the kind of anti aliasing that you need for small text. You have to do a hybrid approach with stereotypes, and maybe rendering the bigger shapes on the GPU. And there's another approach called stencil and cover. It uses a feature of the GPU called the tensile buffer to rather than doing the kind of mathematically hard of breaking a curve into triangles, it draws a point and draws a fan of triangles out from that point in such a way that they all cancel each other out to leave only the points inside the shape filled and the points outside the shape empty. + +And it has the same disadvantages as Tessellation. It's less GPU. It's using more than the rationing. It's a tradeoff, it's not a pure win. There's an example of the library that does that called NanoVG. It uses the stencil and can have cover approach, takes a hybrid approach with text rendering, text on the CPU and shapes on the GPU. It's similar to Ochre, it's minimal library focused on portability and working on as many situations as possible. Moving further down, it's more complicated approaches such as pathfinder which is a Rust library written by Patrick Walton. It was probably the number one inspiration for Ochre. I'm not sure I would have written Ochre without Pathfinder I have to give big acknowledgments to Patrick for that. + +And it's a refinement of the stencil and cover approach that does a lot of the CPU work, so you only have to do work near the edges rather than the big parts of the shape. It's CPU/GPU hybrid. Both places. It can outperform rasters like Cairo. It's a hybrid approach like that, it's not pure GPU. And then you... you the next item on my list here is a vector textures architecture. + +It works kind of differently. The way it works is you have a CPU pre process. And then you can render it many times on the GPU from many angles. For instance, at 3D scene. So, it offloads a lot of the work to the GPU. But it has performance tradeoffs more appropriate to the game than a UI because the reason for that is the end to end render time from loading a font or generating a scene from scratch to processing it, uploading it, and rendering it on the GPU is not even really faster than maybe full on CPU rasterization sometimes. So, I don't consider this a good approach for UIs. But it is a good approach for other situations such as games. + +And then finally, last on the list, we have a pure GPU compute renderer. Which uploads the scene as a data structure to the GPU. Uses modern GPU compute features to render it from scratch there. You'll notice Pathfinder also appears here because Patrick Walton in recent months developed another renderer for Pathfinder that uses GPU community. And there's another Rust library, piet gpu by Levine. And they are impressive performance with a high end GPU. They scale up to really using the GPU. But there's kind of this central tradeoff here. As you use the GPU better, you can scale up better with bigger GPUs. But it makes it harder to work on older hardware and makes it harder to port between different APIs. And this is, as I understand it, this is why Pathfinder has both renderers. Because the non compute rendering is trying to achieve more portability. Ochre is trying to strike a balance on this tradeoff. It's an attempt to strike a balance closer to Tessellation. But also get higher quality so you can render text with it. + +Let me go over a little bit of my earlier process that led me to the current state of Ochre. So, the first thing they worked on, I was working on this for about a year starting about a year and a half ago. And it works like the vector textures architecture I mentioned earlier. So, it has the tradeoffs that I mentioned where it's better suited to a 3D game where you render something that's the same. Many times from different angles. And I put a lot of work into this and then I decided it wasn't the appropriate approach for UIs. So, I've kind of shelved it for now. But Gouache will return eventually. But I was searching for something that struck that tradeoff better for UIs. + +And this was the first thing I got really excited about. I call it sparse scanline rendering. And the way it works is it only renders the pixels that are intersected by the outlines. And then it... and then it uploads those at horizontal lines, the GPU. I like to call it the GL_lines gobrrrr architecture. You can see on this diagram, anything that's not a pixel intersected by the curve doesn't have to be processed on the CPU and gets filled in by the GPU, which is good at doing kind of simple, highly parallel tasks like filling in a bunch of solid pixels. + +So, yeah. You can see here. This is what the horizontal lines look like that get uploaded to the GPU. And you can think of this architecture as run length encoding the image. So, you have to upload less data and compute less data in the first place because you inherently skip all the work of this solid spans. And this was... this had much better performance than I expected from how kind of weird of a design and also how simple it is. So, for complex scenes like the the there's a tiger scene that this is a clipping of. I was getting times that were 10xs as fast as Cairo, which is a single threaded CPU renderer. That was really promising. + +But it has some downsides. Basically, it doesn't handle humongous solid spaces very well. So, if you're rendering a full screen rectangle, the GPU gets stressed out by how many lines you're trying to shovel through it. And it is about 5xs slower than doing it with two rectangles. I tweaked this approach and ended up closer to... this is similar to what Pathfinder does, it does more on the CPU, whereas Pathfinder does more on the GPU. Basically I break it down into edge, tiles and spans. Rather than edge pixels and solid spans, I have 8x8 edge tiles and then solid 8xN spans in the middle. And I pack these into an Atlas texture, upload that to the GPU and render it all using triangles to make up those rectangles. This is an example texture Atlas this is what it looks like for the tiger. This is all of those little 8x8 chunks put in order in the Atlas. + +So, that's how Ochre works. And I'm gonna get a little bit more into how it... how it works from an API design standpoint. So, basically, I I wanted Ochre to be usable as a component rather than kind of taking over the design of your program if you use it. So, when it builds these tiles and spans, what it does is it just builds that data for you and then you can take that data. And you can upload it yourself to the GPU. So, that lets you whether you're on direct X or OpenGL or Metal, whether you have an existing game energy you want to use or you're using some proprietary console API that I couldn't have foreseen or added to Ochre as an API, you can just add the support yourself very straight forwardly using some simple operations. + +And it will still use the GPU efficiently. But all the work... all the work to build this data is done for you by Ochre and then, so, I like to think of this as humble library design where you you respect what the user wants to do. The user knows best their platform and performance things. Performance constraints. And you just make a library that can serve as a component alongside the other components of an application. Rather than kind of trying to take over and insist on things being done a certain way. I think that's the best way to be portable because it allows many different situations to make use of your library. + +And I guess one more shoutout for inspiration here walk the Dear ImGui library, a UI rendering library for C++ which takes a very similar approach and it's been used in very diverse scenarios including the codebase for the large Hadron Collider. Being humble like this gets you a long way. Anyway, that's how Ochre works and that's why I made it the way I made it. It's on GitHub. Hoping to release it on crates.io. Thank you for listening. diff --git a/2020-global/talks/03_LATAM/02-glowcoil.txt b/2020-global/talks/03_LATAM/02-glowcoil.txt deleted file mode 100644 index d78975c..0000000 --- a/2020-global/talks/03_LATAM/02-glowcoil.txt +++ /dev/null @@ -1,36 +0,0 @@ - Ochre: Highly portable GPU accelerated vector graphics - Micah - - Inaki: Welcome back, everyone, for our second talk. I would like to present Micah. Micah is a software engineer who works in the game industry by day and is interested in Rust's potential for applications from game development to music software and visual art. He will be presenting Ochre, a vector graphics and text rendering library that's designed to be portable and GPX accelerated. - >> Glowcoil shows how vectors can act to create a great UI. In fact, they are easy to do on a slow GPU! And they won't fall together with stacked. - >> Okay. Looks like we have a... - >> Glowcoil shows how vectors can ask, and in fact they are easy to do on a slow GPU and they won't fall together when stacked. - >> So, I'm really sorry about this little hiccup. Let me finish the introduction. So, yes. As he was saying, Micah works as a software engineer in the games industry, does music and also GPU accelerated software. And if I understand correctly, you do also portable stuff. So, you work on mobile phones. - Micah: Portable is more intended in the sense of just working on different operating systems, potentially mobile, potentially in the browser as well. - >> Cool. Well, with the intro, thank you very much and take it away. - Micah: Thanks. Hi, I'm Micah Johnston. I also go by the username glowcoil. Today I'm presenting a project I have been working on called Ochre which is a GPU accelerated vector graphics and text renderer library for Rust. And the primary use case that I've intended Ochre for is UI rendering. So, first, I'm gonna try to answer the question why I'm making a vector graphics render in the first place. So, I wouldn't make the claim that vector graphics is by far the dominant type of representation for graphical content in user interfaces today. We use it for our font formats and as well as emoji. And HTML and CSS are... they basically comprise a vector format and a pretty massive amount of UIs use HTML, CSS and the browser rendering engine today. - And then beyond that, most OS platform UI toolkits and cross platform UI toolkits all use... pretty much use vector graphics at this point. And that's for some important reasons. So, first of all, it's just more space efficient to store the vector form of something than the image form. Especially at higher resolutions. But beyond that, it's a resolution independent format. So, suppose you want your app to run on both a retina MacBook and an older 1080p monitor, if you're using images, you have to export a new image for the retina MacBook, whereas they can be rendered the same on similar pixel densities. That's a powerful benefit. Beyond that, it's just a good toolkit for if you have anything that's layout dependent based on the size of whywindow or if you're doing something that's a wave form visualizer or or a line graph in a data visualization program. Vector graphics, it's a really good toolkit for procedural visualizations like that. - So, we would like to have a vector graphics renderer for our UIs. Now I will try to answer the question, why does it need to be GPU accelerated? There are two trends over the past 15, 20 years or so that are the reason I would say GPU acceleration is important for UIs. The resolution of screens are going up. This is a visualization of iPhone sizes over time. This is a drastic increase in and doesn't include the latest iPhone, it's like 2700x1100 pixels. That's a lot of pixels. We need to render every frame. If you want your app to run at a smooth 60 frames per second, you have to render a lot more pixels every second. And that means using more computing power. - So, at the same time we don't want to drain battery from the laptop or a mobile phone. And you need to hit that 60 frames per second deadline if you want to have a smooth app. So, the other important trend is that GPUs have become really ubiquitous in consumer hardware. So, you can... you can get a lot more computation done both per second and per watt with a GPU than a consumer single CPU core. So, this lets you be both more efficient and hit your frame deadlines and also more power efficient so that you don't drain the battery or use too much power just for rendering something simple like a UI which because presumably you would like to use the rest of your CPU for other applications. Your app is presumably not just a UI. - So, because of increasing resolutions and you should note, refresh rates are also starting to become an issue because 120 hertz, 144 hertz monitors are starting to enter the market. So, that's even more pixels you have to paint per second. GPUs are really good for highly parallel tasks. And rendering happens to a lot of aspects of 2D rendering happen to be highly parallel. Since a lot of tasks you do the same operation per pixel. So, it's a good fit for GPUs. The set of GPUs available in consumer hardware are a good fit for UIs. And as kind of evidence of this, both macOS and Windows have been using GPU hardware to accurate... basically the step uptaking the windows you have home and painting them on toone frame on your monitor, which is an operation called compositing. Both the Mac and the Windows have been using that since Mac since 2004 and Windows since 2007 with Vista. In addition to that, browsers are also increasingly taking advantage of the GPU. - So, there's evidence that it is a good idea to use GPU acceleration to render the UI for both efficiency and power efficiency reasons. So, hopefully I've convinced you that a GPU accelerated vector renderer is a desirable thing to have for a UI. But using the GPU comes with a catch. That is that you can't write a single program and run it on every GPU out in the wild. There is a big variety of manufacturers and then of APIs and platforms which give you some subset of the APIs, but not all of them. So, I have this table here showing which APIs are available on which operating systems. - And there isn't really an API that has full coverage of all the operating systems you might want to target. OpenGL looks promising. But it's officially deprecated on Apple platforms. And even before that, there was only an older version with more limited features was supported. And this table actually looks a bit more rosy than the real situation. Because and the... Vulkan and the Metal are only available if you have a 10 year old desktop, it may not be able to run the features. If you have picked the GPU API, you have to look at which you are willing to target and which users you are going to exclude from your application. So, portability is kind of a hard question. You have to figure out how you're going to make your application use different APIs on different platforms if do want to be cross platform. - Probably the hardest part of this, when you write code that runs on the GPU, you use what's called a shading language. And each of those different APIs have different shading languages. So, either you're going to have to rewrite your shaders for each platform, or you're going to have to figure out some cross compilation setup where your build system includes a shader compiler which increases complexity, and you have to negotiate platform specific features when you're doing so. It's a big headache. The approach I have taken with Ochre is just to choose a small subset. The smallest subset of GPU features that are going to be available pretty much anywhere. That are beginning to let us leverage GPU performance advantages and still let us do the UI graphics that we would like to do. - So, to get into a little bit more detail about what a vector renderer has to do, there are two aspects. The first aspect is you take the shapes. For instance, a font glyph, a letter and a font, it basically defines a soled region bounded by a curve and the renderer has to determine which pixels are inside or outside that curve and then fill them in with the appropriate color. Whether there's a solid color or a gradient. - And the other aspect is taking multiple shapes like that and compositing them in order using what's called the painter's algorithm. It's called that because later things that you draw paint over earlier things. The second step, compositing, on the left here, GPUs are really good at it. And as I mentioned before, this is what operating systems and browsers have been making a lot of use of GPUs for, for a long time now. It's an interesting thing to do with GPUs, they're good at it. It's much more power efficient than a CPU. On the other hand, the operation which I will call painting, determining inside and outside of the shape like this doesn't come naturally to GPUs since they kind of only natively speak in triangles. So, you have to somehow translate these types of shapes into triangles in one way or another for the GPU to understand. - So, that's the hard part. There are a lot of different approaches to doing so. So, I have this kind of spectrum here from renderers that use more CPU to renderers that use more GPU. This is a huge oversimplification. This is super subjective. And in different ways, you could argue that they should be in different orders. So, this is just intended to kind of give a broad overview. So, on one end we have dune rendering entirely, and then there's Tessellation that busts apart this into triangles and shovels them over to the GPU to be rendered. It's robust, it's simple. It does work well for performance. But there are some big downsides. Namely that GPUs aren't capable of rendering triangles with the kind of anti aliasing that you need for small text. You have to do a hybrid approach with stereotypes, and maybe rendering the bigger shapes on the GPU. And there's another approach called stencil and cover. It uses a feature of the GPU called the tensile buffer to rather than doing the kind of mathematically hard of breaking a curve into triangles, it draws a point and draws a fan of triangles out from that point in such a way that they all cancel each other out to leave only the points inside the shape filled and the points outside the shape empty. - And it has the same disadvantages as Tessellation. It's less GPU. It's using more than the rationing. It's a tradeoff, it's not a pure win. There's an example of the library that does that called NanoVG. It uses the stencil and can have cover approach, takes a hybrid approach with text rendering, text on the CPU and shapes on the GPU. It's similar to Ochre, it's minimal library focused on portability and working on as many situations as possible. Moving further down, it's more complicated approaches such as pathfinder which is a Rust library written by Patrick Walton. It was probably the number one inspiration for Ochre. I'm not sure I would have written Ochre without Pathfinder I have to give big acknowledgments to Patrick for that. - And it's a refinement of the stencil and cover approach that does a lot of the CPU work, so you only have to do work near the edges rather than the big parts of the shape. It's CPU/GPU hybrid. Both places. It can outperform rasters like Cairo. It's a hybrid approach like that, it's not pure GPU. And then you... you the next item on my list here is a vector textures architecture. - It works kind of differently. The way it works is you have a CPU pre process. And then you can render it many times on the GPU from many angles. For instance, at 3D scene. So, it offloads a lot of the work to the GPU. But it has performance tradeoffs more appropriate to the game than a UI because the reason for that is the end to end render time from loading a font or generating a scene from scratch to processing it, uploading it, and rendering it on the GPU is not even really faster than maybe full on CPU rasterization sometimes. So, I don't consider this a good approach for UIs. But it is a good approach for other situations such as games. - And then finally, last on the list, we have a pure GPU compute renderer. Which uploads the scene as a data structure to the GPU. Uses modern GPU compute features to render it from scratch there. You'll notice Pathfinder also appears here because Patrick Walton in recent months developed another renderer for Pathfinder that uses GPU community. And there's another Rust library, piet gpu by Levine. And they are impressive performance with a high end GPU. They scale up to really using the GPU. But there's kind of this central tradeoff here. As you use the GPU better, you can scale up better with bigger GPUs. But it makes it harder to work on older hardware and makes it harder to port between different APIs. And this is, as I understand it, this is why Pathfinder has both renderers. Because the non compute rendering is trying to achieve more portability. Ochre is trying to strike a balance on this tradeoff. It's an attempt to strike a balance closer to Tessellation. But also get higher quality so you can render text with it. - Let me go over a little bit of my earlier process that led me to the current state of Ochre. So, the first thing they worked on, I was working on this for about a year starting about a year and a half ago. And it works like the vector textures architecture I mentioned earlier. So, it has the tradeoffs that I mentioned where it's better suited to a 3D game where you render something that's the same. Many times from different angles. And I put a lot of work into this and then I decided it wasn't the appropriate approach for UIs. So, I've kind of shelved it for now. But Gouache will return eventually. But I was searching for something that struck that tradeoff better for UIs. - And this was the first thing I got really excited about. I call it sparse scanline rendering. And the way it works is it only renders the pixels that are intersected by the outlines. And then it... and then it uploads those at horizontal lines, the GPU. I like to call it the GL_lines gobrrrr architecture. You can see on this diagram, anything that's not a pixel intersected by the curve doesn't have to be processed on the CPU and gets filled in by the GPU, which is good at doing kind of simple, highly parallel tasks like filling in a bunch of solid pixels. - So, yeah. You can see here. This is what the horizontal lines look like that get uploaded to the GPU. And you can think of this architecture as run length encoding the image. So, you have to upload less data and compute less data in the first place because you inherently skip all the work of this solid spans. And this was... this had much better performance than I expected from how kind of weird of a design and also how simple it is. So, for complex scenes like the the there's a tiger scene that this is a clipping of. I was getting times that were 10xs as fast as Cairo, which is a single threaded CPU renderer. That was really promising. - But it has some downsides. Basically, it doesn't handle humongous solid spaces very well. So, if you're rendering a full screen rectangle, the GPU gets stressed out by how many lines you're trying to shovel through it. And it is about 5xs slower than doing it with two rectangles. I tweaked this approach and ended up closer to... this is similar to what Pathfinder does, it does more on the CPU, whereas Pathfinder does more on the GPU. Basically I break it down into edge, tiles and spans. Rather than edge pixels and solid spans, I have 8x8 edge tiles and then solid 8xN spans in the middle. And I pack these into an Atlas texture, upload that to the GPU and render it all using triangles to make up those rectangles. This is an example texture Atlas this is what it looks like for the tiger. This is all of those little 8x8 chunks put in order in the Atlas. - So, that's how Ochre works. And I'm gonna get a little bit more into how it... how it works from an API design standpoint. So, basically, I I wanted Ochre to be usable as a component rather than kind of taking over the design of your program if you use it. So, when it builds these tiles and spans, what it does is it just builds that data for you and then you can take that data. And you can upload it yourself to the GPU. So, that lets you whether you're on direct X or OpenGL or Metal, whether you have an existing game energy you want to use or you're using some proprietary console API that I couldn't have foreseen or added to Ochre as an API, you can just add the support yourself very straight forwardly using some simple operations. - And it will still use the GPU efficiently. But all the work... all the work to build this data is done for you by Ochre and then, so, I like to think of this as humble library design where you you respect what the user wants to do. The user knows best their platform and performance things. Performance constraints. And you just make a library that can serve as a component alongside the other components of an application. Rather than kind of trying to take over and insist on things being done a certain way. I think that's the best way to be portable because it allows many different situations to make use of your library. - And I guess one more shoutout for inspiration here walk the Dear ImGui library, a UI rendering library for C++ which takes a very similar approach and it's been used in very diverse scenarios including the codebase for the large Hadron Collider. Being humble like this gets you a long way. Anyway, that's how Ochre works and that's why I made it the way I made it. It's on GitHub. Hoping to release it on crates.io. Thank you for listening. I don't think I have very much time left. I think I have a minute or two left. - Inaki: We'll take questions in the chat. Thank you, Micah, I think we'll take questions in the chat. - Micah: Okay. Well, thank you, everyone. - Inaki: Thank you for that great talk. It's... graphics is on everyone's minds lately. - diff --git a/2020-global/talks/03_LATAM/03-Sean-Chen-published.md b/2020-global/talks/03_LATAM/03-Sean-Chen-published.md new file mode 100644 index 0000000..a46f2a5 --- /dev/null +++ b/2020-global/talks/03_LATAM/03-Sean-Chen-published.md @@ -0,0 +1,120 @@ +**The Anatomy of Error Messages** + +**Bard:** +Sean Chen wants to show the appeal +of nicely with errors to deal +seeing rustc's example +there really are ample +suggestions you really should steal + + +**Sean Chen:** +Oh, man, those are great. Welcome to my talk titled The Anatomy of Error Messages. So, again, I know they gave a little bit of a recap for me. But just to kind of reiterate a little bit about me. I teach computer science for my day job over Zoom. So, I guess I do this kind of thing a lot. But anyways, a shameless plug. I also produce a podcast called the Humans of Open Source where I talk to people who work in open source software. I talk to a lot of people who... especially who work in had Rust open source. So, if you're interested in kind of hearing any of those discussions, shameless plug for my podcast. + +And then, the last piece of kind of relevant information here is I co shepherd the Rust Error Handling Working Group. Where we basically work to lobby and advocate for improving Rust's error handling ecosystem. And this last role was kind of... would help to pose this question for me specifically with... well, this question of like, you know, how are Rust's error messages so helpful, right? For me especially I don't think I would have stuck with Rust as long as I have without Rust's error messages being as kind of helpful and straightforward and unintimidating as they are. + +So, I think really these are... Rust's error messages are one of the... I suppose one of its killer features. Even though sometimes I also think they're a little bit... what's the word I'm looking for? They're not given as much credit as... as usual. They're just not in the limelight as much, I suppose. And if I had to go ahead and kind of rank different programming languages kind of on a tier list of the quality of their error messages, I would probably go ahead and do it something like this where I would put Rust in S tier. And I think kind of the only other language that I would also say is kind of in the same class would be Elm. Because I know Elm also... the community there also cares a lot about having their error messages be really helpful and unintimidating. And then kind of everything else going down. Something like this. + +By the way, this is just my opinion. And I will also say one thing that I think I've gotten a lot of practice with teaching over Zoom every day is I've gotten really good at imagining that my audience always laughs at my jokes. So, that being all said, the norm here with error messages. Here's an example of a C++ error message. But the norm with error messages for developers, for programmers, is there something we have to decipher, right? We have to figure out how to read them, right? A lot of us have gotten pretty good and gotten a lot of practice with the algorithm in our heads with an error message. There's a set of steps we have to follow. I have to decipher what this is actually saying. Of this is a little bit of, I suppose, an egregious example. But it's not just C++. You see this in lots of other languages as well. Here's a Python example. Doesn't exactly tell you straight up what the problem is, right? You kind of have to look at this and try to figure out what the interpreter is telling you. And then, of course, in JavaScript, I just wanted to bring this one up in particular because you get the infamous undefined is not a function error message that is so unhelpful. + +Right. So, the form with a lot of error messages and a lot of different programming language ecosystems is just like they're... they leave a lot to be desired, right? And, again, as programmers, we have to kind of learn to decipher them. And especially for me, as someone who teaches neophyte programmers, like error messages are very intimidating for newcomers. For people trying to get into programming for the first time. And especially even more me. I can remember when I first started to learn programming, which was in JavaScript, that was kind of like my first language that I really tried to learn. + +But when an error message popped up, I was just so intimidated that I wouldn't even read it, right? I was just kind of like that overwhelmed by every time I saw an error message. And I would actually just go and poke someone who I thought knew what they were doing and ask them, like, hey, can you decipher this for me because I'm too scared to read it. + +all this to say, you don't have perfect error messages in Rust. This is a Rust error message that is a little bit obtuse, also leaves something to be desired. Obviously, there's still work to be done in the Rust ecosystem around Rust error messages, even around Rust C. Most of the time you're going to get something a little bit more sane, a little bit more straightforward. hopefully a little bit less intimidating where, you know, you... where it's very straightforward, right? And it points a lot of helpful ASCII art, here is the offending line, here is the location. Here's what I, Rust C, think is wrong. And here is a good suggested fix. + +So, the question that I kind of wanted to pose during this talk is this delta between the best in class error messages and kind of everyone else, I suppose, is... is this like... is this a result of culture within those ecosystems? Or is it a question of technology? And I guess what I mean by that is, is there just some crazy kind of like architecture or technology or algorithmic trick that is going on in like the Rust compiler that makes these error messages possible or makes them easier to be, you know, created when something goes wrong in the compilation process inside of Rust C? + +So, you kind of buy this dichotomy that I'm presenting. The nice thing from there is we can kind of go ahead and do a little bit of spelunking inside of Rust C to kind of uncover by process of elimination like is this a question of technology? Or is it a question of culture? + +So, to start off, if we can kind of distill down error messages into kind of this nice standard format where up at the top here you can see is what's called the level where it tells you, is this an error? Or it could be a warning or could be a lint, for example. You have the error code. Which you might hear it called like the error index. So, there's like this nice error index documentation kind of classifying the error by this index. You can go ahead and like take this error code if Rust C gives it to you. It doesn't give you an error code for every single error message. But for the ones that it does give you, you can go ahead and take that and basically look that up inside of the error index to get some more context, some more information on this type of error that you're seeing. + +And then, of course, you have the main error message, the location, the code in question, all the nice ASCII art that's pointing straight at the code in question. And then notes as well as any sub diagnostics to try to be helpful and provide you with a little bit more context as to what... what the error is in your code. There's in nice kind of standard format. And the type or the structure that deals with this at the end of the day inside of rustc itself is this diagnostic type. And so, again, we can see there's a nice kind of one to one mapping of everything we just saw in that kind of standard error message format. + +So, the way error messages are kind of surfaced in rustc. First off, we have to talk a little bit about how Rust C compiles the code and runs it. There's actually multiple phases to which the Rust compiler is running your code. First off is the parsing phase. Takes the source code and needs to go ahead and parse that into some internal representation. That's called the parsing phase. And there are certain types of errors that are caught within the parsing phase. So, we can go ahead and look at an example like this. + +Just trying to go ahead and collect some numbers into a vector of unsigned 32 bit integers. If you stare at this code hard enough, you might realize, oh, we need a turbofish. Or we didn't correctly write it in this case. So, we're going ahead and collecting, and we use a turbofish to note what kind of collection we want to collect into. We use a turbofish to that. And we didn't use the correct syntax. Rust C says you forgot the turbofish. My suggestion is to go ahead and add those. At the end of the day, inside the Rust compiler, there's functionality that specifically looks for this case and determines that if this is the case, spits out this error message and determines that is the most relevant error message to fix the code that you're trying to run. + +And specifically in the case of this error message, again, it happens during the parsing phase and if we're to kind of follow that trail down into Rust C, we can see that inside of this function, that happens again during parsing, this parse.suffix function, down here, we can see this function here, that's called check turbofish, missing angle brackets. So, A + on the function. And here is the actual body of the function itself. And it's basically checking... it basically tries to... what it tries to do, it actually... it assumes whatever is after the two colons of the... it actually tries to parse that as a valid statement. And if it sees that that is actually a correct expression or an expression that makes sense, then it will go ahead and surface the error of, oh, this makes sense to me if I put in angle brackets, so, that's what the problem is. You didn't put angle brackets in, I'm going to go ahead and suggest that you put those in. This function will also specifically check for a extra leading angle bracket. But interestingly, it won't check for if you have an extra trailing angle bracket. That actually then surfaces a different error message all together. + +This is just one example of a type of error that's surfaced during parsing and the thing with when you're actually going ahead and parsing, there's this one data structure that is responsible for the entire parsing phase, which is called the parse session here. And so, we can see kind of up here at this type... at the top of this type here, this span diagnostic is what is kind of responsible for holding on to all of the different error messages or diagnostics, I should say, that crop up during the parsing session specifically. + +Other sorts of error messages that can crop up, or I should say different phases of the compilation process where, of course, other errors... other sorts of error messages can crop up. Of course, you have one example here that we'll look at mutability. Right? So, there's a separate phase after parsing when rustc is running through your code that specifically checks for mutability. This actually happens during the phase where rustc is kind of validating the borrow checker rules. I'm not sure why it makes sense that mutability is checked when it's checking the borrow checking rules, but that's how it works. + +So, with something like this, we go ahead and initialize a string and then we go ahead and try to insert or basically mutate that string. But we forgot to, of course, denote the string is supposed to be mutable. We forgot in this case, that's exactly what rustc tells us. So, again, there is again a function inside of rustc that specifically handles this error class. And we can find that function here called report mutability error. And again, this happens during the borrow checking phase and I definitely cannot take a screenshot of the entirety of this function. It was something like 434 lines of code. So, it was pretty big. + +But, of course, during the borrow checking phase as well, it also checks for lifetimes. And wants to ensure that, you know, you don't dangle any references and all that good stuff that the borrow checker is, of course, famous for. And so, an example like this where we're going ahead and trying to push a couple of references to some vector. But doing that inside of a closure, we go ahead and create those references. And then pushing those references to this vector outside of our closure, that's gonna go ahead and yell at us for saying, hey, these references don't live long enough because they get dropped at the end of this closure. + +And then this particular class of error then is handled by this function. Again, with A + naming called report borrowed value does not live long enough. These two errors, again, they're happening in the borrow checking phase which at the end of the day is kind of governed by this borrowed checker context struct which is, again, pretty big. But we can see down here, this errors buffer where it's basically holding on to all of the diagnostics that are created during this particular phase. + +So, to step back a little bit and try to make sense of all of these different errors... or different ways in which diagnostics are surfaced, when I was doing research for this talk, one thing that kind of kept... one word that kept kind of cropping up in my head while I was, looking through this stuff and digging through rustc. But the thing that cropped up in my mind, the word that I would kind of attribute all of this to is eagerness. And so, what I mean by that is well, both... we can look at this both in kind of the programming context as well as the more general context of what eagerness means, right? So, if we think of eagerness as a programming context, that's the option of laziness. Which is to say, every chance we get to go ahead and do a thing, we're going to do it. Whereas laziness on the opposite end of the spectrum, we're only going to do a thing at the last minute when we can no longer get away with not doing it anymore. + +In the programming context, eagerness is showcased when diagnostics are constructed in rustc. Because it turns out every time something could go wrong as in some kind of diagnostic could be addressing some error in the compilation process, rustc will go ahead and do it. And one of the methods in the diagnostic class is this cancel method. What's actually going on, it's basically, any chance it gets to start creating a diagnostic because something might be going on in the compilation process, it will go ahead and do it. At the end of the day, rustc only wants the relevant errors. + +When it is compiling, it sees errors that might not be relevant to the task at hand and might cancel those. We can think of that in the more general sense of what the word "Eager" means. Someone wants to help you, or someone is eager to lend you their support, right? And that's exactly what this makes me think of, right? And so, this goes back to... this speaks to, again, the question that we had coming into this talk which was... was this culture of technology. To me, this really speaks about culture. Right? + +Rustc and the developers who worked on rustc, they're all in that way eager to help you. They're eager to provide helpful context, provide helpful errors to make your job, your workflow as a developer easier. If you have to spend less time thinking about deciphering messages, that's more time you can work on the code you are writing. It is culture and technology. I think technology takes a little bit of a backseat. Because at the end of the day, even though all the stuff we just saw going through the code examples, it is... it is cool. And it's probably ingenious in a sense. + +But at the end of the day, I don't think it is more complex or more crazy or more ingenious than anything else inside of rustc at the end of day. So, in my mind, it really is a question of culture. And more specifically, I think the culture of the community is what informs the technology that we have going on in this case. + +And it's been really interesting as well seeing some other research specifically around kind of like culture and error messages and how the two kind of have this feedback loop. So, one thing I actually found interesting was a research paper done in 2011 by some researchers who looked at racket. I should say they... well, they did a bunch of research on students looking into what is... how helpful better error messages are, basically. And again, the takeaway there, no surprise really, but, you know, better error messages led to a better learning experience and a smoother learning curve for students. New students getting into programming. In this case. + +And so, you know, this was done in 2011, but I think really the takeaway of this particular research was, well, you know, it's not a question of technology. The technology is all there to actually make it happen. Really, I think it is just a question of increasing the priority of... or, you know, yeah. Basically... making the error messages a higher priority in your language ecosystem at the end of the day. And Rust early on definitely made that a priority and made that a very... as a very conscious choice on the part of the early Rust Core developers. + +To go ahead and look at some other languages, here is a blog post that the creator of Elm, again, kind of the other programming language that I would consider to have S tier error messages. So, Evan wrote a very thorough and useful blog post. Specifically on the same thing. But again, in an Elm context. And the thing they found really interesting from this particular blog post is that he says, I recently took a couple of weeks to really focus on this being... improving the error messages in Elm. And so, you know, he took a couple of weeks, which is to say, like, it... but, yeah, it is a time investment. But it wasn't like an exorbitant time investment. And, you know, really sat down and deliberately thought about how can we make error messages in Elm, you know, really good, really helpful? If you've never seen an Elm error message before, it is in format pretty similar, I would say, to Rust's error messages. They have ASCII art in there as well and color code the error messages to make it as straightforward and unintimidating as possible. Right? + +And at the end of the day, doing that really helps new people trying to get into your language. But at the same time, it's also I would say really useful for even seasoned developers working in your language, right? Because, again, it just lowers that overhead, that mental overhead of when you encounter an error message. You don't have to go through all of that mental algorithm to go ahead and decipher what that error message is saying. It's just kind of right there and you can just kind of address it and go on with your day. It's super great. And makes for a much better workflow, I think. + +Some other languages I'll also quickly mention like Swift and also TypeScript. These languages are doing some interesting things as well. They're I would say taking a slightly different approach. But that's mostly because Swift and TypeScript both have really nice integrations with IDEs. So, you know, Swift with Xcode and TypeScript with VSCode, there's some really cool IDE stuff they can do there. That's a nice and streamlined way to surface error messages. So, I think... I think I've ranked these in the A tier in my tier list. And I think these two languages in particular are doing... I would say they also have kind of embraced this... this notion of improving the culture around error messages and are doing some cool things there as well. + +But ultimately, you know, I think it would be super great as this culture of just trying to be more helpful to developers, trying to be more helpful to new learners of a programming language. As that culture kind of like makes its way into other language ecosystems as well, hopefully eventually we'll get to a point where we have something that looks more like this. And overall at the end of the day, I think that would be super great for everybody. + +And yeah. That's my talk. Hope you found that insightful. And just some references as well. In case... so, like the paper I talked about is here as well as Evan's... Evan's blog post that I mentioned as well as the rustc Dev guide. That was definitely the most helpful resource when I was doing research into all of this. + + +**Inaki:** +Great. Thank you, Sean. + +**Sean:** +Yep. + +**Inaki:** +Quick question. Anything you would change about Rust's diagnostic interface? Diagnostics with a capital D. + +**Sean:** +I would say I think at this point with kind of the format that it has... it presents in the terminal, right? I think that's probably the best you can do there. Some interesting conversations I've had with people who work on this more, I know they've had some pretty cool ideas such as if, hey, if we had better integrations into some kind of IDE, then we would be able to do some of the things that Swift and TypeScript are doing. I don't know if Rust Analyzer is working on that. But it would be great. It's been great seeing if you go back and look at some of the earlier pull requests for how to improve Rust's error messages. So, like... I was thinking of some names and I'm totally blanking. But anyways. Sorry. There have been some pretty interesting ideas that people... contributors to Rust have kind of thought of before for how to go ahead and improve error handling or error handling in rustc specifically. Like yeah. I couldn't name any off of the top of my head right now. But there's some really interesting ideas there. Some didn't end up gaining traction, unfortunately. But yeah. + +**Inaki:** +What about any missing error messages? Like either in rustc or Clippy or maybe uploaded from Clippy to rustc? Like, for example, the recent C string pointer lint. + +**Sean:** +Yeah. That's... that's a little bit... that's a little hard for me to say to be honest. Especially the point about Clippy. Because even though I think it would be interesting to kind of like fold Clippy into rustc, at the same time I do also know there's this very deliberate... there's this very deliberate thing where, you know, a lot of... where the philosophy of Rust is just like we want most things to be in libraries. And not folded directly into the standard library. Or folded into the compiler itself. + +So, yeah. I think... I would think the current way it's done right now with Clippy is probably what adheres to that philosophy the best in this case. + +**Inaki:** +Do you feel that the design decisions of Rust with respect to error locality helps the messages compared to, for example, another language like TypeScript? + +**Sean:** +Yeah. I think so. I would think so. Sorry. I don't have more to say on that. + +**Inaki:** +And last question. One second. I had it around here somewhere. If there were one area of improvement when it comes to compiler diagnostics in Rust, what would that be? + +**Sean:** +As far as compiler diagnostics, again, I feel like those are mostly at a pretty good place. And I say that because I know there's concerted effort that continually goes into working on those. I think maybe where it would be a better or more helpful would actually be to devote more time and attention to improving error messages in libraries. And so, actually some of the stuff that... that we do on the Rust error handling working group is more targeting that specifically. Like disseminating this culture of improving error messages to outside rustc. Because, again, I think even though this was a really cool kind of spelunking tour, for the most part I don't worry too much about the state of error handling in rustc itself. People really care about that in the Core team. I think that's probably in as good a spot as it's going to get. + +**Inaki:** +Cool. Thank you so much for your talk and your answers. + +**Sean:** +My pleasure. + +**Inaki:** +It was really interesting. So... + +**Sean:** +Yeah. Thank you so much. And thanks so much to everyone who put on this wonderful conference and giving me the opportunity to give this talk. It was great. + +**Inaki:** +All right. diff --git a/2020-global/talks/03_LATAM/03-Sean-Chen.txt b/2020-global/talks/03_LATAM/03-Sean-Chen.txt deleted file mode 100644 index 15b99ca..0000000 --- a/2020-global/talks/03_LATAM/03-Sean-Chen.txt +++ /dev/null @@ -1,53 +0,0 @@ - The Anatomy of Error Messages - Sean Chen - - >> Hello, everyone. Can you hear me? Yes. I'm Yuli. I'm the emcee for this block. I would like to present the next that is by Sean Chen. Sean Chen's talk is The Anatomy of Error Messages. And it's presented by Sean Chen. - And Sean Chen works as a computer science teacher and has been involved with the working group focusing on codifying and improving the story. If you have any questions, you can write in the group. And we can ask it after the talk in the QA. And thank you. Welcome. - >> Sean Chen wants to show the appeal of nicely with errors to deal, seeing Rust C example, there really are ample suggestions you really should steal - Sean Chen: Oh, man, those are great. Welcome to my talk titled The Anatomy of Error Messages. So, again, I know they gave a little bit of a recap for me. But just to kind of reiterate a little bit about me. I teach computer science for my day job over Zoom. So, I guess I do this kind of thing a lot. But anyways, a shameless plug. I also produce a podcast called the Humans of Open Source where I talk to people who work in open source software. I talk to a lot of people who... especially who work in had Rust open source. So, if you're interested in kind of hearing any of those discussions, shameless plug for my podcast. - And then, the last piece of kind of relevant information here is I co shepherd the Rust Error Handling Working Group. Where we basically work to lobby and advocate for improving Rust's error handling ecosystem. And this last role was kind of... would help to pose this question for me specifically with... well, this question of like, you know, how are Rust's error messages so helpful, right? For me especially I don't think I would have stuck with Rust as long as I have without Rust's error messages being as kind of helpful and straightforward and unintimidating as they are. - So, I think really these are... Rust's error messages are one of the... I suppose one of its killer features. Even though sometimes I also think they're a little bit... what's the word I'm looking for? They're not given as much credit as... as usual. They're just not in the limelight as much, I suppose. And if I had to go ahead and kind of rank different programming languages kind of on a tier list of the quality of their error messages, I would probably go ahead and do it something like this where I would put Rust in S tier. And I think kind of the only other language that I would also say is kind of in the same class would be Elm. Because I know Elm also... the community there also cares a lot about having their error messages be really helpful and unintimidating. And then kind of everything else going down. Something like this. - By the way, this is just my opinion. And I will also say one thing that I think I've gotten a lot of practice with teaching over Zoom every day is I've gotten really good at imagining that my audience always laughs at my jokes. So, that being all said, the norm here with error messages. Here's an example of a C++ error message. But the norm with error messages for developers, for programmers, is there something we have to decipher, right? We have to figure out how to read them, right? A lot of us have gotten pretty good and gotten a lot of practice with the algorithm in our heads with an error message. There's a set of steps we have to follow. I have to decipher what this is actually saying. Of this is a little bit of, I suppose, an egregious example. But it's not just C++. You see this in lots of other languages as well. Here's a Python example. Doesn't exactly tell you straight up what the problem is, right? You kind of have to look at this and try to figure out what the interpreter is telling you. And then, of course, in JavaScript, I just wanted to bring this one up in particular because you get the infamous undefined is not a function error message that is so unhelpful. - Right. So, the form with a lot of error messages and a lot of different programming language ecosystems is just like they're... they leave a lot to be desired, right? And, again, as programmers, we have to kind of learn to decipher them. And especially for me, as someone who teaches neophyte programmers, like error messages are very intimidating for newcomers. For people trying to get into programming for the first time. And especially even more me. I can remember when I first started to learn programming, which was in JavaScript, that was kind of like my first language that I really tried to learn. - But when an error message popped up, I was just so intimidated that I wouldn't even read it, right? I was just kind of like that overwhelmed by every time I saw an error message. And I would actually just go and poke someone who I thought knew what they were doing and ask them, like, hey, can you decipher this for me because I'm too scared to read it. - all this to say, you don't have perfect error messages in Rust. This is a Rust error message that is a little bit obtuse, also leaves something to be desired. Obviously, there's still work to be done in the Rust ecosystem around Rust error messages, even around Rust C. Most of the time you're going to get something a little bit more sane, a little bit more straightforward. hopefully a little bit less intimidating where, you know, you... where it's very straightforward, right? And it points a lot of helpful ASCII art, here is the offending line, here is the location. Here's what I, Rust C, think is wrong. And here is a good suggested fix. - So, the question that I kind of wanted to pose during this talk is this delta between the best in class error messages and kind of everyone else, I suppose, is... is this like... is this a result of culture within those ecosystems? Or is it a question of technology? And I guess what I mean by that is, is there just some crazy kind of like architecture or technology or algorithmic trick that is going on in like the Rust compiler that makes these error messages possible or makes them easier to be, you know, created when something goes wrong in the compilation process inside of Rust C? - So, you kind of buy this dichotomy that I'm presenting. The nice thing from there is we can kind of go ahead and do a little bit of spelunking inside of Rust C to kind of uncover by process of elimination like is this a question of technology? Or is it a question of culture? - So, to start off, if we can kind of distill down error messages into kind of this nice standard format where up at the top here you can see is what's called the level where it tells you, is this an error? Or it could be a warning or could be a lint, for example. You have the error code. Which you might hear it called like the error index. So, there's like this nice error index documentation kind of classifying the error by this index. You can go ahead and like take this error code if Rust C gives it to you. It doesn't give you an error code for every single error message. But for the ones that it does give you, you can go ahead and take that and basically look that up inside of the error index to get some more context, some more information on this type of error that you're seeing. - And then, of course, you have the main error message, the location, the code in question, all the nice ASCII art that's pointing straight at the code in question. And then notes as well as any sub diagnostics to try to be helpful and provide you with a little bit more context as to what... what the error is in your code. There's in nice kind of standard format. And the type or the structure that deals with this at the end of the day inside of rustc itself is this diagnostic type. And so, again, we can see there's a nice kind of one to one mapping of everything we just saw in that kind of standard error message format. - So, the way error messages are kind of surfaced in rustc. First off, we have to talk a little bit about how Rust C compiles the code and runs it. There's actually multiple phases to which the Rust compiler is running your code. First off is the parsing phase. Takes the source code and needs to go ahead and parse that into some internal representation. That's called the parsing phase. And there are certain types of errors that are caught within the parsing phase. So, we can go ahead and look at an example like this. - Just trying to go ahead and collect some numbers into a vector of unsigned 32 bit integers. If you stare at this code hard enough, you might realize, oh, we need a turbofish. Or we didn't correctly write it in this case. So, we're going ahead and collecting, and we use a turbofish to note what kind of collection we want to collect into. We use a turbofish to that. And we didn't use the correct syntax. Rust C says you forgot the turbofish. My suggestion is to go ahead and add those. At the end of the day, inside the Rust compiler, there's functionality that specifically looks for this case and determines that if this is the case, spits out this error message and determines that is the most relevant error message to fix the code that you're trying to run. - And specifically in the case of this error message, again, it happens during the parsing phase and if we're to kind of follow that trail down into Rust C, we can see that inside of this function, that happens again during parsing, this parse.suffix function, down here, we can see this function here, that's called check turbofish, missing angle brackets. So, A + on the function. And here is the actual body of the function itself. And it's basically checking... it basically tries to... what it tries to do, it actually... it assumes whatever is after the two colons of the... it actually tries to parse that as a valid statement. And if it sees that that is actually a correct expression or an expression that makes sense, then it will go ahead and surface the error of, oh, this makes sense to me if I put in angle brackets, so, that's what the problem is. You didn't put angle brackets in, I'm going to go ahead and suggest that you put those in. This function will also specifically check for a extra leading angle bracket. But interestingly, it won't check for if you have an extra trailing angle bracket. That actually then surfaces a different error message all together. - This is just one example of a type of error that's surfaced during parsing and the thing with when you're actually going ahead and parsing, there's this one data structure that is responsible for the entire parsing phase, which is called the parse session here. And so, we can see kind of up here at this type... at the top of this type here, this span diagnostic is what is kind of responsible for holding on to all of the different error messages or diagnostics, I should say, that crop up during the parsing session specifically. - Other sorts of error messages that can crop up, or I should say different phases of the compilation process where, of course, other errors... other sorts of error messages can crop up. Of course, you have one example here that we'll look at mutability. Right? So, there's a separate phase after parsing when rustc is running through your code that specifically checks for mutability. This actually happens during the phase where rustc is kind of validating the borrow checker rules. I'm not sure why it makes sense that mutability is checked when it's checking the borrow checking rules, but that's how it works. - So, with something like this, we go ahead and initialize a string and then we go ahead and try to insert or basically mutate that string. But we forgot to, of course, denote the string is supposed to be mutable. We forgot in this case, that's exactly what rustc tells us. So, again, there is again a function inside of rustc that specifically handles this error class. And we can find that function here called report mutability error. And again, this happens during the borrow checking phase and I definitely cannot take a screenshot of the entirety of this function. It was something like 434 lines of code. So, it was pretty big. - But, of course, during the borrow checking phase as well, it also checks for lifetimes. And wants to ensure that, you know, you don't dangle any references and all that good stuff that the borrow checker is, of course, famous for. And so, an example like this where we're going ahead and trying to push a couple of references to some vector. But doing that inside of a closure, we go ahead and create those references. And then pushing those references to this vector outside of our closure, that's gonna go ahead and yell at us for saying, hey, these references don't live long enough because they get dropped at the end of this closure. - And then this particular class of error then is handled by this function. Again, with A + naming called report borrowed value does not live long enough. These two errors, again, they're happening in the borrow checking phase which at the end of the day is kind of governed by this borrowed checker context struct which is, again, pretty big. But we can see down here, this errors buffer where it's basically holding on to all of the diagnostics that are created during this particular phase. - So, to step back a little bit and try to make sense of all of these different errors... or different ways in which diagnostics are surfaced, when I was doing research for this talk, one thing that kind of kept... one word that kept kind of cropping up in my head while I was, looking through this stuff and digging through rustc. But the thing that cropped up in my mind, the word that I would kind of attribute all of this to is eagerness. And so, what I mean by that is well, both... we can look at this both in kind of the programming context as well as the more general context of what eagerness means, right? So, if we think of eagerness as a programming context, that's the option of laziness. Which is to say, every chance we get to go ahead and do a thing, we're going to do it. Whereas laziness on the opposite end of the spectrum, we're only going to do a thing at the last minute when we can no longer get away with not doing it anymore. - In the programming context, eagerness is showcased when diagnostics are constructed in rustc. Because it turns out every time something could go wrong as in some kind of diagnostic could be addressing some error in the compilation process, rustc will go ahead and do it. And one of the methods in the diagnostic class is this cancel method. What's actually going on, it's basically, any chance it gets to start creating a diagnostic because something might be going on in the compilation process, it will go ahead and do it. At the end of the day, rustc only wants the relevant errors. - When it is compiling, it sees errors that might not be relevant to the task at hand and might cancel those. We can think of that in the more general sense of what the word "Eager" means. Someone wants to help you, or someone is eager to lend you their support, right? And that's exactly what this makes me think of, right? And so, this goes back to... this speaks to, again, the question that we had coming into this talk which was... was this culture of technology. To me, this really speaks about culture. Right? - Rustc and the developers who worked on rustc, they're all in that way eager to help you. They're eager to provide helpful context, provide helpful errors to make your job, your workflow as a developer easier. If you have to spend less time thinking about deciphering messages, that's more time you can work on the code you are writing. It is culture and technology. I think technology takes a little bit of a backseat. Because at the end of the day, even though all the stuff we just saw going through the code examples, it is... it is cool. And it's probably ingenious in a sense. - But at the end of the day, I don't think it is more complex or more crazy or more ingenious than anything else inside of rustc at the end of day. So, in my mind, it really is a question of culture. And more specifically, I think the culture of the community is what informs the technology that we have going on in this case. - And it's been really interesting as well seeing some other research specifically around kind of like culture and error messages and how the two kind of have this feedback loop. So, one thing I actually found interesting was a research paper done in 2011 by some researchers who looked at racket. I should say they... well, they did a bunch of research on students looking into what is... how helpful better error messages are, basically. And again, the takeaway there, no surprise really, but, you know, better error messages led to a better learning experience and a smoother learning curve for students. New students getting into programming. In this case. - And so, you know, this was done in 2011, but I think really the takeaway of this particular research was, well, you know, it's not a question of technology. The technology is all there to actually make it happen. Really, I think it is just a question of increasing the priority of... or, you know, yeah. Basically... making the error messages a higher priority in your language ecosystem at the end of the day. And Rust early on definitely made that a priority and made that a very... as a very conscious choice on the part of the early Rust Core developers. - To go ahead and look at some other languages, here is a blog post that the creator of Elm, again, kind of the other programming language that I would consider to have S tier error messages. So, Evan wrote a very thorough and useful blog post. Specifically on the same thing. But again, in an Elm context. And the thing they found really interesting from this particular blog post is that he says, I recently took a couple of weeks to really focus on this being... improving the error messages in Elm. And so, you know, he took a couple of weeks, which is to say, like, it... but, yeah, it is a time investment. But it wasn't like an exorbitant time investment. And, you know, really sat down and deliberately thought about how can we make error messages in Elm, you know, really good, really helpful? If you've never seen an Elm error message before, it is in format pretty similar, I would say, to Rust's error messages. They have ASCII art in there as well and color code the error messages to make it as straightforward and unintimidating as possible. Right? - And at the end of the day, doing that really helps new people trying to get into your language. But at the same time, it's also I would say really useful for even seasoned developers working in your language, right? Because, again, it just lowers that overhead, that mental overhead of when you encounter an error message. You don't have to go through all of that mental algorithm to go ahead and decipher what that error message is saying. It's just kind of right there and you can just kind of address it and go on with your day. It's super great. And makes for a much better workflow, I think. - Some other languages I'll also quickly mention like Swift and also TypeScript. These languages are doing some interesting things as well. They're I would say taking a slightly different approach. But that's mostly because Swift and TypeScript both have really nice integrations with IDEs. So, you know, Swift with Xcode and TypeScript with VSCode, there's some really cool IDE stuff they can do there. That's a nice and streamlined way to surface error messages. So, I think... I think I've ranked these in the A tier in my tier list. And I think these two languages in particular are doing... I would say they also have kind of embraced this... this notion of improving the culture around error messages and are doing some cool things there as well. - But ultimately, you know, I think it would be super great as this culture of just trying to be more helpful to developers, trying to be more helpful to new learners of a programming language. As that culture kind of like makes its way into other language ecosystems as well, hopefully eventually we'll get to a point where we have something that looks more like this. And overall at the end of the day, I think that would be super great for everybody. - And yeah. That's my talk. Hope you found that insightful. And just some references as well. In case... so, like the paper I talked about is here as well as Evan's... Evan's blog post that I mentioned as well as the rustc Dev guide. That was definitely the most helpful resource when I was doing research into all of this. - Inaki: Great. Thank you, Sean. - >> Sean: Yep. - Inaki: Quick question. Anything you would change about Rust's diagnostic interface? Diagnostics with a capital D. - Sean: I would say I think at this point with kind of the format that it has... it presents in the terminal, right? I think that's probably the best you can do there. Some interesting conversations I've had with people who work on this more, I know they've had some pretty cool ideas such as if, hey, if we had better integrations into some kind of IDE, then we would be able to do some of the things that Swift and TypeScript are doing. I don't know if Rust Analyzer is working on that. But it would be great. It's been great seeing if you go back and look at some of the earlier pull requests for how to improve Rust's error messages. So, like... I was thinking of some names and I'm totally blanking. But anyways. Sorry. There have been some pretty interesting ideas that people... contributors to Rust have kind of thought of before for how to go ahead and improve error handling or error handling in rustc specifically. Like yeah. I couldn't name any off of the top of my head right now. But there's some really interesting ideas there. Some didn't end up gaining traction, unfortunately. But yeah. - Inaki: What about any missing error messages? Like either in rustc or Clippy or maybe uploaded from Clippy to rustc? Like, for example, the recent C string pointer lint. - Sean: Yeah. That's... that's a little bit... that's a little hard for me to say to be honest. Especially the point about Clippy. Because even though I think it would be interesting to kind of like fold Clippy into rustc, at the same time I do also know there's this very deliberate... there's this very deliberate thing where, you know, a lot of... where the philosophy of Rust is just like we want most things to be in libraries. And not folded directly into the standard library. Or folded into the compiler itself. - So, yeah. I think... I would think the current way it's done right now with Clippy is probably what adheres to that philosophy the best in this case. - Inaki: Do you feel that the design decisions of Rust with respect to error locality helps the messages compared to, for example, another language like TypeScript? - Sean: Yeah. I think so. I would think so. Sorry. I don't have more to say on that. - Inaki: And last question. One second. I had it around here somewhere. If there were one area of improvement when it comes to compiler diagnostics in Rust, what would that be? - Sean: As far as compiler diagnostics, again, I feel like those are mostly at a pretty good place. And I say that because I know there's concerted effort that continually goes into working on those. I think maybe where it would be a better or more helpful would actually be to devote more time and attention to improving error messages in libraries. And so, actually some of the stuff that... that we do on the Rust error handling working group is more targeting that specifically. Like disseminating this culture of improving error messages to outside rustc. Because, again, I think even though this was a really cool kind of spelunking tour, for the most part I don't worry too much about the state of error handling in rustc itself. People really care about that in the Core team. I think that's probably in as good a spot as it's going to get. - Inaki: Cool. Thank you so much for your talk and your answers. - Sean: My pleasure. - Inaki: It was really interesting. So... - Sean: Yeah. Thank you so much. And thanks so much to everyone who put on this wonderful conference and giving me the opportunity to give this talk. It was great. - Inaki: All right. For everyone else watching, we'll now take a longer break for brunch, lunch or tea. Whatever turn it is where you are. We'll be back in 50 minutes, 45 minutes, maybe. Before that, I would like to present our artist for this break. Monrea is a DJ, brewing in the underground scene in Nairobi and beyond. And using the Py, opening up a whole world of music experimentation. Hoping to create a platform called byte, dedicated to the Sonic Py community. So, thank you for joining her performance. This should really blast! diff --git a/2020-global/talks/03_LATAM/04-Max-Orok-published.md b/2020-global/talks/03_LATAM/04-Max-Orok-published.md new file mode 100644 index 0000000..07a9453 --- /dev/null +++ b/2020-global/talks/03_LATAM/04-Max-Orok-published.md @@ -0,0 +1,99 @@ +**Considering Rust for scientific software** + +**Bard:** +Max Orok shows science, not fiction +and Rust ain't no contradiction +it sure won't spill your beans, +so use it by all means +if permitted by your jurisdiction + + +**Max:** +Hello, everyone. My name Max Orok. This is a talk called�canning Rust for scientific software. This talk is for people who are interested in Rust or maybe looking for an alternative language for their programmings or research. And we're gonna be talking about the current scientific computing ecosystem. But also Rust's place in this ecosystem and where it can help researchers write good code. + +So, I'm a mechanical engineering master's student at the University of Ottawa. And I'm working as a radiation modeling researcher. My primary tool here is actually C++. But I have been using Rust for about a year and a half. I'm also a contract software developer for a company called Mevex which is a Canadian linear accelerator manufacturer. + +So, scientific software is sort of an interesting case where you have these very strict requirements. But often times it's written by a very small team. Maybe even one person or a couple people. And they have very limited time and resources because usually they have other commitments, maybe they're professors or students or researchers. And this is actually a case where correctness of the software is very important. It's possible that scientific papers will be published based on the results. And it's important more so than in other fields that we try and make sure our programs are as bug free as possible. + +And especially when sort of research is sort of on the line based on these results. But it's also an area where performance�programs is very important. Because if your program takes forever to run, it usually means that you're going to have a lot more trouble sort of iterating or maybe running a different kind of analysis. And it's usually a very good thing when your program runs quickly. Because, you know, that means a lot of people can do their jobs a lot quicker. Especially if requirements change and you have to redo a bunch of analysis work. + +So, sort of the final thing with scientific software is that the developers usually have other jobs. And writing software might just be part of someone's job. And they don't actually consider themselves to be expert software engineers. So, these might be people who are primarily physicists or chemists and biologists. Or engineers as well. And for a lot of people, programs are just a means to an end. And sometimes sort of good software engineering practices are thrown out the window. So, sometimes compilation of a program or interpretation is the first and last unit test that it gets. And there's also this idea that if it works, don't touch it. Which I think is sort of a negative idea and I think we should be able to have the ability to refactor or programs to add new features or to improve the performance. And this is sort of an idea I think we need to combat. Especially when we build our own software for people to use. + +So, working in the radiation field and sort of going to engineering school, there's the standard case study of the Therac 25. And sort of the issues with it. So with the Therac 25 was a radiation therapy device manufactured by app atomic Energy of Canada limited. And it was part of six major accidents between 1985 and 1987. And I don't want to minimize the issues with this project. + +So, there were a number of sort of complex factors that went into the problems that the Therac 25 had. You know, there were management issues or, you know, project oversight issues. And allegedly there was only one developer who did the entire software for this machine. But also, investigators did find that data races, or concurrency bugs, in the Therac 25 control software contributed to the accidents. And I think this just goes to show a little bit that software bugs do have real world consequences. And usually it's not this serious. You know? Usually we just have to rerun our code to do another analysis job. + +But it is the case that software does affect real people. And we have to be careful to try and avoid bugs as much as possible. So, moving on to the... the existing scientific landscape we have, Python is sort of the lingua franca, or the language that everybody speaks. And I think this is a very good thing because a lot of new programmers, especially today, their first language is Python. And it's important that they're able to write software in a language they're comfortable with. + +But this also brings some problems because Python is actually usually quite slow of a language. So, when people need performance, they start to reach for languages like C and C++. And these are sort of the bedrock systems programming languages that support Python. And here I'm sort of skipping over a lot of other languages. So, for things written in FORTRAN and Julia, I think all of these languages are very important and they definitely have their place. But I'm not going to talk about them specifically here. + +So, an issue I have with the current landscape of sort of scientific computing is that moving from Python, which is a lot of people's first language, to something like C++, which is, you know, sort of a more performance oriented expert level programming language, this should be a natural step because many popular Python libraries depend on C++ as sort of a backend language. And they're actually written mostly in C++ and sort of wrapped up nicely in Python for people to use. And researcher time is usually very precious. So, a lot of people want to know how to speed up their code or get better performance. And sometimes this is actually a very difficult thing to do in Python. It's necessary to move to another language like C++. + +But unfortunately right now, this is a very difficult transition step. And, you know, there's a lot of factors going on here. And, you know, the two languages are very different, have different goals. But it is a problem because, you know, I've definitely seen people leave project because they don't feel they're up to the task. Or maybe they just abandoned their efforts and sort of keep using Python. And I think here Rust really starts to shine as a viable alternative to C++ because you can achieve the same or very similar performance, but with a kinder, sort of more gentle systems programming language. Explicitly designed for non expert users. And that's what a lot of software developers identify as. So, I think it's sort of a very important use case or possibility for Rust is sort of an alternative backend implementation language. To awe CHIEF certain performance goals. And then, of course, you know, right away maybe there are some important reasons not to use Rust. Given the comparative age of all the languages, Rust is relatively young. It's only 5 years old, and only 5 years since the 1.0 release, and Python is 30 years old and C++ is around 40. It's likely they're going to be around a lot longer also. Rust has this notion of there being a lit of a learning curve associated with it. + +But I think it's easier to get up and running good code in Rust than in other languages. The compiler does a good job of guiding you away from sort of unsafe ways of doing things. And sort of more into a correct way of doing things. And especially for beginners, I think this is very helpful. So, I know that the first few months of my writing C++, it certainly wasn't very good, and I was making all sorts of out of the bounds errors and other issues that just wouldn't happen in Rust. + +Another issue, of course, is that you already have a large codebase written in another language. And the saying is that a lot of times the right tool for the job is the one you're already using. And I think this is definitely the case. And I don't think people should be rewriting their projects completely. But I would say, you know, maybe if there's a... if there's a new component and you're sort of looking for an alternative language, I think Rust is a really good choice for this. Another point might be that there's an important library that you depend on that's actually missing on the Rust side of things. And, you know, this is definitely a valid concern. + +Rust ecosystem is smaller than that of Python, of course. Python's is enormous. And that of C++ just because it's younger. But there are ways to access Python and C++ code in Rust as well. And finally, you have things like concerns about a single vendor. So, there really is only one viable Rust compiler right now even though there's work ongoing to add it to GCC. But I will say that the Rust team has done a very good job of supporting the Rust compiler on a lot of platforms. Of course, the three major operating systems. But also a variety of other platforms and you can definitely run it on a lot of systems. So, for me, though, Rust is exciting because it really aligns with my goals as a researcher. I want to write the fastest code as I can with as few bugs as possible. And I want both of those things at once. And it's a bit of a vague goal. But Rust here really helps me because entire classes of bugs are eliminated compared to another sort of unsafe systems programming language. And this means I can actually focus my time on developing a better algorithm or actually doing some other work and not having to worry about bugs as much as I would in another language. + +The other thing is that it's a productive modern programming language with a lot of developer conveniences. But it also has competitive performance to languages like C and C++. And I think this is probably the most important point for me is that it's a language explicitly designed for non expert users. And I think other languages cater to different audiences. So, I think in some regards C++ really cares about its expert developers. And Rust does too. But it also spends a lot of time making sure that the language is sort of suitable for non expert programmers which is often the case for scientific researchers. Who maybe do not identify as experts. + +And finally, there's built in documentation and testing. And this is sort of an area where I don't really want to spend any time sort of wrangling external tools or, you know, fixing issues with them. And also, having sort of an integrated package manager in Rust is a game changer. Because personally I consider time spent writing build system code to be a necessary evil, and I want to minimize as much of that as possible. So, Rust's sort of first class dependency management is really important to me and it sort of... it lets me do... focus my time on more important things. + +So, sort of jumping right in, just to some of the Rust features I find very useful for writing numerical code. I want to sort of preface this by saying that more than one feature... I think it's the sum total of these features which is important. So, you can sort of get analogs of these features using different flags for C and C++. But it's sort of how they're all baked into the language and they're on by default, which is really important. You don't need to know which special flags to pass to your compiler. These are all turned on right away. Right off the bat, we have no implicit conversions between primitive types. At the top, we're dividing two integers and getting a floating number out. And Rust is stopping us and saying there's mismatched types here. And we expected a floating point number, F64, but we found integers. + +This is a very common beginner mistake. And it's nice that it's caught right away here. It's not the most complex bug. But having it caught and sort of addressed right away is a big deal. This can be a little bit noisy sometimes. Here we're trying to convert between a 32 bit unsigned integer and convert it into the platform's size of integer and Rust is not happy here either because it wants us to do an explicit conversion where we try and... we try and convert the number, but if it wouldn't fit, we stopped execution. + +And so, this is, you know, a little bit noisy up front. But this also catches real bugs. Especially if we are running on something like a 16 bit platform where this would certainly be a bug. So, having these things sort of caught up front is really important because the more things that you catch or compile time, the less you have to worry about at runtime. And this is sort of a theme within Rust and it's something that the type system really helps with. So, there's also this notion of very safe defaults to a lot of operations. And it's what's sort of contributes to Rust being a memory safe language. So, as much as possible, it's not going to let you do unsafe operations. And oftentimes, the convenient thing, or the thing people default to, is the safe method. There are ways of saying I actually know what I want to do here, I want to do this specifically. But for the most part, I think safe defaults are a good choice. Especially for beginners. + +Here we have an example with a vector with three elements and we're going to try to access the tenth element. And, of course, this is a bug. And sort of the natural default way of using these brackets to access the element is sort of the safe default way. And we see here that we actually get a panic, which is sort of like Rust's way of, you know, winding down the system and stopping everything and exiting. So, we have panic, and it says the length of this vector is 3, we are trying to get the tenth element. Of course this is a bug. But right now a lot of performance oriented developers are saying, okay. Sometimes I know that my index is correct, and I don't want to pay the cost of bounds checking. Fine. Okay. Let's go ahead and do that. + +So, Rust also has these opt in low level control features where we can, you know, we can do the... sort of the quick or maybe the performance oriented thing, but we have to tell people that we're doing it. And Rust's way of doing this is is using these unsafe blocks. If there's a potentially memory unsafe operation going on. + +So, it's the same sort of examples. We have vector with three elements and we're trying to get the tenth one. But right away, it's a lot noisier. We have this unsafe block. Okay. Something unsafe is potentially happening here. You know? It's sort of... programmer's way of saying, okay, compiler, get out of my way, I really want to do this. But for people reviewing your code, it's very helpful. Because you can right away go to the unsafe block and the reviewing is shrunk because you look at the unsafe blocks and see if they're okay. And here, of course, this is not okay. We're trying to get the tenth element of a vector with only three. + +And, of course, this is gonna give us a garbage answer. And Rust documentation does a really good job of saying, this is actually not recommended. And use it with caution. And this is, you know... this unsafe block is sort of the visual equivalent of that. It's saying something potentially dangerous is happening here. And just be extra careful when you're using it. And having this... this opt in low level control is what sets Rust apart from a lot of other memory safe languages. Because a lot of times you really do know what you're doing. And Rust will say, you know, go ahead, no problem. + +But like I said, the unsafe block is sort of very helpful here. Because it reduces the... the onus on the code reviewer or yourself to look at where potentially dangerous things are happening. And I think another feature of Rust that really sort of is good for numerical programmers especially is that floating point numbers are treated with a lot of caution. And, you know, there are entire books written on handling floating point numbers correctly. And I think this is a good choice in a lot of cases. + +Here we have some potentially surprising code where we're adding .1 to itself three times. And if that's equal to .3, we're going to print out got.3. But otherwise, we're going to print "Got something else." And so, this is a bit of a common beginner mistake. You don't really want to trust one point numbers, this is got something else. It's slightly different than .3 when .1 is added to itself. And Rust is doing a good job and saying that floating point types cannot be used in patterns because this is not a very good way of doing things. And there's better ways of achieving the same result. This is going to be an error in later versions of the compiler. + +And this is sort of a bug where it may not be obvious right away, but having it caught at compile time is a big deal. And so, sometimes this can be a little bit annoying. So, the sort of default way of sorting floating point numbers doesn't actually work. So, if you're trying to sort this vector of floating point numbers, you'll come up with an error. There's a boundary that's not satisfied. There's a reason for this. Generally the reason is not a number, or the infinity values might be tricky to have a total ordering because the nan or non number value is not equal to itself. There's all sorts of these here. And, of course, you can sort floating point numbers in Rust. There's a standard way of doing it and it's in the Rust Cookbook as well. Myself personally, I prefer, if I have to do a little bit more code at the source, and which saves me from bugs later on, this is a tradeoff that I'm comfortable making and I would like to make in my code. + +But another thing that Rust does really well is actually, it's quite a good prototyping language or debugging language. Especially given that it's also a low level programming language. So, here we have a custom data structure called cool data. And we have these vectors of floating point numbers in it. But we also, you know, when you're writing code and prototyping, you really want to print out the value of your Ada often and sort of see what's going on to it, you know, what's happening during execution. And Rust does a really good job here. + +So, we can add this one line to our code, and it says, essentially, give me a debug representation of my structure. And then we can call this debug method and have printed out a really nice representation of our data. And this is great for prototyping. Because, you know, I just want to see what's happening and I want to sort of step through my code. And it's a very useful thing. And I'm using this all the time. And it's a very common pattern for people to use. So, another thing that makes writing scientific code very... sort of really helps it, is that integrated testing in Rust's package manager means that I think tests are going to be much more likely to be written. We have some sort of math expression here and we're testing it against this sort of known value. + +And without any external tools, we can write a unit test and check it right away. And this really removes a lot of the friction around testing. Especially compared to other languages where you might need an external framework. And removing friction means that people are gonna do it a lot more and it's sort of an easier tool to do. I find myself writing unit tests much more frequently in Rust than I do in C++ where it's a little bit more tricky. And in particular, I think documentation tests are really a killer feature for scientific code. Because a lot of scientific code, you need a lot of examples. And this is a way to make sure that your examples compile even if you change your code. So, here we have some documentation tests where it's sort of the same example as before. + +But this will actually be published as part of our documentation. And having this ability to write example code and... but also use it as documentation is sort of... is a really big deal. Because you can really do two things at once. And this will also ensure that your example code doesn't go out of date. Which may be a big deal if you're refactoring your project. So, sort of taken together, Rust's safety guarantees and the fundamentals of the language have a large qualitative impact on what kind of code we're capable of writing. And I'm just gonna use the example of data races are these sort of concurrency issues in multithreading code. + +So, in Safe Rust, you are actually guaranteed an absence of these data races. And this is a simple yes or no answer. Whereas a language like C++, we go to the C++ Core guidelines and the best that we can get from C++ today is this maybe. You know, maybe your code doesn't have a data race. Or maybe it does. And for me, as someone who just wants to write the code, this is 10 times or an order of magnitude better... or an order of magnitude worse than the simple yes or no answer that Rust gives us. And it's sort of a difference in what kind of code we're comfortable writing. + +And so, the thing that sets Rust apart is that software engineering best practices are built into the language in core tools. And I think that choosing Rust is going to have the biggest impact on small, resource constrained teams who don't identify as expert software developers. And Rust's place in scientific computing is a language with the speed and power of C++. But it's also a systems language explicitly designed for non experts and it's designed to lower barriers. It's a companion and complement language to C and C++. So, there's many tradeoffs between these languages. I see myself using them all in the future. And there's no one correct choice here. But Rust's foundational values help us to write good software. Thank you so much for listening. + + +**Yuli:** +Hi. Thank you. That was a great talk. + +**Max:** +Hello. Thanks. Thanks for all your help. Thanks for the intro. + +**Yuli:** +We have a couple questions. For example, here. What do you think about a natural next step to speed up Python code? + +**Max:** +Yes. I think there are definitely a lot of alternatives here. Personally, I haven't done a lot of Cython myself. But I think it's not, you know... having the Rust ecosystem is also a really important thing. And having these�examples of different ways to do things or being able to pull a lot of dependencies into your project and sort of experiment with them I think is also a really important feature that Rust offers as a language. And it is sort of this maybe a little bit of a fresh start for some people. + +**Yuli:** +Okay. Good. Okay. I have another question. What are your favorite ways to integrate Rust with Python? If any? + +**Max:** +Yes, I've definitely played around a bit with the Py O3 project, the Py Oxide Project. This is a really nice way you can do a... you can integrate Rust into Python just by exposing it as sort of a Python module. Or you can also use the Python code in Rust as well. + +**Yuli:** +Okay. Well, we don't have more time, sorry. But we can continue with the Q&A in the chat. If anyone has another question in the chat, you can answer all the questions. Thank you so much. + +**Max:** +Thank you. Thank you very much. diff --git a/2020-global/talks/03_LATAM/04-Max-Orok.txt b/2020-global/talks/03_LATAM/04-Max-Orok.txt deleted file mode 100644 index 888a931..0000000 --- a/2020-global/talks/03_LATAM/04-Max-Orok.txt +++ /dev/null @@ -1,47 +0,0 @@ - Considering Rust for scientific software - Max Orok - - Yuli: Okay. Hola [presenting in Spanish]... okay. I will speak in English now. My name is Yuli and I'm from Mexico. And I am the emcee for this block in the time. So, welcome, again. - And we are hoping you enjoyed this conference. This conference is awesome. And all of the talks of the day. And I would like to present the next talk by Max Orok. Max Orok is a mechanical engineering master's student at the University of Ottawa and has prepared a talk for researchers dissatisfied with the status quo in scientific software about why Rust is a viable alternative to how things are done today. And is a valuable repository for research. And the name of the talk is Considering Rust for scientific software. Enjoy the talk! - >> Max Orok shows science, not fiction. And Rust, a no contradiction. It sure won't spill your beans, so, use it by all means if permitted by your jurisdiction! - Max: Hello, everyone. My name Max Orok. This is a talk calledcanning Rust for scientific software. This talk is for people who are interested in Rust or maybe looking for an alternative language for their programmings or research. And we're gonna be talking about the current scientific computing ecosystem. But also Rust's place in this ecosystem and where it can help researchers write good code. - So, I'm a mechanical engineering master's student at the University of Ottawa. And I'm working as a radiation modeling researcher. My primary tool here is actually C++. But I have been using Rust for about a year and a half. I'm also a contract software developer for a company called Mevex which is a Canadian linear accelerator manufacturer. - So, scientific software is sort of an interesting case where you have these very strict requirements. But often times it's written by a very small team. Maybe even one person or a couple people. And they have very limited time and resources because usually they have other commitments, maybe they're professors or students or researchers. And this is actually a case where correctness of the software is very important. It's possible that scientific papers will be published based on the results. And it's important more so than in other fields that we try and make sure our programs are as bug free as possible. - And especially when sort of research is sort of on the line based on these results. But it's also an area where performanceprograms is very important. Because if your program takes forever to run, it usually means that you're going to have a lot more trouble sort of iterating or maybe running a different kind of analysis. And it's usually a very good thing when your program runs quickly. Because, you know, that means a lot of people can do their jobs a lot quicker. Especially if requirements change and you have to redo a bunch of analysis work. - So, sort of the final thing with scientific software is that the developers usually have other jobs. And writing software might just be part of someone's job. And they don't actually consider themselves to be expert software engineers. So, these might be people who are primarily physicists or chemists and biologists. Or engineers as well. And for a lot of people, programs are just a means to an end. And sometimes sort of good software engineering practices are thrown out the window. So, sometimes compilation of a program or interpretation is the first and last unit test that it gets. And there's also this idea that if it works, don't touch it. Which I think is sort of a negative idea and I think we should be able to have the ability to refactor or programs to add new features or to improve the performance. And this is sort of an idea I think we need to combat. Especially when we build our own software for people to use. - So, working in the radiation field and sort of going to engineering school, there's the standard case study of the Therac 25. And sort of the issues with it. So with the Therac 25 was a radiation therapy device manufactured by app atomic Energy of Canada limited. And it was part of six major accidents between 1985 and 1987. And I don't want to minimize the issues with this project. - So, there were a number of sort of complex factors that went into the problems that the Therac 25 had. You know, there were management issues or, you know, project oversight issues. And allegedly there was only one developer who did the entire software for this machine. But also, investigators did find that data races, or concurrency bugs, in the Therac 25 control software contributed to the accidents. And I think this just goes to show a little bit that software bugs do have real world consequences. And usually it's not this serious. You know? Usually we just have to rerun our code to do another analysis job. - But it is the case that software does affect real people. And we have to be careful to try and avoid bugs as much as possible. So, moving on to the... the existing scientific landscape we have, Python is sort of the lingua franca, or the language that everybody speaks. And I think this is a very good thing because a lot of new programmers, especially today, their first language is Python. And it's important that they're able to write software in a language they're comfortable with. - But this also brings some problems because Python is actually usually quite slow of a language. So, when people need performance, they start to reach for languages like C and C++. And these are sort of the bedrock systems programming languages that support Python. And here I'm sort of skipping over a lot of other languages. So, for things written in FORTRAN and Julia, I think all of these languages are very important and they definitely have their place. But I'm not going to talk about them specifically here. - So, an issue I have with the current landscape of sort of scientific computing is that moving from Python, which is a lot of people's first language, to something like C++, which is, you know, sort of a more performance oriented expert level programming language, this should be a natural step because many popular Python libraries depend on C++ as sort of a backend language. And they're actually written mostly in C++ and sort of wrapped up nicely in Python for people to use. And researcher time is usually very precious. So, a lot of people want to know how to speed up their code or get better performance. And sometimes this is actually a very difficult thing to do in Python. It's necessary to move to another language like C++. - But unfortunately right now, this is a very difficult transition step. And, you know, there's a lot of factors going on here. And, you know, the two languages are very different, have different goals. But it is a problem because, you know, I've definitely seen people leave project because they don't feel they're up to the task. Or maybe they just abandoned their efforts and sort of keep using Python. And I think here Rust really starts to shine as a viable alternative to C++ because you can achieve the same or very similar performance, but with a kinder, sort of more gentle systems programming language. Explicitly designed for non expert users. And that's what a lot of software developers identify as. So, I think it's sort of a very important use case or possibility for Rust is sort of an alternative backend implementation language. To awe CHIEF certain performance goals. And then, of course, you know, right away maybe there are some important reasons not to use Rust. Given the comparative age of all the languages, Rust is relatively young. It's only 5 years old, and only 5 years since the 1.0 release, and Python is 30 years old and C++ is around 40. It's likely they're going to be around a lot longer also. Rust has this notion of there being a lit of a learning curve associated with it. - But I think it's easier to get up and running good code in Rust than in other languages. The compiler does a good job of guiding you away from sort of unsafe ways of doing things. And sort of more into a correct way of doing things. And especially for beginners, I think this is very helpful. So, I know that the first few months of my writing C++, it certainly wasn't very good, and I was making all sorts of out of the bounds errors and other issues that just wouldn't happen in Rust. - Another issue, of course, is that you already have a large codebase written in another language. And the saying is that a lot of times the right tool for the job is the one you're already using. And I think this is definitely the case. And I don't think people should be rewriting their projects completely. But I would say, you know, maybe if there's a... if there's a new component and you're sort of looking for an alternative language, I think Rust is a really good choice for this. Another point might be that there's an important library that you depend on that's actually missing on the Rust side of things. And, you know, this is definitely a valid concern. - Rust ecosystem is smaller than that of Python, of course. Python's is enormous. And that of C++ just because it's younger. But there are ways to access Python and C++ code in Rust as well. And finally, you have things like concerns about a single vendor. So, there really is only one viable Rust compiler right now even though there's work ongoing to add it to GCC. But I will say that the Rust team has done a very good job of supporting the Rust compiler on a lot of platforms. Of course, the three major operating systems. But also a variety of other platforms and you can definitely run it on a lot of systems. So, for me, though, Rust is exciting because it really aligns with my goals as a researcher. I want to write the fastest code as I can with as few bugs as possible. And I want both of those things at once. And it's a bit of a vague goal. But Rust here really helps me because entire classes of bugs are eliminated compared to another sort of unsafe systems programming language. And this means I can actually focus my time on developing a better algorithm or actually doing some other work and not having to worry about bugs as much as I would in another language. - The other thing is that it's a productive modern programming language with a lot of developer conveniences. But it also has competitive performance to languages like C and C++. And I think this is probably the most important point for me is that it's a language explicitly designed for non expert users. And I think other languages cater to different audiences. So, I think in some regards C++ really cares about its expert developers. And Rust does too. But it also spends a lot of time making sure that the language is sort of suitable for non expert programmers which is often the case for scientific researchers. Who maybe do not identify as experts. - And finally, there's built in documentation and testing. And this is sort of an area where I don't really want to spend any time sort of wrangling external tools or, you know, fixing issues with them. And also, having sort of an integrated package manager in Rust is a game changer. Because personally I consider time spent writing build system code to be a necessary evil, and I want to minimize as much of that as possible. So, Rust's sort of first class dependency management is really important to me and it sort of... it lets me do... focus my time on more important things. - So, sort of jumping right in, just to some of the Rust features I find very useful for writing numerical code. I want to sort of preface this by saying that more than one feature... I think it's the sum total of these features which is important. So, you can sort of get analogs of these features using different flags for C and C++. But it's sort of how they're all baked into the language and they're on by default, which is really important. You don't need to know which special flags to pass to your compiler. These are all turned on right away. Right off the bat, we have no implicit conversions between primitive types. At the top, we're dividing two integers and getting a floating number out. And Rust is stopping us and saying there's mismatched types here. And we expected a floating point number, F64, but we found integers. - This is a very common beginner mistake. And it's nice that it's caught right away here. It's not the most complex bug. But having it caught and sort of addressed right away is a big deal. This can be a little bit noisy sometimes. Here we're trying to convert between a 32 bit unsigned integer and convert it into the platform's size of integer and Rust is not happy here either because it wants us to do an explicit conversion where we try and... we try and convert the number, but if it wouldn't fit, we stopped execution. - And so, this is, you know, a little bit noisy up front. But this also catches real bugs. Especially if we are running on something like a 16 bit platform where this would certainly be a bug. So, having these things sort of caught up front is really important because the more things that you catch or compile time, the less you have to worry about at runtime. And this is sort of a theme within Rust and it's something that the type system really helps with. So, there's also this notion of very safe defaults to a lot of operations. And it's what's sort of contributes to Rust being a memory safe language. So, as much as possible, it's not going to let you do unsafe operations. And oftentimes, the convenient thing, or the thing people default to, is the safe method. There are ways of saying I actually know what I want to do here, I want to do this specifically. But for the most part, I think safe defaults are a good choice. Especially for beginners. - Here we have an example with a vector with three elements and we're going to try to access the tenth element. And, of course, this is a bug. And sort of the natural default way of using these brackets to access the element is sort of the safe default way. And we see here that we actually get a panic, which is sort of like Rust's way of, you know, winding down the system and stopping everything and exiting. So, we have panic, and it says the length of this vector is 3, we are trying to get the tenth element. Of course this is a bug. But right now a lot of performance oriented developers are saying, okay. Sometimes I know that my index is correct, and I don't want to pay the cost of bounds checking. Fine. Okay. Let's go ahead and do that. - So, Rust also has these opt in low level control features where we can, you know, we can do the... sort of the quick or maybe the performance oriented thing, but we have to tell people that we're doing it. And Rust's way of doing this is is using these unsafe blocks. If there's a potentially memory unsafe operation going on. - So, it's the same sort of examples. We have vector with three elements and we're trying to get the tenth one. But right away, it's a lot noisier. We have this unsafe block. Okay. Something unsafe is potentially happening here. You know? It's sort of... programmer's way of saying, okay, compiler, get out of my way, I really want to do this. But for people reviewing your code, it's very helpful. Because you can right away go to the unsafe block and the reviewing is shrunk because you look at the unsafe blocks and see if they're okay. And here, of course, this is not okay. We're trying to get the tenth element of a vector with only three. - And, of course, this is gonna give us a garbage answer. And Rust documentation does a really good job of saying, this is actually not recommended. And use it with caution. And this is, you know... this unsafe block is sort of the visual equivalent of that. It's saying something potentially dangerous is happening here. And just be extra careful when you're using it. And having this... this opt in low level control is what sets Rust apart from a lot of other memory safe languages. Because a lot of times you really do know what you're doing. And Rust will say, you know, go ahead, no problem. - But like I said, the unsafe block is sort of very helpful here. Because it reduces the... the onus on the code reviewer or yourself to look at where potentially dangerous things are happening. And I think another feature of Rust that really sort of is good for numerical programmers especially is that floating point numbers are treated with a lot of caution. And, you know, there are entire books written on handling floating point numbers correctly. And I think this is a good choice in a lot of cases. - Here we have some potentially surprising code where we're adding .1 to itself three times. And if that's equal to .3, we're going to print out got.3. But otherwise, we're going to print "Got something else." And so, this is a bit of a common beginner mistake. You don't really want to trust one point numbers, this is got something else. It's slightly different than .3 when .1 is added to itself. And Rust is doing a good job and saying that floating point types cannot be used in patterns because this is not a very good way of doing things. And there's better ways of achieving the same result. This is going to be an error in later versions of the compiler. - And this is sort of a bug where it may not be obvious right away, but having it caught at compile time is a big deal. And so, sometimes this can be a little bit annoying. So, the sort of default way of sorting floating point numbers doesn't actually work. So, if you're trying to sort this vector of floating point numbers, you'll come up with an error. There's a boundary that's not satisfied. There's a reason for this. Generally the reason is not a number, or the infinity values might be tricky to have a total ordering because the nan or non number value is not equal to itself. There's all sorts of these here. And, of course, you can sort floating point numbers in Rust. There's a standard way of doing it and it's in the Rust Cookbook as well. Myself personally, I prefer, if I have to do a little bit more code at the source, and which saves me from bugs later on, this is a tradeoff that I'm comfortable making and I would like to make in my code. - But another thing that Rust does really well is actually, it's quite a good prototyping language or debugging language. Especially given that it's also a low level programming language. So, here we have a custom data structure called cool data. And we have these vectors of floating point numbers in it. But we also, you know, when you're writing code and prototyping, you really want to print out the value of your Ada often and sort of see what's going on to it, you know, what's happening during execution. And Rust does a really good job here. - So, we can add this one line to our code, and it says, essentially, give me a debug representation of my structure. And then we can call this debug method and have printed out a really nice representation of our data. And this is great for prototyping. Because, you know, I just want to see what's happening and I want to sort of step through my code. And it's a very useful thing. And I'm using this all the time. And it's a very common pattern for people to use. So, another thing that makes writing scientific code very... sort of really helps it, is that integrated testing in Rust's package manager means that I think tests are going to be much more likely to be written. We have some sort of math expression here and we're testing it against this sort of known value. - And without any external tools, we can write a unit test and check it right away. And this really removes a lot of the friction around testing. Especially compared to other languages where you might need an external framework. And removing friction means that people are gonna do it a lot more and it's sort of an easier tool to do. I find myself writing unit tests much more frequently in Rust than I do in C++ where it's a little bit more tricky. And in particular, I think documentation tests are really a killer feature for scientific code. Because a lot of scientific code, you need a lot of examples. And this is a way to make sure that your examples compile even if you change your code. So, here we have some documentation tests where it's sort of the same example as before. - But this will actually be published as part of our documentation. And having this ability to write example code and... but also use it as documentation is sort of... is a really big deal. Because you can really do two things at once. And this will also ensure that your example code doesn't go out of date. Which may be a big deal if you're refactoring your project. So, sort of taken together, Rust's safety guarantees and the fundamentals of the language have a large qualitative impact on what kind of code we're capable of writing. And I'm just gonna use the example of data races are these sort of concurrency issues in multithreading code. - So, in Safe Rust, you are actually guaranteed an absence of these data races. And this is a simple yes or no answer. Whereas a language like C++, we go to the C++ Core guidelines and the best that we can get from C++ today is this maybe. You know, maybe your code doesn't have a data race. Or maybe it does. And for me, as someone who just wants to write the code, this is 10 times or an order of magnitude better... or an order of magnitude worse than the simple yes or no answer that Rust gives us. And it's sort of a difference in what kind of code we're comfortable writing. - And so, the thing that sets Rust apart is that software engineering best practices are built into the language in core tools. And I think that choosing Rust is going to have the biggest impact on small, resource constrained teams who don't identify as expert software developers. And Rust's place in scientific computing is a language with the speed and power of C++. But it's also a systems language explicitly designed for non experts and it's designed to lower barriers. It's a companion and complement language to C and C++. So, there's many tradeoffs between these languages. I see myself using them all in the future. And there's no one correct choice here. But Rust's foundational values help us to write good software. Thank you so much for listening. - Yuli: Hi. Thank you. That was a great talk. - Max: Hello. Thanks. Thanks for all your help. Thanks for the intro. - Yuli: We have a couple questions. For example, here. What do you think about a natural next step to speed up Python code? - Max: Yes. I think there are definitely a lot of alternatives here. Personally, I haven't done a lot of Cython myself. But I think it's not, you know... having the Rust ecosystem is also a really important thing. And having theseexamples of different ways to do things or being able to pull a lot of dependencies into your project and sort of experiment with them I think is also a really important feature that Rust offers as a language. And it is sort of this maybe a little bit of a fresh start for some people. - Yuli: Okay. Good. Okay. I have another question. What are your favorite ways to integrate Rust with Python? If any? - Max: Yes, I've definitely played around a bit with the Py O3 project, the Py Oxide Project. This is a really nice way you can do a... you can integrate Rust into Python just by exposing it as sort of a Python module. Or you can also use the Python code in Rust as well. - Yuli: Okay. Well, we don't have more time, sorry. But we can continue with the Q&A in the chat. If anyone has another question in the chat, you can answer all the questions. Thank you so much. - Max: Thank you. Thank you very much. - diff --git a/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina-published.md b/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina-published.md new file mode 100644 index 0000000..366c05b --- /dev/null +++ b/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina-published.md @@ -0,0 +1,108 @@ +**Project Necromancy: How to Revive a Dead Rust Project** + +**Bard:** +Carlo Supina and Micah strive +left for dead Rust projects to revive +by making it dress +up with an ECS +now it's perfectly looking alive. + + +**Carlo:** +Hi, everyone, welcome to our talk on how to revive a dead Rust project. My name is Carlo, and I will be presenting together with Micah, talking about a 2D shooter game made with the Amethyst engine. I'm a welder, pursuing AWS certification. And I run a micronote, I talk about microcontrollers with Python. In my free time, enjoy tinkering with electronics, designing and printing did�3D parts. I dove into embedded Rust with a remote controlled rover. And then wanted more conventional experience, I started working on Kibrarian for managing schematics and footprints for Kicad. It's an electronics suite. I scattered working on space_shooter_rs, made with the Amethyst. + +**Micah:** +Hey, I'm Micah, a software engineer at Mozilla. My background is focused on front end web development. Currently I work on the Firefox web browser on the UI. And I like to write blog posts about experiences in software development. They are a way to deliver complex topics in a way that are simple and easy to follow. I got started with Rust two years ago as an on and off hobby. I have contributed to open source Rust projects and a few toy projects. I started learning about game development using the Amethyst game engine. I started learning about a common game development architecture called ECS and how it was used by Amethyst. And we'll be explaining what ECS is later in this talk. Don't worry about knowing what it is right now. And so, I decided to share my learnings at Rust Conf this year where it helped solidify my knowledge about the topic. I write about it on blog posts. And now to Carlo, where he will explain the origin of space_shooter_rs. + +**Carlo:** +In 2019, I was honing Rust skills. I want I decided to make a game. And found arewegameyet.rs. They created games with the Rust language. Here I discovered a game engine, Amethyst, that used an unfamiliar architecture, ECS. I read through the Amethyst book and followed tutorials and started working on my own game, space_shooter_rs. It's a space shooter game with enemies from the top of the screen. It was chosen�to be an official game for the Amethyst engine. It was initially a project for learning Rust. Which meant it contained mistakes that a person new to Rust would make. And it was unorganized due to me being new to the ECS architecture of Amethyst. The game contained large components with redundant data. It was difficult for me to continue contributing to, because with a couple of hours to work on a feature, I would need to reorient myself to working on something new. For this reason, it was difficult for others to contribute to too. I was aware that refactoring was needed but wasn't sure where to start. Because of this, it slowed to a halt until 2020 when I watched Micah's Rust Conf talk about experiences with Amethyst. This talk inspired me to work on the game again. I ended up encountering the same problems. This is when I invited Micah to collaborate on space_shooter_rs. She had industry practices through work at Mozilla. And I reached occupant out... determining if I had time and if I had a plan and what needed to be worked on. + +**Micah:** +I was interested in collaborating because I wanted to expand my knowledge by working on an existing game. At first, I wasn't sure if I would be able to help out meaningfully since I was learning Rust and Amethyst. And I wasn't sure if I would make good suggestions on how to make the existing codebase due to my experience. But I realized that the concerns that I was having, Carlo knew the code and could guide me through the problems the project was having. It was clear there were areas of Amethyst Carlo had explored that I haven't. So, I knew I would be learning my concepts as well. And another nice thing about the collaboration, we were coming into this knowing we could learn something from each other. And since it was the two of us, we could make mistakes and learn from them. But as someone not familiar with the code, it was hard to know where to begin with refactoring the codebase. +In this section our talk, we're going to provide some background for the space_shooter game. A little bit of knowledge will provide background into the refactoring decisions we made and lead us into discussions of how we refactored the ECS code. And finally, explaining collaboration strategies to plan and execute the refactors we discussed. And now a to Carlo who will be giving a summary of space_shooter. + +**Carlo:** +Here's background. The goal of the game is to survive levels by destroying enemies, collecting consumables, buying items and defeating the final boss. It's the binding of Isaac. And items that synergize, randomly generated levels and satisfying controls with physics. Currently, three enemy types. They have a distinct behavior for unique challenges. There are 13 unique items. Purchased by the player and when acquired change the rules of the game to benefit the player. This could range from changing the speed of the player to changing the damage the player does to enemies or even changing the prices of items in the store. The game has four consumable drops. Consumables are dropped by enemies when destroyed. So far, health and money. Which in this game is represented by bright green space rocks. The money is used to purchase items from the store. Items are listed for a price. When purchased by the player, the item drops from the top and the player collects it. There are animations, 3D backgrounds and a work in progress boss in the game. And now to Micah talk about the entity structure. + +**Micah:** +We have been throwing around the early ECS. It's entity component system. It's a common pattern used in game development that makes it easy to compose objects in a game because components can be arbitrarily added to entities. This makes development much more data driven since entities are defined by the components that are attached to them. So, in ceremony, an entity often represented with a single ID can be composed of a number of components. And a component acts like a container for data that can describe an aspect of an object. And finally, a system is a piece of logic that can operate on one or more entities in ha game. I find it easier to understand what ECS is through a series of examples. + +This slide shows a screenshot of the game with five circled objects. These are entities. They are sprite renders for the enemy, spaceship, drone ship, and enemy projectile. Enemies are rented with unique IDs. Which is why they have labels with one through five. Attaching names such adds spaceship or enemy make it easier to describe the purpose to someone else. Next, now the entity's characteristics required for them to function in a game. Entities have a number of components attached to them. These components define pieces of data that help describe behaviors that we expect an entity to have. And it helps to think of the collections as ground together under one ID. I have grouped them together with the entity they are associated with. Entities has SpriteRender, attack, health and movement attached. And entity 2 has similar components as well. If we needed to access a health component for the player entity, we need an expressive way to do this. One way is to have the player tag attached to the entity to differentiate a health component associated with the player entity from the others. Of course, having something like a health component attached to an entity doesn't do much on its own. We need a way for the game to update the values of a health component when damage is dealt to the entity that it's attached to. And this is where the system part of ECS comes in. But before we talk about the system part of ECS. We should address component storages. This is how Amethyst updates collections of them. Giving components their own storages allows for faster access to data needed when updating an entity's component state. This is important when a system needs to operate on hundreds of components at a time. Because of this, it's important that component storages are responsible for containing and managing components of one type. And for every component in a storage, they have an entity ID that they are associated with. Now that we have briefly explained component storages, we can quickly go over the system part of ECS. + +In this diagram, we have an example of an animation system implemented using the Amethyst game engine. The system is a way to know what sprite is in each game cycle. This shows that the animation system needs to read from a time resource. Resources in Amethyst are containers of data not associated with an entity. The other two storages it reads from are the animation and SpriteRender component storages. Using these resources, the animation system writes to an entity SpriteRender component. In particular, a SpriteRender component can tell Amethyst which should be drawn. This is related to the time resource, resulting in the animation sequence. The animation component serves as data the entity should be using when determining what to rendering the SpriteRender component with. And now the problems that we were having. One was that the spaceship and enemy component had a lot of redundant data. We took a step back and examined the basics of ECS. We wanted to avoid components that were bloated, redundant data and as a result difficult to use. And we found these problems made some of the existing systems overly complex. The first two points, having components that were bloated and redundant data made it difficult to keep track of what data a component was responsible for. This caused some scenarios where it wasn't clear what the system was supposed to do. Since the problem made it difficult to reuse, it made the properties specific to the component they were associated with. This meant modifying the health stat of an entity would need to specifically access the component containing it rather than a generic health component. To break them down in a way that makes them reusable and concise, it helps to define a set of requirements. We should define what the expected behavior of an entity should be. This would help conceptualize how the entity components work together. And instead of as a whole. To illustrate the point, we can list the expected behaviors of the spaceship and item entities as components. It might seem like a lot of components for only three entities it. But keep in mind every one is a small piece of data with a functionality for an entity, it's easier to have a holistic view of the entity's behavior. If we look at the spaceship and enemy entities, they have a number of similar behaviors, motion, animation and health. These are pieces of data in the form of their own generic components. Having components reusable between entities is what ECS strives to do. You might also notice that the spaceship, enemy and item entities have a component with their same name. For the space_shooter game, we made the decision that functionality that is specific to an entity should be campaigned to the specific component. Now that we have behaviors defined, let's examine the older revision of the spaceship and enemy components. In particular, take a look at the motion component as one of the required behaviors for these two. In these code snippets, spaceship and enemy have data required for the motion behavior. But we wanted to avoid repeating this data and instead extracting them into their own generic motion component attached to any other entity that requires motion. The images here highlight the properties that define motion behavior, and we can now move them into their own component. This motion component applies to entity motion within a 2 dimensional space. Motion and shooter refers to the acceleration, deceleration and maximum speed values. We also simplified some of the properties to be represented as vectors. Such as velocity to store the X and Y values. Now to Carlo to explain how the components allow for more designable systems. + +**Carlo:** +Before I talked about items in space_shooter. Items fuel the progression of the player. They're purchased from the store and modify the rules. As the player destroys enemies, they can collect currency, allowing them to destroy enemies more efficiently and collect more currency and items and so on. How to keep the system lean and focused while at the same time allowing them to detect if an item is collected? For example, there's an item wall called frequency augmenter. This sharply increases the rate of fire for the play. Collisions are detected in a system, spaceship item collision system. All have a HashMap of data about what traits they detect. For the frequency augmenter, this data is an increase in the fire rate for the player. This data needs to find its way to a system called spaceship system which is a system for managing the spaceship attributes. How do we get the data from the collision detection system into the spaceship system? The solution is through the use of an event channel. They function as communication lines between systems. A good analogy for event channels is a radio broadcast. A system that needs to send data to other systems, it can initialize with the data that needs to be sent. Then written to the event channel in the source system. This event channel can be thought of as a radio tower broadcasting a message. Other messages can tune into this channel by setting up an event reader to look for events of a certain type. Then the system can read the data from the event message and use it in the system. Going back to the frequency augmenter item, when it's collected in the spaceship item collision system, it initializes an event and sent to the event channel to the spaceship system where the attributes can be added to the spaceship that collected it. This can be extended for items that affect any system in the game. In this next sex, how we collaborated together to implement the refactors we discussed previously. One of the most important parts of trying to revive the space_shooter project was to establish a workflow for the both of us. We needed... to discuss and prioritize tasks to get the code in a better place. We are exploring four approaches which are collaborative coding practices, writing documentation together, communication and weekly meetings. +The first factor that made our collaboration successful is our collaborative coding practices. When I started this in 2019, GitHub was a place to safely store and distribute files. I was using branch requests, but not to the extent that I need be. I didn't know to, because there were infrequent contributors besides me alone. Part of the reason I reached out to Micah to collaborate on the project, I knew she likely had a lot more experience using these tools through her work at Mozilla. After a week, I gave her enough trust to give maintainer privileges to not go through me. Allowing her to look at issues, branches, pull requests and others in the repository. I followed her lead and learned for myself how to use the collaborative tools. My last point is even though we are communicating through direct communication lines, we still make sure to do public code reviews through GitHub, even if it is just a small change. That makes our decisions transparent which is good for the growth of the project and established a formal process for reviewing code. Now over to Micah to talk about the importance of documentation for the project. + +**Micah:** +Since we were both learning new things as we refactored space_shooter, we wanted to work on documentation as a way to record thought processes. Including tasks, updating the README be an entry point. Anyone who wanted to learn about the game or contributing, go to the mdBook. Where more information is found. It's a tool in Rust that's modern online book it is�markdown files. We decided to create a book for space_shooter as a central location for documentation. Right now the most important content is helping them make contributions to the project. We needed to have a contributing guide for those interested in adding code. We needed to have a code of conduct to establish a safe environment to contribute and learn. While dedication about the architecture and code have been the main focus, Carlo has been actively working to provide contribution guidelines around adding new items to the game as well as artwork. The goal is to have more than one way to contribute, code, artwork, ideas for items and even documentation. The online work book for space_shooter is a work in progress. But anyone interested in previewing it can check it out Amethyst.GitHub.io/space_shooter. And now to weekly meetings. + +**Carlo:** +They were critical to the progress so far. We established short term goals. Refactor a bloated component into smaller components. Long term goals, an example, adding a ... to the game and what kind of components and systems it would require. And the last time are the larger project goals that relate to space_shooter_rs as a project. What kind of documentation do we have? And do we plan on selling a version of the game? And sharing ideas. Our main guiding principle is that while some ideas are certainly bad, all ideas are worth sharing. This is important because we have to not be afraid of sharing ideas to make sure that we are on the same page with the project. +Some examples are character abilities, items and the structure of the game. + +**Micah:** +Having a way to track discussions around refactoring decisions was one of the best ways to capture thought processes. We did this by having discussions on GitHub issues and code reviews. If we had any questions or ideas about an issue we were working on, we would post them to the issue they were relevant to. Ideally, these discussions would involve any pre implementation work, such as clarifying the problem before writing any code. Then once a pull request has been submitted as a potential fix for that issue, we could move the implementation discussions to there. While these discussions can be done privately, information that is easily available would allow us to revisit the reasoning for why we decided to make certain implementation decisions. This can make it easier for when documentation is created to address this section of the code. Or if a new architecture problem arises as a result of fixing that one issue. At the end of the day, making these discussions public on GitHub will help create an environment that encourages open discussion and questions with others. Sometimes discussions around implementation details can be less relevant when addressing an issue. Which is when we discussed project ideas through director messaging. It's difficult to define a balance to what's at the task at hand and unnecessary information. In general, project ideas not ready for the public are kept in direct message. This is features that are not relevant to the current goal such as future project ideas and implementation ideas that are off topic. And now to Carlo who will explain how creating informal documentation helped with communicating the state of the space_shooter project to help onboard me on to the project. + +**Carlo:** +I made two informal documents when Micah joined. Current state and ideal state. Current state document described the game as it was, and the ideal state described the game as I thought it should be after refactor. The intent of the documents was for Micah to use as a reference getting familiar with the game. But at the same time explaining my current intentions for the game. As Micah was more comfortable, we were able to use the ideal state document to have in depth discussions about what the ideal state of the game should look like. This turned into a more informal ideas document where we throw ideas for discussion as we think of them. It came from inspiration from other games. After a few weeks of consistent collaboration on this project, I started working on a few more formal documents to put the longer term ideas in my head on to paper. +The first document was a large flowchart explaining my ideas for what progression through the game looked like. The progress of levels, unlocking characters, bosses. And levels. And the next were the MVPs for the project. I say MVPs because I wrote two based on the two didn't scopes for the project. The first MVP was for space_shooter as a showcase game. Includes how many items, characters, bosses and levels you want in the game. The general rule for this MVP was things only could be in the showcase game to the point where they made sense for the game to show off the Amethyst engine. The second MVP was for space_shooter as a fully released game. The only difference between this MVP and the showcase game MVP is the general rule for this document is more content to make the game as fun as possible rather than just showing off the engine. This could be a story or even a secret ending. And now to conclude the talk. + +**Micah:** +In summary, the main takeaways on the collaboration we used were engaging regularly in open discussion is beneficial to capturing project progress. Documentation is important for solidifying knowledge about project architecture decisions. And sharing ideas regularly keeps everyone on the same page. And this is what we found to be the most helpful for reviving a dead Rust project. If you're interested in learning Rust or want to contribute to an open source game, then we would be happy to help. Working on space_shooter is a project that aims to be a fun and informative learning experience. Whether it's with code, arts, and/or documentation, feel free to reach out and we would be excited to meet you. And thank you for attending our talk. We are now open to taking any questions. + +**Inaki:** +Great talk! + +**Carlo:** +Glad you liked it. + +**Inaki:** +So, first question is, why is item and spaceship collision a separate system from spaceship and enemy collision? + +**Micah:** +So, we organized our systems to listen for specific event types. Like in this example, spaceship to enemy and spaceship to items were their own even types. And because of that, we decided to make the decision to create separate systems for them, so it was easier to maintain the logic that was specific to those particular event types. But yeah. + +**Inaki:** +Gotcha. Do you feel there is a difference between starting a game... a game design with ECS and moving one from something else to ECS? + +**Carlo:** +Yeah. I can take this one. Yes. So, I have... I've tried a few smaller games. I've done like a game jam in the past. And yeah. ECS was very unique to me because of the way it organizes the data. It's completely different. So, I think the most developed one I have done in the past is something with Py Game and I was sort of using the object oriented tools there to make... to best represent the data in that way and it wasn't... it's very easy to get organized. And ECS is very good at constraining the data and systems into being organized which is really nice. +But as you can see, we need to refactor anyway. So, you can still mess it up. + +**Inaki:** +Of course, always. Always, always, always. It seems like ECS is a popular solution for games. But are there any situations where you would not use it? + +**Carlo:** +I can't think of any. But I don't think... we're still learning ECS. And I... yeah. I can't think of a situation where I wouldn't. But I don't think I'm... I know enough to speak fully to that. + +**Inaki:** +Also, it seems that the Amethyst game engine in recent versions, I'm not quite sure if it's released yet or not, they're moving on to a new ECS system. Can you comment on that or know anything about that? + +**Carlo:** +I know that, yeah, they're switching from library called specs to legion. And I don't think I'm qualified to talk about it. But yeah. If you are interested in learning about that, there... you should join the Amethyst Discord. That's where we learn a lot about it. It's something we will probably need to do some refactoring to adjust for that in the future. But, yeah, join the Amethyst Discord�to learn more about that. + +**Inaki:** +Another talk. And another user asks whether you've had a look at the Bevy engine which also is quite recent and whether you can comment on it from the user perspective, an Amethyst user. + +**Carlo:** +I can't comment on it. But because I haven't tried it yet. It does look interesting. I know it uses like the same design philosophy. And I know that Amethyst, there is a post that was made that Amethyst is working with Bevy. And so, yeah, I think it's a good... it's a... it's a great engine that people should try too. + +**Inaki:** +Great. + +**Carlo:** +We're not in competition. + +**Inaki:** +All righty, then. I think we're out of questions. So, just one second, see if anyone came in. Oh, no, we're good. Micah, Carlo, thank you so much + +**Carlo:** +Yep. Great to be here. + +**Inaki:** +Until next time. + +**Carlo:** +Yep, see ya! diff --git a/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina.txt b/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina.txt deleted file mode 100644 index c9ad288..0000000 --- a/2020-global/talks/03_LATAM/05-Micah-Tigley-and-Carlo-Supina.txt +++ /dev/null @@ -1,43 +0,0 @@ - Project Necromancy: How to Revive a Dead Rust Project - Micah Tigley and Carlo Supina - - Inaki: So, our next talk is called Project Necromancy: How to Revive a Dead Rust Project. Oooo! It's by Carlo and Micah. Carlo Supina is a welder and also runs an electronics and programming education company. In his free time, learning Rust through working on a space shooter. And Micah is a software developer who is interested also in developing games using Rust. So, they will be answering some of the burning questions in game development such as how do you plan an effective free... and with that, I hand it over to our bard to present them. - >> Carlo Supina and Micah, left for dead Rust projects arrive. By making an address with an ECS, now it's perfectly looking alive! - >> Hi, everyone, welcome to our talk on how to revive a dead Rust project. My name is Carlo, and I will be presenting together with Micah, talking about a 2D shooter game made with the Amethyst engine. I'm a welder, pursuing AWS certification. And I run a micronote, I talk about microcontrollers with Python. In my free time, enjoy tinkering with electronics, designing and printing did3D parts. I dove into embedded Rust with a remote controlled rover. And then wanted more conventional experience, I started working on Kibrarian for managing schematics and footprints for Kicad. It's an electronics suite. I scattered working on space_shooter_rs, made with the Amethyst. - Micah: Hey, I'm Micah, a software engineer at Mozilla. My background is focused on front end web development. Currently I work on the Firefox web browser on the UI. And I like to write blog posts about experiences in software development. They are a way to deliver complex topics in a way that are simple and easy to follow. I got started with Rust two years ago as an on and off hobby. I have contributed to open source Rust projects and a few toy projects. I started learning about game development using the Amethyst game engine. I started learning about a common game development architecture called ECS and how it was used by Amethyst. And we'll be explaining what ECS is later in this talk. Don't worry about knowing what it is right now. And so, I decided to share my learnings at Rust Conf this year where it helped solidify my knowledge about the topic. I write about it on blog posts. And now to Carlo, where he will explain the origin of space_shooter_rs. - Carlo: In 2019, I was honing Rust skills. I want I decided to make a game. And found arewegameyet.rs. They created games with the Rust language. Here I discovered a game engine, Amethyst, that used an unfamiliar architecture, ECS. I read through the Amethyst book and followed tutorials and started working on my own game, space_shooter_rs. It's a space shooter game with enemies from the top of the screen. It was chosento be an official game for the Amethyst engine. It was initially a project for learning Rust. Which meant it contained mistakes that a person new to Rust would make. And it was unorganized due to me being new to the ECS architecture of Amethyst. The game contained large components with redundant data. It was difficult for me to continue contributing to, because with a couple of hours to work on a feature, I would need to reorient myself to working on something new. For this reason, it was difficult for others to contribute to too. I was aware that refactoring was needed but wasn't sure where to start. Because of this, it slowed to a halt until 2020 when I watched Micah's Rust Conf talk about experiences with Amethyst. This talk inspired me to work on the game again. I ended up encountering the same problems. This is when I invited Micah to collaborate on space_shooter_rs. She had industry practices through work at Mozilla. And I reached occupant out... determining if I had time and if I had a plan and what needed to be worked on. - Micah: I was interested in collaborating because I wanted to expand my knowledge by working on an existing game. At first, I wasn't sure if I would be able to help out meaningfully since I was learning Rust and Amethyst. And I wasn't sure if I would make good suggestions on how to make the existing codebase due to my experience. But I realized that the concerns that I was having, Carlo knew the code and could guide me through the problems the project was having. It was clear there were areas of Amethyst Carlo had explored that I haven't. So, I knew I would be learning my concepts as well. And another nice thing about the collaboration, we were coming into this knowing we could learn something from each other. And since it was the two of us, we could make mistakes and learn from them. But as someone not familiar with the code, it was hard to know where to begin with refactoring the codebase. - In this section our talk, we're going to provide some background for the space_shooter game. A little bit of knowledge will provide background into the refactoring decisions we made and lead us into discussions of how we refactored the ECS code. And finally, explaining collaboration strategies to plan and execute the refactors we discussed. And now a to Carlo who will be giving a summary of space_shooter. - >> Here's background. The goal of the game is to survive levels by destroying enemies, collecting consumables, buying items and defeating the final boss. It's the binding of Isaac. And items that synergize, randomly generated levels and satisfying controls with physics. Currently, three enemy types. They have a distinct behavior for unique challenges. There are 13 unique items. Purchased by the player and when acquired change the rules of the game to benefit the player. This could range from changing the speed of the player to changing the damage the player does to enemies or even changing the prices of items in the store. The game has four consumable drops. Consumables are dropped by enemies when destroyed. So far, health and money. Which in this game is represented by bright green space rocks. The money is used to purchase items from the store. Items are listed for a price. When purchased by the player, the item drops from the top and the player collects it. There are animations, 3D backgrounds and a work in progress boss in the game. And now to Micah talk about the entity structure. - >> We have been throwing around the early ECS. It's entity component system. It's a common pattern used in game development that makes it easy to compose objects in a game because components can be arbitrarily added to entities. This makes development much more data driven since entities are defined by the components that are attached to them. So, in ceremony, an entity often represented with a single ID can be composed of a number of components. And a component acts like a container for data that can describe an aspect of an object. And finally, a system is a piece of logic that can operate on one or more entities in ha game. I find it easier to understand what ECS is through a series of examples. - This slide shows a screenshot of the game with five circled objects. These are entities. They are sprite renders for the enemy, spaceship, drone ship, and enemy projectile. Enemies are rented with unique IDs. Which is why they have labels with one through five. Attaching names such adds spaceship or enemy make it easier to describe the purpose to someone else. Next, now the entity's characteristics required for them to function in a game. Entities have a number of components attached to them. These components define pieces of data that help describe behaviors that we expect an entity to have. And it helps to think of the collections as ground together under one ID. I have grouped them together with the entity they are associated with. Entities has SpriteRender, attack, health and movement attached. And entity 2 has similar components as well. If we needed to access a health component for the player entity, we need an expressive way to do this. One way is to have the player tag attached to the entity to differentiate a health component associated with the player entity from the others. Of course, having something like a health component attached to an entity doesn't do much on its own. We need a way for the game to update the values of a health component when damage is dealt to the entity that it's attached to. And this is where the system part of ECS comes in. But before we talk about the system part of ECS. We should address component storages. This is how Amethyst updates collections of them. Giving components their own storages allows for faster access to data needed when updating an entity's component state. This is important when a system needs to operate on hundreds of components at a time. Because of this, it's important that component storages are responsible for containing and managing components of one type. And for every component in a storage, they have an entity ID that they are associated with. Now that we have briefly explained component storages, we can quickly go over the system part of ECS. - In this diagram, we have an example of an animation system implemented using the Amethyst game engine. The system is a way to know what sprite is in each game cycle. This shows that the animation system needs to read from a time resource. Resources in Amethyst are containers of data not associated with an entity. The other two storages it reads from are the animation and SpriteRender component storages. Using these resources, the animation system writes to an entity SpriteRender component. In particular, a SpriteRender component can tell Amethyst which should be drawn. This is related to the time resource, resulting in the animation sequence. The animation component serves as data the entity should be using when determining what to rendering the SpriteRender component with. And now the problems that we were having. One was that the spaceship and enemy component had a lot of redundant data. We took a step back and examined the basics of ECS. We wanted to avoid components that were bloated, redundant data and as a result difficult to use. And we found these problems made some of the existing systems overly complex. The first two points, having components that were bloated and redundant data made it difficult to keep track of what data a component was responsible for. This caused some scenarios where it wasn't clear what the system was supposed to do. Since the problem made it difficult to reuse, it made the properties specific to the component they were associated with. This meant modifying the health stat of an entity would need to specifically access the component containing it rather than a generic health component. To break them down in a way that makes them reusable and concise, it helps to define a set of requirements. We should define what the expected behavior of an entity should be. This would help conceptualize how the entity components work together. And instead of as a whole. To illustrate the point, we can list the expected behaviors of the spaceship and item entities as components. It might seem like a lot of components for only three entities it. But keep in mind every one is a small piece of data with a functionality for an entity, it's easier to have a holistic view of the entity's behavior. If we look at the spaceship and enemy entities, they have a number of similar behaviors, motion, animation and health. These are pieces of data in the form of their own generic components. Having components reusable between entities is what ECS strives to do. You might also notice that the spaceship, enemy and item entities have a component with their same name. For the space_shooter game, we made the decision that functionality that is specific to an entity should be campaigned to the specific component. Now that we have behaviors defined, let's examine the older revision of the spaceship and enemy components. In particular, take a look at the motion component as one of the required behaviors for these two. In these code snippets, spaceship and enemy have data required for the motion behavior. But we wanted to avoid repeating this data and instead extracting them into their own generic motion component attached to any other entity that requires motion. The images here highlight the properties that define motion behavior, and we can now move them into their own component. This motion component applies to entity motion within a 2 dimensional space. Motion and shooter refers to the acceleration, deceleration and maximum speed values. We also simplified some of the properties to be represented as vectors. Such as velocity to store the X and Y values. Now to Carlo to explain how the components allow for more designable systems. - Carlo: Before I talked about items in space_shooter. Items fuel the progression of the player. They're purchased from the store and modify the rules. As the player destroys enemies, they can collect currency, allowing them to destroy enemies more efficiently and collect more currency and items and so on. How to keep the system lean and focused while at the same time allowing them to detect if an item is collected? For example, there's an item wall called frequency augmenter. This sharply increases the rate of fire for the play. Collisions are detected in a system, spaceship item collision system. All have a HashMap of data about what traits they detect. For the frequency augmenter, this data is an increase in the fire rate for the player. This data needs to find its way to a system called spaceship system which is a system for managing the spaceship attributes. How do we get the data from the collision detection system into the spaceship system? The solution is through the use of an event channel. They function as communication lines between systems. A good analogy for event channels is a radio broadcast. A system that needs to send data to other systems, it can initialize with the data that needs to be sent. Then written to the event channel in the source system. This event channel can be thought of as a radio tower broadcasting a message. Other messages can tune into this channel by setting up an event reader to look for events of a certain type. Then the system can read the data from the event message and use it in the system. Going back to the frequency augmenter item, when it's collected in the spaceship item collision system, it initializes an event and sent to the event channel to the spaceship system where the attributes can be added to the spaceship that collected it. This can be extended for items that affect any system in the game. In this next sex, how we collaborated together to implement the refactors we discussed previously. One of the most important parts of trying to revive the space_shooter project was to establish a workflow for the both of us. We needed... to discuss and prioritize tasks to get the code in a better place. We are exploring four approaches which are collaborative coding practices, writing documentation together, communication and weekly meetings. - The first factor that made our collaboration successful is our collaborative coding practices. When I started this in 2019, GitHub was a place to safely store and distribute files. I was using branch requests, but not to the extent that I need be. I didn't know to, because there were infrequent contributors besides me alone. Part of the reason I reached out to Micah to collaborate on the project, I knew she likely had a lot more experience using these tools through her work at Mozilla. After a week, I gave her enough trust to give maintainer privileges to not go through me. Allowing her to look at issues, branches, pull requests and others in the repository. I followed her lead and learned for myself how to use the collaborative tools. My last point is even though we are communicating through direct communication lines, we still make sure to do public code reviews through GitHub, even if it is just a small change. That makes our decisions transparent which is good for the growth of the project and established a formal process for reviewing code. Now over to Micah to talk about the importance of documentation for the project. - Micah: Since we were both learning new things as we refactored space_shooter, we wanted to work on documentation as a way to record thought processes. Including tasks, updating the README be an entry point. Anyone who wanted to learn about the game or contributing, go to the mdBook. Where more information is found. It's a tool in Rust that's modern online book it ismarkdown files. We decided to create a book for space_shooter as a central location for documentation. Right now the most important content is helping them make contributions to the project. We needed to have a contributing guide for those interested in adding code. We needed to have a code of conduct to establish a safe environment to contribute and learn. While dedication about the architecture and code have been the main focus, Carlo has been actively working to provide contribution guidelines around adding new items to the game as well as artwork. The goal is to have more than one way to contribute, code, artwork, ideas for items and even documentation. The online work book for space_shooter is a work in progress. But anyone interested in previewing it can check it out Amethyst.GitHub.io/space_shooter. And now to weekly meetings. - Carlo: They were critical to the progress so far. We established short term goals. Refactor a bloated component into smaller components. Long term goals, an example, adding a ... to the game and what kind of components and systems it would require. And the last time are the larger project goals that relate to space_shooter_rs as a project. What kind of documentation do we have? And do we plan on selling a version of the game? And sharing ideas. Our main guiding principle is that while some ideas are certainly bad, all ideas are worth sharing. This is important because we have to not be afraid of sharing ideas to make sure that we are on the same page with the project. - Some examples are character abilities, items and the structure of the game. - Micah: Having a way to track discussions around refactoring decisions was one of the best ways to capture thought processes. We did this by having discussions on GitHub issues and code reviews. If we had any questions or ideas about an issue we were working on, we would post them to the issue they were relevant to. Ideally, these discussions would involve any pre implementation work, such as clarifying the problem before writing any code. Then once a pull request has been submitted as a potential fix for that issue, we could move the implementation discussions to there. While these discussions can be done privately, information that is easily available would allow us to revisit the reasoning for why we decided to make certain implementation decisions. This can make it easier for when documentation is created to address this section of the code. Or if a new architecture problem arises as a result of fixing that one issue. At the end of the day, making these discussions public on GitHub will help create an environment that encourages open discussion and questions with others. Sometimes discussions around implementation details can be less relevant when addressing an issue. Which is when we discussed project ideas through director messaging. It's difficult to define a balance to what's at the task at hand and unnecessary information. In general, project ideas not ready for the public are kept in direct message. This is features that are not relevant to the current goal such as future project ideas and implementation ideas that are off topic. And now to Carlo who will explain how creating informal documentation helped with communicating the state of the space_shooter project to help onboard me on to the project. - Carlo: I made two informal documents when Micah joined. Current state and ideal state. Current state document described the game as it was, and the ideal state described the game as I thought it should be after refactor. The intent of the documents was for Micah to use as a reference getting familiar with the game. But at the same time explaining my current intentions for the game. As Micah was more comfortable, we were able to use the ideal state document to have in depth discussions about what the ideal state of the game should look like. This turned into a more informal ideas document where we throw ideas for discussion as we think of them. It came from inspiration from other games. After a few weeks of consistent collaboration on this project, I started working on a few more formal documents to put the longer term ideas in my head on to paper. - The first document was a large flowchart explaining my ideas for what progression through the game looked like. The progress of levels, unlocking characters, bosses. And levels. And the next were the MVPs for the project. I say MVPs because I wrote two based on the two didn't scopes for the project. The first MVP was for space_shooter as a showcase game. Includes how many items, characters, bosses and levels you want in the game. The general rule for this MVP was things only could be in the showcase game to the point where they made sense for the game to show off the Amethyst engine. The second MVP was for space_shooter as a fully released game. The only difference between this MVP and the showcase game MVP is the general rule for this document is more content to make the game as fun as possible rather than just showing off the engine. This could be a story or even a secret ending. And now to conclude the talk. - Micah: In summary, the main takeaways on the collaboration we used were engaging regularly in open discussion is beneficial to capturing project progress. Documentation is important for solidifying knowledge about project architecture decisions. And sharing ideas regularly keeps everyone on the same page. And this is what we found to be the most helpful for reviving a dead Rust project. If you're interested in learning Rust or want to contribute to an open source game, then we would be happy to help. Working on space_shooter is a project that aims to be a fun and informative learning experience. Whether it's with code, arts, and/or documentation, feel free to reach out and we would be excited to meet you. And thank you for attending our talk. We are now open to taking any questions. - Inaki: Great talk! - Carlo: Glad you liked it. - Inaki: So, first question is, why is item and spaceship collision a separate system from spaceship and enemy collision? - Micah: So, we organized our systems to listen for specific event types. Like in this example, spaceship to enemy and spaceship to items were their own even types. And because of that, we decided to make the decision to create separate systems for them, so it was easier to maintain the logic that was specific to those particular event types. But yeah. - Inaki: Gotcha. Do you feel there is a difference between starting a game... a game design with ECS and moving one from something else to ECS? - Carlo: Yeah. I can take this one. Yes. So, I have... I've tried a few smaller games. I've done like a game jam in the past. And yeah. ECS was very unique to me because of the way it organizes the data. It's completely different. So, I think the most developed one I have done in the past is something with Py Game and I was sort of using the object oriented tools there to make... to best represent the data in that way and it wasn't... it's very easy to get organized. And ECS is very good at constraining the data and systems into being organized which is really nice. - But as you can see, we need to refactor anyway. So, you can still mess it up. - Inaki: Of course, always. Always, always, always. It seems like ECS is a popular solution for games. But are there any situations where you would not use it? - Carlo: I can't think of any. But I don't think... we're still learning ECS. And I... yeah. I can't think of a situation where I wouldn't. But I don't think I'm... I know enough to speak fully to that. - Inaki: Also, it seems that the Amethyst game engine in recent versions, I'm not quite sure if it's released yet or not, they're moving on to a new ECS system. Can you comment on that or know anything about that? - Carlo: I know that, yeah, they're switching from library called specs to legion. And I don't think I'm qualified to talk about it. But yeah. If you are interested in learning about that, there... you should join the Amethyst Discord. That's where we learn a lot about it. It's something we will probably need to do some refactoring to adjust for that in the future. But, yeah, join the Amethyst Discordto learn more about that. - Inaki: Another talk. And another user asks whether you've had a look at the Bevy engine which also is quite recent and whether you can comment on it from the user perspective, an Amethyst user. - Carlo: I can't comment on it. But because I haven't tried it yet. It does look interesting. I know it uses like the same design philosophy. And I know that Amethyst, there is a post that was made that Amethyst is working with Bevy. And so, yeah, I think it's a good... it's a... it's a great engine that people should try too. - Inaki: Great. - Carlo: We're not in competition. - Inaki: All righty, then. I think we're out of questions. So, just one second, see if anyone came in. Oh, no, we're good. Micah, Carlo, thank you so much - Carlo: Yep. Great to be here. - Inaki: Until next time. - Carlo: Yep, see ya! - diff --git a/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch-publish.md b/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch-publish.md new file mode 100644 index 0000000..c854e12 --- /dev/null +++ b/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch-publish.md @@ -0,0 +1,230 @@ +**Tier 3 Means Getting Your Hands Dirty** + +**Bard:** +Andrew Dona-Couch will now go +to the farthest reach of Rust to show +if you're willing to get +your coding feet wet +Tier three has got some room to grow + + +**Andrew:** +Good afternoon. Welcome to RustFest Global 2020. My name's Andrew Dona Couch. This talk is Tier 3 Means Getting Your Hands Dirty. I'm a freelance software developer, I'm a big fan of Rust. And I love playing around with hardware. And recently hardware and Rust. On social media generally @couchand. And previously mentioned, this talk is about my experiences finding and fixing a bug within the Rust compiler. Within the Rust compiler, within a library used by Rust called LLVM. And this bug only comes up on the AVR platform. + +So, what's AVR? AVR is a series of 8 bit microcontrollers. These are tiny computers. They are an older series. They're cheap and very hacker friendly. And support for AVR was mainlined into the Rust compiler this past summer. Into Rust and�LLVM thanks to a significant amount of effort by a gentleman, Dylan McKay. And you can see the AVR line ranges here... + +❖ + +**Andrew:** +My apologies. I... + +**Stefan (@dns2utf8):** +No problem. + +**Andrew:** +My cat chewed through my power cord and so, it was not charging. All right. + +**Stefan (@dns2utf8):** +The risks of feline. + +**Andrew:** +Right. + +**Stefan (@dns2utf8):** +Would you mind going back a couple of slides because we lost you quite some time ago. + +**Andrew:** +Sure. Yep! + +**Stefan (@dns2utf8):** +Yes. All right. We can't hear you at the moment. + +**Andrew:** +How about now? My apologies for all of that. All right. So, I'm at a strange angle at the moment. Hopefully we can make this work. So, just to briefly go over these ground rules again, I'm going to try to show everybody respect today. Hopefully I succeed at that. We'll also be talking about things that I'm relatively an amateur about. And so, bear with me. And most importantly, I'm going to be oversimplifying because I would like to cover a lot of material. And due to those technical issues, we're, of course, now a few minutes behind from our... from our plan. + +All right. So, with that in mind, what are these simplified models that I'm going to be talking about? The first example is a black box model. We'll be talking about some complex system. Take this Rube Goldberg self operating napkin, and we're going to put this system inside of a box. And the reason we do that it because we want to analyze this complex system, no terms of the details of the inside, the implementation, but rather in terms of its inputs and its outputs. Its relationship to the outside world. That's a bit abstract. So, let's look at a couple examples. + +We can think of a computer as being a box. And the input to our computer is some kind of program. And the output will say... hand waving a bit here... but that we have two outputs. Computation and effects on the world. All right. This is a technical conference. So, let's look inside the box and we find that there are four additional boxes. There's... we'll talk about each of these four in turn a little bit later. But right now I'll give you just a brief sense of what they... what they are. So, on the left we see we have a registered... we have a set of registers and some memory. And on the right our two boxes are called processer and peripherals. + +So, just to get a vague idea what have each of these are, we'll use the metaphor of an old school paper pushing office. And you have someone sitting at a desk moving paperwork around. And that person is the brains of the operation and that's analogous to the promoter. The desk in front of them, that's covered in the paper work they're currently working on is something like the registers. The memory is a bit like a file cabinet that holds the paperwork that they need, they need to have access to, but they don't need it immediately at hand. And finally, the peripherals in this metaphor are something like an intercom sitting on top of the desk where if we need to connect with the outside world, if we need to interface with the outside world, ask for a cup of coffee, we can push the button on the intercom and request support. + +All right. So, let's connect our arrows from earlier. We said that the computation comes from... we said the process is the brains of the computer. The computation can be thought of as coming from there. The peripherals are the interface to the outside world. We can think of our effects on the world as coming on those peripherals. And then program will... we will load that into memory and then we will go ahead and start executing from there. + +All right. Let's look at another example of a black box. This one is a compiler. And we can think of a compiler as a box. And the program is the input to the compiler. And since this is a Rust conference, our compiler presumably is rustc, the Rust compiler. And our program is the Rust source code for... the Rust source code for our program. All right. + +And we can look at the output of our compiler. And now we've reached a really interesting point. What is the output of this compiler? So, again, let's not define it based on its intrinsic properties, but rather use another simplifying model. So, here we'll define this output by how we use it. And we'll see that we use it by passing it as the input to our computer box from previous. And we already said that input is a program. So, we've now determined that this compiler is a box. + +The compiler is something that takes a program as input and produces a program as output. Now, this is a bit confusing, so, we'll be more specific about this. We have two different representations of the same program. And the input representation, we already called it source code. And the output representation is the one that's meant for the machine, the computer machine. And so, we call it machine code. All right. + +So, again, this is a technical conference. Let's look inside this compiler. And we see two more boxes. We have a rustc frontend and an LLVM backend. Now, rustc we know is the Rust compiler. So, that term is not new. But LLVM might be new. So, what is this LLVM thing? LLVM is a library that it contains all of the common parts of compilers. Things that are shared among compilers no matter what language they're for. Optimizations, code generation for the machine code and so forth. As suggested by that previous slide, our back end for the Rust compiler is LLVM. But the LLVM library serves as the back end for many other languages, including Julia and Swift and many others. + +And notably, the... the Clang clang compiler, which is part of the project is the C compiler on the Macintosh. Okay. So, looking at our compiler model again. And let's once again connect our arrows for our input and output. And extend them inwards and connect the source code input to the rustc front end. And the source code is the input to our front end. And the output of our LLVM backend is our machine code. + +And that leaves one big gap. What is it that's between these two boxes? Well, it's another representation of a program. And because it is intermediate between the source and machine code, the LLVM authors have decided to call it intermediate representation. It's frequently abbreviated IR. So, we'll see frequently the term LLVM IR. That's this common currency between the rustc front end or the front end for any compiler that might be using LLVM, and the LLVM back end. + +All right. It's about time that I tell you about my project. So, I was working on something that is more or less the hello, world, of embedded. That the "Hello, World" of embedded is sometimes called Blinky. And the idea here is you take a microcontroller. So, here we have a bread board with a ATtiny 85 microcontroller. That's a particular AVR microcontroller. And we will go ahead and connect it to an LED. An LED is like a tiny light. + +So, we have our tiny computer wired up to our tiny light and we write a program for the microcontroller. And that program will make the light turn on and off. And on and off. And on and off. And in embedded, you tend to write things as an infinite loop. So, we're going to turn on and off our light forever. + +Well, I expect that, like me for most of you, the bulk of your software development experience is on the desktop or server. Perhaps modern mobile. Or a web context. And there are a few important things to know about writing software for an embedded context that make it slightly different from those other environments. + +So, the most important difference is that in an embedded context you have extremely limited resources. Now, we sort of generally probably know what that means in terms of memory, processing power and so forth. But let's look at, again, back at our computer model and see what... what is this limitation in resources imply for each of these components. So, first we'll look at the processer. And inside our processer, we have, again, hand waving quite a bit here some math related stuff that's going to be doing arithmetic perhaps. And some program control stuff that lets us do loops and conditional jumps and moving around in our program. And the big difference in embedded is that it's slower. And in some cases, it's less capable. For instance, the AVR devices that I'm generally developing for don't have floating point numbers. + +So, all arithmetic has to be done on integers. All right? Let's look at another component of our computer model, the peripherals. So, I said previously that the peripherals are the interface to the outside world. and let's look at a couple examples. We can consider a video streaming application where you might need to have access to a networking peripheral that you can use to fetch a video from some service. And then once you have the video, you want to show it to a user and so you would use some sort of vastly improve hardware which would be provided by a peripheral. Now, in an embedded context, you certainly could have a networking peripheral, although it's not universal. You might have a video peripheral, although it's somewhat rare. + +You tend to have access to peripherals that are much more limited. For instance, you could ask the computer to count to a number. And then let you know when it gets to that number. And this is a bit like when my kids are in the other room and I'm working on the dishes. And my kids want me to come in and play with them. And I need them to finish up what I'm doing and ask them to count to 20 and let me know when I get to 20. And we can do the same thing for our microcontroller for our little computer if we need to delay and perform some computation at a later point in time. We can ask the computer to count to a number for us and�let us know when it gets to it. + +All right. Let's take a look at the memory real briefly. So, we're going to be talking a bit more about how the stack will work. But let's sort of get a vague idea of what it is inside this memory. So, I have this little photo collage that I made up. And hopefully it is not completely misleading. But we have three broadly speaking... three parts of our memory. So, as we said previously, we're going to put our program into memory. We're gonna load program into memory. And so, that's... we see at the bottom, the brick wall at the bottom is our program that's been loaded. And any static variables, any static... static variables in the program would be there. And then we're going to have a heap potentially. And this is a bit like a program. This is where the program stores things that it may need later, but it doesn't need right now. It can throw them on the heap and then go back and look for them later. + +And finally, we have a stack. And there are really two models that might be useful for thinking about the stack. So, the first model is a bit... is more of an operational model. And this is where you are... you're thinking of the stack as sort of nested context for the program. So, we have... perhaps a pile of file folders on our desk. Now, we're departing a bit from our desk metaphor earlier. Because here the desk is not the registers. But using a slightly different metaphor. + +So, I'll have... perhaps I'm working on a client. I have my... the file folder open. And I have some pages. And then my boss comes in and is holding the file folder for a client that's more critical. And we open that up and set it down on top of the pile of papers on my desk. And we look at that one. And then the boss' boss comes in with another client's file folder and we open that up and put it at the very top of the stack. And then when we finish, the boss' boss' client, we close that file and take it away. And resume work on my boss' client. And then when we finish that up, do that work, and my boss leaves and then I can resume doing what I was doing before I got interrupted. + +And that's sort of the nested context view of the stack. The somewhat more physical view of the stack I like to think of as a stalactite. Growing from the roof of a cavern. And the reason I think of it this way is that in most... on most computers, the stack begins at the very top of the memory... memory. The very highest address and grows downwards. Grows down into memory. And so, it's a bit like a stalactite hanging from the roof of a cavern. + +And yeah. So, it's a bit like a stalactite hanging from the roof of a cavern where as we add additional context, our stalactite grows down. And then as we resume previous context, we shrink it back up. + +And the most important difference for the memory in an embedded context is that it's much... that there's... that it's significantly restricted. You have much less memory in most embedded devices. For instance, the ATtiny 10 that you saw on the very first picture of the AVR slides, that has 1 kilobyte of program memory. There are similar devices that have only 512 bytes of program memory. You must write program to fit in 512 bytes. That's a significant limitation when developing. + +For completeness, let's look inside the registers and talk about what the types of registers are. So, the main class of registers are these general purpose registers. And this is where we store the state that we're currently working on as I had mentioned previously in the desk metaphor. And we can sort of think of the general purpose registers as a pile of etch a sketches on a table. And any time that we need to ask the processer to do work for us, we have to write down the numbers on the etch a sketches and give them to the processer. So, if we need... if we want to... if we want to provide some numbers for the processer to work on, we need to go to our table and pick up an etch a sketch and ideally find a blank one. + +But if we don't find a blank one, we pick up one that's in use. We write down somewhere whatever number is on it and make it to erase it. And then we put our... the value of our number on it. And we need to write it down because we probably were using that previously. And so, when we're done with... with whatever we're doing immediately, we want to restore that etch a sketch to its previous state. And so, we want to know what was on it before we erase it so that we can put it back on. + +And this is a process that's referred to as clobbering the registers. So, when you're using a register that's already in use, you clobber it. And if it's important for you to maintain what was in it previously, then you have to save and restore that register. All right. We also have a few special purpose registers. And there are lots of these. But there are two... there are only two that we will talk about today. + +And the first one is called the status register. This tells you what the processer has been up to recently. For instance, if you do not math and the result is zero, the status register will tell you that the last math you did, the result is zero. Another important status register we'll talk about later today is a frame pointer. So, the frame pointer always points at the current function stack frame. It points at where on the stack we can find the local variables for the current function. + +And like I said, there are many other special purpose registers that we won't be talking about today. All right. So, that is the first difference. We took that opportunity to also expand our computer model a bit. But remember, the first significant difference for developing an embedded is that we have these very tight resource constraints. All right. The second difference is that in an embedded context we're working in, it's a no_std environment. What this means is that you don't have access to the standard library. You do have access to the Core library. The Rust Core library. And this is the parts of the Standard library that don't require memory allocations and don't require platform support. So, with the Core library, we don't have access to collections like a vector or a HashMap. And we don't have access to operating system things like a file system. + +But we still do have a lot of the basic functionality. In addition to the no standard context, we are... we're working in a Panic abort model frequently. Because unwinding the stack in the case is very expensive in terms of memory use and processer. So, in general, in developing for a context, we abort on panic rather than going into the stack. This leads to the third difference, which is limitations on debugs and observability. If we can't unwind a panic, then a panic occurs, we have a lot fewer clues about what is caused the panic to happen. + +But there are other debugging and observability limitations. So, I'm vaguely aware that there are these professional in circuit debugger systems. There's a standard called JTAG that I'm aware exists. But I've never used these myself. My impression, which might be a naive impression, but my understanding is that they're probably expensive, hard to set up, and probably windows only system. So, I never really looked into them. + +So, what that means is in general, when I'm developing for an embedded context, I'm using much more naive debugging methodology. Now, I tend to like to write my software using lots of unit tests with really good... making sure that I have a really good mental model of everything that's going on. But when I'm developing for embedded, I often start working on a project with the... like this meme shows. Of like my code doesn't work. I have no idea why and I changed something about it. + +And then it works, and I have no idea why. And it takes much longer when developing an embedded for me to really understand what it is that's making it work and not work. One third option that I've recently started looking at for debugging in an embedded context is simulating. So, now because these embedded devices are so limited, it's actually quite easy to run a program on your desktop computer that simulates the entirety of the embedded device. So, there's a wonderful one for AVR that's an open source project called Sim AVR. + +And it allows you to run a simulation of your AVR program. And you... it can produce these trace files as output which you can still load into a GTKWave or other trace visualizers. And here we can see a trace from a program I was simulating a few weeks ago. Running on an AVR device. + +Something else that's interesting about sim AVR, we can use the trace files adds inputs. The trace files can represent memory inside the processer. But they can also represent the state of pins on our chips. So, for instance, my ATtiny 85 that I love has 8 pins. Two of them are power. One of them is a reset pin. So, we have 5 general purpose pins to use. And so, you can see the state of those five pins. And if one of them is an input, you could provide a trace file that has those inputs. All right. So, that's the debugging and observability. Somewhat related to that is in an embedded context, your compiler is a cross compiler. You're generally running the Rust compiler on the same device that the program you're compiling runs on. But in an embedded context, you compile it on a host machine like your desktop computer and then you send that to the embedded device and run it on the embedded device. And there's a whole host of nuance to this. + +But it leads to things being a little trickier. Okay. So, let's get back to my project. It's a simple real world button handling library. Real world example for a button handling library. The details are not relevant, I'm not going to get into it. But this is the button handling library that I'm writing an example for. And the thing to note here is that it's... it uses embedded HAL, a hardware abstraction layer for the embedded context. We can write code for an embedded context that doesn't need to know details about the... about what hardware it will run on. Okay. + +So, we'll go back to Blinky. We're wiring up a microcontroller and connecting it to a an LED. But because I'm writing this example for a button handling library, I'll add a button. And the idea is that I can write an example that when I push the button, it will blink the light on and off. All right. So, I take my existing Blinky example, which I've run, and I know that it works. And I add in a little bit of... I add in a little bit of support for the button part. And I send that to my microcontroller, and I try it out. And nothing. It doesn't work. Okay. It's a very simple example. It's not much code. But still, there's a 95% chance that the... it's my fault. 95% chance the bug is in the code you just wrote. But in fact I'm using unsafe for this code because I'm using static mutable variables. So, it's actually more like a 99% chance that the bug's in my code. So, what are some other things that I'm thinking about? +Well, AVR is Tier 3. So, Tier 3 means that it's supported by the Rust compiler, but it's not supported. It's not built and tested automatically by the continuous integration server. And critically, as you can see from the docs here, it may not work. So, we're... everything's a bit up in the air when you're running on Tier 3. I'm also, as I mentioned previously, using static mut. Static mutable variables. And that's always unsafe. I don't know about you. But one reason I like writing Rust is I can almost always ignore unsafe code and not have to think about it. + +So, I'm not entirely confident that I'm using my unsafe code correctly here. One other thing to keep in mind is that AVR interrupt service routines are explicitly experimental. So, the code that I've written uses an interrupt service routine. This is experimental. There's a tracking issue in Rust that has no... no progress effectively made on it. And so, that means I have to compile with the Rust nightly and add a feature flag and all of those sort of add confounding factors to the debugging process. + +Okay. I said that AVR interrupts are experimental. I said I'm use�interrupts. But what is an interrupt? An interrupt, it's a lot like a function call. But instead of one piece of code calling another function, it's a bit like the world is what's calling your function. And, of course, as we saw in the model previously, the world here is the peripherals. So, we have a function call. We have a function on the left, A, and the function on the right, B. And A does some things and calls into B. And when B is done, it returns back to where an A... it was running. And what does this look like in terms of what we were talking about with the stack? Well, there's this thing called a calling convention. And the calling convention says some of the working registers need to be saved by the function that calls... that is doing the calling. + +And so, we're gonna do that first. We're gonna save working registers that we need to save. And then we jump over into the function. And then the first thing inside the function is that we may need to save the other working registers. These are the callee saved registers. Which registers are saved in step 1 or step 3 are determined by the calling convention. But the important thing to note here is that some are done in 1 and some are done in 3. Okay. + +So, let's look at the same thing for an interrupt. We have function A on the left and an interrupt service routine on the right. And function A is just doing some local things. But the world says, oh, wait. Let's call in to our interrupt service routine. Perhaps the computer finished counting to 20. Or perhaps finished counting to 256. And so, now we've jumped into our interrupt service routine. Or perhaps we've received a byte on a network interface. But something in the world... something in a peripheral has determined that we need to service this interrupt. So, we go into the interrupt service routine. And then when it's done, we return back to wherever it was that we were interrupted. + +All right. So, what is different for our calling convention for an interrupt service routine as opposed to a regular function? Well, the first thing is that we don't... we're not jumping into a function on step 2. We're jumping into an interrupt service routine. And the second thing is, because the function that's getting interrupted doesn't know that it's going to be interrupted, it can't perform step first... step 1 first. The function that's being interrupted doesn't know that it needs to save the working registers. So, we need to move step 1 down to step 2. And so, now the first... now the first thing we do inside our interrupt service routine needs to be save those working registers that would otherwise be saved by the caller. + +Okay. So, I have some code and it's not working, and I think it should. And so, I'm making a bunch of changes. To try to get it to work or not work. And some of the things that make the bug appear and disappear are moving my interrupt service routine code into main. Well, this maybe is a clue, but doesn't provide a lot of info because we know ISR... or ISR is experimental. And also, just interrupts are a bit hard to identify what's going on. + +We also, if I change an inline annotation to inline never, that makes the bug disappear. Now, this is curious. I've heard of there being compiler bugs related to inlining. But that means it's not my bug. It's not a bug in code I wrote. That implies it's a bug in the compiler. And it's never a bug in the compiler. So, that's very curious. Another thing that can make the bug disappear is adding a map error to unit before I unwrap. So, this is basically I built a conveyor belt and any errors coming down, I just throw into the trash.�be it's important to note that this error never actually gets called. This map error never actually gets called. So, adding... I'm adding code that doesn't get called and that changes the observable behavior of the program. + +All right. Let's dig into that just a bit more. So, here I have the... an example of my broken code. It's an interrupt. We call it PCINT0 because it's an interrupt on the pin change. If our pin changes value, it will jump into our interrupt. And so, online 3 we toggle our LED. And then online 4, we unwrap. Well, why is it that toggle returns a result? This goes back to the embedded HAL. Embedded HAL has a number of traits that different pieces of hardware can implement. And in this case, we're talking about an output pin trait and it returns a result because on some platforms, attempting to set an output pin can fail. But not on AVR. On AVR, the error type used for this trait is infallible, it's void. And so, this result can never be the error case. We know statically the result is never the error case. So, the unwrap never actually gets called. Or the error case of the unwrap never gets called. + +All right. So, like you said earlier, we can make this work by mapping our error to unit. Now, it's not entirely clear why throwing away the original error and replacing it with an empty error would make this broken code work, particularly because I know statically that error case never actually happens. We can also replace that with a panic. If we explicitly panic, rather than panicking when we unwrap an error case, then it works. + +So, I don't, myself, see a semantic different here between this panic version and the original broken version. So, I'm gonna say, I found a bug. Found a bug in the compiler. We need to minimize our reproduction of the bug. So, we're gonna remove all of our external references. We're gonna remove, of course, the crate that I was writing example code for. We're gonna remove any other crates that I make use of. I'm going to remove references to the core library where possible. Because I'm trying to eliminate anything non essential. + +And finally, I'm gonna remove the memory shenanigans around my unsafe fat static mutable because I want to eliminate anything that could possibly distract from the bug. And I do this by copying my working version of the code to a file called A dash working. And the broken version to a file called A dash broken. And I make the same change to both of them and compile them and send them, and make sure that the working is working and the broken is broken. And make the incremental changes repeatedly until it looks like the Y case, X or Y, I get to Y and I have what I think is a minimal reproduction. And the minimal difference here looks a lot like my... my minimal difference before. But I've removed all of the core library code and the crates that are in use. And just by changing reference to... to an error to a reference to unit, I can make the bug appear and disappear. + +And so, I've confirmed a bug. I have a minimal repro. So, I file a Rust issue. And I sit for a little bit. And nobody comments on it. I guess people have other things to do. So, I say I'm gonna go ahead and dig into this. Now, I'm vaguely aware that Rust uses LLVM. And I've done some messing around with LLVM in the past. So, I think, let's take a look at the LLVM IR that's generated for this code. + +And so, I use this incantation, RUSTFLAGS = emit = LLVM ir. And the Rust compiler�dutifully complies and emits LLVM. And it's not important to understand this in detail. But we can see what the broken code's difference is. And the main difference is that we have this one alloca. We have an alloca, and that makes the code fail. What is this alloca that we see? We're reserving space on the stack for the local variable. + +So, we said previously, our stack has a stack frame for each function. And if that function has a local variable, we need to reserve space on the stack frame for that local variable. And that's what an LLVM alloca instruction does. But why is reserving space break my code? Break my example? +So, we need to keep digging. So, let's take a look at some assembler. An assembler is a textual representation of the machine code we were talking about earlier. It's as close as you can get to really understanding exactly what the machine sees without reading the binary itself. And the Rust compiler has a flag to emit assembler as well. And looks similar to the LLVM flag. We asks it to emit assembler, and the Rust compiler dutifully complies, and we get this code, which is significantly different from our broken code. That's a lot more... a lot more significant differences. + +And they come in three main sections. So, this first section where we push some things on to the stack and do some in and out +And do with into exactly what this is doing a little bit later. And then a second and third section with pops and out. Let's walk through this. But first we need a little bit more information how to read this AVR assembler. So, the push statement, like I alluded to a moment ago, we're pushing a register value on to the stack. We take some value on the register and put it on the stack to save it for later. And when we want to get it back, we pop it. That takes it off of the stack and puts it back in our register. We have an in operation which will take a value from a... one of the special registers. And put it into a general purpose register. And we have an out command that will take a value from a special register... I apologize... take a value from a general register and put it into a special register. So, push and pop take register values to and from the stack. And in and out take register values to and from special registers for the purpose of this talk. They do other things too. + +We can also clear a register. And we can disable interrupts with the CLI command. So, disabling interrupts, that tells the... tells the machine to not interrupt us so that we can run some code without being interrupted. And we also call that clearing the interrupt flag. And it's worth noting that on AVR that interrupt flag is in the status register we were talking about earlier. All right. One other important concept before we dig in here. We have a Prolog and epilogue for every function. And these bookend the body of the function. And they provide that calling convention that I described earlier. + +And it's important that these fragments mirror each other because they tend to use the stack to implement their... the saving and restoring of registers. They need to mirror each other. Well, what does that exactly mean to mirror each other? Well, here we see the Prolog and the epilogue of our working code. All right. So, let's read this from the outside in. So, we're gonna start with a push and pop on register zero so if we're starting from the top of the function online 2, we push register zero on to our stack. + +And then we push register 1 on to the stack. And then we're going to perform this sequence. So, 63 online 4, the constant 63 refers to the special register, the status register. So, we're going to read in the status register. And we're gonna push it on to the stack. So, now our stack has register zero's prior value, register 1's prior value and the status register's prior value. And then we push register 24 on to the stack. All right. The reason that 24 is down below and r0, r1 and status register are up above is because register zero and 1 are the caller saved registers. And the only reason we are saving them here is because we're in an interrupt. If we were in a regular function, this Prolog would start with line 7. + +Okay. And then we go do lines 8 through 10, which are the body of the function. And then perform the epilogue. We pop 24 you are a of the stack. We pop the status value off of the stack and we use an out command to put it back in. Now our status special register has its prior value. And then we pop register 1 and register zero such that at the end of this interrupt we have now restored all of our registers... our special and general registers and we can return. All right. + +What's different about the broken code? So, on the broken code, we have the same sequence at the start for the prologue. So, we can push register zero, 1, status register and 24. And then we have a few more callee saved registers that we'll push on to the stack. 25, 28, 29. Because our broken version of the interrupt service routine clobbers three additional registers. Okay. And then towards the end, we'll pop registers 29, 28 and 25 off the stack. + +And finally, we have the epilogue from our working... where... from our working code. And this epilogue pops 24. It pops our status value, and it pops register 1 and zero. But note that sequence is interrupted by this other sequence. So, that's a bit mysterious. And we note that this is the sequence to adjust the frame pointer. So, before we walk through that, let's walk through the corresponding sequence from the prologue. + +So, first we're going to read in from 61 and 62. And we're gonna store that on registers 28 29. We said previously we clobber 28 and 29. Here is where we do that. 61 and 62 are the addresses of the special registers for the frame pointer. So, we read in the frame pointer and we... so, this is what we're supposed to do here. So, we'll read in the frame pointer into register 28 and 29. And then we subtract 1 from it online 31. We subtract 1 from the frame pointer. And then we go ahead, and we send that updated version of the frame pointer. So, subtracting 1 is what's allocating space on the stack. And then on 16 and 18 we are going to put our updated version of the frame pointer into the frame pointer special register. + +All right. We note that little sequence is interrupted by this section, 14, 15 and 17. And this is a miniature version of saving and storing the status register. So, on 14 we saved the status register into register zero. On 15, we clear interrupts so that we can perform our pushing of the frame pointer without being interrupted. And then on 17, we restore the status register, restoring the value of the interrupt flag. + +Something curious to note that 17 is before 18 because you get an extra instruction free when you enable interrupts. All right. And then what is it supposed to do to restore the frame pointer in the epilogue? Well, we'll go ahead and add one to our updated frame pointer in registers 28 and 29. And that's going to... that's going to restore that value to the... to what it was prior to entering our interrupt service routine. + +And then we'll output that value into our frame pointer. And at the end, our frame pointer will have its original value. And you can see, we have the same little status register and interrupt clearing. + +All right. That's what it's supposed to do. But we note that we break symmetry here. In the prologue, we push into 28 and 29 and then we read into our... read in our frame pointer. In the epilogue, we pop from 28 and 29 and then we send out to our frame pointer. So, we have a push and an in followed by a pop and an out. But if this were symmetric, if this mirrored collectively, it should be push in, out pop. The epilogue is in the wrong order. Let's see what that actually means. So, what is it actually doing? Well, as we saw previously, first we push the 28 and 29 registers on to the stack. And we read in the frame pointer on lines 11 and 12. And subtract 1 from it and send that back out to the frame pointer. + +So, our prologue is fine. But then in our epilogue, well, first on 22 and 23 we pop our values from registers 28 and 29. And then online 28 we add one to that value. And then on 31 and 32, we put that value into our frame pointer register. And we note that's not the prior value of the frame pointer register. That value is a completely unrelated value based on the previous value of some unrelated registers. So, we have now confirmed we have a bug in LLVM. So, I file an issue. And here is a screenshot of the issue in LLVM's bug repository. + +And I sit on it for a bit. And I wonder, who's gonna fix it? Well, Hermes Conrad, one of my favorite characters from Futurama, if you want it, do it. + +I'm going to breeze through. Don't get overwhelmed. This is C++ code, but mostly concerned about the comments. So, we see that we have special epilogue code with registers 1, zero and the status register. That sounds familiar. And then we see this early exit if there's no need to restore the frame pointer. And I recall, restoring the frame pointer, if we don't need to restore a frame pointer, the code works. If we do need to restore the frame pointer, the code doesn't work. So, this triggers something in my mind. + +And then we see that we're gonna skip the callee save pop instructions and then insert our frame pointer restore code. All right. Let's match this up quickly to what we had... what we saw in our assembler. We saw our emit the special epilogue code here. And we see that's the same as this special epilogue, restoring the status register and register 1 and zero. And we see we restore the frame pointer by doing this arithmetic. And that matches up to our... the sequence we walked through a few minutes ago. + +So, now that gets to this bit in the middle. This question of where do we insert the frame pointer restoration? Well, we're gonna do a loop. Here MBBI starts with the end of the function, and as long as we haven't reached the beginning of the function, we step backwards through the function. And we check. If the op code, if the instruction is a pop, then if the current instruction is a pop, then we continue. If it's not a pop, then we will break out of our loop. Well, what does it look like in our broken code? Here's the broken code before we insert our frame pointer restoration. And we start on 29. That's a pop. So, we keep going. We see 28, that's still a pop, keep going. We see 27. 27 is not a pop. So, we uncertainty our frame pointer restoration code there. + +And that's what leads to our pops... or our frame pointer being restored later than it should. And we see lines 22 and 23 really need to be after the frame pointer restoration. So, I can go ahead and make a fix now that I've figured it out. It's actually quite straightforward once I worked through all of that. I pull out a function to restore the status register from this special epilogue code. And in the case of an early exit, restore the status register then. And otherwise, restore the status register at the very end. + +And I contribute that to LLVM. First write the fix. But, oh, I probably need to write a test to make sure that it works. And before that, I need to compile LLVM which is itself the subject of perhaps a full talk. And then I submit the patch to LLVM. Here is a screenshot of the Phabricator interface that LLVM uses. And I... I get Dylan McKay fortunately had the time to review my patch and committed it. And so, I appreciate that. Thanks again, Dylan. + +So, it's fixed! The bug has been fixed in LLVM and now I want to contribute it to Rust. So, rust keeps a fork of LLVM. So, we cherry pick the fix into that fork and then need to update the Rust compiler. And after a couple PRs get landed, finally the Rust bug has been fixed. + +Hooray! All right. So, what are my next steps? There are several other outstanding AVR issues. Including as you can see several that relate to AVR interrupts. And now that I've worked through stepping through the assemble her that's generated and working through the code that generates that assembler, I feel a little bit of a responsibility to take a look at these bugs. I haven't had time to yet. But I hope to soon. +Well, that was a whirlwind. But thank you very much for listening and for your patience with my technical difficulties at the beginning. Hopefully we have a couple minutes to take a few questions if anyone would like to hear anything more. Thank you. + + +**Inaki:** +Andrew. Thank you for that incredible talk which can only be called epic. + +**Andrew:** +Thanks. + +**Inaki:** +That was an epic. Wow. So, AVR maybe tier 3, but your patience man is like God tier. Not only all you've done, but also with like handling all the tech issues and talking just, you know, I can't believe... so, thank you so much for your patience, actually. We're just like, this is gravy for us. So, I do have a few questions. +First, AVR is quite a new target, right? + +**Andrew:** +That's right. + +**Inaki:** +How are you finding it, and have you tried things like ST 32 targets? + +**Andrew:** +I have. I've messed around a little bit with the other embedded targets. I haven't done the STM 32. For anyone in the audience not familiar, that's the target that the Rust embedded intro book talks through using a board called the discovery board. I have on my list of too many things to do I have the goal of picking up one of those discovery boards and working through that. But I haven't had the opportunity to do that. Yeah. I have done a little bit of arm development. ARM is another embedded platform. I've done a little bit with Rust. +But honestly, I've done very little Rust development at this point. Most of my embedded experience is with C. Programming with C I've never liked, and it's always been frustrating. I'm very grateful that the Rust embedded community is working so hard on making Rust... and the Rust compiler contributors are working so hard to make Rust a viable option for embedded. Because it's... there's a lot of potential there. + +**Inaki:** +Absolutely. Do you know of a good reference for AVR assembly? + +**Andrew:** +I... the AVR documentation generally is pretty good. Directly, it's often hard to find the right PDFs. But once you find them, they... they tend to be pretty solid. So, the AVR... AVR is actually a very limited platform. It's much more limited than, for instance, ARM. And so, the documentation... the assembler referenced is quite complete. So, if you search for the AVR assembler reference guide, the... and I could drop a link to it in the slides when I release those. But that guide is quite complete. +The other resource that I found to be incredibly helpful is a... is a forum called AVR Freaks. And this is a forum that a bunch of people who love programming for AVR answer all kinds of questions. So, almost any question that I have has already been answered on that platform... on that forum on one post or another. And so, you know, coming up with search results on AVR Freaks is fantastic. And then the third resource I would suggest is the AVR... the libc documentation for AVR also contains a lot of nuggets that are very useful for illuminating how things actually work on the AVR platform. + +**Inaki:** +Cool. This is an interesting question. Do you think that making types of a standard library less dependent on the global allocator would make your job easier in any way? + +**Andrew:** +Certainly. Yeah. That's a great point. Yeah. I have... I have not yet experimented with using the allocator. You can... so, I was previously in my talk I was talking about you don't have access to the standard library, but you have access to the core library. And then there's a middle ground there where you have the lib veloce that can give you access to collections like vector and HashMaps and so forth. And you can theoretically compile that for a context. I don't with that. I work on the ATtinys which are extremely limited and you... it's almost always worth doing analysis ahead of time to make sure that you don't run out of memory. And that analysis is significantly harder to do if you're using the heap. +So, my programs on embedded almost never do I think about reaching for the heap. Because it seems like it's going to be creating a lot more problems than it would solve. I think on other embedded devices it probably is more... it's much more relevant. Particularly, for instance, ARM obviously. But other more capable platforms I think using... using an allocator makes a lot of sense. I also... not related to AVR, but I feel like the reliance on a global allocator for the standard library means that other... other context... the other context I do a lot of my development in is high performance web development. And that's a place where being able to use, for instance, a slab allocator on a per object basis would be incredibly valuable. +But, again, I think that the, you know, I'm getting into the weeds for something not related to this talk. but I do see the user experience benefit of having it... having those standard libraries based on a global allocator probably outweighs the technical benefits for these niche use cases. + +**Inaki:** +Cool, cool. There are a few more questions, but we're really running out of time. So, so, maybe you could answer them in chat or later on. So, once again, thank you so much, Andrew, for that epic talk. + +**Andrew:** +Yeah, glad to. Thanks, everyone. Have a good day. + diff --git a/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch.txt b/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch.txt deleted file mode 100644 index ac2a984..0000000 --- a/2020-global/talks/03_LATAM/06-Andrew-Dona-Couch.txt +++ /dev/null @@ -1,118 +0,0 @@ - Tier 3 Means Getting Your Hands Dirty - Andrew Dona Couch - - Yuli: Hello again. Well, we have a little time left before the conference ends. But we have a great talk up by Andrew Dona Couch. Andrew is a freelance software developer who runs a startup designing and building consumer electronic widgets. He works had w his clients and makes Rust, selfishly. And working on calls in the middle of the night. And he comes to you with a story for the age of when some embedded Rust was broken and what he did to find a fix in the compiler. So, welcome, Andrew. - >> Andrew Dona Couch will now go to the farthest reach of Rust to show if you're willing to get your coding feet wet, tier 3 has some room to grow. - Andrew: Good afternoon. Welcome to RustFest Global 2020. My name's Andrew Dona Couch. This talk is Tier 3 Means Getting Your Hands Dirty. I'm a freelance software developer, I'm a big fan of Rust. And I love playing around with hardware. And recently hardware and Rust. On social media generally @couchand. And previously mentioned, this talk is about my experiences finding and fixing a bug within the Rust compiler. Within the Rust compiler, within a library used by Rust called LLVM. And this bug only comes up on the AVR platform. - So, what's AVR? AVR is a series of 8 bit microcontrollers. These are tiny computers. They are an older series. They're cheap and very hacker friendly. And support for AVR was mainlined into the Rust compiler this past summer. Into Rust andLLVM thanks to a significant amount of effort by a gentleman, Dylan McKay. And you can see the AVR line ranges here. [lost audio]... - >> Sorry for the interruption. We seem to have a sound problem. - Inaki: Hi, everyone. Sorry. Give us a minute while we clear this up. - Andrew: Like this, we can take a complex system like this self operating napkin from Rube Goldberg. And... [no sound] - >> Sorry, this fix hasn't worked. We're working on it. - Andrew: Testing. - >> Yes, this works somewhat. Nice debugging. - Andrew: Testing. - >> So, this might be a good moment to mention the snake game again. So, while you wait, you can go down and then the button down on the far left is the snake game. So, you can play multiplayer snake with each other while we try to fix the setup here. - Andrew: Okay. Testing. - >> Yes, Andrew, we can hear you. - Andrew: My apologies. I... - >> No problem. - Andrew: My cat chewed through my power cord and so, it was not charging. All right. - >> The risks of feline. - Andrew: Right. - >> Would you mind going back a couple of slides because we lost you quite some time ago. - Andrew: Sure. Yep! - >> Yes. All right. We can't hear you at the moment. - Andrew: How about now? My apologies for all of that. All right. So, I'm at a strange angle at the moment. Hopefully we can make this work. So, just to briefly go over these ground rules again, I'm going to try to show everybody respect today. Hopefully I succeed at that. We'll also be talking about things that I'm relatively an amateur about. And so, bear with me. And most importantly, I'm going to be oversimplifying because I would like to cover a lot of material. And due to those technical issues, we're, of course, now a few minutes behind from our... from our plan. - All right. So, with that in mind, what are these simplified models that I'm going to be talking about? The first example is a black box model. We'll be talking about some complex system. Take this Rube Goldberg self operating napkin, and we're going to put this system inside of a box. And the reason we do that it because we want to analyze this complex system, no terms of the details of the inside, the implementation, but rather in terms of its inputs and its outputs. Its relationship to the outside world. That's a bit abstract. So, let's look at a couple examples. - We can think of a computer as being a box. And the input to our computer is some kind of program. And the output will say... hand waving a bit here... but that we have two outputs. Computation and effects on the world. All right. This is a technical conference. So, let's look inside the box and we find that there are four additional boxes. There's... we'll talk about each of these four in turn a little bit later. But right now I'll give you just a brief sense of what they... what they are. So, on the left we see we have a registered... we have a set of registers and some memory. And on the right our two boxes are called processer and peripherals. - So, just to get a vague idea what have each of these are, we'll use the metaphor of an old school paper pushing office. And you have someone sitting at a desk moving paperwork around. And that person is the brains of the operation and that's analogous to the promoter. The desk in front of them, that's covered in the paper work they're currently working on is something like the registers. The memory is a bit like a file cabinet that holds the paperwork that they need, they need to have access to, but they don't need it immediately at hand. And finally, the peripherals in this metaphor are something like an intercom sitting on top of the desk where if we need to connect with the outside world, if we need to interface with the outside world, ask for a cup of coffee, we can push the button on the intercom and request support. - All right. So, let's connect our arrows from earlier. We said that the computation comes from... we said the process is the brains of the computer. The computation can be thought of as coming from there. The peripherals are the interface to the outside world. We can think of our effects on the world as coming on those peripherals. And then program will... we will load that into memory and then we will go ahead and start executing from there. - All right. Let's look at another example of a black box. This one is a compiler. And we can think of a compiler as a box. And the program is the input to the compiler. And since this is a Rust conference, our compiler presumably is rustc, the Rust compiler. And our program is the Rust source code for... the Rust source code for our program. All right. - And we can look at the output of our compiler. And now we've reached a really interesting point. What is the output of this compiler? So, again, let's not define it based on its intrinsic properties, but rather use another simplifying model. So, here we'll define this output by how we use it. And we'll see that we use it by passing it as the input to our computer box from previous. And we already said that input is a program. So, we've now determined that this compiler is a box. - The compiler is something that takes a program as input and produces a program as output. Now, this is a bit confusing, so, we'll be more specific about this. We have two different representations of the same program. And the input representation, we already called it source code. And the output representation is the one that's meant for the machine, the computer machine. And so, we call it machine code. All right. - So, again, this is a technical conference. Let's look inside this compiler. And we see two more boxes. We have a rustc frontend and an LLVM backend. Now, rustc we know is the Rust compiler. So, that term is not new. But LLVM might be new. So, what is this LLVM thing? LLVM is a library that it contains all of the common parts of compilers. Things that are shared among compilers no matter what language they're for. Optimizations, code generation for the machine code and so forth. As suggested by that previous slide, our back end for the Rust compiler is LLVM. But the LLVM library serves as the back end for many other languages, including Julia and Swift and many others. - And notably, the... the Clang clang compiler, which is part of the project is the C compiler on the Macintosh. Okay. So, looking at our compiler model again. And let's once again connect our arrows for our input and output. And extend them inwards and connect the source code input to the rustc front end. And the source code is the input to our front end. And the output of our LLVM backend is our machine code. - And that leaves one big gap. What is it that's between these two boxes? Well, it's another representation of a program. And because it is intermediate between the source and machine code, the LLVM authors have decided to call it intermediate representation. It's frequently abbreviated IR. So, we'll see frequently the term LLVM IR. That's this common currency between the rustc front end or the front end for any compiler that might be using LLVM, and the LLVM back end. - All right. It's about time that I tell you about my project. So, I was working on something that is more or less the hello, world, of embedded. That the "Hello, World" of embedded is sometimes called Blinky. And the idea here is you take a microcontroller. So, here we have a bread board with a ATtiny 85 microcontroller. That's a particular AVR microcontroller. And we will go ahead and connect it to an LED. An LED is like a tiny light. - So, we have our tiny computer wired up to our tiny light and we write a program for the microcontroller. And that program will make the light turn on and off. And on and off. And on and off. And in embedded, you tend to write things as an infinite loop. So, we're going to turn on and off our light forever. - Well, I expect that, like me for most of you, the bulk of your software development experience is on the desktop or server. Perhaps modern mobile. Or a web context. And there are a few important things to know about writing software for an embedded context that make it slightly different from those other environments. - So, the most important difference is that in an embedded context you have extremely limited resources. Now, we sort of generally probably know what that means in terms of memory, processing power and so forth. But let's look at, again, back at our computer model and see what... what is this limitation in resources imply for each of these components. So, first we'll look at the processer. And inside our processer, we have, again, hand waving quite a bit here some math related stuff that's going to be doing arithmetic perhaps. And some program control stuff that lets us do loops and conditional jumps and moving around in our program. And the big difference in embedded is that it's slower. And in some cases, it's less capable. For instance, the AVR devices that I'm generally developing for don't have floating point numbers. - So, all arithmetic has to be done on integers. All right? Let's look at another component of our computer model, the peripherals. So, I said previously that the peripherals are the interface to the outside world. and let's look at a couple examples. We can consider a video streaming application where you might need to have access to a networking peripheral that you can use to fetch a video from some service. And then once you have the video, you want to show it to a user and so you would use some sort of vastly improve hardware which would be provided by a peripheral. Now, in an embedded context, you certainly could have a networking peripheral, although it's not universal. You might have a video peripheral, although it's somewhat rare. - You tend to have access to peripherals that are much more limited. For instance, you could ask the computer to count to a number. And then let you know when it gets to that number. And this is a bit like when my kids are in the other room and I'm working on the dishes. And my kids want me to come in and play with them. And I need them to finish up what I'm doing and ask them to count to 20 and let me know when I get to 20. And we can do the same thing for our microcontroller for our little computer if we need to delay and perform some computation at a later point in time. We can ask the computer to count to a number for us andlet us know when it gets to it. - All right. Let's take a look at the memory real briefly. So, we're going to be talking a bit more about how the stack will work. But let's sort of get a vague idea of what it is inside this memory. So, I have this little photo collage that I made up. And hopefully it is not completely misleading. But we have three broadly speaking... three parts of our memory. So, as we said previously, we're going to put our program into memory. We're gonna load program into memory. And so, that's... we see at the bottom, the brick wall at the bottom is our program that's been loaded. And any static variables, any static... static variables in the program would be there. And then we're going to have a heap potentially. And this is a bit like a program. This is where the program stores things that it may need later, but it doesn't need right now. It can throw them on the heap and then go back and look for them later. - And finally, we have a stack. And there are really two models that might be useful for thinking about the stack. So, the first model is a bit... is more of an operational model. And this is where you are... you're thinking of the stack as sort of nested context for the program. So, we have... perhaps a pile of file folders on our desk. Now, we're departing a bit from our desk metaphor earlier. Because here the desk is not the registers. But using a slightly different metaphor. - So, I'll have... perhaps I'm working on a client. I have my... the file folder open. And I have some pages. And then my boss comes in and is holding the file folder for a client that's more critical. And we open that up and set it down on top of the pile of papers on my desk. And we look at that one. And then the boss' boss comes in with another client's file folder and we open that up and put it at the very top of the stack. And then when we finish, the boss' boss' client, we close that file and take it away. And resume work on my boss' client. And then when we finish that up, do that work, and my boss leaves and then I can resume doing what I was doing before I got interrupted. - And that's sort of the nested context view of the stack. The somewhat more physical view of the stack I like to think of as a stalactite. Growing from the roof of a cavern. And the reason I think of it this way is that in most... on most computers, the stack begins at the very top of the memory... memory. The very highest address and grows downwards. Grows down into memory. And so, it's a bit like a stalactite hanging from the roof of a cavern. - And yeah. So, it's a bit like a stalactite hanging from the roof of a cavern where as we add additional context, our stalactite grows down. And then as we resume previous context, we shrink it back up. - And the most important difference for the memory in an embedded context is that it's much... that there's... that it's significantly restricted. You have much less memory in most embedded devices. For instance, the ATtiny 10 that you saw on the very first picture of the AVR slides, that has 1 kilobyte of program memory. There are similar devices that have only 512 bytes of program memory. You must write program to fit in 512 bytes. That's a significant limitation when developing. - For completeness, let's look inside the registers and talk about what the types of registers are. So, the main class of registers are these general purpose registers. And this is where we store the state that we're currently working on as I had mentioned previously in the desk metaphor. And we can sort of think of the general purpose registers as a pile of etch a sketches on a table. And any time that we need to ask the processer to do work for us, we have to write down the numbers on the etch a sketches and give them to the processer. So, if we need... if we want to... if we want to provide some numbers for the processer to work on, we need to go to our table and pick up an etch a sketch and ideally find a blank one. - But if we don't find a blank one, we pick up one that's in use. We write down somewhere whatever number is on it and make it to erase it. And then we put our... the value of our number on it. And we need to write it down because we probably were using that previously. And so, when we're done with... with whatever we're doing immediately, we want to restore that etch a sketch to its previous state. And so, we want to know what was on it before we erase it so that we can put it back on. - And this is a process that's referred to as clobbering the registers. So, when you're using a register that's already in use, you clobber it. And if it's important for you to maintain what was in it previously, then you have to save and restore that register. All right. We also have a few special purpose registers. And there are lots of these. But there are two... there are only two that we will talk about today. - And the first one is called the status register. This tells you what the processer has been up to recently. For instance, if you do not math and the result is zero, the status register will tell you that the last math you did, the result is zero. Another important status register we'll talk about later today is a frame pointer. So, the frame pointer always points at the current function stack frame. It points at where on the stack we can find the local variables for the current function. - And like I said, there are many other special purpose registers that we won't be talking about today. All right. So, that is the first difference. We took that opportunity to also expand our computer model a bit. But remember, the first significant difference for developing an embedded is that we have these very tight resource constraints. All right. The second difference is that in an embedded context we're working in, it's a no_std environment. What this means is that you don't have access to the standard library. You do have access to the Core library. The Rust Core library. And this is the parts of the Standard library that don't require memory allocations and don't require platform support. So, with the Core library, we don't have access to collections like a vector or a HashMap. And we don't have access to operating system things like a file system. - But we still do have a lot of the basic functionality. In addition to the no standard context, we are... we're working in a Panic abort model frequently. Because unwinding the stack in the case is very expensive in terms of memory use and processer. So, in general, in developing for a context, we abort on panic rather than going into the stack. This leads to the third difference, which is limitations on debugs and observability. If we can't unwind a panic, then a panic occurs, we have a lot fewer clues about what is caused the panic to happen. - But there are other debugging and observability limitations. So, I'm vaguely aware that there are these professional in circuit debugger systems. There's a standard called JTAG that I'm aware exists. But I've never used these myself. My impression, which might be a naive impression, but my understanding is that they're probably expensive, hard to set up, and probably windows only system. So, I never really looked into them. - So, what that means is in general, when I'm developing for an embedded context, I'm using much more naive debugging methodology. Now, I tend to like to write my software using lots of unit tests with really good... making sure that I have a really good mental model of everything that's going on. But when I'm developing for embedded, I often start working on a project with the... like this meme shows. Of like my code doesn't work. I have no idea why and I changed something about it. - And then it works, and I have no idea why. And it takes much longer when developing an embedded for me to really understand what it is that's making it work and not work. One third option that I've recently started looking at for debugging in an embedded context is simulating. So, now because these embedded devices are so limited, it's actually quite easy to run a program on your desktop computer that simulates the entirety of the embedded device. So, there's a wonderful one for AVR that's an open source project called Sim AVR. - And it allows you to run a simulation of your AVR program. And you... it can produce these trace files as output which you can still load into a GTKWave or other trace visualizers. And here we can see a trace from a program I was simulating a few weeks ago. Running on an AVR device. - Something else that's interesting about sim AVR, we can use the trace files adds inputs. The trace files can represent memory inside the processer. But they can also represent the state of pins on our chips. So, for instance, my ATtiny 85 that I love has 8 pins. Two of them are power. One of them is a reset pin. So, we have 5 general purpose pins to use. And so, you can see the state of those five pins. And if one of them is an input, you could provide a trace file that has those inputs. All right. So, that's the debugging and observability. Somewhat related to that is in an embedded context, your compiler is a cross compiler. You're generally running the Rust compiler on the same device that the program you're compiling runs on. But in an embedded context, you compile it on a host machine like your desktop computer and then you send that to the embedded device and run it on the embedded device. And there's a whole host of nuance to this. - But it leads to things being a little trickier. Okay. So, let's get back to my project. It's a simple real world button handling library. Real world example for a button handling library. The details are not relevant, I'm not going to get into it. But this is the button handling library that I'm writing an example for. And the thing to note here is that it's... it uses embedded HAL, a hardware abstraction layer for the embedded context. We can write code for an embedded context that doesn't need to know details about the... about what hardware it will run on. Okay. - So, we'll go back to Blinky. We're wiring up a microcontroller and connecting it to a an LED. But because I'm writing this example for a button handling library, I'll add a button. And the idea is that I can write an example that when I push the button, it will blink the light on and off. All right. So, I take my existing Blinky example, which I've run, and I know that it works. And I add in a little bit of... I add in a little bit of support for the button part. And I send that to my microcontroller, and I try it out. And nothing. It doesn't work. Okay. It's a very simple example. It's not much code. But still, there's a 95% chance that the... it's my fault. 95% chance the bug is in the code you just wrote. But in fact I'm using unsafe for this code because I'm using static mutable variables. So, it's actually more like a 99% chance that the bug's in my code. So, what are some other things that I'm thinking about? - Well, AVR is Tier 3. So, Tier 3 means that it's supported by the Rust compiler, but it's not supported. It's not built and tested automatically by the continuous integration server. And critically, as you can see from the docs here, it may not work. So, we're... everything's a bit up in the air when you're running on Tier 3. I'm also, as I mentioned previously, using static mut. Static mutable variables. And that's always unsafe. I don't know about you. But one reason I like writing Rust is I can almost always ignore unsafe code and not have to think about it. - So, I'm not entirely confident that I'm using my unsafe code correctly here. One other thing to keep in mind is that AVR interrupt service routines are explicitly experimental. So, the code that I've written uses an interrupt service routine. This is experimental. There's a tracking issue in Rust that has no... no progress effectively made on it. And so, that means I have to compile with the Rust nightly and add a feature flag and all of those sort of add confounding factors to the debugging process. - Okay. I said that AVR interrupts are experimental. I said I'm useinterrupts. But what is an interrupt? An interrupt, it's a lot like a function call. But instead of one piece of code calling another function, it's a bit like the world is what's calling your function. And, of course, as we saw in the model previously, the world here is the peripherals. So, we have a function call. We have a function on the left, A, and the function on the right, B. And A does some things and calls into B. And when B is done, it returns back to where an A... it was running. And what does this look like in terms of what we were talking about with the stack? Well, there's this thing called a calling convention. And the calling convention says some of the working registers need to be saved by the function that calls... that is doing the calling. - And so, we're gonna do that first. We're gonna save working registers that we need to save. And then we jump over into the function. And then the first thing inside the function is that we may need to save the other working registers. These are the callee saved registers. Which registers are saved in step 1 or step 3 are determined by the calling convention. But the important thing to note here is that some are done in 1 and some are done in 3. Okay. - So, let's look at the same thing for an interrupt. We have function A on the left and an interrupt service routine on the right. And function A is just doing some local things. But the world says, oh, wait. Let's call in to our interrupt service routine. Perhaps the computer finished counting to 20. Or perhaps finished counting to 256. And so, now we've jumped into our interrupt service routine. Or perhaps we've received a byte on a network interface. But something in the world... something in a peripheral has determined that we need to service this interrupt. So, we go into the interrupt service routine. And then when it's done, we return back to wherever it was that we were interrupted. - All right. So, what is different for our calling convention for an interrupt service routine as opposed to a regular function? Well, the first thing is that we don't... we're not jumping into a function on step 2. We're jumping into an interrupt service routine. And the second thing is, because the function that's getting interrupted doesn't know that it's going to be interrupted, it can't perform step first... step 1 first. The function that's being interrupted doesn't know that it needs to save the working registers. So, we need to move step 1 down to step 2. And so, now the first... now the first thing we do inside our interrupt service routine needs to be save those working registers that would otherwise be saved by the caller. - Okay. So, I have some code and it's not working, and I think it should. And so, I'm making a bunch of changes. To try to get it to work or not work. And some of the things that make the bug appear and disappear are moving my interrupt service routine code into main. Well, this maybe is a clue, but doesn't provide a lot of info because we know ISR... or ISR is experimental. And also, just interrupts are a bit hard to identify what's going on. - We also, if I change an inline annotation to inline never, that makes the bug disappear. Now, this is curious. I've heard of there being compiler bugs related to inlining. But that means it's not my bug. It's not a bug in code I wrote. That implies it's a bug in the compiler. And it's never a bug in the compiler. So, that's very curious. Another thing that can make the bug disappear is adding a map error to unit before I unwrap. So, this is basically I built a conveyor belt and any errors coming down, I just throw into the trash.be it's important to note that this error never actually gets called. This map error never actually gets called. So, adding... I'm adding code that doesn't get called and that changes the observable behavior of the program. - All right. Let's dig into that just a bit more. So, here I have the... an example of my broken code. It's an interrupt. We call it PCINT0 because it's an interrupt on the pin change. If our pin changes value, it will jump into our interrupt. And so, online 3 we toggle our LED. And then online 4, we unwrap. Well, why is it that toggle returns a result? This goes back to the embedded HAL. Embedded HAL has a number of traits that different pieces of hardware can implement. And in this case, we're talking about an output pin trait and it returns a result because on some platforms, attempting to set an output pin can fail. But not on AVR. On AVR, the error type used for this trait is infallible, it's void. And so, this result can never be the error case. We know statically the result is never the error case. So, the unwrap never actually gets called. Or the error case of the unwrap never gets called. - All right. So, like you said earlier, we can make this work by mapping our error to unit. Now, it's not entirely clear why throwing away the original error and replacing it with an empty error would make this broken code work, particularly because I know statically that error case never actually happens. We can also replace that with a panic. If we explicitly panic, rather than panicking when we unwrap an error case, then it works. - So, I don't, myself, see a semantic different here between this panic version and the original broken version. So, I'm gonna say, I found a bug. Found a bug in the compiler. We need to minimize our reproduction of the bug. So, we're gonna remove all of our external references. We're gonna remove, of course, the crate that I was writing example code for. We're gonna remove any other crates that I make use of. I'm going to remove references to the core library where possible. Because I'm trying to eliminate anything non essential. - And finally, I'm gonna remove the memory shenanigans around my unsafe fat static mutable because I want to eliminate anything that could possibly distract from the bug. And I do this by copying my working version of the code to a file called A dash working. And the broken version to a file called A dash broken. And I make the same change to both of them and compile them and send them, and make sure that the working is working and the broken is broken. And make the incremental changes repeatedly until it looks like the Y case, X or Y, I get to Y and I have what I think is a minimal reproduction. And the minimal difference here looks a lot like my... my minimal difference before. But I've removed all of the core library code and the crates that are in use. And just by changing reference to... to an error to a reference to unit, I can make the bug appear and disappear. - And so, I've confirmed a bug. I have a minimal repro. So, I file a Rust issue. And I sit for a little bit. And nobody comments on it. I guess people have other things to do. So, I say I'm gonna go ahead and dig into this. Now, I'm vaguely aware that Rust uses LLVM. And I've done some messing around with LLVM in the past. So, I think, let's take a look at the LLVM IR that's generated for this code. - And so, I use this incantation, RUSTFLAGS = emit = LLVM ir. And the Rust compilerdutifully complies and emits LLVM. And it's not important to understand this in detail. But we can see what the broken code's difference is. And the main difference is that we have this one alloca. We have an alloca, and that makes the code fail. What is this alloca that we see? We're reserving space on the stack for the local variable. - So, we said previously, our stack has a stack frame for each function. And if that function has a local variable, we need to reserve space on the stack frame for that local variable. And that's what an LLVM alloca instruction does. But why is reserving space break my code? Break my example? - So, we need to keep digging. So, let's take a look at some assembler. An assembler is a textual representation of the machine code we were talking about earlier. It's as close as you can get to really understanding exactly what the machine sees without reading the binary itself. And the Rust compiler has a flag to emit assembler as well. And looks similar to the LLVM flag. We asks it to emit assembler, and the Rust compiler dutifully complies, and we get this code, which is significantly different from our broken code. That's a lot more... a lot more significant differences. - And they come in three main sections. So, this first section where we push some things on to the stack and do some in and out - And do with into exactly what this is doing a little bit later. And then a second and third section with pops and out. Let's walk through this. But first we need a little bit more information how to read this AVR assembler. So, the push statement, like I alluded to a moment ago, we're pushing a register value on to the stack. We take some value on the register and put it on the stack to save it for later. And when we want to get it back, we pop it. That takes it off of the stack and puts it back in our register. We have an in operation which will take a value from a... one of the special registers. And put it into a general purpose register. And we have an out command that will take a value from a special register... I apologize... take a value from a general register and put it into a special register. So, push and pop take register values to and from the stack. And in and out take register values to and from special registers for the purpose of this talk. They do other things too. - We can also clear a register. And we can disable interrupts with the CLI command. So, disabling interrupts, that tells the... tells the machine to not interrupt us so that we can run some code without being interrupted. And we also call that clearing the interrupt flag. And it's worth noting that on AVR that interrupt flag is in the status register we were talking about earlier. All right. One other important concept before we dig in here. We have a Prolog and epilogue for every function. And these bookend the body of the function. And they provide that calling convention that I described earlier. - And it's important that these fragments mirror each other because they tend to use the stack to implement their... the saving and restoring of registers. They need to mirror each other. Well, what does that exactly mean to mirror each other? Well, here we see the Prolog and the epilogue of our working code. All right. So, let's read this from the outside in. So, we're gonna start with a push and pop on register zero so if we're starting from the top of the function online 2, we push register zero on to our stack. - And then we push register 1 on to the stack. And then we're going to perform this sequence. So, 63 online 4, the constant 63 refers to the special register, the status register. So, we're going to read in the status register. And we're gonna push it on to the stack. So, now our stack has register zero's prior value, register 1's prior value and the status register's prior value. And then we push register 24 on to the stack. All right. The reason that 24 is down below and r0, r1 and status register are up above is because register zero and 1 are the caller saved registers. And the only reason we are saving them here is because we're in an interrupt. If we were in a regular function, this Prolog would start with line 7. - Okay. And then we go do lines 8 through 10, which are the body of the function. And then perform the epilogue. We pop 24 you are a of the stack. We pop the status value off of the stack and we use an out command to put it back in. Now our status special register has its prior value. And then we pop register 1 and register zero such that at the end of this interrupt we have now restored all of our registers... our special and general registers and we can return. All right. - What's different about the broken code? So, on the broken code, we have the same sequence at the start for the prologue. So, we can push register zero, 1, status register and 24. And then we have a few more callee saved registers that we'll push on to the stack. 25, 28, 29. Because our broken version of the interrupt service routine clobbers three additional registers. Okay. And then towards the end, we'll pop registers 29, 28 and 25 off the stack. - And finally, we have the epilogue from our working... where... from our working code. And this epilogue pops 24. It pops our status value, and it pops register 1 and zero. But note that sequence is interrupted by this other sequence. So, that's a bit mysterious. And we note that this is the sequence to adjust the frame pointer. So, before we walk through that, let's walk through the corresponding sequence from the prologue. - So, first we're going to read in from 61 and 62. And we're gonna store that on registers 28 29. We said previously we clobber 28 and 29. Here is where we do that. 61 and 62 are the addresses of the special registers for the frame pointer. So, we read in the frame pointer and we... so, this is what we're supposed to do here. So, we'll read in the frame pointer into register 28 and 29. And then we subtract 1 from it online 31. We subtract 1 from the frame pointer. And then we go ahead, and we send that updated version of the frame pointer. So, subtracting 1 is what's allocating space on the stack. And then on 16 and 18 we are going to put our updated version of the frame pointer into the frame pointer special register. - All right. We note that little sequence is interrupted by this section, 14, 15 and 17. And this is a miniature version of saving and storing the status register. So, on 14 we saved the status register into register zero. On 15, we clear interrupts so that we can perform our pushing of the frame pointer without being interrupted. And then on 17, we restore the status register, restoring the value of the interrupt flag. - Something curious to note that 17 is before 18 because you get an extra instruction free when you enable interrupts. All right. And then what is it supposed to do to restore the frame pointer in the epilogue? Well, we'll go ahead and add one to our updated frame pointer in registers 28 and 29. And that's going to... that's going to restore that value to the... to what it was prior to entering our interrupt service routine. - And then we'll output that value into our frame pointer. And at the end, our frame pointer will have its original value. And you can see, we have the same little status register and interrupt clearing. - All right. That's what it's supposed to do. But we note that we break symmetry here. In the prologue, we push into 28 and 29 and then we read into our... read in our frame pointer. In the epilogue, we pop from 28 and 29 and then we send out to our frame pointer. So, we have a push and an in followed by a pop and an out. But if this were symmetric, if this mirrored collectively, it should be push in, out pop. The epilogue is in the wrong order. Let's see what that actually means. So, what is it actually doing? Well, as we saw previously, first we push the 28 and 29 registers on to the stack. And we read in the frame pointer on lines 11 and 12. And subtract 1 from it and send that back out to the frame pointer. - So, our prologue is fine. But then in our epilogue, well, first on 22 and 23 we pop our values from registers 28 and 29. And then online 28 we add one to that value. And then on 31 and 32, we put that value into our frame pointer register. And we note that's not the prior value of the frame pointer register. That value is a completely unrelated value based on the previous value of some unrelated registers. So, we have now confirmed we have a bug in LLVM. So, I file an issue. And here is a screenshot of the issue in LLVM's bug repository. - And I sit on it for a bit. And I wonder, who's gonna fix it? Well, Hermes Conrad, one of my favorite characters from Futurama, if you want it, do it. - I'm going to breeze through. Don't get overwhelmed. This is C++ code, but mostly concerned about the comments. So, we see that we have special epilogue code with registers 1, zero and the status register. That sounds familiar. And then we see this early exit if there's no need to restore the frame pointer. And I recall, restoring the frame pointer, if we don't need to restore a frame pointer, the code works. If we do need to restore the frame pointer, the code doesn't work. So, this triggers something in my mind. - And then we see that we're gonna skip the callee save pop instructions and then insert our frame pointer restore code. All right. Let's match this up quickly to what we had... what we saw in our assembler. We saw our emit the special epilogue code here. And we see that's the same as this special epilogue, restoring the status register and register 1 and zero. And we see we restore the frame pointer by doing this arithmetic. And that matches up to our... the sequence we walked through a few minutes ago. - So, now that gets to this bit in the middle. This question of where do we insert the frame pointer restoration? Well, we're gonna do a loop. Here MBBI starts with the end of the function, and as long as we haven't reached the beginning of the function, we step backwards through the function. And we check. If the op code, if the instruction is a pop, then if the current instruction is a pop, then we continue. If it's not a pop, then we will break out of our loop. Well, what does it look like in our broken code? Here's the broken code before we insert our frame pointer restoration. And we start on 29. That's a pop. So, we keep going. We see 28, that's still a pop, keep going. We see 27. 27 is not a pop. So, we uncertainty our frame pointer restoration code there. - And that's what leads to our pops... or our frame pointer being restored later than it should. And we see lines 22 and 23 really need to be after the frame pointer restoration. So, I can go ahead and make a fix now that I've figured it out. It's actually quite straightforward once I worked through all of that. I pull out a function to restore the status register from this special epilogue code. And in the case of an early exit, restore the status register then. And otherwise, restore the status register at the very end. - And I contribute that to LLVM. First write the fix. But, oh, I probably need to write a test to make sure that it works. And before that, I need to compile LLVM which is itself the subject of perhaps a full talk. And then I submit the patch to LLVM. Here is a screenshot of the Phabricator interface that LLVM uses. And I... I get Dylan McKay fortunately had the time to review my patch and committed it. And so, I appreciate that. Thanks again, Dylan. - So, it's fixed! The bug has been fixed in LLVM and now I want to contribute it to Rust. So, rust keeps a fork of LLVM. So, we cherry pick the fix into that fork and then need to update the Rust compiler. And after a couple PRs get landed, finally the Rust bug has been fixed. - Hooray! All right. So, what are my next steps? There are several other outstanding AVR issues. Including as you can see several that relate to AVR interrupts. And now that I've worked through stepping through the assemble her that's generated and working through the code that generates that assembler, I feel a little bit of a responsibility to take a look at these bugs. I haven't had time to yet. But I hope to soon. - Well, that was a whirlwind. But thank you very much for listening and for your patience with my technical difficulties at the beginning. Hopefully we have a couple minutes to take a few questions if anyone would like to hear anything more. Thank you. - Inaki: Andrew. Thank you for that incredible talk which can only be called epic. - Andrew: Thanks. - Inaki: That was an epic. Wow. So, AVR maybe tier 3, but your patience man is like God tier. Not only all you've done, but also with like handling all the tech issues and talking just, you know, I can't believe... so, thank you so much for your patience, actually. We're just like, this is gravy for us. So, I do have a few questions. - First, AVR is quite a new target, right? - Andrew: That's right. - Inaki: How are you finding it, and have you tried things like ST 32 targets? - Andrew: I have. I've messed around a little bit with the other embedded targets. I haven't done the STM 32. For anyone in the audience not familiar, that's the target that the Rust embedded intro book talks through using a board called the discovery board. I have on my list of too many things to do I have the goal of picking up one of those discovery boards and working through that. But I haven't had the opportunity to do that. Yeah. I have done a little bit of arm development. ARM is another embedded platform. I've done a little bit with Rust. - But honestly, I've done very little Rust development at this point. Most of my embedded experience is with C. Programming with C I've never liked, and it's always been frustrating. I'm very grateful that the Rust embedded community is working so hard on making Rust... and the Rust compiler contributors are working so hard to make Rust a viable option for embedded. Because it's... there's a lot of potential there. - Inaki: Absolutely. Do you know of a good reference for AVR assembly? - Andrew: I... the AVR documentation generally is pretty good. Directly, it's often hard to find the right PDFs. But once you find them, they... they tend to be pretty solid. So, the AVR... AVR is actually a very limited platform. It's much more limited than, for instance, ARM. And so, the documentation... the assembler referenced is quite complete. So, if you search for the AVR assembler reference guide, the... and I could drop a link to it in the slides when I release those. But that guide is quite complete. - The other resource that I found to be incredibly helpful is a... is a forum called AVR Freaks. And this is a forum that a bunch of people who love programming for AVR answer all kinds of questions. So, almost any question that I have has already been answered on that platform... on that forum on one post or another. And so, you know, coming up with search results on AVR Freaks is fantastic. And then the third resource I would suggest is the AVR... the libc documentation for AVR also contains a lot of nuggets that are very useful for illuminating how things actually work on the AVR platform. - Inaki: Cool. This is an interesting question. Do you think that making types of a standard library less dependent on the global allocator would make your job easier in any way? - Andrew: Certainly. Yeah. That's a great point. Yeah. I have... I have not yet experimented with using the allocator. You can... so, I was previously in my talk I was talking about you don't have access to the standard library, but you have access to the core library. And then there's a middle ground there where you have the lib veloce that can give you access to collections like vector and HashMaps and so forth. And you can theoretically compile that for a context. I don't with that. I work on the ATtinys which are extremely limited and you... it's almost always worth doing analysis ahead of time to make sure that you don't run out of memory. And that analysis is significantly harder to do if you're using the heap. - So, my programs on embedded almost never do I think about reaching for the heap. Because it seems like it's going to be creating a lot more problems than it would solve. I think on other embedded devices it probably is more... it's much more relevant. Particularly, for instance, ARM obviously. But other more capable platforms I think using... using an allocator makes a lot of sense. I also... not related to AVR, but I feel like the reliance on a global allocator for the standard library means that other... other context... the other context I do a lot of my development in is high performance web development. And that's a place where being able to use, for instance, a slab allocator on a per object basis would be incredibly valuable. - But, again, I think that the, you know, I'm getting into the weeds for something not related to this talk. but I do see the user experience benefit of having it... having those standard libraries based on a global allocator probably outweighs the technical benefits for these niche use cases. - Inaki: Cool, cool. There are a few more questions, but we're really running out of time. So, so, maybe you could answer them in chat or later on. So, once again, thank you so much, Andrew, for that epic talk. - Andrew: Yeah, glad to. Thanks, everyone. Have a good day. - diff --git a/2020-global/talks/03_LATAM/07-Colton-Donnelly-published.md b/2020-global/talks/03_LATAM/07-Colton-Donnelly-published.md new file mode 100644 index 0000000..df32a0c --- /dev/null +++ b/2020-global/talks/03_LATAM/07-Colton-Donnelly-published.md @@ -0,0 +1,130 @@ +**Rust for Freshmen** + +**Bard:** +Colton Donelly takes Rust to school +to show freshmen the language is cool +and capable, fun +great to fail or to run +all in all it's a great teaching tool + + +**Colton:** +Hello, and welcome to RustFest Global 2020. Colton Donnelly and this is Rust for Freshmen. First, a little bit of background, I'm a fourth year student at Northeastern University studying computer science and business administration. I first started learning Rust in the spring of 2019 while I was taking computer systems, a course taught in C. After I heard about Rust memory guarantees, I decided to look into it. When�asked my instructors what they thought about it, my professor gave a very encouraging response. And so, I looked more into the language and started writing projects in it. I even joined the community and earlier this year I became an editor for this week in Rust. + +My goal here is to design a freshman level computer science course taught in Rust. I want to enable students with Rust's powerful tooling and ecosystem. Things that they can take advantage of for the rest of their career. Rust's error messages are better than any other language's error messages. They give students a great way to have feedback about their code and know exactly what's going wrong with it. + +For testing and documentation, Rust supports these as first class citizens. Which is great. Because students don't have to worry about adding extra dependencies or adding extra libraries in order to get their code working. I want to avoid teaching some out of scope topics. Stack and heap allocations and threads are things taught in later courses so they're not really in scope for this course. Lifetimes are difficult for every Rust to learn. So, students not having to learn about these gives them a lot of time to focus on the other parts of Rust that are even more important in these early days. Macros are great at reducing redundant code. But unfortunately, they hide a lot of implementation details. And so, I want to reduce the amount of macros that we use as much as possible. I also want to limit the use of cargo since dependencies and compiler configuration options are not really necessary for students first learning how to code. + +I'm going to be basing this course on fundamentals of computer science II. Affectionately called fundies2 by the students. It's taught in racket, and then 2 is taught in Java and introduces students to object oriented program. The instructors want students to walk away from the courses with good habits. Naming conventions, documentation and unit testing are important no matter what language you're using. But encapsulation is something that students haven't encountered before. And so, they must learn how to respect it in object oriented programming. + +Instructors don't introduce null yet, so I'm not really worried about that for now. Homework is typically submitted as library files. A testing library runs all of the students' unit tests while an animation library executes their code. + +The first thing you want to do when you introduce students to a new programming language is to teach them... teach them the very basics. There are numbers, booleans and strings. For numbers in Java, the instructors usually teach students to use int while double is used for floating point precision near terms. For integers, we're going to be using the i32 type. While we're using f32 for doubles. This is simply to keep a little bit of consistency even though double's exact translation would be an f64. For data types, we're going to use C style structs. And for implementation blocks, we're going to keep it classic with instructors and methods. We're going to allow students to use functions, something they can't currently do in Java since that provides a lot of great functionality for programs. We also have to introduce students to references since Rust is very dependent on manual path business reference. + +So, this is a great first step for teaching students how to read Rust programs. We have the dog struct which has fields that use all of the different basic data types that we talked about earlier. That the constructer which constructs the dog. And we also have the method available that accepts the dog by reference and returns a string. + +Now we want to introduce students to abstraction. In fundies2, this is done with Java interfaces. So, in Rust, we're going to use traits to introduce students to the idea of defining shared behavior. + +We now want to add the cat struct. So, in order to reduce the amount of inconsistent behavior between dog and cat, we're going to consolidate all of their behavior into a trait. We create the animal trait which has the sound method which dog will now implement instead of the bark method. That, of course, also implements this. + +The next thing the instructors usually introduce is Java abstract classes. Rust doesn't have a one to one equivalent of this, so, we're going to break it up into two different parts. Abstract classes allow... allow for defining default behavior instead of just shared behavior between types. So, in Rust, this is actually available to us within the trait... the trait types. We define the default speak implementation in the declaration for the animal trait. And, of course, down below you can see I had unit tests to make sure everything works as properly. Something that students will have to get used to. + +The next part of abstract classes is defining shared data. Again, Rust doesn't have a great way of... of handling this. And so, we have to implement our own object which will allow us to do this ourselves. So, we have the animal shared structs that we've now introduced that dog and cat can both contain instances of. This animal shared object contains name and weight as public data fields since dog and cat will be the only ones that will be able to access them anyways. This allows students to reduce the amount of redundant code they have and to have this common data. + +Next, we're gonna have dynamic dispatch to allow students to act on shared behavior between different types. As you can see, this is a very simple dynamic dispatch example. But it's enough to show it's possible. You have the tell animal to speak function which dynamically dispatches on an animal reference. And returns a string. Again, I checked to make sure that all of this is done properly with my unit tests. + +Next, we're gonna look at generics. Since they're... understand they allow programmers to contain different kinds of data in their types. I have a problem with this, though. Typically when the instructors introduce generic types, they introduce it by reimplementing the functional monad lists. We have the IList interface and the Cons and classes that implement the interfaces. This effectively serves as a linked list. This won't work in Rust. Rust actually has monads, and this is not the proper way to define them. So, we must obey Rust's rules for defining things that are similar. + +There's also the issue of recursive data definitions which you can see in the ConsList class. This is something that Rust doesn't allow and so, we must find a way around it. + +So, in my implementation, I came... I used an enum to define a list which now works as a... as a monad. The first and the rest for the Cons while empty doesn't have any data available. We have the Cons and Empty functions available to students since they're typically what they're used to coming from Racket. We also have the count method which is available which iterates over all of the items in the list and counts how many items there are. As you can see, we use a match expression. Something that I was not anticipating teaching to students. + +Racket does have match expressions, but this is something that wasn't introduced during fundies�1. And so, there's not really... and so, students aren't already familiar with this idea. So, this is another thing that we have to teach students how to use and how to use properly. + +We also must use heap allocation. As you can see, I get around the recurrence of data definition by using a shared pointer... a smart pointer for the rest of the list in my Cons. This is, again, something that I didn't want to teach students. And so, this is something that we'll have to find a work around for later. + +Next, we want to introduce function objects to add to existing behavior of types. With list, this is easy to demonstrate. We have the filter method now which accepts a predicate which as you can see at the very top of the file is a type alias to a function that accepts a reference and returns a boolean. And the filter method, we have the same thing as count, except first we check to see if the predicate is satisfied by the first in the list. + +Next, we're gonna look at error handling. This is the next part of the instructor's plan and it's a great way to introduce students to the results type of Rust. Currently, the instructors use Java Exceptions to teach students how to properly handle errors. This, of course, is one of the things that Rust decided not to do and was one of the decisions that created a lot of difference between Rust and a lot of other low level languages. + +Instructors teach students how to throw exceptions, when to throw exceptions, which exceptions to use, and how to catch the exceptions. With Rust, of course, all of this goes away. You really only have to act on the results type which you can use by checking whether or not it's okay or an error. And then returning the error if needed. It also has the future advantage of giving students a better understanding of the monadic style of Rust. So, that they can better understand how to use the option type later on when we decide to introduce non existent data. So, as you can see, we have the animal shared implementation from earlier and we also have the dog implementation from earlier. For both of these constructers, we changed the results type. Yeah, we changed the result type to be a result where the okay is an animal shared object while the error is the string for the animal shared constructer. + +For the dog constructer, it's nearly identical. It returns a dog in the okay case and a string in the error case. And the animal shared implementation, we have to check to make sure that the weight is non negative. Animals simply cannot weigh negative weight. While in dog, we check to see if the animal shared constructer returned an okay. If it was an error, we returned the error. We also checked to see that the age in dog years of the dog is valid and we returned the error if not. Students will use this to have a great understanding of how to properly handle errors. And since Rust doesn't really allow any work arounds for error handling, students always must handle this. Which gives a great habit for students as they move on in their career. + +Lastly, we're gonna look at sameness of data. In the course, students are taught that data is the same when it's reflexive, symmetric and transitive. This is what an example might look like. We have the abstract animal class. And this has the same dog and same cat methods which are gonna be used to check to make sure that... that instances of those type are the same. In the dog class, we have the same animal method which is inherited from the animal interface. And you see that we have the instance of keyword being used. This is a problem. This simply does not have... have an equivalent in Rust. You cannot check the type of an object. In fact, Rust erases the type of an object when you upcast it into a trait object. So, this part simply does not exist. Which could be a problem. + +Except I'm not sure it is. It could in fact be an advantage and it could help students to build better habits. Programmers should only be able to use what they need. And they should be... and they should be avoiding downcasting for specific behavior whenever possible. In fact, they should only be checking equality between two objects when they know those go objects are of the same type. This introduces a lot of good habits to students and ensures that they walk away with a lot of great skills and understanding of how to properly handle the sameness of data. Now we're gonna try to look at how to fix our problems. Earlier we talked about the monadic behavior of Rust. I didn't want to introduce the enum since this isn't even introduced for Java in fundies�2. But if we wanted to talk about monadic implementations of Rust, this is what we'll have to introduce. + +We also looked at the same said of trait objects. It's not exactly clear whether or not that's a good or a bad thing. Lastly, we... some of the recursive definitions. And the only way to work around that easily was to introduce heap allocation. Something that I did not want to introduce into the course. How can we fix this? I think with a simple crate that looks like this. + +But this crate has its own problem. It has too much magic in and of itself. As of right now, in fundies�2, students don't have to depend on external libraries for anything except for the animations and for unit testing. There are also some honorable mentions. Some topics that I didn't get to cover in this presentation such as loops, mutability and algorithms. These shouldn't be too hard to replicate in Rust so I'm gonna leave that alone. + +Another problem that I didn't mention was the animation library. Currently Rust has an immature GUI ecosystem. Which means that there's got to be a lot of work done before we have the ability to have an animation library sufficient enough for the course. There is the possibility of using WebAssembly and having students run their Rust code in the browser. This, of course, is very flexible and could actually be something that could be used in a course like this. There's also the problem of cargo. Right now cargo is the easiest way to run Rust code. + +Without cargo, you don't really have anything else that you can use that students would be able to pick up quickly. You could use the rustc CLI, but that has a lot of extra stuff in and it ultimately is too complicated for students to learn how to pick up. So, there might be room for a custom tool to be used which would run all of the compilation and testing for students without them having to learn anything extra. + +And lastly, there are some topics that I'm leaving out for future classes such as lifetimes. Lifetimes could be introduced in the next course. While threads are definitely introduced in a later course. Unsafe is also something that students should learn how to use in Rust, but they should only learn how to use it properly. I think this would be best used alongside the thread lessons. + +So, is this course viable? Ultimately, it has a lot of advantages and a lot of disadvantages. For advantages, it forces a lot of good habits within students simply by using Rust as the language of choice. And all of the habits that students learn with Rust are applicable to other programs languages. Giving them a lot of skills that they can use throughout the industry. + +Furthermore, future concepts will enforce good habits within students who also have the option type, lifetimes and mutability and�thread safety and all of these things which are included in Rust either in the standard library or in the core library or as a language feature which students will be able to take advantage of. + +And lastly, a helpful compiler gives students a lot of great feedback about what exactly is going wrong in their programs. Unfortunately, there are still some disadvantages to this course. A lot of work must be done to make sure that the student experience is still great. The animation library, for example, again, must have a lot of work in order to make sure that it runs as efficiently and smoothly as the Java implementation runs. We also have to figure out whether or not we're gonna replace Cargo since students don't really need to learn how to manage dependencies. But they will still have to use the animation library to run their code. + +Enums are also unavoidable. Simply because Rust does it differently. Enums in other object oriented programming languages don't have the same kind of enum values as Rust. For example, you can't have one enum value with an integer data field and another one with a boolean data field. Lastly, right now fundies�2 takes advantage of Java's pass by reference by default in order to avoid having to teach students how to properly manage scoping of variables. This is something that we must work around in Rust. But, again, this is a good habit that students must be able to develop anyways. So, this could actually be another advantage. + +Ultimately, this was a great way to think about how an introductory... introductory computer science course written... taught in Rust might look like. Thank you, guys, so much for reasoning. And I really appreciate you being here. + + +**Inaki:** +Excellent. All right. Very interesting. We've also often considered here in Latin America starting... actually, there are several universities teaching Rust in several courses. But none in basic courses. So, for example, like you were saying, concurrency threading should be in advanced courses. They are taught that in advanced courses, but not as a first programming language. So, a lot to think about. + +**Colton:** +Yeah. I do think that there's a lot about that kind of stuff that's a little bit more complicated for students to learn, especially in their first year of learning computer science. And the instructors do a great job of making sure they only learn that later on in their education. I think that's a good plan. + +**Inaki:** +Right. So, one of our questions is, would Rust be a better put for such a course? Or do you think we should introduce the things first through a more basic language like Go to ease into resource management? + +**Colton:** +I mean, I also write Go and I think it's a great language. Ultimately, there are some issues with Go still. You have issues with the linter making sure people check their error returns. But you still have the possibility of if you don't have the linter, then you may not be checking to make sure that error is nil. And so, with Rust monadic option type and result type, I think that that's a great way to kind of make sure that students are always building that habit. And they don't really have any work arounds. + +**Inaki:** +Right. Another suggestion/question was, why not teach students how memory actually works from the start? Why shield them from the complications and not understand the difference between stack and key, for example? + +**Colton:** +So, right now in the curriculum for the computer science students, that's just simply how it is. You don't really learn the ideas of stack and heap until computer systems, the course that I was talking about. And I think that that's good. Especially when you get into like threading and posing style programming and manual stack versus heap. Right now the introductory courses are taught in Racket and Java and you don't really have to work about the stack versus the heap in there. So, I think that it's good to still wait until later to worry about stack versus heap. +But unfortunately, because of Rust's still manual memory management, that's got to be something that has to be worked around. + +**Inaki:** +Right. Not to put my own commentary. But I guess it's the difference in two schools. One is teaching programming from the machine up and the other from programming concepts down. + +**Colton:** +Right. That's exactly the approach that the instructors at my school take. They want to make sure that the students have the fundamental ideas of computer science first. Implementing algorithms. Making sure that their code works properly, all of their logic and then eventually they get deeper into the machine level ideas. + +**Inaki:** +Cool. What missing teaches... Rust feature... do you think would most improve teachability? + +**Colton:** +It's the manual... manual memory management I think that's the biggest problem. So, I think that is what ultimately hurts the teachability of Rust from the start. You can get around lifetimes, you can get around reference passing. But having to have, for example, like I showed the recursive data definitions. Having those is so hard to avoid teaching about stack versus the heap. When right now, again, that's taught later in the curriculum. +So, I think that that's the part that hurts the teachability most. + +**Inaki:** +Right. Memory, can't live with it, can't live without it. + +**Colton:** +Seriously. + +**Inaki:** +All right. Well, Colton, thank you so much for your talk. + +**Colton:** +Thank you so much. I really appreciated being here. + +**Inaki:** +Likewise. And if you want to answer more questions, you can hang around in the chat. + +**Colton:** +All right. Sounds good. Thank you, guys. + +**Inaki:** +Until later! diff --git a/2020-global/talks/03_LATAM/07-Colton-Donnelly.txt b/2020-global/talks/03_LATAM/07-Colton-Donnelly.txt deleted file mode 100644 index fd95636..0000000 --- a/2020-global/talks/03_LATAM/07-Colton-Donnelly.txt +++ /dev/null @@ -1,86 +0,0 @@ - Rust for Freshmen - Colton Donnelly - - Inaki: Welcome everyone to the final talk ifthis block and for RustFest Global 2020. So, I have now the distinct pleasure to introduce Colton Donnelly who is a fourth year student at Northeastern University in Boston, Massachusetts, USA. He tried Rust after taking a computer systems course and quickly fell in love, obviously. Becoming involved with the community and eventually joining the This Week in Rust team as an editor. He will have an introductory computer course and compare the points of using Rust against the points against. And what an intro computer science course in Rust might look like. - >> Colton Donnelly takes Rust to school to show freshmen the language is cool and capable, fun, great to fail or to run. All in all, it's a great teaching tool! - Colton: Hello, and welcome to RustFest Global 2020. Colton Donnelly and this is Rust for Freshmen. First, a little bit of background, I'm a fourth year student at Northeastern University studying computer science and business administration. I first started learning Rust in the spring of 2019 while I was taking computer systems, a course taught in C. After I heard about Rust memory guarantees, I decided to look into it. Whenasked my instructors what they thought about it, my professor gave a very encouraging response. And so, I looked more into the language and started writing projects in it. I even joined the community and earlier this year I became an editor for this week in Rust. - My goal here is to design a freshman level computer science course taught in Rust. I want to enable students with Rust's powerful tooling and ecosystem. Things that they can take advantage of for the rest of their career. Rust's error messages are better than any other language's error messages. They give students a great way to have feedback about their code and know exactly what's going wrong with it. - For testing and documentation, Rust supports these as first class citizens. Which is great. Because students don't have to worry about adding extra dependencies or adding extra libraries in order to get their code working. I want to avoid teaching some out of scope topics. Stack and heap allocations and threads are things taught in later courses so they're not really in scope for this course. Lifetimes are difficult for every Rust to learn. So, students not having to learn about these gives them a lot of time to focus on the other parts of Rust that are even more important in these early days. Macros are great at reducing redundant code. But unfortunately, they hide a lot of implementation details. And so, I want to reduce the amount of macros that we use as much as possible. I also want to limit the use of cargo since dependencies and compiler configuration options are not really necessary for students first learning how to code. - I'm going to be basing this course on fundamentals of computer science II. Affectionately called fundies2 by the students. It's taught in racket, and then 2 is taught in Java and introduces students to object oriented program. The instructors want students to walk away from the courses with good habits. Naming conventions, documentation and unit testing are important no matter what language you're using. But encapsulation is something that students haven't encountered before. And so, they must learn how to respect it in object oriented programming. - Instructors don't introduce null yet, so I'm not really worried about that for now. Homework is typically submitted as library files. A testing library runs all of the students' unit tests while an animation library executes their code. - The first thing you want to do when you introduce students to a new programming language is to teach them... teach them the very basics. There are numbers, booleans and strings. For numbers in Java, the instructors usually teach students to use int while double is used for floating point precision near terms. For integers, we're going to be using the i32 type. While we're using f32 for doubles. This is simply to keep a little bit of consistency even though double's exact translation would be an f64. For data types, we're going to use C style structs. And for implementation blocks, we're going to keep it classic with instructors and methods. We're going to allow students to use functions, something they can't currently do in Java since that provides a lot of great functionality for programs. We also have to introduce students to references since Rust is very dependent on manual path business reference. - So, this is a great first step for teaching students how to read Rust programs. We have the dog struct which has fields that use all of the different basic data types that we talked about earlier. That the constructer which constructs the dog. And we also have the method available that accepts the dog by reference and returns a string. - Now we want to introduce students to abstraction. In fundies2, this is done with Java interfaces. So, in Rust, we're going to use traits to introduce students to the idea of defining shared behavior. - We now want to add the cat struct. So, in order to reduce the amount of inconsistent behavior between dog and cat, we're going to consolidate all of their behavior into a trait. We create the animal trait which has the sound method which dog will now implement instead of the bark method. That, of course, also implements this. - The next thing the instructors usually introduce is Java abstract classes. Rust doesn't have a one to one equivalent of this, so, we're going to break it up into two different parts. Abstract classes allow... allow for defining default behavior instead of just shared behavior between types. So, in Rust, this is actually available to us within the trait... the trait types. We define the default speak implementation in the declaration for the animal trait. And, of course, down below you can see I had unit tests to make sure everything works as properly. Something that students will have to get used to. - The next part of abstract classes is defining shared data. Again, Rust doesn't have a great way of... of handling this. And so, we have to implement our own object which will allow us to do this ourselves. So, we have the animal shared structs that we've now introduced that dog and cat can both contain instances of. This animal shared object contains name and weight as public data fields since dog and cat will be the only ones that will be able to access them anyways. This allows students to reduce the amount of redundant code they have and to have this common data. - Next, we're gonna have dynamic dispatch to allow students to act on shared behavior between different types. As you can see, this is a very simple dynamic dispatch example. But it's enough to show it's possible. You have the tell animal to speak function which dynamically dispatches on an animal reference. And returns a string. Again, I checked to make sure that all of this is done properly with my unit tests. - Next, we're gonna look at generics. Since they're... understand they allow programmers to contain different kinds of data in their types. I have a problem with this, though. Typically when the instructors introduce generic types, they introduce it by reimplementing the functional monad lists. We have the IList interface and the Cons and classes that implement the interfaces. This effectively serves as a linked list. This won't work in Rust. Rust actually has monads, and this is not the proper way to define them. So, we must obey Rust's rules for defining things that are similar. - There's also the issue of recursive data definitions which you can see in the ConsList class. This is something that Rust doesn't allow and so, we must find a way around it. - So, in my implementation, I came... I used an enum to define a list which now works as a... as a monad. The first and the rest for the Cons while empty doesn't have any data available. We have the Cons and Empty functions available to students since they're typically what they're used to coming from Racket. We also have the count method which is available which iterates over all of the items in the list and counts how many items there are. As you can see, we use a match expression. Something that I was not anticipating teaching to students. - Racket does have match expressions, but this is something that wasn't introduced during fundies1. And so, there's not really... and so, students aren't already familiar with this idea. So, this is another thing that we have to teach students how to use and how to use properly. - We also must use heap allocation. As you can see, I get around the recurrence of data definition by using a shared pointer... a smart pointer for the rest of the list in my Cons. This is, again, something that I didn't want to teach students. And so, this is something that we'll have to find a work around for later. - Next, we want to introduce function objects to add to existing behavior of types. With list, this is easy to demonstrate. We have the filter method now which accepts a predicate which as you can see at the very top of the file is a type alias to a function that accepts a reference and returns a boolean. And the filter method, we have the same thing as count, except first we check to see if the predicate is satisfied by the first in the list. - Next, we're gonna look at error handling. This is the next part of the instructor's plan and it's a great way to introduce students to the results type of Rust. Currently, the instructors use Java Exceptions to teach students how to properly handle errors. This, of course, is one of the things that Rust decided not to do and was one of the decisions that created a lot of difference between Rust and a lot of other low level languages. - Instructors teach students how to throw exceptions, when to throw exceptions, which exceptions to use, and how to catch the exceptions. With Rust, of course, all of this goes away. You really only have to act on the results type which you can use by checking whether or not it's okay or an error. And then returning the error if needed. It also has the future advantage of giving students a better understanding of the monadic style of Rust. So, that they can better understand how to use the option type later on when we decide to introduce non existent data. So, as you can see, we have the animal shared implementation from earlier and we also have the dog implementation from earlier. For both of these constructers, we changed the results type. Yeah, we changed the result type to be a result where the okay is an animal shared object while the error is the string for the animal shared constructer. - For the dog constructer, it's nearly identical. It returns a dog in the okay case and a string in the error case. And the animal shared implementation, we have to check to make sure that the weight is non negative. Animals simply cannot weigh negative weight. While in dog, we check to see if the animal shared constructer returned an okay. If it was an error, we returned the error. We also checked to see that the age in dog years of the dog is valid and we returned the error if not. Students will use this to have a great understanding of how to properly handle errors. And since Rust doesn't really allow any work arounds for error handling, students always must handle this. Which gives a great habit for students as they move on in their career. - Lastly, we're gonna look at sameness of data. In the course, students are taught that data is the same when it's reflexive, symmetric and transitive. This is what an example might look like. We have the abstract animal class. And this has the same dog and same cat methods which are gonna be used to check to make sure that... that instances of those type are the same. In the dog class, we have the same animal method which is inherited from the animal interface. And you see that we have the instance of keyword being used. This is a problem. This simply does not have... have an equivalent in Rust. You cannot check the type of an object. In fact, Rust erases the type of an object when you upcast it into a trait object. So, this part simply does not exist. Which could be a problem. - Except I'm not sure it is. It could in fact be an advantage and it could help students to build better habits. Programmers should only be able to use what they need. And they should be... and they should be avoiding downcasting for specific behavior whenever possible. In fact, they should only be checking equality between two objects when they know those go objects are of the same type. This introduces a lot of good habits to students and ensures that they walk away with a lot of great skills and understanding of how to properly handle the sameness of data. Now we're gonna try to look at how to fix our problems. Earlier we talked about the monadic behavior of Rust. I didn't want to introduce the enum since this isn't even introduced for Java in fundies2. But if we wanted to talk about monadic implementations of Rust, this is what we'll have to introduce. - We also looked at the same said of trait objects. It's not exactly clear whether or not that's a good or a bad thing. Lastly, we... some of the recursive definitions. And the only way to work around that easily was to introduce heap allocation. Something that I did not want to introduce into the course. How can we fix this? I think with a simple crate that looks like this. - But this crate has its own problem. It has too much magic in and of itself. As of right now, in fundies2, students don't have to depend on external libraries for anything except for the animations and for unit testing. There are also some honorable mentions. Some topics that I didn't get to cover in this presentation such as loops, mutability and algorithms. These shouldn't be too hard to replicate in Rust so I'm gonna leave that alone. - Another problem that I didn't mention was the animation library. Currently Rust has an immature GUI ecosystem. Which means that there's got to be a lot of work done before we have the ability to have an animation library sufficient enough for the course. There is the possibility of using WebAssembly and having students run their Rust code in the browser. This, of course, is very flexible and could actually be something that could be used in a course like this. There's also the problem of cargo. Right now cargo is the easiest way to run Rust code. - Without cargo, you don't really have anything else that you can use that students would be able to pick up quickly. You could use the rustc CLI, but that has a lot of extra stuff in and it ultimately is too complicated for students to learn how to pick up. So, there might be room for a custom tool to be used which would run all of the compilation and testing for students without them having to learn anything extra. - And lastly, there are some topics that I'm leaving out for future classes such as lifetimes. Lifetimes could be introduced in the next course. While threads are definitely introduced in a later course. Unsafe is also something that students should learn how to use in Rust, but they should only learn how to use it properly. I think this would be best used alongside the thread lessons. - So, is this course viable? Ultimately, it has a lot of advantages and a lot of disadvantages. For advantages, it forces a lot of good habits within students simply by using Rust as the language of choice. And all of the habits that students learn with Rust are applicable to other programs languages. Giving them a lot of skills that they can use throughout the industry. - Furthermore, future concepts will enforce good habits within students who also have the option type, lifetimes and mutability andthread safety and all of these things which are included in Rust either in the standard library or in the core library or as a language feature which students will be able to take advantage of. - And lastly, a helpful compiler gives students a lot of great feedback about what exactly is going wrong in their programs. Unfortunately, there are still some disadvantages to this course. A lot of work must be done to make sure that the student experience is still great. The animation library, for example, again, must have a lot of work in order to make sure that it runs as efficiently and smoothly as the Java implementation runs. We also have to figure out whether or not we're gonna replace Cargo since students don't really need to learn how to manage dependencies. But they will still have to use the animation library to run their code. - Enums are also unavoidable. Simply because Rust does it differently. Enums in other object oriented programming languages don't have the same kind of enum values as Rust. For example, you can't have one enum value with an integer data field and another one with a boolean data field. Lastly, right now fundies2 takes advantage of Java's pass by reference by default in order to avoid having to teach students how to properly manage scoping of variables. This is something that we must work around in Rust. But, again, this is a good habit that students must be able to develop anyways. So, this could actually be another advantage. - Ultimately, this was a great way to think about how an introductory... introductory computer science course written... taught in Rust might look like. Thank you, guys, so much for reasoning. And I really appreciate you being here. - Inaki: Excellent. All right. Very interesting. We've also often considered here in Latin America starting... actually, there are several universities teaching Rust in several courses. But none in basic courses. So, for example, like you were saying, concurrency threading should be in advanced courses. They are taught that in advanced courses, but not as a first programming language. So, a lot to think about. - Colton: Yeah. I do think that there's a lot about that kind of stuff that's a little bit more complicated for students to learn, especially in their first year of learning computer science. And the instructors do a great job of making sure they only learn that later on in their education. I think that's a good plan. - Inaki: Right. So, one of our questions is, would Rust be a better put for such a course? Or do you think we should introduce the things first through a more basic language like Go to ease into resource management? - Colton: I mean, I also write Go and I think it's a great language. Ultimately, there are some issues with Go still. You have issues with the linter making sure people check their error returns. But you still have the possibility of if you don't have the linter, then you may not be checking to make sure that error is nil. And so, with Rust monadic option type and result type, I think that that's a great way to kind of make sure that students are always building that habit. And they don't really have any work arounds. - Inaki: Right. Another suggestion/question was, why not teach students how memory actually works from the start? Why shield them from the complications and not understand the difference between stack and key, for example? - Colton: So, right now in the curriculum for the computer science students, that's just simply how it is. You don't really learn the ideas of stack and heap until computer systems, the course that I was talking about. And I think that that's good. Especially when you get into like threading and posing style programming and manual stack versus heap. Right now the introductory courses are taught in Racket and Java and you don't really have to work about the stack versus the heap in there. So, I think that it's good to still wait until later to worry about stack versus heap. - But unfortunately, because of Rust's still manual memory management, that's got to be something that has to be worked around. - Inaki: Right. Not to put my own commentary. But I guess it's the difference in two schools. One is teaching programming from the machine up and the other from programming concepts down. - Colton: Right. That's exactly the approach that the instructors at my school take. They want to make sure that the students have the fundamental ideas of computer science first. Implementing algorithms. Making sure that their code works properly, all of their logic and then eventually they get deeper into the machine level ideas. - Inaki: Cool. What missing teaches... Rust feature... do you think would most improve teachability? - Colton: It's the manual... manual memory management I think that's the biggest problem. So, I think that is what ultimately hurts the teachability of Rust from the start. You can get around lifetimes, you can get around reference passing. But having to have, for example, like I showed the recursive data definitions. Having those is so hard to avoid teaching about stack versus the heap. When right now, again, that's taught later in the curriculum. - So, I think that that's the part that hurts the teachability most. - Inaki: Right. Memory, can't live with it, can't live without it. - Colton: Seriously. - Inaki: All right. Well, Colton, thank you so much for your talk. - Colton: Thank you so much. I really appreciated being here. - Inaki: Likewise. And if you want to answer more questions, you can hang around in the chat. - Colton: All right. Sounds good. Thank you, guys. - Inaki: Until later! - >> Hello. We have done it. All the talks. Three days! Not three days. But it feels like three days. - Inaki: Feels like three days. That was a lot. - Stefan: I'm very happy with the result we got. And as is tradition with RustFest events, we have some final slides. But don't worry, since Linalab && !ME is queuing up as we speak, keep these brief. Also, Inaki, if you see that I missed something, just interrupt me. - So, first of all, huge thanks to the APAC team. I'm not sure if anyone is online currently. But it's about their morning again. Just still baffling. Thank you, again, UTC team which I am part of. I just see... oh, we have so many chats. Maybe Tomohide will join from Japan. - Inaki: In a normal RustFest we would all get on stage for a picture. - Stefan: Absolutely. And do a family photo, usually. Most of us remember these from high above into the whole crowd. - Inaki: Exactly. - Stefan: But maybe some other time. Maybe if technology advances some more. - Inaki: Maybe if we advance some more. - Stefan: Yeah, it's becoming open source, I've heard. Anyhow. Huge thanks to the LATAM team. You have seen, well... what have you seen? Oh, my brain. It's over. We have some Inaki and Yuli doing the emceeing. - Inaki: Tomas has been a great moderator. - Stefan: Thank you so Rafaela who had the how to join a chat room. - Inaki: A big shout out to our captioners, our sketch note artists, our speakers, all the artists, all the attendees. - Stefan: And one last time to all our sponsors. Yeah. Shall we read them down again? - Inaki: Yes. And before we do that, just one slight reminder. This is not for any particular sponsor. But bear in mind in that these companies have been kind enough to take interest in a Rust conference and many are looking for Rust hires. So, or at least you can, you know, look them up and see what's up. -Stefan: All right. So, we have Coil, Embark, Partiy, MUX, Mozilla, Centricular, OpenSuse, Mullvad VPN, OBB, Red Sift, TerminusDB, Nervos, TrueLayer, Tweede Golf, Technolution, IOmentum, Traverse Research, and Ferrous Systems. Thank you very much. And thanking the people who gave us money. Thank you to everybody who bought a ticket to the conference. It helps us a lot and gives us the flexibility to pay for all the infrastructure that we need. And Tomohide has joined. Hello. - Tomohide: Hello. - Stefan: Would you like to say something? How was the first block? - Tomohide: So, thank you for the day. Yeah. - Stefan: Cool. - Tomohide: Okay. Yeah. So, that is everything. - Stefan: Yeah, I'm very tired. But I'm also very happy. - Tomohide: Yeah. Me too. - Stefan: All right. So, now the anticipation is growing, right? - Inaki: Exactly. Our final artist. For our closing act we would like you all to welcome Linalab and ME, L they are two artists that crossed paths to look at the analogic side of music. Sound and visual landscapes, getting the farthest away from digital methods and generating in real time. Giving the show an improvisation character. This should blow your minds. - Stefan: If you are curious what we do next year, email us. This is the... the teams have different addresses. Just text us and we can make it happen on any continent. Stay tuned, I think switching the stream takes about 5 minutes. That's the perfect time to get yourself a glass of water. - Inaki: Until next time. - Stefan: Until next time, bye! - Tomohide: Thank you.