Hi, TheMLGuy here, today I want to share in my opinion 5 of the most cool and prospecitve applications of AI in real Products and Technologies!
Topics for today:
- Github Copilot
- Netflix Recommendation System
- OpenAI 5
Proteins are molecules made up of a sequence of multiple amino acids combined together into long chains. They are operating the cells in the living beings, being essential for the structure, functionality and harmonisation of bodies’ tissues and organs.
The scientific world knows currently over 200 million proteins, numerous new ones being discovered every year, each of them having different 3D shapes that controls their functionality, but determining accurately the protein’s structure by its sequence of amino acids is still a challenge for the scientists.
AlphaFold is an AI system designed to predict with atomic precision proteins’ shapes, turning them into 3D models. It has a great scientific perspective to facilitate protein-folding (the determination of protein’s chain 3D structure, specifically its “folded” form that is biologically functional), this process being proven as long-lasting and very expensive.
The AlphaFold Protein Structure Database brings together decades of arduous work done by scientists to determine the proteins’ structures in the traditional way. The database created in partnership with EMBL’s European Bioinformatics Institute can be accessed for free. It covers over 200 million structures, which includes almost all catalogued proteins known to the scientific world. There can be found proteins present in the human body, beings such as fruit flies and mice that are usually used in biological research with the purpose of discovering and developing medicine.
This AI system has the potential to predict the structures of thousands and millions of unknown proteins, speeding up the development of new medicine, curing diseases and expanding the knowledge of scientists from all over the world.
AlphaZero is a self-taught system that manages to master such games as chess, shogi and Go, beating world-champions and professional players in each one. Surprisingly, it has no built-in knowledge about strategies, its training begins with a random play that follows the basic game rules, but it learns through each game in order to develop itself into an excellent player.
Unlike the traditional chess engines such as the world computer chess champion “Stockfish”, “IBM’s ground-breaking Deep Blue” and shogi programs which rely on multiple rules and guess principles custom-built by human players, Alpha Zero uses a deep neural network and general purpose algorithms that only imply fundamental rules of the games.
In order to master each game, the untrained neural network has to play against itself a great number of times through a trial and error process known as reinforcement learning (training method that learns the optimal behaviour in an environment to maximise the result). The trained network is used to direct the Monte-Carlo Tree Search - an algorithm that determines the best move from a collection of moves by selecting, expanding, simulating, and updating the nodes in the tree to get the final solution.
Required training time of AlphaZero depends on how sophisticated the games are. It takes around 9 hours to train for chess, 12 hours for shogi and 13 days for Go.
AlphaZero showed great performance winning 115 chess games out of 1000 against world champion Stockfish, losing only 6 times. Also, it defeated the “2017 CSA world champion version of Elmo” winning 91.2% of shogi games. In Go it has a 61% win rate against AlphaGo Zero.
One of the most captivating things in AlphaZero is its playing style recognized by professional players as Garry Kaspatov(chess) and Yoshiharu Habu(shogi) who referred to it as dynamic and unique, showing new possibilities for the game.
The purpose of the AlphaZero is beyond playing games, it lights up the potential of AI to solve real-life problems by demonstrating that an intelligent system can learn and find new knowledge in a range of settings.
Developing an algorithm able to master specific skills with a high level of standard is still a challenge for AI researchers, because due to slight modifications of the tasks the system may fail. In spite of everything the progress made so far is encouraging, making the specialists pursue their desire of building general purpose learning systems that could help in finding solutions for complex scientific problems. AlphaFold that will be spotlighted next is one of the implemented systems meant to solve a scientific problem.
Github Copilot is an artificial intelligence tool that uses the OpenAI Codex in order to assist users of different code editors by autocompleting individual lines and whole functions.
It is trained on billions lines of publicly available source code, turning natural language prompts into coding recommendations in dozens of programming languages, aiming to facilitate the process of writing boilerplate and repetitive patterns.
To master GitHub Copilot is sufficient to write a comment describing the logic that needs to be implemented in your program and it will suggest to you the solution that could be accepted, rejected or edited. Also, it could help with mastering new languages and frameworks, reducing the time spent wandering through the internet and documentation.
This tool shows its best performance when the code is divided into small functions, using meaningful names for parameters and writing good docstrings and comments.
More details regarding the features of GitHub Copilot can be found accessing the link: https://github.com/features/copilot.
Netflix’s Recommendation System
Netflix is a well known streaming platform for movies and tv shows that takes its start in the late 1990s as a subscription-based model, posting DVDs to people’s homes in the US and turning in 2020 into a global streaming service with over 182 million subscribers.
An important feature of this platform is the recommendation system that ensures about 80% of its stream time, aiming to enhance user experience that will increase retention rate. Although the specifics of each model's architecture are not stated, Netflix uses a number of rankers that are listed below:
- Personalised Video Ranking(PVR) - a general purpose algorithm that collects catalogue’s data based on a specific criteria(genres, TV Shows based on their country of production etc.), in combination with other features such as popularity;
- Top-N Video Ranker - it looks at the head of the catalogue rankings giving the recommendations based on the top rank;
- Trending Now Ranker - represents a system that detects temporal trends that are good indicators of future interests that may last for a couple of minutes to a few days (Halloween, Christmas Holidays, Valentine’s Day etc.), being a powerful predictor;
- Continue Watching Ranker - the algorithm that checks the elements that user has accessed, but didn’t complete, for example movies that have not been watched to the end or TV Series with not all episodes viewed.
- Video-Video Similarity Ranker - uses an item-item similarity matrix, returning the most similar elements to the watched ones from the catalogue.
The mentioned algorithms undergo the row generation process by finding candidates, evidence selection (that the candidate corresponds to the criteria), filter & deduplication, rank checking, formatting and finally choosing the needed candidates. After this process starts the page generation that determines what content will be shown to the user.
Netflix desires to give to its users the best experience hence performing an accurate prediction of what they would like to watch in the current session, without omitting the videos that were played only halfway, providing at the same time something fresh and stabilising the page for long-time users that have their own manner of navigating through the page. In order to reach all of these Netflix came up with different approaches:
- Row-based - assesses each row and ranks them based on scores using existing recommendation or learning-to-rank approaches. The imperfection of this method is its lack of diversity.
- Stage-wise - improves the mentioned approach, firstly selecting the rows sequentially and then recomputing the next ones, taking into consideration the relationships to the precedent rows and their elements.
- Machine Learning - this approach trains a model using historical data in order to construct a scoring function. It is based on Netflix’s created homepages for its users which includes the information about the items present on their page, interactions and played videos.
There are multiple features and ways for representing a specific row in the homepage for an algorithm, but the main goal is to generate pages that would be attractive to the potential user, scoring it using the page-level metrics.
OpenAI Five is a combination of five neural networks that plays the five-on-five video game Dota 2. The goal of this project being the surpassing of human skills in complex video games with eventual applications beyond the games.
It learned by playing over 10 thousand years of games against itself. In the process of training it used a scaled-up version of Proximal Policy Optimization (an effective reinforcement learning approach to solve continuous control tasks) and a separate LSTM(networks well-suited for organising, processing and making predictions based on time series data) for each Dota 2 hero, learning noticeable strategies. All of this without human data.
In order to acquire necessary capabilities for playing a real-time strategy game such as Dota 2 this AI needs to take into account the following features:
- Long time horizons - Dota 2 runs at 30 frames/second for an average of 45 minutes, provoking 80 thousand ticks per game. The AI system detects every 4th frame, yielding 20 thousand moves, in contrast with classic games such as chess that usually lasts no more than 40 moves and Go that ends before 150 moves;
- Partially-observed state - due to the fact that the units and constructions in game can see exclusively the area around them and the rest of the map is covered in fog, the decisions must be taken based on incomplete data, modelling what action may be taken by the opponent;
- High-dimensional, continuous action space - each hero from Dota can take multiple actions targeting another unit or a position on the terrain, OpenAI Five separates the space into 170 thousand possible actions per hero;
- High-dimensional, continuous observation space - the game is played on a vast continuous map that holds 10 heroes, a bunch of constructions, NPCs and other game features like runes, trees and other items. The model observes the state of a game via Valve’s Bot API as 20 thousand numbers that represents all information that could be accessed by a typical user.
Needed to be mentioned that Dota 2 rules are very complex, due to its active development over dozens of years and game logic implemented in between one hundred thousand and a million lines of code, also the game gets an update once in a while regularly changing the environment semantics.
As mentioned before, the OpenAI Five system learns using Proximal Policy Optimization starting with random parameters that permits a natural walkthrough of the game’s environment. Also it imports different data about the units based on the scripted baseline. It is trained to maximise the exponentially decayed sum of eventual rewards.
Each of the five networks contain a single-layer 1024-unit LSTM that observes the current game state and generates actions through several possible action heads that are computed independently and have their own semantic meaning.
In terms of coordination, in OpenAI Five teamwork is managed by a hyperparameter called “team spirit” that ranges from 0 to 1 and weights how much each of the team's heroes should take into account its individual reward over the average team’s reward.
Differences between OpenAI five and humans:
The system has access to the same information as typical players, but unlike humans who need to check manually their inventories, position, health statuses the OpenAI Five receives all data instantly.
Another difference lies in frame-perfect timing that is trivial for the system, but only possible for skilled players, the OpenAI Five average reaction time being around 80ms that is faster than humans.
The team behind OpenAI Five observed some interesting things in the process of developing:
- Binary rewards can give good performance - the 1v1 model had a shaped bounty, including ones for last hits, kills and the like. The team ran an experiment where they rewarded the agent only for winning or losing, it trained slower and stabilised somewhat in the middle, unlike the smooth learning curves that are usually seen.
- Creep blocking can be learned from scratch - for creep blocking in 1v1 was used the traditional RL with a “creep block” reward. One of the team members left a 2v2 model to train for a couple of days to see how the performance will boost after a longer training. Surprisingly, the model had learned to creep block (a tactic in Dota 2 with the purpose of constantly repositioning the hero so the first creep of the wave gets blocked repeatedly).
- Winnings can be acquired even with bugs - the system showed that it is possible to beat nice players while still having serious bugs, but of course the team will still work on fixing these bugs.
Perfectioning OpenAI Five the team got a great achievement in 2019, their AI system won back-to-back games against Dota 2 world champions “OG” at Finals, becoming the first AI that defeated world champions in an esports game.
Thanks for reading!