Update README.md

williamshen-nz · Sep 6, 2019 · 978ac92 · 978ac92
1 parent 84a20de
commit 978ac92
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Research project for COMP3770 at the Australian National University. 
 
-Published at the Symposium on Combinatorial Search 2019 as "[Guiding Search with Generalized Policies for Probabilistic Planning
+Published at the Symposium on Combinatorial Search (SoCS) 2019 as "[Guiding Search with Generalized Policies for Probabilistic Planning
 ](https://aaai.org/ocs/index.php/SOCS/SOCS19/paper/view/18334)"
 
 ## Abstract
@@ -11,7 +11,7 @@ Planning is the essential ability of an intelligent agent to solve the problem o
 
 Monte-Carlo Tree Search (MCTS) is a state-space search algorithm for optimal decision making that relies on performing Monte-Carlo simulations to incrementally build a search tree, and estimate the values of each state. MCTS can often achieve state-of-the-art performance when combined with domain-specific knowledge. However, without this knowledge, MCTS requires a large number of simulations in order to obtain reliable estimates in the search tree.
 
-The Action Schema Network (ASNets) [Toyer et al., 2018](https://github.com/qxcv/asnets) is a very recent contribution  in planning that uses deep learning and neural networks to learn generalized policies for planning problems. ASNets are well suited to problems where the ``local knowledge of the environment can help to avoid certain traps''. However, like most machine learning algorithms, an ASNet may fail to generalize to problems that it was not trained on. For example, this could be due to a poor choice of hyperparameters that lead to an undertrained or overtrained network.
+The Action Schema Network (ASNets) \[[Toyer et al., 2018](https://github.com/qxcv/asnets)\] is a very recent contribution  in planning that uses deep learning and neural networks to learn generalized policies for planning problems. ASNets are well suited to problems where the "local knowledge of the environment can help to avoid certain traps". However, like most machine learning algorithms, an ASNet may fail to generalize to problems that it was not trained on. For example, this could be due to a poor choice of hyperparameters that lead to an undertrained or overtrained network.
 
 This research project is concerned with investigating how we can improve upon the policy learned by an ASNet by combining it with MCTS. Our project has three key contributions. The first contribution is an ingredient-based framework for MCTS that allows us to specify different flavors of MCTS -- including those which use the policy learned by an ASNet. Our second contribution is two new methods which allow us to use ASNets to perform simulations in MCTS, and hence directly affect the estimated values of states in the search tree. Our third and final contribution is two new methods for using ASNets in the selection phase of MCTS. This allows us to bias the navigation of the search space towards what an ASNet believes is promising.