From 2627f263bc363ee48a41164393f33d697d09bff9 Mon Sep 17 00:00:00 2001
From: ilyasu123 <ilya.su@gmail.com>
Date: Mon, 4 Jul 2016 10:57:32 -0700
Subject: [PATCH] Revert "Comedian neural network research request"

---
 _requests_for_research/funnybot.html  | 77 ---------------------------
 _requests_for_research/funnybot.html~ | 77 ---------------------------
 2 files changed, 154 deletions(-)
 delete mode 100644 _requests_for_research/funnybot.html
 delete mode 100644 _requests_for_research/funnybot.html~
diff --git a/_requests_for_research/funnybot.html b/_requests_for_research/funnybot.html
deleted file mode 100644
index f6bc7ff..0000000
--- a/_requests_for_research/funnybot.html
+++ /dev/null
@@ -1,77 +0,0 @@
----
-title: Comedian language model
-summary: ''
-difficulty: 2 # out of 3
----
-
-<p>Train a language model capable of generating jokes from one of 
-predefined categories.
-This request can be solved as follows:
-</p>
-
-<p> 
-Firstly, obtain a large corpus of jokes in a raw text format, that will 
-later be used to train a language model. For initial tests you can use 
-following existing datasets:
-</p>
-<p> 
-Pun of the Day dataset: ~2500 puns; 16000 One Liners dataset: ~16000 
-one line jokes; See 
-[Yang, Diyi, et al. "Humor recognition and humor anchor extraction."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.696.1901&rep=rep1&type=pdf)
-</p>
-<p> 
-[Jester datasets](http://eigentaste.berkeley.edu/dataset/) : contains around 
-150 jokes and ratings from a large number of users.
-</p>
-<p>
-Train a [language model](https://arxiv.org/abs/1602.02410) 
-on jokes datasets, similarily as [in this post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
-See if models trained on above datasets produce reasonable results.
-</p>
-<p>
-Most likely to obtainin a reasonable language model more
-data is required then what is available in above datasets. 
-Obtain such additional data by web scraping sites like 
-https://www.reddit.com/r/jokes, 
-http://funtweets.com/, 
-http://funnytweeter.com/ 
-and similar sites. 
-Please make sure to obey website policies with respect to the web scraping!
-You can download reddit comments (not only for jokes) using [this torrent](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/).
-</p>
-<p>
-To further increase the amount of text that language model is trained on,
-pretrain a language model on a large corpus of regular text, for example
-on the [One Billion Words dataset](http://arxiv.org/abs/1312.3005), which you can 
-download [here](http://www.statmt.org/lm-benchmark/). This way your language
-model will already be initialized with the knowledge of the general structures in 
-English language; Fine tune such pretrained model on a jokes corpus. 
-One of the outcomes of this research request is to determine whether pretraining
-helps with the joke generation.
-</p>
-<p>
-People are all different, and so are their tastes in jokes, thus some
-might prefer a certain category of jokes over others. Modify a language
-model used in previous steps such that it can be configured to generate
-jokes from a certain category only. To do so, train a language model using jokes 
-from https://www.reddit.com/r/jokes on both joke text and one hot encoded 
-label of joke, such that the language model can be configured to generate jokes
-only of a certain type by fixing the corresponding one hot encoded input label.
-For the other datasets, jokes can be labeled using a text classifier
-trained to detect the reddit label from the joke text. 
-</p>
-<p>
-Expected outcome of this research request is to determine whether a reasonable 
-language model as described in the previous paragraph can be built with current 
-language modelling approaches.
-</p>
-
-<p>Related literature:
-[etrovic, Sasa, and David Matthews. "Unsupervised joke generation from big data." ACL (2). 2013.](http://homepages.inf.ed.ac.uk/s0894589/petrovic13unsupervised.pdf)
-</p>
-
-<p>Potential follow up after this research request is solved is to create a 
-setup in which created neural network is able to
-post jokes somewhere online where it can get a reliable score feedback
-for jokes it generates and improve itself accordingly using RL.
-</p>
diff --git a/_requests_for_research/funnybot.html~ b/_requests_for_research/funnybot.html~
deleted file mode 100644
index d5a7815..0000000
--- a/_requests_for_research/funnybot.html~
+++ /dev/null
@@ -1,77 +0,0 @@
----
-title: Comedian language model
-summary: ''
-difficulty: 2 # out of 3
----
-
-<p>Train a language model capable of generating jokes from one of 
-predefined categories.
-This request can be solved as follows:
-</p>
-
-<p> 
-Firstly, obtain a large corpus of jokes in a raw text format, that will 
-later be used to train a language model. For initial tests you can use 
-following existing datasets:
-</p>
-<p> 
-Pun of the Day dataset: ~2500 puns; 16000 One Liners dataset: ~16000 
-one line jokes; See 
-[Yang, Diyi, et al. "Humor recognition and humor anchor extraction."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.696.1901&rep=rep1&type=pdf)
-</p>
-<p> 
-[Jester datasets](http://eigentaste.berkeley.edu/dataset/) : contains around 
-150 jokes and ratings from a large number of users.
-</p>
-<p>
-Train a large [language model](https://arxiv.org/abs/1602.02410) 
-on jokes datasets, similarily as [in this post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
-See if models trained on above datasets produce reasonable results.
-</p>
-<p>
-Most likely to obtainin a reasonable language model more
-data is required then what is available in above datasets. 
-Obtain such additional data by web scraping sites like 
-https://www.reddit.com/r/jokes, 
-http://funtweets.com/, 
-http://funnytweeter.com/ 
-and similar sites. 
-Please make sure to obey website policies with respect to the web scraping!
-You can download reddit comments (not only for jokes) using [this torrent](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/).
-</p>
-<p>
-To further increase the amount of text that language model is trained on,
-pretrain a language model on a large corpus of regular text, for example
-on the [One Billion Words dataset](http://arxiv.org/abs/1312.3005), which you can 
-download [here](http://www.statmt.org/lm-benchmark/). This way your language
-model will already be initialized with the knowledge of the general structures in 
-English language; Fine tune such pretrained model on a jokes corpus. 
-One of the outcomes of this research request is to determine whether pretraining
-helps with the joke generation.
-</p>
-<p>
-People are all different, and so are their tastes in jokes, thus some
-might prefer a certain category of jokes over others. Modify a language
-model used in previous steps such that it can be configured to generate
-jokes from a certain category only. To do so, train a language model using jokes 
-from https://www.reddit.com/r/jokes on both joke text and one hot encoded 
-label of joke, such that the language model can be configured to generate jokes
-only of a certain type by fixing the corresponding one hot encoded input label.
-For the other datasets, jokes can be labeled using a text classifier
-trained to detect the reddit label from the joke text. 
-</p>
-<p>
-Expected outcome of this research request is to determine whether a reasonable 
-language model as described in the previous paragraph can be built with current 
-language modelling approaches.
-</p>
-
-<p>Related literature:
-[etrovic, Sasa, and David Matthews. "Unsupervised joke generation from big data." ACL (2). 2013.](http://homepages.inf.ed.ac.uk/s0894589/petrovic13unsupervised.pdf)
-</p>
-
-<p>Potential follow up after this research request is solved is to create a 
-setup in which created neural network is able to
-post jokes somewhere online where it can get a reliable score feedback
-for jokes it generates and improve itself accordingly using RL.
-</p>