-
Notifications
You must be signed in to change notification settings - Fork 3
/
params.json
6 lines (6 loc) · 9.83 KB
/
params.json
1
2
3
4
5
6
{
"name": "Huginn-newsroom-scenarios",
"tagline": "Documentation and common scenarios for newsroom usage of https://github.com/cantino/huginn/",
"body": "# Huginn for Newsrooms\r\n\r\n[Huginn](https://github.com/cantino/huginn/) is a system for building agents that can automate many common web tasks. Think of it as an open source, more flexible version of IFTTT or a successor to Yahoo Pipes. It can let users quickly create simple bots or schedule and repeat routine tasks.\r\n\r\nAt The New York Times we've been [using it](https://source.opennews.org/en-US/articles/open-source-bot-factory/) for a few years to automate a variety of routine tasks. A few tasks it could be used for: monitor a webpage and alert you when it changes; alert a Slack channel when the lead story on the homepage is changed; email a reporter when their story is published by the CMS; regularly scrape a database and save it's results; and more.\r\n\r\n* [How to Install](#how-to-install)\r\n* [Creating an account](#creating-an-account)\r\n* [Import a scenario](#list-of-scenarios)\r\n* [Configure any credentials necessary](#user-credentials)\r\n* [Go back to each agent and edit as necessary](#modify-agents-after-importing)\r\n* [Picking CSS selectors from page](#more-on-css-selectors)\r\n\r\n### Creating an Account\r\n\r\nThere's no central Huginn website, each installation is maintained separately. If someone has already set-up Huginn for you, use the invitation code and URL they've provided you.\r\n\r\nIf you have just want to try it out, there is a demo public install maintained by @albertsun at https://mighty-retreat-48764.herokuapp.com which you can register for with the invitation code `try-huginn`.\r\n\r\nIf you are slightly comfortable with the command line, have a Heroku account, you can run your own [Huginn installation](#how-to-install).\r\n\r\n### Basic Concepts\r\n\r\nHuginn is built around the concept of \"Agents\". Each Agent does one discrete, very simple task. To accomplish more complicated things requires combining agents into workflows. For example, one agent to check a webpage on a schedule, and another agent to send an email with the results from the first agent.\r\n\r\nWhen you first create your account, it will have no agents. To get started, the easiest way will be to import and modify a scenario. Scenarios are pre-arranged sets of agents already configured for a larger task. Select \"Scenarios\" from the top menu and then \"Import Scenario\".\r\n\r\n### List of Scenarios\r\n\r\nYou can find Scenarios provided by other Huginn users, or we're currently providing several useful scenarios as gists. To use one of these, grab the JSON from the Scenario Definition link and either modify and upload it, or paste the link to use it directly.\r\n\r\n[Scenario Definitions](https://gist.github.com/albertsun/7e5cffc84a450c7d587f05f9f5b6938e)\r\n\r\nThese are currently configured with random existing websites to be replaced.\r\n\r\n* Watch a website for changes, then send an email \r\n `WebsiteAgent -> EmailAgent` \r\n [Scenario Definition](https://raw.githubusercontent.com/albertsun/huginn-newsroom-scenarios/master/scenarios/website-email-notifier.json) \r\n\r\n* Watch a website for changes, then send a slack notification \r\n `WebsiteAgent -> SlackAgent` \r\n [Scenario Definition](https://raw.githubusercontent.com/albertsun/huginn-newsroom-scenarios/master/scenarios/website-slack-notifier.json) \r\n\r\n* Scrape a website and save it's text to [Stevedore](https://github.com/newsdev/stevedore) \r\n `WebsiteAgent -> PostAgent` \r\n [Scenario Definition](https://raw.githubusercontent.com/albertsun/huginn-newsroom-scenarios/master/scenarios/website-to-stevedore.json) \r\n\r\n* Follow an RSS feed, and send the full text of its items to [Stevedore](https://github.com/newsdev/stevedore) \r\n `RssAgent -> WebsiteAgent -> PostAgent` \r\n [Scenario Definition](https://raw.githubusercontent.com/albertsun/huginn-newsroom-scenarios/master/scenarios/rss-full-text-scrape-to-stevedore.json) \r\n\r\n* Filter the NYTimes Timeswire API and email stories matching a regex \r\n `WebsiteAgent -> TriggerAgent -> EmailAgent` \r\n [Scenario Definition](https://raw.githubusercontent.com/albertsun/huginn-newsroom-scenarios/master/scenarios/timeswire-story-filter-email.json) \r\n\r\n* _TK_ alert a Slack channel when the lead story on the homepage is changed\r\n* _TK_ email a reporter when their story is published by the CMS\r\n* _TK_ detect a spike in twitter mentions of a story URL and send a notification\r\n\r\n\r\n### Importing Scenarios\r\n\r\nIf you're comfortable editing JSON directly you may find it easier to modify the scenario before uploading, otherwise you can modify the agents through the web UI after importing.\r\n\r\nHere are the agents for the Website Email Notifier scenario.\r\n\r\n```\r\n{\r\n \"type\": \"Agents::EmailAgent\",\r\n \"name\": \"Website Email Notifier\",\r\n \"disabled\": false,\r\n \"guid\": \"9c61eccfd02f22c23d8fe461bbc46805\",\r\n \"options\": {\r\n \"subject\": \"The Website Changed\",\r\n \"headline\": \"Your notification:\",\r\n \"expected_receive_period_in_days\": \"2\"\r\n },\r\n \"propagate_immediately\": true\r\n}\r\n```\r\n\r\nIn the EmailAgent, the options to change would be `subject` for the subject line of the email and `headline`, for what headline to include in the body of the email.\r\n\r\n```\r\n{\r\n \"type\": \"Agents::WebsiteAgent\",\r\n \"name\": \"Website Scraper\",\r\n \"disabled\": false,\r\n \"guid\": \"f820bf7d25c0fc527280a0c05d905b1f\",\r\n \"options\": {\r\n \"expected_update_period_in_days\": \"2\",\r\n \"url\": \"https://example.com/some-url\",\r\n \"type\": \"html\",\r\n \"mode\": \"on_change\",\r\n \"extract\": {\r\n \"text\": {\r\n \"css\": \"div#css-selector\",\r\n \"value\": \"normalize-space(.)\"\r\n }\r\n }\r\n },\r\n \"schedule\": \"every_30m\",\r\n \"keep_events_for\": 0,\r\n \"propagate_immediately\": false\r\n}\r\n```\r\n\r\nThen in the WebsiteAgent, you would set the `url` option to the page you want to scrape, and then the `extract` hash with what elements on the page should be selected. Use a CSS selector in `css` and an XPath expression in `value` to select the value from the selected elements. If you just want full text, leave the value as `normalize-space(.)`. Set `schedule` to specify how frequently the website should be checked. Valid options are [here](https://github.com/cantino/huginn/wiki/Creating-a-new-agent#scheduling).\r\n\r\n### User Credentials\r\n\r\nIn the options for some agents, you'll see a tag like this `{% credential slack_webhook %}`. In the [Liquid templating language](https://github.com/cantino/huginn/wiki/Formatting-Events-using-Liquid) that Huginn uses that means to use a user credential. It's a more secure way to configure agents with secret variables like passwords and authentication tokens.\r\n\r\nTo set up credentials for agents you've created or imported, click \"Credentials\" in the top menu, and then \"New Credential\".\r\n\r\nIf an agent has the `{% credential slack_webhook %}` credential for example you'll want to create a webhook integration in Slack, copy the webhook URL from there and create a credential like so.\r\n\r\n```\r\nCredential name: slack_webhook\r\nCredential value: https://hooks.slack.com/services/TKTK...YOURVALUE...TKTK\r\n```\r\n\r\n### Modify Agents after Importing\r\n\r\nAfter importing the agents, you can continue modifying them and adding new agents as you see fit. Go through and replace whatever URLs and options are needed to fit the site you want to scrape or create more complicated configurations.\r\n\r\nThe [Huginn wiki](https://github.com/cantino/huginn/wiki/) has more examples of agent configurations Can make more extensive configuration changes following the documentation here and a guide to the [Liquid event templating language](https://github.com/cantino/huginn/wiki/Formatting-Events-using-Liquid) that allows variables to be inserted.\r\n\r\nFor example, with an Email Agent it will by default send emails to whatever email address you registered with. If you want to send email to a different address, go to edit the agent and add a `recipients` option with the email address (or multiple addresses as an array) which should receive the email.\r\n\r\nEvery agent will have a section of documentation on create or edit page explaining the options it uses.\r\n\r\n### More on CSS Selectors\r\n\r\nOne of the most common and trickiest tasks for using Huginn is finding how to select the proper parts of pages for the WebsiteAgent. This requires some comfort and understanding of the DOM tree to be able to pick out the right nodes.\r\n\r\nThere's more documentation in the Huginn [WebsiteAgent](https://github.com/cantino/huginn/blob/master/app/models/agents/website_agent.rb#L38) page. One recommended way to find the right selectors is the [SelectorGadget](http://selectorgadget.com/) bookmarklet. Or using Firebug or your browser's built in developer tools.\r\n\r\n\r\n## How to Install\r\n\r\nIf you don't already have an instance of Huginn you can use, the easiest way to deploy is with the one-click [Heroku installer](https://github.com/cantino/huginn/#heroku).\r\n\r\n[![Deploy](https://www.herokucdn.com/deploy/button.png)](https://heroku.com/deploy?template=https://github.com/cantino/huginn)\r\n\r\n* Click to deploy the app\r\n* Run the command line `bin/setup_heroku` to configure an admin user, domain\r\n* Either use the generated invite code or set a new one from\r\n* Set up a sendgrid add-on to be able to send email\r\n\r\n[Full Huginn Heroku documentation](https://github.com/cantino/huginn/blob/master/doc/heroku/install.md)\r\n\r\nThe main Huginn documentation is written for develoeprs and there are many other [installation options](https://github.com/cantino/huginn/#getting-started).\r\n",
"note": "Don't delete this file! It's used internally to help with page regeneration."
}