Skip to content

Commit

Permalink
Merge pull request #38 from aaronpk/library-refactor
Browse files Browse the repository at this point in the history
Refactors into a library that can be used separately from the API
  • Loading branch information
aaronpk authored Apr 29, 2017
2 parents 2a3d7b4 + 78e3e16 commit 11977e6
Show file tree
Hide file tree
Showing 39 changed files with 2,225 additions and 1,204 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.DS_Store
config.php
vendor/
XRay-*.json
php_errors.log
XRay-*.json
22 changes: 18 additions & 4 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
Copyright 2016 by Aaron Parecki
MIT License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Copyright (c) 2017 Aaron Parecki

http://www.apache.org/licenses/LICENSE-2.0
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
96 changes: 91 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,66 @@ XRay parses structured content from a URL.
The contents of the URL is checked in the following order:

* A silo URL from one of the following websites:
** Instagram
** Twitter
** (more coming soon)
* h-entry, h-event, h-card
* Instagram
* Twitter
* GitHub
* XKCD
* (more coming soon)
* Microformats
* h-card
* h-entry
* h-event
* h-review
* h-recipe
* h-product

## Library

XRay can be used as a library in your PHP project. The easiest way to install it and its dependencies is via composer.

```
composer require p3k/xray
```

Basic usage:

```php
$xray = new p3k\XRay();
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/');
```

If you already have an HTML or JSON document you want to parse, you can pass it as a string in the second parameter.

```php
$xray = new p3k\XRay();
$html = '<html>....</html>';
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html);
```

In both cases, you can add an additional parameter to configure various options of how XRay will behave. Below is a list of the options.

## Parse API
* `timeout` - The timeout in seconds to wait for any HTTP requests
* `max_redirects` - The maximum number of redirects to follow
* `include_original` - Will also return the full document fetched
* `target` - Specify a target URL, and XRay will first check if that URL is on the page, and only if it is, will continue to parse the page. This is useful when you're using XRay to verify an incoming webmention.

Additionally, the following parameters are supported when making requests that use the Twitter or GitHub API. See the authentication section below for details.

```php
$xray = new p3k\XRay();

$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', [
'timeout' => 30
]);

$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html, [
'target' => 'http://example.com/'
]);
```

## API

XRay can also be used as an API to provide its parsing capabilities over an HTTP service.

To parse a page and return structured data for the contents of the page, simply pass a url to the parse route.

Expand All @@ -33,6 +86,26 @@ In both cases, the response will be a JSON object containing a key of "type". If

You can also make a POST request with the same parameter names.

If you already have an HTML or JSON document you want to parse, you can include that in the parameter `body`. This POST request would look like the below:

```
POST /parse
Content-type: application/x-www-form-urlencoded
url=https://aaronparecki.com/2016/01/16/11/
&body=<html>....</html>
```

or for Twitter/GitHub where you might have JSON,

```
POST /parse
Content-type: application/x-www-form-urlencoded
url=https://github.com/aaronpk/XRay
&body={"repo":......}
```

### Authentication

If the URL you are fetching requires authentication, include the access token in the parameter "token", and it will be included in an "Authorization" header when fetching the URL. (It is recommended to use a POST request in this case, to avoid the access token potentially being logged as part of the query string.) This is useful for [Private Webmention](https://indieweb.org/Private-Webmention) verification.
Expand All @@ -57,6 +130,13 @@ You should only send Twitter credentials when the URL you are trying to parse is
* twitter_access_token_secret - Your Twitter secret access token


### GitHub Authentication

XRay uses the GitHub API to fetch GitHub URLs, which provides higher rate limits when used with authentication. You can pass a GitHub access token along with the request and XRay will use it when making requests to the API.

* github_access_token - A GitHub access token


### Error Response

```json
Expand Down Expand Up @@ -119,8 +199,14 @@ The primary object on the page is returned in the `data` property. This will ind
If a property supports multiple values, it will always be returned as an array. The following properties support multiple values:

* in-reply-to
* like-of
* repost-of
* bookmark-of
* syndication
* photo (of entry, not of a card)
* video
* audio
* category

The content will be an object that always contains a "text" property and may contain an "html" property if the source documented published HTML content. The "text" property must always be HTML escaped before displaying it as HTML, as it may include unescaped characters such as `<` and `>`.

Expand Down
41 changes: 22 additions & 19 deletions composer.json
Original file line number Diff line number Diff line change
@@ -1,36 +1,39 @@
{
"name": "p3k/xray",
"type": "library",
"license": "MIT",
"homepage": "https://github.com/aaronpk/XRay",
"description": "X-Ray returns structured data from any URL",
"require": {
"league/plates": "3.*",
"league/route": "1.*",
"mf2/mf2": "~0.3",
"ezyang/htmlpurifier": "4.*",
"indieweb/link-rel-parser": "0.1.*",
"dg/twitter-php": "^3.6",
"dg/twitter-php": "3.6.*",
"p3k/timezone": "*",
"cebe/markdown": "~1.1.1"
"p3k/http": "0.1.*",
"cebe/markdown": "1.1.*"
},
"autoload": {
"psr-4": {
"p3k\\XRay\\": "lib/XRay"
},
"files": [
"lib/helpers.php",
"controllers/Main.php",
"controllers/Parse.php",
"controllers/Token.php",
"controllers/Rels.php",
"controllers/Certbot.php",
"lib/HTTPCurl.php",
"lib/HTTPStream.php",
"lib/HTTP.php",
"lib/Formats/Mf2.php",
"lib/Formats/Instagram.php",
"lib/Formats/GitHub.php",
"lib/Formats/Twitter.php",
"lib/Formats/XKCD.php",
"lib/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php"
"lib/XRay.php"
]
},
"require-dev": {
"league/plates": "3.*",
"league/route": "1.*",
"phpunit/phpunit": "4.8.*"
},
"autoload-dev": {
"files": [
"lib/HTTPTest.php"
"controllers/Main.php",
"controllers/Parse.php",
"controllers/Token.php",
"controllers/Rels.php",
"controllers/Certbot.php"
]
}
}
Loading

0 comments on commit 11977e6

Please sign in to comment.