scraper.rs

A scraper intended for use with philomena. This tool presents a simple HTTP endpoint that returns URLs of image files belonging to a social media post.

Compiling and Install

Simply checkout scraper.rs, and run either "cargo test" or "cargo build".

Then run "cargo test" to verify the scraper is working (requires Tumblr API Key)

"cargo run" will run the current source code, otherwise use "cargo build --release" to generate a release build.

Configuration

For configuration see .env.example.

A Tumblr API Key is required

Scrapers

Available Scrapers are:

Service	Status	Notes
DeviantArt	Beta	Does not work for images flagged as mature
Twitter	Unsupported	Due to API requirements, the Twitter scraper is becoming hard to support
Nitter	Production	Only supports officially listed instances
Tumblr	Production	Missing Text-Post Scraping
Raw	Production	Valid for gif, jpeg, png, svg, webm
Philomena	Production	Works for a selected number of boorus
Buzzly.Art	Unsupported	Actively broken

API

Make a request to <domain>/images/scrape. Scraper.rs accepts POSTS and optionally GET requests.

For the GET request, simply put an URL encoded query into the query parameter "url". In the POST method, simply encode the request as JSON with the object attribute "url" set.

Example:

POST www.example.com/images/scrape
{
    "url": "some-tumblr-blog.com/my-post-id-/image"
}

You will receive a scrape response with 200 Status Code if the request is accepted. If the "errors" field is populated, you must ignore the remainder of the object. The errors field is an array containing strings describing the error path.

Example of an error:

{"errors":["URL invalid"]}
{"errors":["Twitter parser failed","invalid api response","API request is not 200 code"]}

Otherwise, the response will look like this;

{
    "source_url":"https://twitter.com/user/status/1000000000000000000",
    "author_name":"user",
    "description":"My tweet\nhas some images I made",
    "images":[
        {
            "url":"https://pbs.twimg.com/media/EpiHor000000000.jpg",
            "camo_url":"https://pbs.twimg.com/media/EpiHor000000000.jpg"
        },
        {
            "url":"https://pbs.twimg.com/media/EpiHor000000001.jpg",
            "camo_url":"https://pbs.twimg.com/media/EpiHor000000001.jpg"
        },
        {
            "url":"https://pbs.twimg.com/media/EpiHor000000002.jpg",
            "camo_url":"https://pbs.twimg.com/media/EpiHor000000002.jpg"
        },
        {
            "url":"https://pbs.twimg.com/media/EpiHor000000003.jpg",
            "camo_url":"https://pbs.twimg.com/media/EpiHor000000003.jpg"
        }
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 598 Commits
.github		.github
src		src
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
garnix.yaml		garnix.yaml
service.nix		service.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scraper.rs

Compiling and Install

Configuration

Scrapers

API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

booru/scraper

Folders and files

Latest commit

History

Repository files navigation

scraper.rs

Compiling and Install

Configuration

Scrapers

API

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages