Skip to content

GateNLP/ultimate-sitemap-parser

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

149a6ad · Jan 14, 2025
Dec 18, 2024
Jan 14, 2025
Jan 14, 2025
Jan 14, 2025
Nov 29, 2018
Aug 18, 2024
Dec 16, 2024
Aug 16, 2024
Sep 3, 2024
Dec 18, 2024
Dec 18, 2024
Jan 13, 2025
Jan 13, 2025

Repository files navigation

Ultimate Sitemap Parser

PyPI - Python Version PyPI - Version Conda Version Pepy Total Downloads

Ultimate Sitemap Parser (USP) is a performant and robust Python library for parsing and crawling sitemaps.

Features

Installation

pip install ultimate-sitemap-parser

or using Anaconda:

conda install -c conda-forge ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.example.org/')

for page in tree.all_pages():
    print(page.url)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses. AbstractSitemap.all_pages() returns a generator to efficiently iterate over pages without loading the entire tree into memory.

For more examples and details, see the documentation.