Module `harvest_and_collect` {#id}

Sub-modules

harvest_and_collect.connect_to_arxiv
harvest_and_collect.db_connexion
harvest_and_collect.main

Module `harvest_and_collect.connect_to_arxiv` {#id}

This module provides classes to connect to and harvest records from the ArXiv database.

Classes -----= ArXivRecord: Represents a single record from the ArXiv database. ArXivHarvester: Handles the connection to the ArXiv database and fetches records.

The ArXivRecord class parses XML data from a single ArXiv record into a Python object. It extracts the header and metadata from the record and checks if the record is valid.

The ArXivHarvester class connects to the ArXiv database and fetches records. It handles HTTP exceptions and retries failed requests. It also handles pagination by using the resumption token provided by the ArXiv API.

Classes

Class `ArXivHarvester` {#id}

class ArXivHarvester(
    **kwargs
)

A class to handle the connection to the ArXiv database and fetch records.

Raises -----= ArXivHarvester.CustomHTTPException : Custom HTTP Exception that forward the status code and the resumption token, if any.

Yields -----= next_record(): Yields the next record from the fetched records.

Class variables

Variable `CustomHTTPException` {#id}

Custom HTTP Exception that forward the status code and the resumption token, if any.

Methods

Method `next_record` {#id}

def next_record(
    self
) ‑> Generator[harvest_and_collect.connect_to_arxiv.ArXivRecord, Any, None]

A generator method that yields the next record from the fetched records.

This method continuously yields records from the fetched records list. If the list is empty, it fetches a new batch of records from the ArXiv database. If there are still no records after fetching, it stops the generator.

Yields: ArXivRecord : The next record from the fetched records.

Raises: CustomHTTPException : If an HTTP error occurs while fetching new records.

Class `ArXivRecord` {#id}

class ArXivRecord(
    record_xml: xml.etree.ElementTree.Element
)

A class to represent a single record from the ArXiv database.

Module `harvest_and_collect.db_connexion` {#id}

Using the data from a harvester, add records to the database.

Classes

Class `GraphDBConnexion` {#id}

class GraphDBConnexion(
    uri: str
)

Handle the database connection and provides functions to easily add records to it.

Methods

Method `add_record` {#id}

def add_record(
    self,
    record: harvest_and_collect.connect_to_arxiv.ArXivRecord
) ‑> None

Adds a record to the database.

This method checks if the record is valid. If it is, it opens a new session with the database and executes a write transaction using the _record_tx method.

Args -----= record : ArXivRecord : The record to be added to the database.

Raises -----= neo4j.exceptions.ServiceUnavailable : If the database is not available.

Method `clean_database` {#id}

def clean_database(
    self
) ‑> None

Deletes all nodes and relationships from the database.

Module `harvest_and_collect.main` {#id}

This module provides the main entry point for the ArXiv harvesting application.

The main function in this module sets up a connection to the ArXiv database and the Neo4j database, then fetches records from the ArXiv database and adds them to the Neo4j database. It also handles command line arguments for running the application in mock mode, specifying the Neo4j URI, and specifying the resumption token for the ArXiv database.

Functions

Function `main` {#id}

def main(
    mock=False,
    neo4j_uri='neo4j://localhost:7687',
    resumption_token=None
)

Generated by pdoc 0.10.0 (https://pdoc3.github.io).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harvest-and-collect.md

harvest-and-collect.md

Module `harvest_and_collect` {#id}

Sub-modules

Module `harvest_and_collect.connect_to_arxiv` {#id}

Classes

Class `ArXivHarvester` {#id}

Class variables

Variable `CustomHTTPException` {#id}

Methods

Method `next_record` {#id}

Class `ArXivRecord` {#id}

Module `harvest_and_collect.db_connexion` {#id}

Classes

Class `GraphDBConnexion` {#id}

Methods

Method `add_record` {#id}

Method `clean_database` {#id}

Module `harvest_and_collect.main` {#id}

Functions

Function `main` {#id}

Files

harvest-and-collect.md

Latest commit

History

harvest-and-collect.md

File metadata and controls

Module harvest_and_collect {#id}

Sub-modules

Module harvest_and_collect.connect_to_arxiv {#id}

Classes

Class ArXivHarvester {#id}

Class variables

Variable CustomHTTPException {#id}

Methods

Method next_record {#id}

Class ArXivRecord {#id}

Module harvest_and_collect.db_connexion {#id}

Classes

Class GraphDBConnexion {#id}

Methods

Method add_record {#id}

Method clean_database {#id}

Module harvest_and_collect.main {#id}

Functions

Function main {#id}

Module `harvest_and_collect` {#id}

Module `harvest_and_collect.connect_to_arxiv` {#id}

Class `ArXivHarvester` {#id}

Variable `CustomHTTPException` {#id}

Method `next_record` {#id}

Class `ArXivRecord` {#id}

Module `harvest_and_collect.db_connexion` {#id}

Class `GraphDBConnexion` {#id}

Method `add_record` {#id}

Method `clean_database` {#id}

Module `harvest_and_collect.main` {#id}

Function `main` {#id}