This module provides classes to connect to and harvest records from the ArXiv database.
Classes -----= ArXivRecord: Represents a single record from the ArXiv database. ArXivHarvester: Handles the connection to the ArXiv database and fetches records.
The ArXivRecord class parses XML data from a single ArXiv record into a Python object. It extracts the header and metadata from the record and checks if the record is valid.
The ArXivHarvester class connects to the ArXiv database and fetches records. It handles HTTP exceptions and retries failed requests. It also handles pagination by using the resumption token provided by the ArXiv API.
class ArXivHarvester( **kwargs )
A class to handle the connection to the ArXiv database and fetch records.
Raises
-----=
ArXivHarvester.CustomHTTPException
: Custom HTTP Exception that forward the status code and the resumption token, if any.
Yields -----= next_record(): Yields the next record from the fetched records.
Custom HTTP Exception that forward the status code and the resumption token, if any.
def next_record( self ) ‑> Generator[harvest_and_collect.connect_to_arxiv.ArXivRecord, Any, None]
A generator method that yields the next record from the fetched records.
This method continuously yields records from the fetched records list. If the list is empty, it fetches a new batch of records from the ArXiv database. If there are still no records after fetching, it stops the generator.
Yields: ArXivRecord : The next record from the fetched records.
Raises: CustomHTTPException : If an HTTP error occurs while fetching new records.
class ArXivRecord( record_xml: xml.etree.ElementTree.Element )
A class to represent a single record from the ArXiv database.
Using the data from a harvester, add records to the database.
class GraphDBConnexion( uri: str )
Handle the database connection and provides functions to easily add records to it.
def add_record( self, record: harvest_and_collect.connect_to_arxiv.ArXivRecord ) ‑> None
Adds a record to the database.
This method checks if the record is valid. If it is, it opens a new session with the database and executes a write transaction using the _record_tx method.
Args
-----=
record
: ArXivRecord
: The record to be added to the database.
Raises
-----=
neo4j.exceptions.ServiceUnavailable
: If the database is not available.
def clean_database( self ) ‑> None
Deletes all nodes and relationships from the database.
This module provides the main entry point for the ArXiv harvesting application.
The main function in this module sets up a connection to the ArXiv database and the Neo4j database, then fetches records from the ArXiv database and adds them to the Neo4j database. It also handles command line arguments for running the application in mock mode, specifying the Neo4j URI, and specifying the resumption token for the ArXiv database.
def main( mock=False, neo4j_uri='neo4j://localhost:7687', resumption_token=None )
Generated by pdoc 0.10.0 (https://pdoc3.github.io).