Skip to content

Latest commit

 

History

History

paper_summarization

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Scientific Paper Summarization using Document AI and Vertex AI

DEPRECATED: Go to github.com/GoogleCloudPlatform/generative-ai/language/use-cases/document-summarization

Training Data

ScisummNet - Scientific Article Summarization Dataset

  • Google Cloud Storage Bucket: gs://cloud-samples-data/documentai/ScisummNet
    • pdf - Original PDF files of papers from ACL Anthology
    • summary_txt - Human-written summaries of papers
    • json - Contains Document.json files processed by the Document AI OCR Processor
    • full_txt - Contains Full OCR-Extracted Text from each Document extracted from Document.json files