Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a simple JSON filtering app to filter paper data from huge json #19

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions json_stripper/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# JSON Papers Processor

A TypeScript application that reads a JSON file containing academic papers, filters them based on a specified cutoff date, and outputs a simplified version of the papers to a new JSON file.

## Features

- Reads a JSON file containing paper information.
- Filters out papers published before a specified cutoff date.
- Outputs a simplified version of the papers in a new JSON file.

## Requirements

- Node.js (version 18 or higher)
- npm (Node package manager)

## Installation

Install the necessary dependencies:

`npm install`

## Run the script

`npm run readjson -- <input-file.json> "<cutoff-date>"`

### Parameters

- `<input-file.json>:` The path to the input JSON file containing the papers.
- `<cutoff-date>:` The cutoff date in ISO format (e.g., `"2024-06-30"`).

### Example

`npm run readjson -- test.json "1990-06-30"`
19 changes: 19 additions & 0 deletions json_stripper/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "json_stripper",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"readjson": "ts-node src/stripJson.ts"
},
"author": "",
"license": "ISC",
"dependencies": {
"typescript": "^5.0.4"
},
"devDependencies": {
"@types/node": "^22.7.0",
"env-cmd": "^10.1.0",
"ts-node": "^10.9.1"
}
}
82 changes: 82 additions & 0 deletions json_stripper/src/stripJson.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import * as fs from 'fs';
import * as readline from 'readline';
import {LeanPaper, Paper} from "./types";

let buffer = '';
let fullPapers: Paper[] = [];

let filename = process.argv[2];
let cutoffDateParam = process.argv[3];
let cutoffDate: Date;

const outputFilename = 'leanPapers.json';

const DEFAULT_CUTOFF_DATE = new Date('1996-06-30');


if (!filename) {
console.error('No filename given, using test file: test.json');
filename = 'test.json';
}

if(!cutoffDateParam) {
console.error('No cutoff date given, using default cutoff date: 1996-06-30');
cutoffDate = DEFAULT_CUTOFF_DATE;
}else {
cutoffDate = new Date(cutoffDateParam);
}

console.log(`Reading file: ${filename}`);
console.log(`Cutoff date: ${cutoffDate}`);

const readStream = fs.createReadStream(filename, { encoding: 'utf-8' });
const rl = readline.createInterface({
input: readStream,
output: process.stdout,
terminal: false,
});

rl.on('line', (line: string) => {
buffer += line.trim(); // Add each line to the buffer

try {
// Try parsing the accumulated buffer
const parsedObject: Paper = JSON.parse(buffer);

const paperCreationDate = new Date(parsedObject.versions[0].created);

if(paperCreationDate > cutoffDate) {
fullPapers.push(parsedObject);
}

// Reset the buffer after successful parse
buffer = '';
} catch (err) {
// Continue buffering lines if the object is incomplete
// The error here indicates that the JSON is not fully received yet
}
});

rl.on('close', () => {
console.log('Finished processing the file.');

const leanPapers: LeanPaper[] = fullPapers.map((paper) => ({
id: paper.id,
authors: paper.authors,
title: paper.title,
doi: paper.doi,
categories: paper.categories,
}));

// Output the transformed lean papers
console.log('Lean papers:', leanPapers.length);

fs.writeFile(outputFilename, JSON.stringify(leanPapers, null, 2), (err) => {
if (err) {
console.error('Error writing to output file:', err);
} else {
console.log(`Lean papers written to ${outputFilename}`);
}
});

});
37 changes: 37 additions & 0 deletions json_stripper/src/types.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
export interface Version {
version: string;
created: string;
}

export interface Author {
lastName: string;
firstName: string;
middleName: string;
affiliation: string;
}

export interface Paper {
id: string;
submitter: string;
authors: string;
title: string;
comments: string;
journalRef: string | null;
doi: string;
reportNo: string | null;
categories: string;
license: string | null;
abstract: string;
versions: Version[];
updateDate: string;
authorsParsed: Author[];
}


export interface LeanPaper {
id: string;
authors: string;
title: string;
doi: string;
categories: string;
}
6 changes: 6 additions & 0 deletions json_stripper/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"id":"supr-con/9608007","submitter":null,"authors":"Francesca Federici, Andrei A. Varlamov","title":"The Fluctuation Induced Pseudogap in the Infrared Optical Conductivity\n of High Temperature Superconductors","comments":"8 pages, 4 eps figures, Submitted to Phys. Rev. B","journal-ref":null,"doi":"10.1103/PhysRevB.55.6070","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" We study the effect of fluctuations on the {\\bf ac} conductivity of a layered\nsuperconductor both for $c$-axis and $ab$-plane electromagnetic wave\npolarizations. The fluctuation contributions of different physical nature and\nsigns (paraconductivity, Maki-Thompson anomalous contribution, one-electron\ndensity of states renormalization) are found to be suppressed by the external\nfield at different characterisitic frequencies ($ \\omega_{\\rm AL}\\sim T-T_{\\rm\nc}$, $\\omega_{\\rm MT} \\sim \\max\\{ T-T_{\\rm c}, \\tau_{\\varphi}^{-1}\\}$, $\n\\omega_{\\rm DOS} \\sim \\min\\{T,\\tau ^{-1}\\}$ for the $2D$ case). As a result the\nappearance of the nonmonotonic frequency dependence (pseudogap) in the infrared\noptical conductivity of HTS film is predicted. The effect has to be especially\npronounced in the case of the electromagnetic field polarization along\n$c$-axis.\n","versions":[{"version":"v1","created":"Fri, 23 Aug 1996 09:39:49 GMT"},{"version":"v2","created":"Wed, 28 Aug 1996 11:54:46 GMT"}],"update_date":"2009-10-30","authors_parsed":[["Federici","Francesca",""],["Varlamov","Andrei A.",""]]}
{"id":"supr-con/9608008","submitter":"Ruslan Prozorov","authors":"R. Prozorov, M. Konczykowski, B. Schmidt, Y. Yeshurun, A. Shaulov, C.\n Villard, G. Koren","title":"On the origin of the irreversibility line in thin YBaCuO7 films with and\n without columnar defects","comments":"19 pages, LaTex, 6 PostScript figures; Author's Homepage:\n http://www.biu.ac.il:80/~prozorr","journal-ref":null,"doi":"10.1103/PhysRevB.54.15530","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" We report on measurements of the angular dependence of the irreversibility\ntemperature $T_{irr}(\\theta) $ in $YBa_2Cu_3O_{7-\\delta }$ thin films, defined\nby the onset of a third harmonic signal and measured by a miniature Hall probe.\nFrom the functional form of $T_{irr}(\\theta)$ we conclude that the origin of\nthe irreversibility line in unirradiated films is a dynamic crossover from an\nunpinned to a pinned vortex liquid. In irradiated films the irreversibility\ntemperature is determined by the trapping angle.\n","versions":[{"version":"v1","created":"Mon, 26 Aug 1996 15:08:35 GMT"}],"update_date":"2009-10-30","authors_parsed":[["Prozorov","R.",""],["Konczykowski","M.",""],["Schmidt","B.",""],["Yeshurun","Y.",""],["Shaulov","A.",""],["Villard","C.",""],["Koren","G.",""]]}
{"id":"supr-con/9609001","submitter":"Durga P. Choudhury","authors":"Durga P. Choudhury, Balam A. Willemsen, John S. Derov and S. Sridhar\n (Physics Department, Northeastern University and Rome Laboratory, Hanscom\n AFB.)","title":"Nonlinear Response of HTSC Thin Film Microwave Resonators in an Applied\n DC Magnetic Field","comments":"4 pages, LaTeX type, Uses IEEE style files, 600 dpi PostScript file\n with color figures available at http://sagar.physics.neu.edu/preprints.html\n Submitted to IEEE Transactions on Applied Superconductivity","journal-ref":null,"doi":"10.1109/77.620744","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" The non-linear microwave surface impedance of patterned YBCO thin films, was\nmeasured using a suspended line resonator in the presence of a perpendicular DC\nmagnetic field of magnitude comparable to that of the microwave field.\nSignature of the virgin state was found to be absent even for relatively low\nmicrowave power levels. The microwave loss was initially found to decrease for\nsmall applied DC field before increasing again. Also, non-linearities inherent\nin the sample were found to be substantially suppressed at low powers at these\napplied fields. These two features together can lead to significant improvement\nin device performance.\n","versions":[{"version":"v1","created":"Sat, 31 Aug 1996 17:34:38 GMT"}],"update_date":"2016-11-18","authors_parsed":[["Choudhury","Durga P.","","Physics Department, Northeastern University and Rome Laboratory, Hanscom\n AFB."],["Willemsen","Balam A.","","Physics Department, Northeastern University and Rome Laboratory, Hanscom\n AFB."],["Derov","John S.","","Physics Department, Northeastern University and Rome Laboratory, Hanscom\n AFB."],["Sridhar","S.","","Physics Department, Northeastern University and Rome Laboratory, Hanscom\n AFB."]]}
{"id":"supr-con/9609002","submitter":"Durga P. Choudhury","authors":"Balam A. Willemsen, J. S. Derov and S.Sridhar (Physics Department,\n Northeastern University and Rome Laboratory, Hanscom AFB)","title":"Critical State Flux Penetration and Linear Microwave Vortex Response in\n YBa_2Cu_3O_{7-x} Films","comments":"20 pages, LaTeX type, Uses REVTeX style files, Submitted to Physical\n Review B, 600 dpi PostScript file with high resolution figures available at\n http://sagar.physics.neu.edu/preprints.html","journal-ref":null,"doi":"10.1103/PhysRevB.56.11989","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" The vortex contribution to the dc field (H) dependent microwave surface\nimpedance Z_s = R_s+iX_s of YBa_2Cu_3O_{7-x} thin films was measured using\nsuspended patterned resonators. Z_s(H) is shown to be a direct measure of the\nflux density B(H) enabling a very precise test of models of flux penetration.\nThree regimes of field-dependent behavior were observed: (1) Initial flux\npenetration occurs on very low field scales H_i(4.2K) 100Oe, (2) At moderate\nfields the flux penetration into the virgin state is in excellent agreement\nwith calculations based upon the field-induced Bean critical state for thin\nfilm geometry, parametrized by a field scale H_s(4.2K) J_c*d 0.5T, (3) for very\nhigh fields H >>H_s, the flux density is uniform and the measurements enable\ndirect determination of vortex parameters such as pinning force constants\n\\alpha_p and vortex viscosity \\eta. However hysteresis loops are in\ndisagreement with the thin film Bean model, and instead are governed by the low\nfield scale H_i, rather than by H_s. Geometric barriers are insufficient to\naccount for the observed results.\n","versions":[{"version":"v1","created":"Tue, 3 Sep 1996 14:08:26 GMT"}],"update_date":"2009-10-30","authors_parsed":[["Willemsen","Balam A.","","Physics Department,\n Northeastern University and Rome Laboratory, Hanscom AFB"],["Derov","J. S.","","Physics Department,\n Northeastern University and Rome Laboratory, Hanscom AFB"],["Sridhar","S.","","Physics Department,\n Northeastern University and Rome Laboratory, Hanscom AFB"]]}
{"id":"supr-con/9609003","submitter":"Hasegawa Yasumasa","authors":"Yasumasa Hasegawa (Himeji Institute of Technology)","title":"Density of States and NMR Relaxation Rate in Anisotropic\n Superconductivity with Intersecting Line Nodes","comments":"7 pages, 4 PostScript Figures, LaTeX, to appear in J. Phys. Soc. Jpn","journal-ref":null,"doi":"10.1143/JPSJ.65.3131","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" We show that the density of states in an anisotropic superconductor with\nintersecting line nodes in the gap function is proportional to $E log (\\alpha\n\\Delta_0 /E)$ for $|E| << \\Delta_0$, where $\\Delta_0$ is the maximum value of\nthe gap function and $\\alpha$ is constant, while it is proportional to $E$ if\nthe line nodes do not intersect. As a result, a logarithmic correction appears\nin the temperature dependence of the NMR relaxation rate and the specific heat,\nwhich can be observed experimentally. By comparing with those for the heavy\nfermion superconductors, we can obtain information about the symmetry of the\ngap function.\n","versions":[{"version":"v1","created":"Wed, 18 Sep 1996 07:57:29 GMT"}],"update_date":"2009-10-30","authors_parsed":[["Hasegawa","Yasumasa","","Himeji Institute of Technology"]]}
{"id":"supr-con/9609004","submitter":"Masanori Ichioka","authors":"Naoki Enomoto, Masanori Ichioka and Kazushige Machida (Okayama Univ.)","title":"Ginzburg Landau theory for d-wave pairing and fourfold symmetric vortex\n core structure","comments":"12 pages including 8 eps figs, LaTeX with jpsj.sty & epsfig","journal-ref":"J. Phys. Soc. Jpn. 66, 204 (1997).","doi":"10.1143/JPSJ.66.204","report-no":null,"categories":"supr-con cond-mat.supr-con","license":null,"abstract":" The Ginzburg Landau theory for d_{x^2-y^2}-wave superconductors is\nconstructed, by starting from the Gor'kov equation with including correction\nterms up to the next order of ln(T_c/T). Some of the non-local correction terms\nare found to break the cylindrical symmetry and lead to the fourfold symmetric\ncore structure, reflecting the internal degree of freedom in the pair\npotential. Using this extended Ginzburg Landau theory, we investigate the\nfourfold symmetric structure of the pair potential, current and magnetic field\naround an isolated single vortex, and clarify concretely how the vortex core\nstructure deviates from the cylindrical symmetry in the d_{x^2-y^2}-wave\nsuperconductors.\n","versions":[{"version":"v1","created":"Wed, 25 Sep 1996 14:17:09 GMT"}],"update_date":"2009-10-30","authors_parsed":[["Enomoto","Naoki","","Okayama Univ."],["Ichioka","Masanori","","Okayama Univ."],["Machida","Kazushige","","Okayama Univ."]]}
15 changes: 15 additions & 0 deletions json_stripper/tsconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"compilerOptions": {
"module": "commonjs",
"target": "ES2020",
"sourceMap": true,
"esModuleInterop": true,
"moduleResolution": "nodenext",
"lib": ["ES2020"],
"types": ["node"]
},
"exclude": [
"node_modules"
],
"include": ["src/**/*.ts"]
}
4 changes: 3 additions & 1 deletion walrus_upload/walrus_upload_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def upload_pdf_to_walrus(pdf_file_path):
pdf_size = os.path.getsize(pdf_file_path)

# MacOS Operating System
# Part 1. Upload the file to the Walrus service
# Part 1. Upload the file to the Walrus service
store_json_command = f"""{{ "config" : "{PATH_TO_WALRUS_CONFIG}",
"command" : {{ "store" :
{{ "file" : "{pdf_file_path}", "epochs" : 2 }}}}
Expand Down Expand Up @@ -85,6 +85,7 @@ def upload_pdf_to_walrus(pdf_file_path):
print(f"Error uploading {pdf_file_path}: {str(e)}")
return f"Error uploading {pdf_file_path}: {str(e)}\n"


def process_pdfs(start_index=0, end_index=10, output_file="upload_log.txt"):
# Get all the PDF files in the folder
pdf_files = [f for f in os.listdir(PATH_TO_PDFS) if f.endswith('.pdf')]
Expand All @@ -107,6 +108,7 @@ def process_pdfs(start_index=0, end_index=10, output_file="upload_log.txt"):

print(f"Upload logs exported to {output_file}")


if __name__ == "__main__":
# Specify the range of files to upload
start_index = 41 # Starting from the 12th file (indexing starts at 0)
Expand Down