-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: relaxation of grammar & fault-tolerant parser
The lexer has become difficult to manage. The definition and priority of tokens were conflicting. For example, I defined these two tokens: - A "name" token is a sequence of <a-z+><,><a-z+> (e.g. Doe,John). - A "value" token is a sequence of any character. Because the "name" token is defined before the "value" token, the parser would fail for a tag expecting a "value" token that contains the same sequence as the "name" token. e.g., AB - foo,bar ^ failure: expected "value" token but saw a "name" token instead In addition some vendors seem to produce RIS files that, to the best of my knowledge, aren't compatible with the RIS specification which admittedly is sometimes confusing. For example I have seen this kind of RIS file: TY - JOUR KW - foo ER - If your editor doesn't show you whitespaces, the error is that the ER tag contains and extra space: ER<SPACE><SPACE>-<SPACE><SPACE> Whereas the RIS spec says that it should be: ER<SPACE><SPACE>-<SPACE> In this particular case it seemed unnecessary to have the parser fail. It will definitely not be obvious to the user what the error is and it so easy for the parser to just ignore that extra space. I have therefore decided that the grammar and the lexer will simply facilitate the parsing of RIS files and act less as validating agents. BREAKING CHANGE: The parser will not fail unless the RIS file doesn't follow the basic specification: <TAG><SPACE><SPACE>-<SPACE><CONTENT> For example the parser used to fail for this: RP - FOOBAR Expected value for the "RP" tag is: - IN FILE - NOT IN FILE - ON REQUEST (mm/dd/yyyy) Now the parser will simply take the content as is and will make best-effort attempts to make sense of the data. In a nutshell do not use the parser as a validation tool anymore.
- Loading branch information
1 parent
c5eded6
commit 3b8a16f
Showing
10 changed files
with
158 additions
and
120 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
[ | ||
{ | ||
"A1": [ | ||
{ | ||
"first_name": "Jacek", | ||
"last_name": "Borysow", | ||
"suffix": "" | ||
}, | ||
{ | ||
"first_name": "Lothar", | ||
"last_name": "Frommhold", | ||
"suffix": "" | ||
}, | ||
{ | ||
"first_name": "George", | ||
"last_name": "Birnbaum", | ||
"suffix": "" | ||
} | ||
], | ||
"DO": "10.1086/166112", | ||
"JO": "The Astrophysical Journal", | ||
"KW": [ | ||
"Absorption Spectra", | ||
"Helium", | ||
"Hydrogen", | ||
"Planetary Atmospheres", | ||
"Planetary Radiation", | ||
"Cool Stars", | ||
"Far Infrared Radiation", | ||
"Molecular Collisions", | ||
"Molecular Rotation", | ||
"Atomic and Molecular Physics", | ||
"LABORATORY SPECTRA", | ||
"MOLECULAR PROCESSES", | ||
"PLANETS: SPECTRA" | ||
], | ||
"N2": "The zeroth, first, and second spectral moments of the rototranslational collision-induced absorption (RT CIA) spectra of hydrogen-helium mixtures are calculated from the fundamental theory, for temperatures from 40 to 3000 K. With the help of simple analytical functions of three parameters and the information given, the RT CIA spectra of H2-He pairs can be generated on computers of small capacity, with rms deviations from exact quantum profiles of not more than a few percent. Such representations of the CIA spectra are of interest for work related to the atmospheres of the outer planets and cool stars. The theoretical spectra are in close agreement with existing laboratory measurements at various temperatures from about 77 to 3000 K.", | ||
"RP": { | ||
"status": "NOT IN FILE" | ||
}, | ||
"SN": "0004-637X", | ||
"SP": "509", | ||
"T1": "Collision-induced Rototranslational Absorption Spectra of H 2-He Pairs at Temperatures from 40 to 3000 K", | ||
"TY": "JOUR", | ||
"UR": [ | ||
"https://ui.adsabs.harvard.edu/abs/1988ApJ...326..509B" | ||
], | ||
"VL": "326", | ||
"Y1": "1988/03/1" | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
TY - JOUR | ||
T1 - Collision-induced Rototranslational Absorption Spectra of H 2-He Pairs at Temperatures from 40 to 3000 K | ||
A1 - Borysow, Jacek | ||
A1 - Frommhold, Lothar | ||
A1 - Birnbaum, George | ||
JO - The Astrophysical Journal | ||
VL - 326 | ||
Y1 - 1988/03/1 | ||
SP - 509 | ||
KW - Absorption Spectra | ||
KW - Helium | ||
KW - Hydrogen | ||
KW - Planetary Atmospheres | ||
KW - Planetary Radiation | ||
KW - Cool Stars | ||
KW - Far Infrared Radiation | ||
KW - Molecular Collisions | ||
KW - Molecular Rotation | ||
KW - Atomic and Molecular Physics | ||
KW - LABORATORY SPECTRA | ||
KW - MOLECULAR PROCESSES | ||
KW - PLANETS: SPECTRA | ||
UR - https://ui.adsabs.harvard.edu/abs/1988ApJ...326..509B | ||
N2 - The zeroth, first, and second spectral moments of the rototranslational | ||
collision-induced absorption (RT CIA) spectra of hydrogen-helium | ||
mixtures are calculated from the fundamental theory, for temperatures | ||
from 40 to 3000 K. With the help of simple analytical functions of three | ||
parameters and the information given, the RT CIA spectra of H2-He pairs | ||
can be generated on computers of small capacity, with rms deviations | ||
from exact quantum profiles of not more than a few percent. Such | ||
representations of the CIA spectra are of interest for work related to | ||
the atmospheres of the outer planets and cool stars. The theoretical | ||
spectra are in close agreement with existing laboratory measurements at | ||
various temperatures from about 77 to 3000 K. | ||
DO - 10.1086/166112 | ||
SN - 0004-637X | ||
ER - | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters