Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.] appearing in (scrubbed?) content fields #3

Open
jeremydouglass opened this issue May 26, 2019 · 0 comments
Open

[.] appearing in (scrubbed?) content fields #3

jeremydouglass opened this issue May 26, 2019 · 0 comments

Comments

@jeremydouglass
Copy link
Contributor

An example is in the content_scrubbed of a Reddit comment:

Maybe you mispoke, and that[.] fine, but don’t please don’t get pissy at me for taking what you wrote at face value. ... The first modern American research university was Johns Hopkins, which is based closely on the Humboldtian model. Humboldt[.] idea for the university put a big focus on research, required all students to have a strong foundation in the humanities (philosophy, history, literature, the arts).

All of the Reddit data is in content_scrubbed -- we need to find examples in the LexisNexis data to determine whether this was introduced by out scrubbing, or by some python unicode issue, or what (and however so, where and how it should be fixed).

We want to watch out for applying a replace fix blindly, as that might be breaking the editorial [.] in legitimate examples:

This recalls Lincoln's thoughts on "Four-score and seven[.]"

Related discussion:
https://we1s.ryver.com/index.html#posts/2109151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant