-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1637 from UUDigitalHumanitieslab/feature/entity-d…
…ocumentation Feature/entity documentation
- Loading branch information
Showing
25 changed files
with
256 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Named Entities | ||
I-Analyzer has the capacity to display named entities. | ||
|
||
## Prerequisites | ||
In order to display a corpus enriched with named entities, install the Annotated Text plugin of Elasticsearch, following the instructions [here](https://www.elastic.co/guide/en/elasticsearch/plugins/8.6/mapper-annotated-text.html). | ||
|
||
### Named entity fields | ||
To determine whether named entities are available for a given corpus, the application checks if a given corpus contains fields ending with `:ner`. | ||
|
||
If the main content field is called `speech`, the field containing named entity annotations should be called `speech:ner`. This field should have the following Elasticsearch mapping: | ||
```python | ||
{ | ||
'type': 'annotated_text' | ||
} | ||
``` | ||
|
||
Moreover, an enriched corpus should contain the following keyword fields: | ||
- `ner:person` | ||
- `ner:location` | ||
- `ner:organization` | ||
- `ner:miscellaneous` | ||
These can be used to search or filter (to be implemented). | ||
|
||
## Enriching a corpus with named entities | ||
To enrich a corpus with named entities, we recommend using the [TextMiNER](https://github.com/CentreForDigitalHumanities/TextMiNER) library. This library will read from an existing index and a specified field name. The content of the field is analyzed with the BERT-based models for named entity recognition provided by [flair](https://github.com/flairNLP/flair). The library then adds named entities to the `annotated_text` field and the keyword fields, as outlined above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 6 additions & 0 deletions
6
frontend/src/app/document/entity-toggle/entity-toggle.component.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
<button class="button" type="button" (click)="showNamedEntityDocumentation()"> | ||
<span class="icon"> | ||
<fa-icon [icon]="actionIcons.helpAlt" aria-label="help"></fa-icon> | ||
</span> | ||
</button> | ||
<em [id]="toggleLabel">Show named entities<ia-toggle (toggled)="toggleNER.emit($event)" [toggleLabel]="toggleLabel"></ia-toggle></em> |
8 changes: 8 additions & 0 deletions
8
frontend/src/app/document/entity-toggle/entity-toggle.component.scss
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
button { | ||
float: left; | ||
margin-right: 1em; | ||
} | ||
em { | ||
position: absolute; | ||
margin-top: .3em; | ||
} |
21 changes: 21 additions & 0 deletions
21
frontend/src/app/document/entity-toggle/entity-toggle.component.spec.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import { ComponentFixture, TestBed } from '@angular/core/testing'; | ||
|
||
import { commonTestBed } from '../../common-test-bed'; | ||
import { EntityToggleComponent } from './entity-toggle.component'; | ||
|
||
describe('EntityToggleComponent', () => { | ||
let component: EntityToggleComponent; | ||
let fixture: ComponentFixture<EntityToggleComponent>; | ||
|
||
beforeEach(async () => { | ||
await commonTestBed().testingModule.compileComponents(); | ||
|
||
fixture = TestBed.createComponent(EntityToggleComponent); | ||
component = fixture.componentInstance; | ||
fixture.detectChanges(); | ||
}); | ||
|
||
it('should create', () => { | ||
expect(component).toBeTruthy(); | ||
}); | ||
}); |
24 changes: 24 additions & 0 deletions
24
frontend/src/app/document/entity-toggle/entity-toggle.component.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import { Component, output } from '@angular/core'; | ||
|
||
import { actionIcons } from '../../shared/icons'; | ||
import { DialogService } from '../../services'; | ||
|
||
@Component({ | ||
selector: 'ia-entity-toggle', | ||
imports: [], | ||
templateUrl: './entity-toggle.component.html', | ||
styleUrl: './entity-toggle.component.scss' | ||
}) | ||
export class EntityToggleComponent { | ||
actionIcons = actionIcons; | ||
toggleNER = output<Boolean>(); | ||
toggleLabel: string; | ||
|
||
constructor(private dialogService: DialogService) { | ||
this.toggleLabel = 'ner-toggle'; | ||
} | ||
|
||
public showNamedEntityDocumentation() { | ||
this.dialogService.showManualPage('namedentities'); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import { FieldEntities } from '../../models'; | ||
import { EntityPipe } from './entity.pipe'; | ||
|
||
describe('EntityPipe', () => { | ||
const mockInput: Array<FieldEntities> = [ | ||
{text: 'Nobody expects the ', entity: 'flat'}, | ||
{text: 'Spanish Inquisition', entity: 'organization'}, | ||
{text: '!', entity: 'flat'} | ||
]; | ||
|
||
it('creates an instance', () => { | ||
const pipe = new EntityPipe(); | ||
expect(pipe).toBeTruthy(); | ||
}); | ||
|
||
it('adds mark tags to named entity annotations', ()=> { | ||
const pipe = new EntityPipe(); | ||
const output = pipe.transform(mockInput.slice(1,2)); | ||
expect(output).toContain('<mark '); | ||
expect(output).toContain('</mark>'); | ||
expect(output).toContain('<svg '); | ||
expect(output).toContain('</svg>'); | ||
}); | ||
|
||
it('does not change Field Entities of `flat` type', () => { | ||
const pipe = new EntityPipe(); | ||
const output = pipe.transform(mockInput.slice(0,1)); | ||
expect(output).toEqual(mockInput[0].text); | ||
}) | ||
|
||
it('concatenates highlighted and non-annotated text', () => { | ||
const pipe = new EntityPipe(); | ||
const output = pipe.transform(mockInput); | ||
expect(typeof output).toBe('string'); | ||
}) | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
import { Pipe, PipeTransform } from '@angular/core'; | ||
import { icon } from '@fortawesome/fontawesome-svg-core'; | ||
|
||
import { entityIcons } from '../icons'; | ||
import { FieldEntities } from '../../models'; | ||
|
||
@Pipe({ | ||
name: 'entity' | ||
}) | ||
export class EntityPipe implements PipeTransform { | ||
/** | ||
* a pipe to transform a list of FieldEntities into flat text and entities | ||
* wrapped in <mark> tags, with icons indicating the type of named entity. | ||
* Note that this pipe needs to be followed by the | paragraph or | safeHtml pipe; | ||
* otherwise, the icons will be removed due to sanitization | ||
* @param entityArray: list of FieldEntities | ||
* @returns string of mixed text and html. | ||
*/ | ||
|
||
transform(entityArray: Array<FieldEntities>): string { | ||
const output = entityArray.map(ent => { | ||
if (ent.entity === 'flat') { | ||
return ent.text | ||
} | ||
else { | ||
const iconName = entityIcons[ent.entity]; | ||
return `<mark class="entity-${ent.entity}" title="Named Entity ${ent.entity}">${ent.text} ${icon(iconName as any).html}</mark>` | ||
} | ||
}) | ||
return output.join(''); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,47 @@ | ||
import { TestBed } from '@angular/core/testing'; | ||
import { DomSanitizer } from '@angular/platform-browser'; | ||
|
||
|
||
import { ParagraphPipe } from './paragraph.pipe'; | ||
|
||
describe('ParagraphPipe', () => { | ||
it('create an instance', () => { | ||
const pipe = new ParagraphPipe(); | ||
expect(pipe).toBeTruthy(); | ||
}); | ||
let pipe: ParagraphPipe; | ||
|
||
beforeEach(() => { | ||
TestBed.configureTestingModule({ | ||
providers: [ | ||
ParagraphPipe, | ||
{ provide: DomSanitizer, useValue: { | ||
bypassSecurityTrustHtml: (input) => input | ||
} | ||
} | ||
] | ||
}); | ||
pipe = TestBed.inject(ParagraphPipe); | ||
}) | ||
|
||
it('creates an instance', () => { | ||
expect(pipe).toBeTruthy(); | ||
}); | ||
|
||
it('does not alter text without linebreaks', () => { | ||
const input = 'Some text. And some more text. And even more.'; | ||
const output = pipe.transform(input); | ||
expect(output).toEqual(input); | ||
}); | ||
|
||
it('wraps text with linebreaks in paragraph tags', () => { | ||
const input = 'Some text.\nAnd some more text.\nAnd even more.'; | ||
const output = pipe.transform(input); | ||
const expected = '<p>Some text.</p><p>And some more text.</p><p>And even more.</p>' | ||
expect(output).toEqual(expected); | ||
}); | ||
|
||
it('ignores multiple linebreaks', () => { | ||
const input = '\nSome text.\n\n\nAnd some more text.\n\n'; | ||
const output = pipe.transform(input); | ||
const expected = '<p>Some text.</p><p>And some more text.</p>' | ||
expect(output).toEqual(expected); | ||
}); | ||
|
||
}); |
Oops, something went wrong.