Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing figures template. #171

Merged
merged 4 commits into from
Dec 22, 2023
Merged

Conversation

balhoff
Copy link
Member

@balhoff balhoff commented Aug 14, 2023

Currently seeing error:

jim (figures)$ poetry run ontogpt extract -t figure.FigureCaption -i caption.txt -o caption.yaml
Configuration file exists at /Users/jim/Library/Preferences/pypoetry, reusing this directory.

Consider moving TOML configuration files to /Users/jim/Library/Application Support/pypoetry, as support for the legacy directory will be removed in an upcoming release.
ERROR:root:HuggingFace Hub API key not found. See README.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/jim/Library/Caches/pypoetry/virtualenvs/ontogpt-T28sWqJT-py3.11/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Library/Caches/pypoetry/virtualenvs/ontogpt-T28sWqJT-py3.11/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/jim/Library/Caches/pypoetry/virtualenvs/ontogpt-T28sWqJT-py3.11/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Library/Caches/pypoetry/virtualenvs/ontogpt-T28sWqJT-py3.11/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Library/Caches/pypoetry/virtualenvs/ontogpt-T28sWqJT-py3.11/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Documents/Source/ontogpt/src/ontogpt/cli.py", line 298, in extract
    results = ke.extract_from_text(text, target_class_def)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Documents/Source/ontogpt/src/ontogpt/engines/spires_engine.py", line 91, in extract_from_text
    extracted_object = self.parse_completion_payload(raw_text, cls, object=object)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jim/Documents/Source/ontogpt/src/ontogpt/engines/spires_engine.py", line 529, in parse_completion_payload
    self._auto_add_ids(raw, cls)
  File "/Users/jim/Documents/Source/ontogpt/src/ontogpt/engines/spires_engine.py", line 542, in _auto_add_ids
    if slot.range == "uriorcurie" or self.range == "uri":
                                     ^^^^^^^^^^
AttributeError: 'SPIRESEngine' object has no attribute 'range'

@balhoff balhoff marked this pull request as draft August 14, 2023 22:39
@balhoff
Copy link
Member Author

balhoff commented Aug 14, 2023

Here is caption.txt:

Fig. 3. Morphological characters. A–D. Head in dorsal view. A. Gerbelius nr. confluens. B. Voconia decorata sp. nov. C. Voconia pallidipes Stål, 1866. D. Voconia schoutedeni (Villiers, 1964) comb. nov. E–G. Head in lateral view. E. Voconia wegneri (Miller, 1954) comb. nov. F. Voconia dolichocephala sp. nov. G. Gerbelius typicus Distant, 1903. H. Voconia loki sp. nov., head and pronotum in dorsal view. I–J. Prosternum in ventrolateral view. I. Voconia mexicana sp. nov. J. Voconia bracata sp. nov. K–L. Pronotum in dorsal view. K. Voconia conradti (Jeannel, 1917) comb. nov. L. Voconia tuberculata sp. nov.

@cmungall
Copy link
Member

  • I can replicate this
  • You are running into a bug that was fixed after you branched a3e5452

However, the bug is only triggered by certain odd schema configurations. In this case, I don't think you want to be subclassing NamedEntity. This has a special meaning for OntoGPT (sorry for the out-of-band secrets)

@cmungall
Copy link
Member

After removing the two is_as I get:

input_text: |
  Fig. 3. Morphological characters. A–D. Head in dorsal view. A. Gerbelius nr. confluens. B. Voconia decorata sp. nov. C. Voconia pallidipes Stål, 1866. D. Voconia schoutedeni (Villiers, 1964) comb. nov. E–G. Head in lateral view. E. Voconia wegneri (Miller, 1954) comb. nov. F. Voconia dolichocephala sp. nov. G. Gerbelius typicus Distant, 1903. H. Voconia loki sp. nov., head and pronotum in dorsal view. I–J. Prosternum in ventrolateral view. I. Voconia mexicana sp. nov. J. Voconia bracata sp. nov. K–L. Pronotum in dorsal view. K. Voconia conradti (Jeannel, 1917) comb. nov. L. Voconia tuberculata sp. nov.
raw_completion_output: |-
  title: Morphological characters
  subpanel: A–D. Head in dorsal view.
  subpanel: E–G. Head in lateral view.
  subpanel: H. Voconia loki sp. nov., head and pronotum in dorsal view.
  subpanel: I–J. Prosternum in ventrolateral view.
  subpanel: K–L. Pronotum in dorsal view.
prompt: |+
  Split the following piece of text into fields in the following format:

  id: <The identifier for this figure subpanel>
  text: <The text associated with this figure subpanel>
  info: <any information from the overall figure caption that applies to that subpanel (which may be duplicated across other subpanels).>


  Text:
  K–L. Pronotum in dorsal view.

  ===

extracted_object:
  title: Morphological characters
  subpanel:
    - id: K-L
      text: Pronotum in dorsal view.
      info: None

which is disappointing but at least works!

@cmungall
Copy link
Member

I get much better results with a hint:

      subpanel:
        description: a subpanel of the figure
        annotations:
          prompt: >-
            a semicolon separated list of descriptions of every panel in the text. Keep the panel id and text together.
            for example: "1A: A side view of the foo; 1B: A frontal view of the foo"
        multivalued: true
        range: SubPanel

results:

extracted_object:
  title: Morphological characters
  subpanel:
    - id: A
      text: Head in dorsal view of Gerbelius nr. confluens
      info: None
    - id: B
      text: Head in dorsal view of Voconia decorata sp. nov.
      info: None
    - text: C
      info: Head in dorsal view of Voconia pallidipes Stål, 1866
    - id: E
      text: Head in lateral view of Voconia wegneri (Miller, 1954) comb. nov.
      info: None
    - id: F
      text: Head in lateral view of Voconia dolichocephala sp. nov.
      info: None
    - id: G
      text: Head in lateral view of Gerbelius typicus Distant, 1903
      info: None
    - id: N/A
      text: 'H: Head and pronotum in dorsal view of Voconia loki sp. nov.'
      info: N/A
    - id: I
      text: Prosternum in ventrolateral view of Voconia mexicana sp. nov.
      info: None
    - id: J
      text: Prosternum in ventrolateral view of Voconia bracata sp. nov.
      info: None
    - id: K
      text: Pronotum in dorsal view of Voconia conradti (Jeannel, 1917) comb. nov.
      info: None
    - id: L
      text: Pronotum in dorsal view of Voconia tuberculata sp. nov.
      info: None

@caufieldjh
Copy link
Member

Merging this to retain the template, though it will likely need to be rebuilt (for pydantic classes) before use

@caufieldjh caufieldjh marked this pull request as ready for review December 22, 2023 19:43
@caufieldjh caufieldjh merged commit 4d1f1cc into monarch-initiative:main Dec 22, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants