Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ValueError: unexpected '{' in field name #1657

Open
3 tasks done
NathanAP opened this issue Jan 23, 2025 · 0 comments
Open
3 tasks done

[Bug]: ValueError: unexpected '{' in field name #1657

NathanAP opened this issue Jan 23, 2025 · 0 comments
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@NathanAP
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I'm trying to use GRAPHRA_API.build_index to create my indexes, but I'm getting the following error:

[2025-01-23 19:21:28,418: INFO/ForkPoolWorker-1] found text files from input, found [('Ata_Reuniao_Condominio_2.pdf__0.txt', {})]
[2025-01-23 19:21:28,427: INFO/ForkPoolWorker-1] Found 1 files, loading 1
[2025-01-23 19:21:28,457: INFO/ForkPoolWorker-1] Final # of rows loaded: 1
[2025-01-23 19:21:28,551: INFO/ForkPoolWorker-1] reading table from storage: input.parquet
[2025-01-23 19:21:29,097: INFO/ForkPoolWorker-1] reading table from storage: input.parquet
[2025-01-23 19:21:29,104: INFO/ForkPoolWorker-1] reading table from storage: create_base_text_units.parquet
[2025-01-23 19:21:29,134: INFO/ForkPoolWorker-1] reading table from storage: create_base_text_units.parquet
[2025-01-23 19:21:29,405: ERROR/ForkPoolWorker-1] error extracting graph
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/graph_extractor.py", line 127, in __call__
    result = await self._process_document(text, prompt_variables)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/graph_extractor.py", line 156, in _process_document
    self._extraction_prompt.format(**{
ValueError: unexpected '{' in field name
[2025-01-23 19:21:29,407: INFO/ForkPoolWorker-1] Entity Extraction Error details={'doc_index': 0, 'text': 'Ata de Reunião do Condomínio Residencial Monte Aurora\n\nData: 04 de dezembro de 2024\n\nHorário: 20:00\n\nLocal: Auditório Privativo do Condomínio\n\n1. Abertura da Reunião\n\nA  reunião  foi  iniciada  pelo  síndico,  Sr.  Eduardo  Fontana,  pontualmente  às\n\n20:00.  Estiveram  presentes  45  moradores,  representando  75%  das  unidades,\n\nalém de representantes da administradora de condomínios LuxGest.\n\n2. Pautas Discutidas\n\n2.1 Instalação de Painéis Solares\n\nFoi apresentada a proposta de instalação de painéis solares nas áreas comuns,\n\ncom  o  objetivo  de  reduzir  os  custos  de  energia  elétrica  e  promover\n\nsustentabilidade.\n\nO  projeto  foi  aprovado  por  unanimidade,  com  um  orçamento  estimado  de  R$\n\n150.000,00, a ser financiado em 12 parcelas.\n\n2.2 Ampliação da Academia\n\nOs condôminos discutiram a necessidade de modernizar e ampliar a academia.\n\nUm  arquiteto  será  contratado  para  projetar  a  reforma,  e  os  custos  serão\n\napresentados na próxima reunião.\n\n2.3 Segurança e Monitoramento\n\nDiante  de  recentes  relatos  de  tentativas  de  invasão  na  região,  foi  aprovada  a\n\n\x0cimplementação  de  novas  câmeras  de  segurança  e  a  contratação  de  uma\n\nempresa de monitoramento 24 horas.\n\nO  investimento  inicial  será  de  R$  25.000,00,  com  uma  taxa  mensal  de  R$\n\n3.500,00.\n\n2.4 Regras para Uso do Spa e Sauna\n\nMoradores sugeriram melhorias nas regras de agendamento e higiene do spa e\n\nsauna.  Decidiu-se  criar  um  sistema  online  de  reservas  e  exigir  o  uso\n\nobrigatório de toalhas para proteção dos assentos.\n\n3. Encerramento\n\nO síndico agradeceu a presença e participação de todos e encerrou a reunião\n\nàs 22:15.\n\nA próxima reunião foi marcada para o dia 10 de março de 2025.\n\nAssinam a presente ata'}
[2025-01-23 19:21:29,411: ERROR/ForkPoolWorker-1] error extracting graph
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/graph_extractor.py", line 127, in __call__
    result = await self._process_document(text, prompt_variables)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/graph_extractor.py", line 156, in _process_document
    self._extraction_prompt.format(**{
ValueError: unexpected '{' in field name
[2025-01-23 19:21:29,412: INFO/ForkPoolWorker-1] Entity Extraction Error details={'doc_index': 0, 'text': 'um  sistema  online  de  reservas  e  exigir  o  uso\n\nobrigatório de toalhas para proteção dos assentos.\n\n3. Encerramento\n\nO síndico agradeceu a presença e participação de todos e encerrou a reunião\n\nàs 22:15.\n\nA próxima reunião foi marcada para o dia 10 de março de 2025.\n\nAssinam a presente ata:\n\n__________________________        __________________________\n\nSíndico  Eduardo  Fontana                      Moradora  Mariana  Torres,  Secretária  da\n\nReunião'}
[2025-01-23 19:21:29,416: ERROR/ForkPoolWorker-1] error running workflow extract_graph
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/run/run_workflows.py", line 166, in _run_workflows
    result = await run_workflow(
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/workflows/extract_graph.py", line 45, in run_workflow
    base_entity_nodes, base_relationship_edges = await extract_graph(
                                                 ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/flows/extract_graph.py", line 33, in extract_graph
    entities, relationships = await extract_entities(
                              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/extract_entities.py", line 136, in extract_entities
    entities = _merge_entities(entity_dfs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/graphrag/index/operations/extract_entities/extract_entities.py", line 168, in _merge_entities
    all_entities.groupby(["title", "type"], sort=False)
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 9183, in groupby
    return DataFrameGroupBy(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 1329, in __init__
    grouper, exclusions, obj = get_grouper(
                               ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
    raise KeyError(gpr)
KeyError: 'title'
[2025-01-23 19:21:29,425: INFO/ForkPoolWorker-1] Error running pipeline! details=None
[2025-01-23 19:21:29,506: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0 Job: 0.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.einfo.ExceptionWithTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0 Job: 0.

Steps to reproduce

I'm not sure if this is important tbh, but I'm using the following:

graphrag_config = create_graphrag_config(
  values=settings,
  root_dir=source_path,
)
index_result: List[PipelineRunResult] = await GraphRAG_API.build_index(config=graphrag_config)

Please let me know if I'm doing anything wrong.

Expected Behavior

My documents should be indexed.

GraphRAG Config Used

async_mode: threaded
basic_search:
  prompt: prompts/basic_search_system_prompt.txt
cache:
  base_dir: cache
  type: file
chunks:
  group_by_columns:
  - id
  overlap: 0
  size: 600
claim_extraction:
  description: Any claims or facts that could be relevant to information discovery.
  enabled: false
  max_gleanings: 1
  prompt: prompts/claim_extraction.txt
cluster_graph:
  max_cluster_size: 10
community_reports:
  max_input_length: 8000
  max_length: 2000
  prompt: prompts/community_report.txt
drift_search:
  prompt: prompts/drift_search_system_prompt.txt
  reduce_prompt: prompts/drift_search_reduce_prompt.txt
embed_graph:
  enabled: false
embeddings:
  async_mode: threaded
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    model: text-embedding-3-small
    type: openai_embedding
  vector_store:
    collection_name: default
    db_uri: output/lancedb
    overwrite: true
    type: lancedb
encoding_model: cl100k_base
entity_extraction:
  entity_types:
  - organization
  - person
  - geo
  - event
  max_gleanings: 1
  prompt: prompts/entity_extraction.txt
global_search:
  knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
  map_prompt: prompts/global_search_map_system_prompt.txt
  reduce_prompt: prompts/global_search_reduce_system_prompt.txt
input:
  base_dir: input
  file_encoding: utf-8
  file_pattern: .*\.txt$
  file_type: text
  type: file
llm:
  api_key: ${GRAPHRAG_API_KEY}
  model: gpt-4o-mini
  model_supports_json: true
  type: openai_chat
local_search:
  prompt: prompts/local_search_system_prompt.txt
parallelization:
  stagger: 0.3
reporting:
  base_dir: logs
  type: file
skip_workflows: []
snapshots:
  embeddings: false
  graphml: false
  transient: false
storage:
  base_dir: output
  type: file
summarize_descriptions:
  max_length: 500
  prompt: prompts/summarize_descriptions.txt
umap:
  enabled: false
update_index_storage: null

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 1.2.0
  • Operating System: Linux
  • Python Version: 3.12
  • Related Issues: I didn't find any
@NathanAP NathanAP added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant