You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
I want to run graphrag with a locally deployed model, but it seems to always verify the validity of the api_key regardless of whether I provide the api_base or not.
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
### This config file contains required core defaults that must be set, along with a handful of common optional settings.### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/### LLM settings ##### There are a number of settings to tune the threading and token limits for LLM calls - check the docs.encoding_model: cl100k_base # this needs to be matched to your model!llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env filetype: openai_chat # or azure_openai_chatmodel: llama-3-70b-instruct-awqmodel_supports_json: true # recommended if this is available for your model.# audience: "https://cognitiveservices.azure.com/.default"api_base: http://localhost:8005/v1# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name>parallelization:
stagger: 0.3# num_threads: 50async_mode: threaded # or asyncioembeddings:
async_mode: threaded # or asynciovector_store:
type: lancedbdb_uri: 'output/lancedb'container_name: defaultoverwrite: truellm:
api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embeddingmodel: nv-embed-v2api_base: http://localhost:9997/v1# api_version: 2024-02-15-preview# audience: "https://cognitiveservices.azure.com/.default"# organization: <organization_id># deployment_name: <azure_model_deployment_name>### Input settings ###input:
type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"chunks:
size: 1200overlap: 100group_by_columns: [id]### Storage settings ##### If blob storage is specified in the following four sections,## connection_string and container_name must be providedcache:
type: file # or blobbase_dir: "cache"reporting:
type: file # or console, blobbase_dir: "logs"storage:
type: file # or blobbase_dir: "output"## only turn this on if running `graphrag index` with custom settings## we normally use `graphrag update` with the defaultsupdate_index_storage:
# type: file # or blob# base_dir: "update_output"### Workflow settings ###skip_workflows: []entity_extraction:
prompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 1summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:
enabled: falseprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 1community_reports:
prompt: "prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:
max_cluster_size: 10embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodesumap:
enabled: false # if true, will generate UMAP embeddings for nodessnapshots:
graphml: falseembeddings: falsetransient: false### Query settings ##### The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#querylocal_search:
prompt: "prompts/local_search_system_prompt.txt"global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
Logs and screenshots
15:37:40,41 graphrag.cli.index INFO Logging enabled at /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/logs/indexing-engine.log
15:37:40,42 graphrag.cli.index INFO Starting pipeline run for: 20241208-153740, dry_run=False
15:37:40,43 graphrag.cli.index INFO Using default configuration: {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"root_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024",
"reporting": {
"type": "file",
"base_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/logs",
"storage_account_blob_url": null
},
"storage": {
"type": "file",
"base_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/output",
"storage_account_blob_url": null
},
"update_index_storage": null,
"cache": {
"type": "file",
"base_dir": "cache",
"storage_account_blob_url": null
},
"input": {
"type": "file",
"file_type": "text",
"base_dir": "input",
"storage_account_blob_url": null,
"encoding": "utf-8",
"file_pattern": ".\.txt$",
"file_filter": null,
"source_column": null,
"timestamp_column": null,
"timestamp_format": null,
"text_column": "text",
"title_column": null,
"document_attribute_columns": []
},
"embed_graph": {
"enabled": false,
"num_walks": 10,
"walk_length": 40,
"window_size": 2,
"iterations": 3,
"random_seed": 597832,
"strategy": null
},
"embeddings": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_embedding",
"encoding_model": "cl100k_base",
"model": "nv-embed-v2",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0,
"top_p": 1,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:9997/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": null,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"batch_size": 16,
"batch_max_tokens": 8191,
"target": "required",
"skip": [],
"vector_store": {
"type": "lancedb",
"db_uri": "output/lancedb",
"container_name": "==== REDACTED ====",
"overwrite": true
},
"strategy": null
},
"chunks": {
"size": 1200,
"overlap": 100,
"group_by_columns": [
"id"
],
"strategy": null,
"encoding_model": null
},
"snapshots": {
"embeddings": false,
"graphml": false,
"transient": false
},
"entity_extraction": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/entity_extraction.txt",
"entity_types": [
"organization",
"person",
"geo",
"event"
],
"max_gleanings": 1,
"strategy": null,
"encoding_model": null
},
"summarize_descriptions": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/summarize_descriptions.txt",
"max_length": 500,
"strategy": null
},
"community_reports": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/community_report.txt",
"max_length": 2000,
"max_input_length": 8000,
"strategy": null
},
"claim_extraction": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"enabled": false,
"prompt": "prompts/claim_extraction.txt",
"description": "Any claims or facts that could be relevant to information discovery.",
"max_gleanings": 1,
"strategy": null,
"encoding_model": null
},
"cluster_graph": {
"max_cluster_size": 10,
"strategy": null
},
"umap": {
"enabled": false
},
"local_search": {
"prompt": "prompts/local_search_system_prompt.txt",
"text_unit_prop": 0.5,
"community_prop": 0.1,
"conversation_history_max_turns": 5,
"top_k_entities": 10,
"top_k_relationships": 10,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"llm_max_tokens": 2000
},
"global_search": {
"map_prompt": "prompts/global_search_map_system_prompt.txt",
"reduce_prompt": "prompts/global_search_reduce_system_prompt.txt",
"knowledge_prompt": "prompts/global_search_knowledge_system_prompt.txt",
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"data_max_tokens": 12000,
"map_max_tokens": 1000,
"reduce_max_tokens": 2000,
"concurrency": 32,
"dynamic_search_llm": "gpt-4o-mini",
"dynamic_search_threshold": 1,
"dynamic_search_keep_parent": false,
"dynamic_search_num_repeats": 1,
"dynamic_search_use_summary": false,
"dynamic_search_concurrent_coroutines": 16,
"dynamic_search_max_level": 2
},
"drift_search": {
"prompt": "prompts/drift_search_system_prompt.txt",
"temperature": 0.0,
"top_p": 1.0,
"n": 3,
"max_tokens": 12000,
"data_max_tokens": 12000,
"concurrency": 32,
"drift_k_followups": 20,
"primer_folds": 5,
"primer_llm_max_tokens": 12000,
"n_depth": 3,
"local_search_text_unit_prop": 0.9,
"local_search_community_prop": 0.1,
"local_search_top_k_mapped_entities": 10,
"local_search_top_k_relationships": 10,
"local_search_max_data_tokens": 12000,
"local_search_temperature": 0.0,
"local_search_top_p": 1.0,
"local_search_n": 1,
"local_search_llm_max_gen_tokens": 2000
},
"encoding_model": "cl100k_base",
"skip_workflows": []
}
15:37:40,44 graphrag.index.create_pipeline_config INFO skipping workflows
15:37:40,44 graphrag.index.run.run INFO Running pipeline
15:37:40,44 graphrag.storage.file_pipeline_storage INFO Creating file storage at /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/output
15:37:40,44 graphrag.index.input.factory INFO loading input from root_dir=input
15:37:40,44 graphrag.index.input.factory INFO using file storage for input
15:37:40,45 graphrag.storage.file_pipeline_storage INFO search /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/input for files matching ..txt$
15:37:40,45 graphrag.index.input.text INFO found text files from input, found [('contexts.txt', {})]
15:37:40,47 graphrag.index.input.text INFO Found 1 files, loading 1
15:37:40,47 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_final_documents', 'create_base_entity_graph', 'create_final_entities', 'create_final_relationships', 'create_final_nodes', 'create_final_communities', 'create_final_text_units', 'create_final_community_reports', 'generate_text_embeddings']
15:37:40,48 graphrag.index.run.run INFO Final # of rows loaded: 1
15:37:40,105 graphrag.index.run.workflow INFO dependencies for create_base_text_units: []
15:37:40,109 datashaper.workflow.workflow INFO executing verb create_base_text_units
15:37:40,748 graphrag.index.run.workflow INFO dependencies for create_final_documents: ['create_base_text_units']
15:37:40,749 graphrag.index.run.workflow WARNING Dependency table create_base_text_units not found in storage: it may be a runtime-only in-memory table. If you see further errors, this may be an actual problem.
15:37:40,753 datashaper.workflow.workflow INFO executing verb create_final_documents
15:37:40,760 graphrag.index.exporter INFO exporting parquet table create_final_documents.parquet
15:37:40,831 graphrag.index.run.workflow INFO dependencies for create_base_entity_graph: ['create_base_text_units']
15:37:40,832 graphrag.index.run.workflow WARNING Dependency table create_base_text_units not found in storage: it may be a runtime-only in-memory table. If you see further errors, this may be an actual problem.
15:37:40,837 datashaper.workflow.workflow INFO executing verb create_base_entity_graph
15:37:42,57 httpx INFO HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 401 Unauthorized"
15:37:42,61 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': '\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as ("entity"<|><entity_name><|><entity_type><|><entity_description>)\n \n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as ("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_strength>)\n \n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n \n4. When finished, output <|COMPLETE|>\n \n######################\n-Examples-\n######################\nExample 1:\nEntity_types: ORGANIZATION,PERSON\nText:\nThe Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.\n######################\nOutput:\n("entity"<|>CENTRAL INSTITUTION<|>ORGANIZATION<|>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)\n##\n("entity"<|>MARTIN SMITH<|>PERSON<|>Martin Smith is the chair of the Central Institution)\n##\n("entity"<|>MARKET STRATEGY COMMITTEE<|>ORGANIZATION<|>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)\n##\n("relationship"<|>MARTIN SMITH<|>CENTRAL INSTITUTION<|>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<|>9)\n<|COMPLETE|>\n\n######################\nExample 2:\nEntity_types: ORGANIZATION\nText:\nTechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nTechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.\n######################\nOutput:\n("entity"<|>TECHGLOBAL<|>ORGANIZATION<|>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)\n##\n("entity"<|>VISION HOLDINGS<|>ORGANIZATION<|>Vision Holdings is a firm that previously owned TechGlobal)\n##\n("relationship"<|>TECHGLOBAL<|>VISION HOLDINGS<|>Vision Holdings formerly owned TechGlobal from 2014 until present<|>5)\n<|COMPLETE|>\n\n######################\nExample 3:\nEntity_types: ORGANIZATION,GEO,PERSON\nText:\nFive Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.\n\nThe swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.\n\nThe exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.\n\nThey were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.\n\nThe Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.\n######################\nOutput:\n("entity"<|>FIRUZABAD<|>GEO<|>Firuzabad held Aurelians as hostages)\n##\n("entity"<|>AURELIA<|>GEO<|>Country seeking to release hostages)\n##\n("entity"<|>QUINTARA<|>GEO<|>Country that negotiated a swap of money in exchange for hostages)\n##\n##\n("entity"<|>TIRUZIA<|>GEO<|>Capital of Firuzabad where the Aurelians were being held)\n##\n("entity"<|>KROHAARA<|>GEO<|>Capital city in Quintara)\n##\n("entity"<|>CASHION<|>GEO<|>Capital city in Aurelia)\n##\n("entity"<|>SAMUEL NAMARA<|>PERSON<|>Aurelian who spent time in Tiruzia's Alhamia Prison)\n##\n("entity"<|>ALHAMIA PRISON<|>GEO<|>Prison in Tiruzia)\n##\n("entity"<|>DURKE BATAGLANI<|>PERSON<|>Aurelian journalist who was held hostage)\n##\n("entity"<|>MEGGIE TAZBAH<|>PERSON<|>Bratinas national and environmentalist who was held hostage)\n##\n("relationship"<|>FIRUZABAD<|>AURELIA<|>Firuzabad negotiated a hostage exchange with Aurelia<|>2)\n##\n("relationship"<|>QUINTARA<|>AURELIA<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n("relationship"<|>QUINTARA<|>FIRUZABAD<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>ALHAMIA PRISON<|>Samuel Namara was a prisoner at Alhamia prison<|>8)\n##\n("relationship"<|>SAMUEL NAMARA<|>MEGGIE TAZBAH<|>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>DURKE BATAGLANI<|>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>MEGGIE TAZBAH<|>DURKE BATAGLANI<|>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>FIRUZABAD<|>Samuel Namara was a hostage in Firuzabad<|>2)\n##\n("relationship"<|>MEGGIE TAZBAH<|>FIRUZABAD<|>Meggie Tazbah was a hostage in Firuzabad<|>2)\n##\n("relationship"<|>DURKE BATAGLANI<|>FIRUZABAD<|>Durke Bataglani was a hostage in Firuzabad<|>2)\n<|COMPLETE|>\n\n######################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: Information 1:\nPublication date: 2020-10-01\nTitle: Is the coronavirus airborne? | FAQ - Covid-19 - NJ.gov\nContent:\nThis is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes. Furthermore, the Centers for Disease Control and Prevention recommend airborne precautions for the care of COVID-19 suspected or confirmed patients.\nMeasles and tuberculosis are examples of respiratory diseases that remain infectious in the air for long time periods. The measles virus can live for up to two hours in the air where an infected person coughs or sneezes. Tuberculosis can live in the air for up to six hours. Under experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours.\nUnder experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours. The researchers estimate that in most real-world situations, the virus would remain suspended in the air for about 30 minutes, before settling onto surfaces. This is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes.\nSee an error?Let us know! ... Yes, COVID-19 can spread via airborne transmission.\n######################\nOutput:', 'kwargs': {}}
15:37:42,61 graphrag.index.graph.extractors.graph.graph_extractor ERROR error extracting graph
Traceback (most recent call last):
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/graph/extractors/graph/graph_extractor.py", line 127, in call
result = await self._process_document(text, prompt_variables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/graph/extractors/graph/graph_extractor.py", line 155, in _process_document
response = await self._llm(
^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat.py", line 83, in call
return await self._text_chat_llm(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/features/tools_parsing.py", line 120, in call
return await self._delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 112, in call
return await self._invoke(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 128, in _invoke
return await self._decorated_target(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/json.py", line 71, in invoke
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 109, in invoke
result = await execute_with_retry()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 93, in execute_with_retry
async for a in AsyncRetrying(
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 166, in anext
do = await self.iter(retry_state=self._retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/init.py", line 398, in
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 101, in execute_with_retry
return await attempt()
^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 78, in attempt
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/rate_limiter.py", line 70, in invoke
result = await delegate(prompt, **args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/json.py", line 71, in invoke
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 152, in _decorator_target
output = await self._execute_llm(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat_text.py", line 155, in _execute_llm
completion = await self._call_completion_or_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat_text.py", line 127, in _call_completion_or_cache
return await self._cache.get_or_insert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/cache_interactor.py", line 50, in get_or_insert
entry = await func()
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1661, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1843, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1537, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1638, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: none. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
15:37:42,63 graphrag.callbacks.file_workflow_callbacks INFO Entity Extraction Error details={'doc_index': 0, 'text': 'Information 1:\nPublication date: 2020-10-01\nTitle: Is the coronavirus airborne? | FAQ - Covid-19 - NJ.gov\nContent:\nThis is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes. Furthermore, the Centers for Disease Control and Prevention recommend airborne precautions for the care of COVID-19 suspected or confirmed patients.\nMeasles and tuberculosis are examples of respiratory diseases that remain infectious in the air for long time periods. The measles virus can live for up to two hours in the air where an infected person coughs or sneezes. Tuberculosis can live in the air for up to six hours. Under experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours.\nUnder experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours. The researchers estimate that in most real-world situations, the virus would remain suspended in the air for about 30 minutes, before settling onto surfaces. This is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes.\nSee an error?Let us know! ... Yes, COVID-19 can spread via airborne transmission.'}
15:37:42,66 datashaper.workflow.workflow ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name'
Traceback (most recent call last):
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'
15:37:42,68 graphrag.callbacks.file_workflow_callbacks INFO Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' details=None
15:37:42,68 graphrag.index.run.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/run/run.py", line 260, in run_pipeline
result = await _process_workflow(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/run/workflow.py", line 103, in _process_workflow
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'
Additional Information
GraphRAG Version: 0.9.0
Operating System: Ubuntu 22.04
Python Version: 3.11.10
Related Issues:
The text was updated successfully, but these errors were encountered:
hanlv15
added
bug
Something isn't working
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Dec 8, 2024
Do you need to file an issue?
Describe the bug
I want to run graphrag with a locally deployed model, but it seems to always verify the validity of the api_key regardless of whether I provide the api_base or not.
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
Logs and screenshots
15:37:40,41 graphrag.cli.index INFO Logging enabled at /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/logs/indexing-engine.log
15:37:40,42 graphrag.cli.index INFO Starting pipeline run for: 20241208-153740, dry_run=False
15:37:40,43 graphrag.cli.index INFO Using default configuration: {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"root_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024",
"reporting": {
"type": "file",
"base_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/logs",
"storage_account_blob_url": null
},
"storage": {
"type": "file",
"base_dir": "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/output",
"storage_account_blob_url": null
},
"update_index_storage": null,
"cache": {
"type": "file",
"base_dir": "cache",
"storage_account_blob_url": null
},
"input": {
"type": "file",
"file_type": "text",
"base_dir": "input",
"storage_account_blob_url": null,
"encoding": "utf-8",
"file_pattern": ".\.txt$",
"file_filter": null,
"source_column": null,
"timestamp_column": null,
"timestamp_format": null,
"text_column": "text",
"title_column": null,
"document_attribute_columns": []
},
"embed_graph": {
"enabled": false,
"num_walks": 10,
"walk_length": 40,
"window_size": 2,
"iterations": 3,
"random_seed": 597832,
"strategy": null
},
"embeddings": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_embedding",
"encoding_model": "cl100k_base",
"model": "nv-embed-v2",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0,
"top_p": 1,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:9997/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": null,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"batch_size": 16,
"batch_max_tokens": 8191,
"target": "required",
"skip": [],
"vector_store": {
"type": "lancedb",
"db_uri": "output/lancedb",
"container_name": "==== REDACTED ====",
"overwrite": true
},
"strategy": null
},
"chunks": {
"size": 1200,
"overlap": 100,
"group_by_columns": [
"id"
],
"strategy": null,
"encoding_model": null
},
"snapshots": {
"embeddings": false,
"graphml": false,
"transient": false
},
"entity_extraction": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/entity_extraction.txt",
"entity_types": [
"organization",
"person",
"geo",
"event"
],
"max_gleanings": 1,
"strategy": null,
"encoding_model": null
},
"summarize_descriptions": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/summarize_descriptions.txt",
"max_length": 500,
"strategy": null
},
"community_reports": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/community_report.txt",
"max_length": 2000,
"max_input_length": 8000,
"strategy": null
},
"claim_extraction": {
"llm": {
"api_key": "==== REDACTED ====",
"type": "openai_chat",
"encoding_model": "cl100k_base",
"model": "llama-3-70b-instruct-awq",
"embeddings_model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"request_timeout": 180.0,
"api_base": "http://localhost:8005/v1",
"api_version": null,
"proxy": null,
"audience": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25,
"responses": null
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"enabled": false,
"prompt": "prompts/claim_extraction.txt",
"description": "Any claims or facts that could be relevant to information discovery.",
"max_gleanings": 1,
"strategy": null,
"encoding_model": null
},
"cluster_graph": {
"max_cluster_size": 10,
"strategy": null
},
"umap": {
"enabled": false
},
"local_search": {
"prompt": "prompts/local_search_system_prompt.txt",
"text_unit_prop": 0.5,
"community_prop": 0.1,
"conversation_history_max_turns": 5,
"top_k_entities": 10,
"top_k_relationships": 10,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"llm_max_tokens": 2000
},
"global_search": {
"map_prompt": "prompts/global_search_map_system_prompt.txt",
"reduce_prompt": "prompts/global_search_reduce_system_prompt.txt",
"knowledge_prompt": "prompts/global_search_knowledge_system_prompt.txt",
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"data_max_tokens": 12000,
"map_max_tokens": 1000,
"reduce_max_tokens": 2000,
"concurrency": 32,
"dynamic_search_llm": "gpt-4o-mini",
"dynamic_search_threshold": 1,
"dynamic_search_keep_parent": false,
"dynamic_search_num_repeats": 1,
"dynamic_search_use_summary": false,
"dynamic_search_concurrent_coroutines": 16,
"dynamic_search_max_level": 2
},
"drift_search": {
"prompt": "prompts/drift_search_system_prompt.txt",
"temperature": 0.0,
"top_p": 1.0,
"n": 3,
"max_tokens": 12000,
"data_max_tokens": 12000,
"concurrency": 32,
"drift_k_followups": 20,
"primer_folds": 5,
"primer_llm_max_tokens": 12000,
"n_depth": 3,
"local_search_text_unit_prop": 0.9,
"local_search_community_prop": 0.1,
"local_search_top_k_mapped_entities": 10,
"local_search_top_k_relationships": 10,
"local_search_max_data_tokens": 12000,
"local_search_temperature": 0.0,
"local_search_top_p": 1.0,
"local_search_n": 1,
"local_search_llm_max_gen_tokens": 2000
},
"encoding_model": "cl100k_base",
"skip_workflows": []
}
15:37:40,44 graphrag.index.create_pipeline_config INFO skipping workflows
15:37:40,44 graphrag.index.run.run INFO Running pipeline
15:37:40,44 graphrag.storage.file_pipeline_storage INFO Creating file storage at /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/output
15:37:40,44 graphrag.index.input.factory INFO loading input from root_dir=input
15:37:40,44 graphrag.index.input.factory INFO using file storage for input
15:37:40,45 graphrag.storage.file_pipeline_storage INFO search /home/hanlv/workspace/code/research/infodemic/LLM/graphrag/my_work/COVMIS2024/input for files matching ..txt$
15:37:40,45 graphrag.index.input.text INFO found text files from input, found [('contexts.txt', {})]
15:37:40,47 graphrag.index.input.text INFO Found 1 files, loading 1
15:37:40,47 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_final_documents', 'create_base_entity_graph', 'create_final_entities', 'create_final_relationships', 'create_final_nodes', 'create_final_communities', 'create_final_text_units', 'create_final_community_reports', 'generate_text_embeddings']
15:37:40,48 graphrag.index.run.run INFO Final # of rows loaded: 1
15:37:40,105 graphrag.index.run.workflow INFO dependencies for create_base_text_units: []
15:37:40,109 datashaper.workflow.workflow INFO executing verb create_base_text_units
15:37:40,748 graphrag.index.run.workflow INFO dependencies for create_final_documents: ['create_base_text_units']
15:37:40,749 graphrag.index.run.workflow WARNING Dependency table create_base_text_units not found in storage: it may be a runtime-only in-memory table. If you see further errors, this may be an actual problem.
15:37:40,753 datashaper.workflow.workflow INFO executing verb create_final_documents
15:37:40,760 graphrag.index.exporter INFO exporting parquet table create_final_documents.parquet
15:37:40,831 graphrag.index.run.workflow INFO dependencies for create_base_entity_graph: ['create_base_text_units']
15:37:40,832 graphrag.index.run.workflow WARNING Dependency table create_base_text_units not found in storage: it may be a runtime-only in-memory table. If you see further errors, this may be an actual problem.
15:37:40,837 datashaper.workflow.workflow INFO executing verb create_base_entity_graph
15:37:42,57 httpx INFO HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 401 Unauthorized"
15:37:42,61 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': '\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as ("entity"<|><entity_name><|><entity_type><|><entity_description>)\n \n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as ("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_strength>)\n \n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n \n4. When finished, output <|COMPLETE|>\n \n######################\n-Examples-\n######################\nExample 1:\nEntity_types: ORGANIZATION,PERSON\nText:\nThe Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.\n######################\nOutput:\n("entity"<|>CENTRAL INSTITUTION<|>ORGANIZATION<|>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)\n##\n("entity"<|>MARTIN SMITH<|>PERSON<|>Martin Smith is the chair of the Central Institution)\n##\n("entity"<|>MARKET STRATEGY COMMITTEE<|>ORGANIZATION<|>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)\n##\n("relationship"<|>MARTIN SMITH<|>CENTRAL INSTITUTION<|>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<|>9)\n<|COMPLETE|>\n\n######################\nExample 2:\nEntity_types: ORGANIZATION\nText:\nTechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nTechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.\n######################\nOutput:\n("entity"<|>TECHGLOBAL<|>ORGANIZATION<|>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)\n##\n("entity"<|>VISION HOLDINGS<|>ORGANIZATION<|>Vision Holdings is a firm that previously owned TechGlobal)\n##\n("relationship"<|>TECHGLOBAL<|>VISION HOLDINGS<|>Vision Holdings formerly owned TechGlobal from 2014 until present<|>5)\n<|COMPLETE|>\n\n######################\nExample 3:\nEntity_types: ORGANIZATION,GEO,PERSON\nText:\nFive Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.\n\nThe swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.\n\nThe exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.\n\nThey were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.\n\nThe Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.\n######################\nOutput:\n("entity"<|>FIRUZABAD<|>GEO<|>Firuzabad held Aurelians as hostages)\n##\n("entity"<|>AURELIA<|>GEO<|>Country seeking to release hostages)\n##\n("entity"<|>QUINTARA<|>GEO<|>Country that negotiated a swap of money in exchange for hostages)\n##\n##\n("entity"<|>TIRUZIA<|>GEO<|>Capital of Firuzabad where the Aurelians were being held)\n##\n("entity"<|>KROHAARA<|>GEO<|>Capital city in Quintara)\n##\n("entity"<|>CASHION<|>GEO<|>Capital city in Aurelia)\n##\n("entity"<|>SAMUEL NAMARA<|>PERSON<|>Aurelian who spent time in Tiruzia's Alhamia Prison)\n##\n("entity"<|>ALHAMIA PRISON<|>GEO<|>Prison in Tiruzia)\n##\n("entity"<|>DURKE BATAGLANI<|>PERSON<|>Aurelian journalist who was held hostage)\n##\n("entity"<|>MEGGIE TAZBAH<|>PERSON<|>Bratinas national and environmentalist who was held hostage)\n##\n("relationship"<|>FIRUZABAD<|>AURELIA<|>Firuzabad negotiated a hostage exchange with Aurelia<|>2)\n##\n("relationship"<|>QUINTARA<|>AURELIA<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n("relationship"<|>QUINTARA<|>FIRUZABAD<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>ALHAMIA PRISON<|>Samuel Namara was a prisoner at Alhamia prison<|>8)\n##\n("relationship"<|>SAMUEL NAMARA<|>MEGGIE TAZBAH<|>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>DURKE BATAGLANI<|>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>MEGGIE TAZBAH<|>DURKE BATAGLANI<|>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n("relationship"<|>SAMUEL NAMARA<|>FIRUZABAD<|>Samuel Namara was a hostage in Firuzabad<|>2)\n##\n("relationship"<|>MEGGIE TAZBAH<|>FIRUZABAD<|>Meggie Tazbah was a hostage in Firuzabad<|>2)\n##\n("relationship"<|>DURKE BATAGLANI<|>FIRUZABAD<|>Durke Bataglani was a hostage in Firuzabad<|>2)\n<|COMPLETE|>\n\n######################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: Information 1:\nPublication date: 2020-10-01\nTitle: Is the coronavirus airborne? | FAQ - Covid-19 - NJ.gov\nContent:\nThis is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes. Furthermore, the Centers for Disease Control and Prevention recommend airborne precautions for the care of COVID-19 suspected or confirmed patients.\nMeasles and tuberculosis are examples of respiratory diseases that remain infectious in the air for long time periods. The measles virus can live for up to two hours in the air where an infected person coughs or sneezes. Tuberculosis can live in the air for up to six hours. Under experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours.\nUnder experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours. The researchers estimate that in most real-world situations, the virus would remain suspended in the air for about 30 minutes, before settling onto surfaces. This is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes.\nSee an error?Let us know! ... Yes, COVID-19 can spread via airborne transmission.\n######################\nOutput:', 'kwargs': {}}
15:37:42,61 graphrag.index.graph.extractors.graph.graph_extractor ERROR error extracting graph
Traceback (most recent call last):
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/graph/extractors/graph/graph_extractor.py", line 127, in call
result = await self._process_document(text, prompt_variables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/graph/extractors/graph/graph_extractor.py", line 155, in _process_document
response = await self._llm(
^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat.py", line 83, in call
return await self._text_chat_llm(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/features/tools_parsing.py", line 120, in call
return await self._delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 112, in call
return await self._invoke(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 128, in _invoke
return await self._decorated_target(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/json.py", line 71, in invoke
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 109, in invoke
result = await execute_with_retry()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 93, in execute_with_retry
async for a in AsyncRetrying(
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 166, in anext
do = await self.iter(retry_state=self._retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/tenacity/init.py", line 398, in
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 101, in execute_with_retry
return await attempt()
^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/retryer.py", line 78, in attempt
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/rate_limiter.py", line 70, in invoke
result = await delegate(prompt, **args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/json.py", line 71, in invoke
return await delegate(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/base/base.py", line 152, in _decorator_target
output = await self._execute_llm(prompt, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat_text.py", line 155, in _execute_llm
completion = await self._call_completion_or_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/openai/llm/chat_text.py", line 127, in _call_completion_or_cache
return await self._cache.get_or_insert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/fnllm/services/cache_interactor.py", line 50, in get_or_insert
entry = await func()
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1661, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1843, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1537, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1638, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: none. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
15:37:42,63 graphrag.callbacks.file_workflow_callbacks INFO Entity Extraction Error details={'doc_index': 0, 'text': 'Information 1:\nPublication date: 2020-10-01\nTitle: Is the coronavirus airborne? | FAQ - Covid-19 - NJ.gov\nContent:\nThis is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes. Furthermore, the Centers for Disease Control and Prevention recommend airborne precautions for the care of COVID-19 suspected or confirmed patients.\nMeasles and tuberculosis are examples of respiratory diseases that remain infectious in the air for long time periods. The measles virus can live for up to two hours in the air where an infected person coughs or sneezes. Tuberculosis can live in the air for up to six hours. Under experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours.\nUnder experimental conditions, researchers found that the COVID-19 virus stayed viable in the air for three hours. The researchers estimate that in most real-world situations, the virus would remain suspended in the air for about 30 minutes, before settling onto surfaces. This is similar to what was found for SARS and MERS, which some researchers consider likely to be spread via airborne transmission. One study estimates that a person infected with the COVID-19 virus who speaks loudly for one minute produces at least 1,000 virus-containing droplets that remain airborne for more than 8 minutes.\nSee an error?Let us know! ... Yes, COVID-19 can spread via airborne transmission.'}
15:37:42,66 datashaper.workflow.workflow ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name'
Traceback (most recent call last):
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'
15:37:42,68 graphrag.callbacks.file_workflow_callbacks INFO Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' details=None
15:37:42,68 graphrag.index.run.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/run/run.py", line 260, in run_pipeline
result = await _process_workflow(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/run/workflow.py", line 103, in _process_workflow
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hanlv/workspace/code/research/infodemic/LLM/graphrag/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
^^^^^^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
^^^^^^^^^^^^
File "/home/hanlv/miniconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'
Additional Information
The text was updated successfully, but these errors were encountered: