Running Cypress e2e tests on Stage #4889

GitHub Actions / JUnit Test Report failed Jan 23, 2025 in 0s

194 tests run, 192 passed, 1 skipped, 1 failed.

Annotations

Check failure on line 320 in nuclia_e2e/nuclia_e2e/tests/test_kb.py

github-actions / JUnit Test Report

test_kb.test_kb[aws-us-east-2-1]

AssertionError: assert 'climate change' in 'not enough data to answer this.'
 +  where 'not enough data to answer this.' = <built-in method lower of str object at 0x7f48d4d8bcd0>()
 +    where <built-in method lower of str object at 0x7f48d4d8bcd0> = 'Not enough data to answer this.'.lower
 +      where 'Not enough data to answer this.' = <built-in method decode of bytes object at 0x7f48d4ca17f0>()
 +        where <built-in method decode of bytes object at 0x7f48d4ca17f0> = b'Not enough data to answer this.'.decode
 +          where b'Not enough data to answer this.' = AskAnswer(answer=b'Not enough data to answer this.', object=None, learning_id='5a74efb9235a49d885a2c68422c9cbcf', rela..._best_matches=None, status='no_context', prompt_context=None, relations=None, predict_request=None, error_details=None).answer

Raw output


            request = <FixtureRequest for <Function test_kb[aws-us-east-2-1]>>
regional_api_config = {'name': 'aws-us-east-2-1', 'permanent_account_id': '8c7db65c-3b7e-4140-8165-d37bb4e6e9b8', 'permanent_account_slug': ...tlWnqbD5r2JCf-AIlivtganDzgqtQXkdKVPgWOfbn0gTv0wuYFvlwGJ2Xu2np1wk2JtYZNZdFAnVFuDG4elz1eIxpiajKH0SggdFtVOiy5EfFAPI', ...}
clean_kb_test = None

    @pytest.mark.asyncio_cooperative
    async def test_kb(request, regional_api_config, clean_kb_test):
        """
        Test a chain of operations that simulates a normal use of a knowledgebox, just concentrated
        in time.
    
        These tests are not individual tests in order to be able to test stuff with newly created
        knowledgebox, without creating a lot of individual kb's for more atomic tests, just to avoid
        wasting our resources. The value of doing that on a new kb each time, is being able to catch
        any error that may not be catches by using a preexisting kb with older parameters.
    
        A part of this tests is sequential, as it is important to guarantee the state before moving on
        while other parts can be run concurrently, hence the use of `gather` in some points
        """
    
        def logger(msg):
            print(f"{request.node.name} ::: {msg}")
    
        zone = regional_api_config["zone_slug"]
        account = regional_api_config["permanent_account_id"]
        auth = get_auth()
    
        # Creates a brand new kb that will be used troughout this test
        kbid = await run_test_kb_creation(regional_api_config, logger)
    
        # Configures a nucliadb client defaulting to a specific kb, to be used
        # to override all the sdk endpoints that automagically creates the client
        # as this is incompatible with the cooperative tests
        async_ndb = get_async_kb_ndb_client(zone, account, kbid, auth._config.token)
        sync_ndb = get_sync_kb_ndb_client(zone, account, kbid, auth._config.token)
    
        # Import a preexisting export containing several resources (coming from the financial-news kb)
        # and wait for the resources to be completely imported
        await run_test_import_kb(regional_api_config, async_ndb, logger)
    
        # Create a labeller configuration, with the goal of testing two tings:
        # - labelling of existing resources (the ones imported)
        # - labelling of new resources(will be created later)
        await run_test_create_da_labeller(regional_api_config, sync_ndb, logger)
    
        # Upload a new resource and validate that is correctly processed and stored in nuclia
        await run_test_upload_and_process(regional_api_config, async_ndb, logger)
    
        # Wait for both labeller task results to be consolidated in nucliadb while we also run semantic search
        # This /find and /ask requests are crafted so they trigger all the existing calls to predict features
        # We wait until find succeeds to run the ask tests to maximize the changes that all indexes will be
        # available and so minimize the llm costs retrying
        await asyncio.gather(
            run_test_check_da_labeller_output(regional_api_config, sync_ndb, logger),
            run_test_find(regional_api_config, async_ndb, logger),
        )
>       await run_test_ask(regional_api_config, async_ndb, logger)

nuclia_e2e/nuclia_e2e/tests/test_kb.py:470: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/backoff/_async.py:151: in retry
    ret = await target(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

regional_api_config = {'name': 'aws-us-east-2-1', 'permanent_account_id': '8c7db65c-3b7e-4140-8165-d37bb4e6e9b8', 'permanent_account_slug': ...tlWnqbD5r2JCf-AIlivtganDzgqtQXkdKVPgWOfbn0gTv0wuYFvlwGJ2Xu2np1wk2JtYZNZdFAnVFuDG4elz1eIxpiajKH0SggdFtVOiy5EfFAPI', ...}
ndb = <nuclia.lib.kb.AsyncNucliaDBClient object at 0x7f48d4c49360>
logger = <function test_kb.<locals>.logger at 0x7f48d4d00c10>

    @backoff.on_exception(backoff.constant, AssertionError, max_tries=5, interval=5)
    async def run_test_ask(regional_api_config, ndb, logger):
        kb = AsyncNucliaKB()
    
        ask_result = await kb.search.ask(
            ndb=ndb,
            autofilter=True,
            rephrase=True,
            reranker="predict",
            features=["keyword", "semantic", "relations"],
            query=TEST_CHOCO_QUESTION,
            model="chatgpt-azure-4o-mini",
            prompt=dedent(
                """
                Answer the following question based **only** on the provided context. Do **not** use any outside
                knowledge. If the context does not provide enough information to fully answer the question, reply
                with: “Not enough data to answer this.”
                Don't be too picky. please try to answer if possible, even if it requires to make a bit of a
                deduction.
                [START OF CONTEXT]
                {context}
                [END OF CONTEXT]
                Question: {question}
                # Notes
                - **Do not** mention the source of the context in any case
                """
            ),
        )
>       assert "climate change" in ask_result.answer.decode().lower()
E       AssertionError: assert 'climate change' in 'not enough data to answer this.'
E        +  where 'not enough data to answer this.' = <built-in method lower of str object at 0x7f48d4d8bcd0>()
E        +    where <built-in method lower of str object at 0x7f48d4d8bcd0> = 'Not enough data to answer this.'.lower
E        +      where 'Not enough data to answer this.' = <built-in method decode of bytes object at 0x7f48d4ca17f0>()
E        +        where <built-in method decode of bytes object at 0x7f48d4ca17f0> = b'Not enough data to answer this.'.decode
E        +          where b'Not enough data to answer this.' = AskAnswer(answer=b'Not enough data to answer this.', object=None, learning_id='5a74efb9235a49d885a2c68422c9cbcf', rela..._best_matches=None, status='no_context', prompt_context=None, relations=None, predict_request=None, error_details=None).answer

nuclia_e2e/nuclia_e2e/tests/test_kb.py:320: AssertionError

View more details on GitHub Actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Cypress e2e tests on Stage #4889

JUnit Test Report

Running Cypress e2e tests on Stage #4889

Jobs

Run details

194 tests run, 192 passed, 1 skipped, 1 failed.

Annotations

github-actions / JUnit Test Report

Re-running jobs...