Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing with AWS-S3 Bucket is failing #560

Open
ApsaraDhanasekar11 opened this issue Nov 1, 2021 · 2 comments
Open

Indexing with AWS-S3 Bucket is failing #560

ApsaraDhanasekar11 opened this issue Nov 1, 2021 · 2 comments

Comments

@ApsaraDhanasekar11
Copy link

ApsaraDhanasekar11 commented Nov 1, 2021

Hi, I tried to create index with a different backend (AWS -S3 bucket) using the s3leveldown module as a DB store option. The Index is being created, but while querying using _SEARCH/ QUERY methods, the result set is inappropriate. Like for eg, when I initialise the DB with the S3 bucket and use the PUT method to add documents, where my text is "Final is the file name".. and "what is the version" . This is how its created ::

current Indexed one:: { key: 'description:file#0.60', value: [ '1635744247556-1-1' ] }.
Another one:: { key: 'description:version#0.50', value: [ '1635744285856-1-1' ] }

I am able to see the above in my store, when I do a createReadStream. But when my search keyword is "version", my expected result should be only the 2nd indexed document. But it gives me both 1st and 2nd.I tried using both _SEARCH/ QUERY methods, but both r giving same wrong/additional results.
I took reference from the below test folder examples:: https://github.com/fergiemcdowall/search-index/blob/master/test/src/memdown-test.js .
Can someone guide on the correct approach for implementing other backend store options like Amazon-S3 ?

@fergiemcdowall
Copy link
Owner

Thanks for the bug report @ApsaraDhanasekar11. I think I understand your problem, but there are too many variables to reproduce it accurately.

Could you include a standalone script/test that demonstrates the issue?

@ApsaraDhanasekar11
Copy link
Author

Hi @fergiemcdowall , thanks for replying back. Please find the below example code and help us with details. Thanks!

const levelup = require('levelup');
const si = require('search-index');
const s3leveldown = require('s3leveldown');

const s3Store = await levelup(s3leveldown(bucketName, S3Client));

const idx = await si({
db: s3Store,
storeVectors: true
});

let data = [
{
_id: 'a',
description: 'Use template to list'
},
{
_id: 'b',
description: 'All versions and updates'
},
{
_id: 'c',
description: 'Final is the file name'
}
];

const result = await idx.PUT(data, {
storeVectors: true });

// results is :: [
{ _id: 'a', operation: 'PUT', status: 'CREATED' },
{ _id: 'b', operation: 'PUT', status: 'CREATED' },
{ _id: 'c', operation: 'PUT', status: 'CREATED' }
]

// ****** The above code creates the index as below::
{ key: 'description:file#1.00', value: [ 'c' ] }
{ key: 'description:final#1.00', value: [ 'c' ] }
{ key: 'description:list#1.00', value: [ 'a' ] }
{ key: 'description:name#1.00', value: [ 'c' ] }
{ key: 'description:template#1.00', value: [ 'a' ] }
{ key: 'description:updates#1.00', value: [ 'b' ] }
{ key: 'description:use#1.00', value: [ 'a' ] }
{ key: 'description:versions#1.00', value: [ 'b' ] }
{ key: '○DOCUMENT_COUNT○', value: 3 }
{
key: '○DOC_RAW○a○',
value: { _id: 'a', description: 'Use template to list' }
}
{
key: '○DOC_RAW○b○',
value: { _id: 'b', description: 'All versions and updates' }
}
{
key: '○DOC_RAW○c○',
value: { _id: 'c', description: '"Final is the file name' }
}
{
key: '○DOC○a○',
value: {
_id: 'a',
description: [ 'list#1.00', 'template#1.00', 'to#1.00', 'use#1.00' ]
}
}
{
key: '○DOC○b○',
value: {
_id: 'b',
description: [ 'all#1.00', 'and#1.00', 'updates#1.00', 'versions#1.00' ]
}
}
{
key: '○DOC○c○',
value: {
_id: 'c',
description: [ 'file#1.00', 'final#1.00', 'is#1.00', 'name#1.00', 'the#1.00' ]
}
}
{ key: '○FIELD○description○', value: 'description' }
//// ************************* ///////

// ****** For the Search/ query : ******* //
const result = await indexedDb.QUERY( {
GET: {
FIELD: ['description'],
VALUE: {
GTE: 'versions',
LTE: 'versions'
},
}
}); ----> Tried other options like Query->(GET, SEARCH) , _SEARCH, _GET

But the result was ::
RESULT: [
{
_id: 'c',
_match: [
'description:file#1.00',
'description:final#1.00',
'description:name#1.00'
]
},
{
_id: 'a',
_match: [
'description:list#1.00',
'description:template#1.00',
'description:use#1.00'
]
},
{
_id: 'b',
_match: [ 'description:updates#1.00', 'description:versions#1.00' ]
}
],
RESULT_LENGTH: 3
} :: which is actually giving results of all previous alphabetical words from a-v (as "versions" begins with v)

While trying to identify the process flow, I noticed the GET function has internal implementation of db.createReadStream method which should actually filter the data according to the keywords passed in GTE & LTE. But looks like this is failing and instead bringing up the entire result set restricting upto the first character (alphabetic order)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants