Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odd join behavior #572

Open
sggraham32 opened this issue Dec 4, 2012 · 1 comment
Open

odd join behavior #572

sggraham32 opened this issue Dec 4, 2012 · 1 comment

Comments

@sggraham32
Copy link

Set up: include the attached file in a webapp subdirectory called testfeed that was included in the ql.io discuss forum in a post by me with the same subject line.

I initially wrote a simple join between the atom:feed and the mapping info, using a script variable to contain the map:

mapping = [
{
"name" : "A",
"user" : "engineering"
},
{
"name" : "B",
"user" : "pre-sales"
}
];

create table test.feed on select get from "http://localhost:8080/testfeed/test.xml" resultset "atom:feed.atom:entry";

select test.atom:content.vsc:Thing.vsc:displayName as t:displayName,
umap.name as umap_name,
test.atom:content.vsc:Thing.inst:p1 as t:p1,
umap.user as user

from test.feed as test, mapping as umap
where umap.name = test.atom:content.vsc:Thing.vsc:displayName;

The response is what I expected:
[
{
"t:displayName": "A",
"umap_name": "A",
"t:p1": "A property1",
"user": "engineering"
},
{
"t:displayName": "B",
"umap_name": "B",
"t:p1": "B property1",
"user": "pre-sales"
}
]

When I converted the script to instead pull the mapping file in from an http call (to the same data contained in a file in the /testfeed webapp called user-map.json), I get different join results:
create table mapping on select get from "http://localhost:8080/testfeed/user-map.json" ;

The exact same select statement seems to create a union instead of a join:

[

{
    "t:displayName": "A",
    "umap_name": "A",
    "t:p1": "A property1",
    "user": "engineering"
},
{
    "t:displayName": "A",
    "umap_name": "B",
    "t:p1": "A property1",
    "user": "pre-sales"
},
{
    "t:displayName": "B",
    "umap_name": "A",
    "t:p1": "B property1",
    "user": "engineering"
},
{
    "t:displayName": "B",
    "umap_name": "B",
    "t:p1": "B property1",
    "user": "pre-sales"
},
{
    "t:displayName": "C",
    "umap_name": "A",
    "t:p1": "C property1",
    "user": "engineering"
},
{
    "t:displayName": "C",
    "umap_name": "B",
    "t:p1": "C property1",
    "user": "pre-sales"
}

]

The data is really the same in both situations, I would expect the result of the select statement to be the same too.

@mbellmbell
Copy link

While you're waiting for a real answer, maybe try assigning each select statement to a variable and then doing the join between those variables. My (very limited) understanding is that the ql.io query optimizer will query sourceA once and then query sourceB once per row from sourceA. If sourceB returns all rows (which it does in your case), then you'll end up with a cross-product.

We had luck with:

A = select * from tableA
B = select * from tableB

select A.x, B.y from A, B where A.x=B.x

Plus it's way faster because there are only two network requests... the join (iterative subqueries) is all done in memory. Note that if you have > 50 rows in the first table it iterates over then you need to change maxNestedRequests to a larger number (that defines the number of subqueries executed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants