Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict sql submit predict job as maxcompute udf script #605

Merged
merged 18 commits into from
Aug 1, 2019

Conversation

Yancey1989
Copy link
Collaborator

@Yancey1989 Yancey1989 commented Jul 26, 2019

Related issue: #604

An example SQL can be like:

SELECT alps_inference(
            concat_ws(",",user_id,l)
            ,1
            ,"deep_id"
            ,COALESCE(deep_id, "")
        ) AS (user_info, score)
FROM alifin_jtest_dev.predict_data
PREDICT predict_result
WITH OSS_ID=a, OSS_KEY=b
USING sqlflow_model;

TODO: sql-machine-learning/gomaxcompute#46

@Yancey1989
Copy link
Collaborator Author

Yancey1989 commented Jul 27, 2019

Execute the ODPS query failed:

response error: 400, <?xml version="1.0" encoding="UTF-8"?>
<Error>
	<Code>ParseError</Code>
	<Message><![CDATA[Please add put { "odps.sql.submit.mode" : "script"} for multi-statement query in settings.]]></Message>
	<RequestId>5D3BE689B8C6891C0879BE45</RequestId>
	<HostId>odps.aliyun.com</HostId>
</Error>

the gomaxcompute need to support the settings interface.

This PR uses odpscmd to submit the ODPS UDF script.

@Yancey1989 Yancey1989 changed the title [WIP]predict sql generate odps udf script predict sql generate odps udf script Jul 29, 2019
@Yancey1989 Yancey1989 changed the title predict sql generate odps udf script predict sql submit predict job as maxcompute udf script Jul 29, 2019
# 7. Load sqlflow Jupyter magic command automatically. c.f. https://stackoverflow.com/a/32683001.
# 7. Install odpscmd for submitting alps predict job with odps udf script
# TODO(Yancey1989): using gomaxcompute instead of the odpscmd command-line tool.
wget -q http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/119096/cn_zh/1557995455961/odpscmd_public.zip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure odpscmd a public available program.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the link from Aliyun, I think it's a public program.

sql/codegen_alps.go Outdated Show resolved Hide resolved
@@ -40,6 +43,7 @@ type alpsFiller struct {
ModelDir string
ScratchDir string
PredictOutputTable string
PredictInputModel string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reuse ModelDir ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed it.

sql/sql.y Outdated
$$.model = $4
}
: PREDICT IDENT { $$.into = $2 }
| PREDICT WITH attrs { $$.predAttrs = $3 }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why not PREDICT IDENT WITH attrs ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

sql/sql.y Outdated
}
: PREDICT IDENT { $$.into = $2 }
| PREDICT WITH attrs { $$.predAttrs = $3 }
| predict_clause USING IDENT { $$.model = $3 }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line mean we can write multiple USING?

Copy link
Collaborator Author

@Yancey1989 Yancey1989 Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and found some same syntax bug, such as SELECT xx FROM a FROM b, SELECT xxx from table TRAIN model COLUMN a,b COLUMN c,d, I issued a bug: #616 , and will fix it in another pr.

tonyyang-svail
tonyyang-svail previously approved these changes Jul 31, 2019
Copy link
Collaborator

@tonyyang-svail tonyyang-svail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me.

modelDir := fmt.Sprintf("oss://arks-model/%s/%s.tar.gz", session.UserId, pr.predictClause.model)

return &alpsFiller{
IsTraining: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why predict should set IsTraining: true,?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

func alpsPred(w *PipeWriter, pr *extendedSelect, db *DB, cwd string, session *pb.Session) error {
f, err := newALPSPredictFiller(pr)
fname := "alps_pre.odps"
filepath := filepath.Join(cwd, fname)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer clean up this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@Yancey1989 Yancey1989 merged commit a290673 into sql-machine-learning:develop Aug 1, 2019
@Yancey1989 Yancey1989 deleted the alps_predict_udf branch August 1, 2019 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants