Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql edits to basic_instruct and classic sql-eval #129

Merged
merged 4 commits into from
May 8, 2024
Merged

Conversation

wendy-aw
Copy link
Contributor

@wendy-aw wendy-aw commented May 7, 2024

More refinements to sql. Explanations in commits

Copy link
Collaborator

@wongjingping wongjingping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this really detailed PR!

How does the ratio of publications to journals change over the years? Return the annual numbers of publications and journals as well.,"SELECT publication.year, COUNT(DISTINCT publication.pid) AS num_publications, COUNT(DISTINCT publication.jid) AS num_journals, CAST(COUNT(DISTINCT publication.pid) AS FLOAT) / NULLIF(COUNT(DISTINCT publication.jid), 0) AS ratio FROM publication GROUP BY publication.year ORDER BY publication.year;",academic,ratio,
What is the ratio of publications presented in conferences to publications published in journals?,"SELECT CAST(COUNT(DISTINCT CASE WHEN cid IS NOT NULL THEN pid END) AS FLOAT) / NULLIF(COUNT(DISTINCT CASE WHEN jid IS NOT NULL THEN pid END), 0) AS ratio FROM publication;",academic,ratio,
What is the ratio of the total number of publications to the total number of keywords within each domain ID? Show all domain IDs.,"SELECT domain_publication.did, CAST(COUNT(DISTINCT domain_publication.pid) AS FLOAT) / NULLIF(COUNT(DISTINCT domain_keyword.kid), 0) AS publication_to_keyword_ratio FROM domain_publication LEFT JOIN domain_keyword ON domain_publication.did = domain_keyword.did GROUP BY domain_publication.did ORDER BY publication_to_keyword_ratio DESC NULLS LAST;SELECT domain_publication.did, CAST(COUNT(DISTINCT domain_publication.pid) AS FLOAT) / NULLIF(COUNT(DISTINCT domain_keyword.kid), 0) AS publication_to_keyword_ratio FROM domain_keyword LEFT JOIN domain_publication ON domain_publication.did = domain_keyword.did GROUP BY domain_publication.did ORDER BY publication_to_keyword_ratio DESC NULLS LAST;SELECT d.did, COALESCE(CAST(COUNT(DISTINCT dp.pid) AS FLOAT) / NULLIF(COUNT(DISTINCT dk.kid), 0), 0) AS publication_to_keyword_ratio FROM domain d LEFT JOIN domain_publication dp ON d.did = dp.did LEFT JOIN domain_keyword dk ON d.did = dk.did GROUP BY d.did ORDER BY publication_to_keyword_ratio DESC NULLS LAST;",academic,ratio,
How does the ratio of publications to journals change over the years? Return the annual numbers of publications and journals as well.,"SELECT p.year, COUNT(DISTINCT p.pid) AS num_publications, COUNT(DISTINCT j.jid) AS num_journals, CAST(COUNT(DISTINCT p.pid) AS FLOAT) / NULLIF(COUNT(DISTINCT j.jid), 0) AS ratio FROM publication p LEFT JOIN journal j ON p.jid = j.jid GROUP BY p.year ORDER BY p.year;",academic,ratio,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran both the queries before and after and they do return the same result - just to confirm that it's expected? I also realised that jid in journal but not in publication won't able to show up here as it wouldn't have the year column from publication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. That's true yea. So in effect it's the same as the original query without the JOIN. Thanks for pointing this out. Then I'll just revert to the old one.

@@ -44,14 +44,14 @@ How many courses are offered for each semester id?,"SELECT course_offering.semes
"What is the total number of students enrolled in each course, ordered from highest to lowest?","SELECT {course.course_id, course.name, course.number}, SUM(course.num_enrolled) AS total_students FROM course GROUP BY {} ORDER BY total_students DESC NULLS LAST;",advising,order_by,
"What is the total number of credits earned by each student, ordered from highest to lowest? Give the student id and the total number of credits.","SELECT student.student_id, student.total_credit FROM student ORDER BY student.total_credit DESC NULLS LAST;",advising,order_by,
"What is the name of the instructor who has taught the most courses, and how many courses have they taught?","SELECT instructor.name, count(offering_instructor.offering_id) AS num_courses FROM offering_instructor JOIN instructor ON offering_instructor.instructor_id = instructor.instructor_id GROUP BY instructor.name ORDER BY num_courses DESC LIMIT 1;",advising,order_by,
What is the ratio of the total number of students enrolled in courses with exams to the total number of students enrolled in courses without exams?,"WITH exams AS (SELECT DISTINCT sr.student_id FROM public.student_record sr JOIN public.course_offering co ON sr.offering_id = co.offering_id WHERE co.has_final_exam = TRUE ), no_exams AS (SELECT DISTINCT sr.student_id FROM public.student_record sr JOIN public.course_offering co ON sr.offering_id = co.offering_id WHERE co.has_final_exam = FALSE ) SELECT (SELECT COUNT(student_id) FROM exams)::FLOAT / (SELECT COUNT(student_id) FROM no_exams) AS ratio;",advising,ratio,
What is the ratio of the total number of students enrolled in courses with exams to the total number of students enrolled in courses without exams?,"SELECT SUM(CASE WHEN c.has_exams THEN c.num_enrolled ELSE 0 END)::FLOAT / SUM(CASE WHEN NOT c.has_exams THEN c.num_enrolled ELSE 0 END) AS ratio FROM course c;",advising,ratio,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo thanks for pointing out the slightly ambiguous reference of final_exam vs exam!

What's the earliest flight departure time in the day in HH:MM?,"SELECT to_char(to_timestamp(departure_time)::TIME, 'HH24:MI') AS earliest_departure_time FROM flight ORDER BY earliest_departure_time LIMIT 1;",atis,date_functions,
What's the difference in time in days between today and the earliest flight departure?,"SELECT date_part('day', CURRENT_DATE - to_timestamp(departure_time)) AS difference_in_days FROM flight ORDER BY departure_time LIMIT 1;SELECT (CURRENT_DATE - TO_TIMESTAMP(MIN(f.departure_time))) AS days_difference FROM flight f;",atis,date_functions,
What is the total cost of round-trip fares for each airline code?,"SELECT fare.fare_airline, SUM(fare.round_trip_cost) AS total_round_trip_cost FROM fare GROUP BY fare.fare_airline ORDER BY total_round_trip_cost DESC;",atis,group_by,
"What is the average cost of round-trip fares from Los Angeles (LAX) to Chicago (ORD) for each airline, sorted in descending order by average cost?","SELECT fare.fare_airline, AVG(fare.round_trip_cost) AS average_cost FROM fare WHERE fare.from_airport = 'LAX' AND fare.to_airport = 'ORD' GROUP BY fare.fare_airline ORDER BY average_cost DESC NULLS LAST;SELECT airline.airline_name, AVG(fare.round_trip_cost) AS avg_round_trip_cost FROM fare JOIN airline ON fare.fare_airline = airline.airline_code WHERE fare.from_airport = 'LAX' AND fare.to_airport = 'ORD' GROUP BY airline.airline_name ORDER BY avg_round_trip_cost DESC;",atis,group_by,
"What is the average cost of a one-way trip for each fare id, sorted in ascending order of the cost?","SELECT fare.fare_id, AVG(fare.one_direction_cost) AS average_cost FROM fare GROUP BY fare.fare_id ORDER BY average_cost ASC NULLS LAST;",atis,group_by,
"What is the average cost of a one-way trip for each airport pair in the fare table?","SELECT f.from_airport, f.to_airport, AVG(f.one_direction_cost) AS average_cost FROM fare f GROUP BY f.from_airport, f.to_airport ORDER BY f.from_airport, f.to_airport NULLS LAST;",atis,group_by,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change makes sense, thanks for updating!

Which street has the most number of restaurants?,SELECT DISTINCT location.street_name FROM LOCATION WHERE street_name = (SELECT street_name FROM LOCATION GROUP BY 1 ORDER BY COUNT(restaurant_id) DESC LIMIT 1);,restaurants,order_by,
Which restaurants serve Italian cuisine or are located in New York? Order the results by the restaurant name.,SELECT restaurant.name FROM restaurant JOIN LOCATION ON restaurant.id = location.restaurant_id WHERE restaurant.food_type ILIKE '%Italian%' OR location.city_name ILIKE '%New York%' ORDER BY restaurant.name NULLS LAST;,restaurants,order_by,
Which street has the most number of restaurants?,SELECT street_name FROM location GROUP BY street_name ORDER BY COUNT(restaurant_id) DESC LIMIT 1;,restaurants,order_by,
Which restaurants serve Italian cuisine or are located in New York? Order the results by the restaurant name.,SELECT name FROM restaurant WHERE food_type ILIKE '%Italian%' OR city_name ILIKE %New York% ORDER BY name NULLS LAST;,restaurants,order_by,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you used a different quote character that isn't the usual '

restaurants=# SELECT name FROM restaurant WHERE food_type ILIKE '%Italian%' OR city_name ILIKE ‘%New York%’ ORDER BY name NULLS LAST;
ERROR:  syntax error at or near "York"
LINE 1: ...d_type ILIKE '%Italian%' OR city_name ILIKE ‘%New York%’ ORD...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew good catch!

Copy link
Member

@rishsriv rishsriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the eagle eyed fixes and improvement -- both around logic, simplification of queries, and formatting!

Just one comment around a group by statement in instruct basic that needs addressing. Feel free to address that and then merge!

@@ -1,6 +1,6 @@
db_name,query_category,question,query
broker,basic_join_date_group_order_limit,"What are the top 5 countries by total transaction amount in the past 30 days, inclusive of 30 days ago? Return the country name, number of transactions and total transaction amount.","SELECT c.sbCustCountry, COUNT(t.sbTxId) AS num_transactions, SUM(t.sbTxAmount) AS total_amount FROM sbCustomer c JOIN sbTransaction t ON c.sbCustId = t.sbTxCustId WHERE t.sbTxDateTime >= CURRENT_DATE - INTERVAL '30 days' GROUP BY c.sbCustCountry ORDER BY total_amount DESC LIMIT 5"
broker,basic_join_date_group_order_limit,"How many distinct customers made each type of transaction between Jan 1, 2023 and Mar 31, 2023 (inclusive of start and end dates)? Return the transaction type, number of distinct customers and average number of shares, for the top 3 transaction types by number of customers.","SELECT t.sbTxType, COUNT(DISTINCT c.sbCustId) AS num_customers, AVG(t.sbTxShares) AS avg_shares FROM sbTransaction t JOIN sbCustomer c ON t.sbTxCustId = c.sbCustId WHERE t.sbTxDateTime BETWEEN '2023-01-01' AND '2023-03-31' GROUP BY t.sbTxType ORDER BY num_customers DESC LIMIT 3"
broker,basic_join_date_group_order_limit,"How many distinct customers made each type of transaction between Jan 1, 2023 and Mar 31, 2023 (inclusive of start and end dates)? Return the transaction type, number of distinct customers and average number of shares, for the top 3 transaction types by number of customers.","SELECT t.sbTxType, COUNT(DISTINCT t.sbTxCustId) AS num_customers, AVG(t.sbTxShares) AS avg_shares FROM sbTransaction t WHERE t.sbTxDateTime BETWEEN '2023-01-01' AND '2023-03-31 23:59:59' GROUP BY t.sbTxType ORDER BY num_customers DESC LIMIT 3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@@ -10,7 +10,7 @@ broker,basic_group_order_limit,What are the top 5 countries by number of custome
broker,basic_left_join,Return the customer ID and name of customers who have not made any transactions.,"SELECT c.sbCustId, c.sbCustName FROM sbCustomer c LEFT JOIN sbTransaction t ON c.sbCustId = t.sbTxCustId WHERE t.sbTxCustId IS NULL"
broker,basic_left_join,Return the ticker ID and symbol of tickers that do not have any daily price records.,"SELECT tk.sbTickerId, tk.sbTickerSymbol FROM sbTicker tk LEFT JOIN sbDailyPrice dp ON tk.sbTickerId = dp.sbDpTickerId WHERE dp.sbDpTickerId IS NULL"
car_dealership,basic_join_date_group_order_limit,"Who were the top 3 sales representatives by total revenue in the past 3 months, inclusive of today's date? Return their first name, last name, total number of sales and total revenue.","SELECT c.first_name, c.last_name, COUNT(s.id) AS total_sales, SUM(s.sale_price) AS total_revenue FROM sales s JOIN salespersons c ON s.salesperson_id = c.id WHERE s.sale_date >= CURRENT_DATE - INTERVAL '3 months' GROUP BY c.first_name, c.last_name ORDER BY total_revenue DESC LIMIT 3"
car_dealership,basic_join_date_group_order_limit,"Return the top 5 salespersons by number of sales in the past 30 days? Return their first and last name, total sales count and total revenue amount.","SELECT sp.first_name, sp.last_name, COUNT(s.id) AS total_sales, SUM(s.sale_price) AS total_revenue FROM sales s JOIN salespersons sp ON s.salesperson_id = sp.id WHERE s.sale_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY sp.first_name, sp.last_name ORDER BY total_sales DESC LIMIT 5"
car_dealership,basic_join_date_group_order_limit,"Return the top 5 salespersons by number of sales in the past 30 days? Return their first and last name, total sales count and total revenue amount.","SELECT sp.first_name, sp.last_name, COUNT(s.id) AS total_sales, SUM(s.sale_price) AS total_revenue FROM sales s JOIN salespersons sp ON s.salesperson_id = sp.id WHERE s.sale_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY sp.first_name, sp.last_name, sp.id ORDER BY total_sales DESC LIMIT 5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking if we need the additional sp.id group by (since it's not in the select statements)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safer to group by id since there technically could be duplicates of first_name, last_name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aight sounds good!

@@ -34,7 +34,7 @@ ewallet,basic_join_date_group_order_limit,"How many distinct active users sent m
ewallet,basic_join_group_order_limit,"What are the top 3 most frequently used coupon codes? Return the coupon code, total number of redemptions, and total amount redeemed.","SELECT c.code AS coupon_code, COUNT(t.txid) AS redemption_count, SUM(t.amount) AS total_discount FROM consumer_div.coupons c JOIN consumer_div.wallet_transactions_daily t ON c.cid = t.coupon_id GROUP BY c.code ORDER BY redemption_count DESC LIMIT 3"
ewallet,basic_join_group_order_limit,"Which are the top 5 countries by total transaction amount sent by users, sender_type = 0? Return the country, number of distinct users who sent, and total transaction amount.","SELECT u.country, COUNT(DISTINCT t.sender_id) AS user_count, SUM(t.amount) AS total_amount FROM consumer_div.users u JOIN consumer_div.wallet_transactions_daily t ON u.uid = t.sender_id WHERE t.sender_type = 0 GROUP BY u.country ORDER BY total_amount DESC LIMIT 5"
ewallet,basic_join_distinct,Return the distinct list of merchant IDs that have received money from a transaction. Include all transaction types in the results you return.,SELECT DISTINCT m.mid AS merchant_id FROM consumer_div.merchants m JOIN consumer_div.wallet_transactions_daily t ON m.mid = t.receiver_id WHERE t.receiver_type = 1
ewallet,basic_join_distinct,Return the distinct list of user IDs who have received transaction notifications.,SELECT DISTINCT u.uid AS user_id FROM consumer_div.users u JOIN consumer_div.notifications n ON u.uid = n.user_id WHERE n.type = 'transaction'
ewallet,basic_join_distinct,Return the distinct list of user IDs who have received transaction notifications.,SELECT DISTINCT user_id FROM consumer_div.notifications WHERE type = 'transaction'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yay thanks for simplifying this!

@rishsriv rishsriv merged commit a73cecb into main May 8, 2024
2 checks passed
@rishsriv rishsriv deleted the wendy/sql_review branch May 8, 2024 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants