You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Useage 1) Specify only the column name# When taking the highest values in container without the risk of duplicationquery_only_key= ['count', 'date'] # key name to columnforrindiselect(sample_from_json, query_only_key):
print(r)
# results {'count': 1, 'date': '2022-5-31'}
# Useage 2) Extract nested values# parent paths tuple keys of target 'terminal' value# If there are few parental generations, duplicate matching may occur.# Exception when duplicate occursquery_deep_path= [('city', 'names', 'en'), ('country', 'names', 'en')] # en is key of terminal valueforrindiselect(sample_from_json, query_deep_path):
print(r)
# results # {('city', 'names', 'en'): 'Songpa-gu', ('country', 'names', 'en'): 'South Korea'}# {('city', 'names', 'en'): 'Songpa-gu2', ('country', 'names', 'en'): 'South Korea2'}
# Useage 3) Aliasing query to column name# Change the query to an usable column namequery_aliases= {
('city', 'names', 'en'): 'city_name',
('country', 'names', 'en'): 'country_name',
('subdivisions', 'names', 'en'): 'subdivision_name'
}
# orquery_aliases= [
{('city', 'names', 'en'): 'city_name'},
{('country', 'names', 'en'): 'country_name'},
{('subdivisions', 'names', 'en'): 'subdivision_names'}
]
forrindiselect(sample_from_json, query_aliases):
print(r)
# results:# {'city_name': 'Songpa-gu', 'country_name': 'South Korea', 'subdivision_names': ['Seoul', 'Hangang']}# {'city_name': 'Songpa-gu2', 'country_name': 'South Korea2', 'subdivision_names': ['Seoul2', 'Hangang2']}# multiple children values of subdivision_names has coaleased to list ['Seoul', 'Hangang']
# Useage 4) join listed children values# pass tuple value of aliase and functionquery_aliases_and_join_children= {
('city', 'names', 'en'): 'city_name',
('country', 'names', 'en'): 'country_name',
('subdivisions', 'names', 'en'): ('subdivision_names', ','.join), # alias, join function
}
forrindiselect(sample_from_json, query_aliases_and_join_children):
print(r)
# results# {'city_name': 'Songpa-gu', 'country_name': 'South Korea', 'subdivision_names': 'Seoul,Hangang'}# {'city_name': 'Songpa-gu2', 'country_name': 'South Korea2', 'subdivision_names': 'Seoul2,Hangang2'}# Soule, Hangang has joined with sep ','
non-overlapping 'minimum' path of value item (need not be fullpath)
parents path lists key of target 'terminal' value (target value must be scalar value, like str, int...)
More detail is better to avoid duplication (...great-grandparent, grandparent, parent)
You can mix dict and tuple
The results column order of the output matches the order of the query
alias: column name representing the query
apply: function to be applied to value
3. caution
If there is no query matching the key path of the container, a warning is output and it does not appear into the result column.
If the matching of the query is duplicated, an exception is raised and a more detailed query is required.
Consider the data structure of the container. Suggested queries are aggregated by matching top-level keys of matched with query.
# date and count in the presented example data are single entities as top-level keys.# 'count': 1,# 'date': '2022-5-31',# 'data_list': [ ...# but data_list is multiple row value# Querying data from both tendencies at the same time leads to unpredictable behavior.greedy_query= [
# query for top level single context value'count', 'date',
# query for row values
{
('city', 'names', 'en'): 'city_name',
('continent', 'code'): 'continent_code',
('continent', 'names', 'en'): 'continent_name',
('country', 'iso_code'): 'country_code',
('country', 'names', 'en'): 'country_name',
('location', 'time_zone'): 'timezone',
('subdivisions', 'names', 'en'): ('subdivision_name', ','),
}
]
forrindiselect(sample_from_json, greedy_query):
print(r)
# results# {'count': 1, 'date': '2022-5-31', 'city_name': ['Songpa-gu', 'Songpa-gu2'], 'continent_code': ['AS', 'AS2'], 'continent_name': ['Asia', 'Asia2'], 'country_code': ['KR', 'KR2'], 'country_name': ['South Korea', 'South Korea2'], 'timezone': ['Asia/Seoul', 'Asia/Seoul2'], 'subdivision_name': 'Seoul,Hangang,Seoul2,Hangang2'}# The data is organized vertically with the top keys count and date. Maybe this is what you want.# This can be used as a trick to get the column dataset## Tip. separate query by structure for get two of them bothquery_context= ['count', 'date']
query_list= {
('city', 'names', 'en'): 'city_name',
('continent', 'code'): 'continent_code',
('continent', 'names', 'en'): 'continent_name',
('country', 'iso_code'): 'country_code',
('country', 'names', 'en'): 'country_name',
('location', 'time_zone'): 'timezone',
('subdivisions', 'names', 'en'): ('subdivision_name', ','),
}
[context_data] =list(diselect(sample_from_json, query_context)) # may onecount=context_data['count']
date=context_data['date']
# or may be simple and better just direct indexing when values are easy to accesscount=sample_from_json['count']
date=sample_from_json['date']
data_list=list(diselect(sample_from_json, query_list)) # many