Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

available_uriel_languages is not correct #12

Open
matus-pikuliak opened this issue Jan 1, 2023 · 1 comment
Open

available_uriel_languages is not correct #12

matus-pikuliak opened this issue Jan 1, 2023 · 1 comment

Comments

@matus-pikuliak
Copy link

matus-pikuliak commented Jan 1, 2023

I've noticed that available_uriel_languages does not work properly. I am not sure why it is filtered according to fam features, but the filtering itself seems buggy. The resulting mask has only ~3.5k elements, even though we have 7k languages with URIEL features. All of elements in the mask are True. Additionally, the number 3.5k is exactly equal to the number of language families. I suspect that the reduction in np.all might being done along wrong axis.

I have sidestepped the issue for now by using data directly from the feature_predictions.npz file.

def available_uriel_languages():
avail = set()
#for feature_set in FEATURE_SETS_DICT:
for feature_set in ["fam"]:
filename, source, prefix = FEATURE_SETS_DICT[feature_set]
filename = pkg_resources.resource_filename(__name__, os.path.join('data', filename))
feature_database = np.load(filename)
mask = np.all(feature_database["data"] != -1.0, axis=0)
langs = [feature_database["langs"][i] for i,m in enumerate(mask) if np.sum(m)>0]
for l in langs:
avail.add(l)
return avail

@antonisa
Copy link
Owner

Huh, thanks! I'll look into incorporating this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants