Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish Implementation #29

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open

Conversation

Laifsyn
Copy link

@Laifsyn Laifsyn commented Apr 7, 2024

Submitting as draft because
1-) I'm unclear how pull requests actually works
2-) I haven't properly made an Integration Test because I found it extremely un-ergonomic to constantly build the Num2Words settings after every .to_word() call

I also submitted a .devcontainer configuration because I found it extremely helpful to be able to test stuffs in a virtual environment specially so when I didn't have available a beefy-enough computer (I might've forgotten to add Rust Docs Viewer in the extensions though)

@Laifsyn
Copy link
Author

Laifsyn commented Apr 12, 2024

Alright, this should be about it. (except for mod.rs because I was too lazy to undo rustfmt on that. Shame on me)
But everything should be the most minimal for the spanish implementation feature.

Rustfmt should be easily be doable in another pull request by just re-adding the rustfmt.toml file and running rustfmt against the project.

the undonde derives could be additionally done in another PR.

and optionally fix clippy warnings in another PR

Copy link
Author

@Laifsyn Laifsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the commit content. This commit should've been add missing docs

@Ballasi
Copy link
Owner

Ballasi commented Apr 12, 2024

Sorry for making you go through all of this trouble! and thank you very much for taking the time to work on it, it's very kind :)

I'll see the PR more in depth this weekend! If you're not planning on redoing the derives or rustfmt, I'll take on that job once this is merged.

@Laifsyn
Copy link
Author

Laifsyn commented Apr 12, 2024

I was thinking about figuring what rustfmt config was the one that onelined the chained methods and see if it was alright to disable it.

The reason is because if the method chaining gets one-lined, then the type-hint for each method return on the chain will not be hinted by rust-analyser.

but anyways, the chainings tends to be short more often than not, so the loss of the type-hint which can only be seen in an IDE isn't that relevant

@Laifsyn
Copy link
Author

Laifsyn commented Apr 14, 2024

Hi, I was thinking about making a PR Benching with Criterion with another branch for benchmarks. Mainly I want to see what issues can arises when I expect merge conflicts( expecting so because I had to add some derives to do the benchmark)

The reason was to see how the spanish implementation fared because I did a fix to the unneeded 0 spam that the split_thousands() method created. (Ref: link)

Copy link
Owner

@Ballasi Ballasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for taking the time to participate! I'm sorry I am relatively slow, I've got a lot going on lately.

Feel free to take a look at the comments I've put :)

tests/lang_es_test.rs Show resolved Hide resolved
README.md Show resolved Hide resolved
@@ -25,6 +25,7 @@ AVAILABLE LANGUAGES:
fr: French (France and Canada)
fr_BE: French (Belgium and the Democratic Republic of the Congo)
fr_CH: French (Swiss Confederation and Aosta Valley)
es: Spanish
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the pokemon?

src/lang/es.rs Show resolved Hide resolved

let word = to_words(BigFloat::from(21.01), Outputs::Currency, &[]);
assert_eq!(word.unwrap(), "veintiún US dollars con un centavo");
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, compared the results from es.rs to python num2words and see differences

I don't mind much the differences in megas as it depends a lot on the source. however mil trillones makes me wonder, is this what should actually be the number?

10000000000000
diez billones
diez trillones

100000000000000
cien billones
cien trillones

1000000000000000000
un trillón
un quintillón

1000000000000000000000 <=== here
mil trillones
un sextillón

10000000000000000000000000
diez cuatrillones
diez septillones

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's different?. Is Mil Trillones from Python?

I followed the standard of Each thousands changes the Millard because It was a bit confusing that a Billion*1000 is called the equivalent in spanish of "thousand billions" instead of "trillion"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://es.wikipedia.org/wiki/Anexo:Nombres_de_los_n%C3%BAmeros_en_espa%C3%B1ol
So according to this source, it starts to jump by 1_000_000 instead of 1_000 after the trillion

As summary, English's Sextillion is 10^21
Spanish's Sextillion (according to this particular page in wiki) is 10^36

I guess it would still best to discuss with someone more familiarized with spanish, because if this really is the spanish standard, I'm pretty sure almost none will get spanish's quadrillon right (Because it actually is "A Million Trillion")

tests/lang_es_test.rs Show resolved Hide resolved
based on [Real Academia Española](https://www.rae.es/dpd/ordinales) standards, the TENS of 2 have no space between itself and its units.
`"la primera y a la segunda decena se pueden escribir en una o en dos palabras, siendo hoy mayoritaria y siempre preferible la grafía univerbal"`
From Ballasi#29 (comment),
fix ordinal representation based on the rules defined in 2.b && 2.d @
https://www.rae.es/dpd/ordinales
@Laifsyn
Copy link
Author

Laifsyn commented Apr 15, 2024

Alright. That should be it

@Laifsyn
Copy link
Author

Laifsyn commented Apr 15, 2024

Oh, btw. I'm working on translating the currencies. Might submit a commit by the end of the day

Fix the wrong semantics that was used.
Spanish semantics follows this pattern of for each big milliard (Million, Billion, Trillion) grows at a pace of `1 000 000`^n

This means that 1 million is `1 000 000`^1
1 billion => `1 000 000`^2 [1 000_000 000_000]
1 trillion => `1 000 000`^3 [1 000_000 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
...... etc

This semantic differs from english's
1 billion => `1 000`^3 [1 000 000_000] 
1 trillion => `1 000 000`^4 [1 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
Laifsyn added a commit to Laifsyn/num2words_spanish that referenced this pull request Apr 17, 2024
* Fix Tens edge case for Ordinals

based on [Real Academia Española](https://www.rae.es/dpd/ordinales) standards, the TENS of 2 have no space between itself and its units.
`"la primera y a la segunda decena se pueden escribir en una o en dos palabras, siendo hoy mayoritaria y siempre preferible la grafía univerbal"`

* Fix Ordinals Representation
From Ballasi#29 (comment),
fix ordinal representation based on the rules defined in 2.b && 2.d @
https://www.rae.es/dpd/ordinales

* Fix Tests & remove comments

* Currency Translation for spanish

* Cardinal Fix - Mistakenly used English semantics (#1)

Fix the wrong semantics that was used.
Spanish semantics follows this pattern of for each big milliard (Million, Billion, Trillion) grows at a pace of `1 000 000`^n

This means that 1 million is `1 000 000`^1
1 billion => `1 000 000`^2 [1 000_000 000_000]
1 trillion => `1 000 000`^3 [1 000_000 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
...... etc

This semantic differs from english's
1 billion => `1 000`^3 [1 000 000_000] 
1 trillion => `1 000 000`^4 [1 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
@Laifsyn
Copy link
Author

Laifsyn commented Apr 25, 2024

I have a question if you can get the answer

I'll be leaving it as the first meanwhile, but should it be like the second case?

assert_eq!(
    ordinal(124_001_091_000_000_000_001),
    "ciento veinticuatrotrillonésimos milnoventa y unobillonésimos primeros"
//  "ciento veinticuatrotrillonésimos mil noventa y unobillonésimos primeros"
);

* Staging changes for ordinal Fix

Some data to take into consideration
Todo: Fix accent on monomorphized words i.e. VigesimoSegundo

Si el ordinal se escribe en dos palabras, el primer elemento mantiene la tilde que le corresponde como palabra independiente: vigésimo segundo, vigésima cuarta, trigésimo octavo, cuadragésima quinta; pero, si se escribe en una sola palabra, el compuesto resultante, al ser ser una voz llana terminada en vocal, debe escribirse sin tilde, pues no le corresponde llevarla según las reglas de acentuación (v. cap. II, § 3.4.5.1.1): vigesimosegundo (no ⊗‍vigésimosegundo).

Los ordinales complejos escritos en una sola palabra solo presentan variación de género y número en el segundo componente: vigesimoprimero, vigesimoprimera, vigesimoprimeros, vigesimoprimeras; pero, si se escriben en dos palabras, ambos componentes son variables: vigésimo primero, vigésima primera, vigésimos primeros, vigésimas primeras. No se consideran correctas las grafías en dos palabras si se mantiene invariable el primer componente: ⊗‍vigésimo segundos, ⊗‍vigésimo cuarta, ⊗‍vigésimo octavas.

* temporarily add rustfmt back

* Finish Ordinal Fix.

Reference: https://www.rae.es/dpd/ordinales

* remove accent on composites of `Vigesimo`

* remove rustfmt for merge
@Laifsyn
Copy link
Author

Laifsyn commented May 26, 2024

I couldn't redo the Cardinal implementation.
Performance gains were dirt low. at about x1.02-x1.25 faster, and also the implementation looked a bit ugly. So I have no changes to the current implementation of Spanish.

@PabloGmz96
Copy link

Hey guys! Native Spanish speaker here. Is there anything that you need help with? I'd love to use this crate in a project that I'm currently developing

@Laifsyn
Copy link
Author

Laifsyn commented Jun 16, 2024

Hey guys! Native Spanish speaker here. Is there anything that you need help with? I'd love to use this crate in a project that I'm currently developing

  • I would like opinions on whether to support a flavor for "Semantically Incorrect" milliards.
    according to RAE, 10E9(1,000,000,000) is called "Billón", whereas in English, 10E9 is called "Trillion". Choosing to support semantically Incorrect cardinals would mean that spanish's "Billón" would be english's trillion milliards like "billón" and "trillón" in spanish are just the literal translation of english's.

  • Maybe check currency, in case some currency got mistranslated.

  • Support converting too big of numbers into nEm expression, i.e. 20.23E20 => veinte coma veintitrés por 10 a la 20

Priority close to 0:

  • Cleanup tests & Integration Tests: Because some test cases overlaps other tests. Also I personally think they could look cleaner with some work.

Otherwise, everything should be fine. Other than big numbers which can lose precision

@PabloGmz96
Copy link

PabloGmz96 commented Jun 20, 2024

Hey @Laifsyn! I just review the "Semantically Incorrect" numbers that you mention and as far as I know almost the complete Latin-america region use those big numbers exactly in the same way as in English, so a direct translation of them should be precise enough. Later versions of this crate could include some other forms of Spanish cardinals numbers but knowing that they are rarely used this should be not a priority. But let me ask some friends in Colombia and Argentina (Countries that use particularly high number prices in the every day life) to confirm that.

In the other hand, let me test the USD and PESOS translation, those are the currencies that I'm familiar with.

And last but not least, I just want to say a big thank you to make the Spanish implementation possible!

@PabloGmz96
Copy link

PabloGmz96 commented Jun 20, 2024

Just a quick update... When using Spanish as language, the output shows the array that the code use to translate the numbers to words.
image

@Laifsyn
Copy link
Author

Laifsyn commented Jun 20, 2024

Just a quick update... When using Spanish as language, the output shows the array that the code use to translate the numbers to words. image

Thanks for finding out! I might have forgotten to test the actual binary before I uploaded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants