Skip to content

Not accurate source language autodetection  #74

@joeperpetua

Description

@joeperpetua

Hi!
First of all wanted to say that I love the project, have been using it for a while now.

I came across some bizarre behavior that maybe you could check or maybe explain to me (I tried checking the source code for the functions but did not see anything relevant that could be causing this).

In this case, it seems that the source language autodetection is a bit off when giving it short and single words. I reproduced it with Spanish, but I don't know if it does happen in other languages too.
In this case, if you give the words "casa" or "hola" for example, it will detect the source language as English instead of Spanish.

For example using the base translator:

Python 3.11.1 (tags/v3.11.1:a7a450f, Dec  6 2022, 19:58:39) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import translatepy
>>> translatepy.Translator().language("casa")
LanguageResult(service=Google, source=casa, result=eng)

Then I tried using the translators explicitly, in this case Reverso and Google, then using the base translator again, and it worked correctly (I guess because of the cache, but I may be wrong):

Python 3.11.1 (tags/v3.11.1:a7a450f, Dec  6 2022, 19:58:39) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import translatepy
>>> translatepy.translators.reverso.ReversoTranslate().language("casa")
LanguageResult(service=Reverso, source=casa, result=spa)
>>> translatepy.translators.google.GoogleTranslate().language("casa")
LanguageResult(service=Google, source=casa, result=spa)
>>> translatepy.Translate().language("casa")
LanguageResult(service=Google, source=casa, result=spa)

But interestingly enough, then, in the same session, using the base translator with the method translate(), the detection was off again:

>>> translatepy.Translate().translate("casa", "en")
TranslationResult(service=Google, source=casa, source_language=eng, destination_language=eng, result=casa)

Any ideas of why could be this happening? I guess the workaround by know would be to run the GoogleTranslate().language() method, and then the Translator().translate() method to get accurate results, like so:

>>> lang = translatepy.translators.google.GoogleTranslate().language("casa")
>>> translatepy.Translate().translate("casa", "en", lang.result)
TranslationResult(service=Google, source=casa, source_language=spa, destination_language=eng, result=house)

Anyway, wanted to ask about this and see if there is any reasoning behind it.
Sorry for the long message and thanks in adavance !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions