lagom-nlp.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

✅

submitted 132 days ago

comment in response to post

We look at the role of English in this evaluation: it can be, and is often used as, an interface to boost task performance. Or it can be used as a natural language to evaluate language understanding. We recommend to move away from task performance as a main goal and focus on language understanding.

submitted 199 days ago

comment in response to post

milanlp.bsky.social is having the same issue, maybe take a look at this github issue here: github.com/bluesky-soci...

submitted 210 days ago

comment in response to post

Moreover, we advocate for a shift in perspective from seeking a general definition of data quality towards a more language- and task-specific one. Ultimately, we aim for this study to serve as a guide to using Wikipedia for pretraining in a multilingual setting.

submitted 212 days ago

comment in response to post

We evaluate the downstream impact of quality filtering on Wikipedia by training tiny monolingual pretrained models for each Wikipedia to find that data quality pruning is an effective means for resource-efficient training without hurting performance, especially for LRLs.

submitted 212 days ago

comment in response to post

We subject non-English Wikipedias to common quality filtering techniques like script filtering, MinHash and heuristic filtering, which reveal widespread issues such as a high percentage of one-line articles and duplicate articles.

submitted 212 days ago

comment in response to post

In this paper we critically examine the notion of Wikipedia as a 'high quality' resource, particularly in the pretraining setting.

submitted 212 days ago

comment in response to post

It's still not working somehow, if i search for your handle in the search bar, your profile doesn't show up, I don't know if this is a bug or some setting on your side that's not set correctly?

submitted 218 days ago

comment in response to post

I just tried to add you to the list and somehow couldn't find you, I suspect this might just be too soon after the account creation? I will try again later, might be tomorrow

submitted 219 days ago

comment in response to post

Just did 😁

submitted 220 days ago

comment in response to post

Never mind it did work

submitted 222 days ago

comment in response to post

Trying to but seems not to be working from my phone, will do this from a laptop later today or tomorrow if it hasn't worked

submitted 222 days ago

comment in response to post

welcome!

submitted 223 days ago

comment in response to post

go.bsky.app/LKGekew here we go!

submitted 229 days ago

comment in response to post

why not! one more!

submitted 229 days ago

comment in response to post

I wanted to do this but I am not finding enough accounts yet, I also have @amsterdamnlp.bsky.social @ukplab.bsky.social @colt-upf.bsky.social but I need two more

submitted 229 days ago

comment in response to post

@mdlhx.bsky.social will virtually present our work about zero-shot pos tagging at the multilingual representation workshop on Saturday, 16 Dec at the poster session Anthology link: aclanthology.org/2024.mrl-1.9/

submitted 230 days ago

comment in response to post

Kushal Tatariya will present our work about interpreting PIXEL (Pixology): Session 09 Interpretability and Analysis of Models for NLP Nov 13 (Wed) 16:00-17:30 Anthology link: aclanthology.org/2024.emnlp-m...

submitted 230 days ago

comment in response to post

Wessel Poelman and Esther Ploeger will present about typological diversity: Session 11 Multilinguality and Language Diversity Nov 14 (Thu) 10:30-12:00. Anthology link: aclanthology.org/2024.emnlp-m...

submitted 230 days ago

comment in response to post

Furthermore, we show that skewed language selection can paint an unfair picture of multilingual model performance. We hope that this work motivates more systematic approaches to language sampling in NLP, potentially inspired by existing methods from linguistic typology.

submitted 508 days ago

comment in response to post

We approximate the diversity by measuring average language distance and absolute typological feature value inclusion and find great variation across papers.

submitted 508 days ago

comment in response to post

We find that there are no set definitions or criteria for making claims about typological diversity in NLP. In practice, when looking at all papers making such claims, languages spoken in Europe are overrepresented.

submitted 508 days ago

comment in response to post

Spoiler: We find that PLMs do get more influenced by Hindi words to predict negative emotions, and by English words to predict positive emotions. Moreover, the PLMs may also overgeneralise this learning to examples where it does not apply.

submitted 508 days ago

comment in response to post

We use LIME and token-level language ID to examine the effect of language on emotion prediction across 3 PLMs finetuned on a Hinglish emotion classification dataset.

submitted 508 days ago

comment in response to post

TLDR: In this paper we leverage sociolinguistic theories to see what pre-trained language models learn when predicting emotion for code-mixed data. Hinglish speakers switch to Hindi to express negative emotions and to English for positive emotions.

submitted 508 days ago