Add a feature that augments text representations with WALS-derived typological feature sets using a RoBERTa encoder, to improve downstream tasks (typology prediction, low-resource transfer, linguistic probing).
# Pseudo-script: update_sets.sh python update_wals.py --interactions data/new_clicks.csv --output wals_factors_latest.npy python update_roberta.py --text_data data/new_descriptions.json --output ./roberta_finetuned python merge_sets.py --wals wals_factors_latest.npy --roberta ./roberta_finetuned --output hybrid_embeddings.parquet wals roberta sets upd
class HybridRecoModel(nn.Module): def (self, wals_factors_dim=50, roberta_dim=768): super(). init () self.wals_proj = nn.Linear(wals_factors_dim, 128) self.roberta_proj = nn.Linear(roberta_dim, 128) self.score = nn.DotProduct() Add a feature that augments text representations with
trainer.train()
: Specifically designed to see if a model can predict a language's identity or grammatical features based on sentence embeddings alone. 📈 Why This Matters Importance in NLP Research Language Identity to improve downstream tasks (typology prediction