Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

1_Pooling/config.json +10 -0
README.md +171 -0
config.json +23 -0
config_sentence_transformers.json +14 -0
config_setfit.json +15 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +73 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,171 @@

+---
+base_model: sentence-transformers/all-mpnet-base-v2
+language:
+  - en
+license: apache-2.0
+tags:
+  - noneconomic-attributes
+  - mention-classification
+  - mpnet-base-v2
+  - setfit
+  - multi-label-classification
+model-index:
+- name: all-mpnet-base-v2_noneconomic-attributes-classifier
+  results:
+  - task:
+      type: multi-label-classification
+      name: Multi-label classification
+    metrics:
+    - type: _tba_
+      value: -1.0
+    dataset:
+      type: custom
+      name: custom human-labeled multi-label annotation dataset
+---
+# Group mention non-economic attributes classifier
+A multi-label classifier for detecting **non-economic attribute** categories referred to in a social group mention, trained with `setfit` based on the light-weight [`sentence-transformers/all-mpnet-base-v2`](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) sentence embedding model.
+The non-economic attributes classified are:
+| attribute                 | definition                                                                                                                                                                                                        |
+|:--------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| age                       | People referred to based on or categorized according to their age, generation, or cohort such as children, young people, old people, future generations.                                                          |
+| family                    | People referred to based on or categorized according to their familial role such as fathers, mothers, parents.                                                                                                    |
+| gender/sexuality          | People referred to based on or categorized according to their gender or sexuality such as men, women, or LGBTQI+ people.                                                                                          |
+| place/location            | People referred to based on or categorized according to their place or location such as peolple from rural areas, urban center, the global south, or global north.                                                |
+| nationality               | People referred to based on or categorized according to their nationality such as natives or immigrants.                                                                                                          |
+| ethnicity                 | People referred to based on or categorized according to heir ethnicity such as people of color or ethnic minorities.                                                                                              |
+| religion                  | People referred to based on or categorized according to their religion of belief such as christians, jews, muslims, etc.                                                                                          |
+| health                    | People referred to based on or categorized according to their health condition or relation to aspects of health such as disabled/handicapped people or chronically sick people.                                   |
+| crime                     | People referred to based on or categorized according to their relation to crime such as offenders/criminals or victims.                                                                                           |
+| shared values/mentalities | People referred to based on or categorized according to their shared values or mentalities such as people with a growth mindset, meritocratic values, environmental or peace mentalities or a more equal society. |
+## Model Details
+### Model Description
+Group mention non-economic attributes classifier
+- **Developed by:** Hauke Licht
+- **Model type:** mpnet
+- **Language(s) (NLP):** ['en']
+- **License:** apache-2.0
+- **Finetuned from model:** sentence-transformers/all-mpnet-base-v2
+- **Funded by:** The *Deutsche Forschungsgemeinschaft* (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866
+### Model Sources
+- **Repository:** _tba_
+- **Paper:** _tba_
+- **Demo:** [More Information Needed]
+## Uses
+### Bias, Risks, and Limitations
+- Evaluation of the classifier in held-out data shows that it makes mistakes.
+- The model has been finetuned only on human-annotated labeled social group mentions recorded in sentences sampled from party manifestos of European parties (mostly far-right and Green parties). Applying the classifier in other domains can lead to higher error rates.
+- The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.
+#### Recommendations
+- Users who want to apply the model outside its training data domain should evaluate its performance in the target data.
+- Users who want to apply the model outside its training data domain  should contuninue to finetune this model on labeled data.
+### How to Get Started with the Model
+Use the code below to get started with the model.
+## Usage
+You can use the model with the [`setfit` python library](https://github.com/huggingface/setfit) (>=1.1.0):
+*Note:* It is recommended to use transformers version >=4.5.5,<=5.0.0 and sentence-transformers version >=4.0.1,<=5.1.0 for compatibility.
+### Classification
+```
+import torch
+from setfit import SetFitModel
+model_name = "hauke-licht/all-mpnet-base-v2_noneconomic-attributes-classifier"
+device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
+classifier = SetFitModel.from_pretrained(model_name)
+classifier.to(device);
+# Example mentions
+mentions = ["working class people", "highly-educated professionals", "people without a stable job"]
+# Get predictions
+predictions = classifier.predict(mentions)
+print(predictions)
+# Map predictions to labels
+[
+    [
+        classifier.id2label[l]
+        for l, p in enumerate(pred) if p==1
+    ]
+    for pred in predictions
+]
+```
+### Mention embedding
+```python
+import torch
+from sentence_transformers import SentenceTransformer
+model_name = "hauke-licht/all-mpnet-base-v2_noneconomic-attributes-classifier"
+device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
+# Load the sentence transformer component of the pre-trained classifier
+model = SentenceTransformer(model_name, device=device)
+# Example mentions
+mentions = ["working class people", "highly-educated professionals", "people without a stable job"]
+# Compute mention embeddings
+embeddings = model.encode(mentions)
+````
+## Training Details
+### Training Data
+The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.
+### Training Procedure
+#### Training Hyperparameters
+- num epochs: (1, 4)
+- train batch sizes: (32, 4)
+- body train max teps: 75
+- head learning rate: 0.010
+- L2 weight: 0.01
+- warmup proportion: 0.15
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.
+## Citation
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Model Card Contact
+[email protected]

config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "relative_attention_num_buckets": 32,
+  "transformers_version": "4.57.1",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "__version__": {
+    "sentence_transformers": "5.1.0",
+    "transformers": "4.57.1",
+    "pytorch": "2.6.0+cu124"
+  },
+  "model_type": "SentenceTransformer",
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "normalize_embeddings": true,
+  "labels": [
+    "noneconomic__age",
+    "noneconomic__crime",
+    "noneconomic__ethnicity",
+    "noneconomic__family",
+    "noneconomic__gender_sexuality",
+    "noneconomic__health",
+    "noneconomic__nationality",
+    "noneconomic__place_location",
+    "noneconomic__religion",
+    "noneconomic__shared_values_mentalities"
+  ]
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:67ef1e1b6b0940ab69d33374d5cdb21c399541dac6ef2c9a7bde020d11ff81b8
+size 437967672

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ffcd1d585bd83325c4f31de778facd9479fbeac5471259bd472db5209eb1145
+size 32691

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 384,
+    "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 384,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "</s>",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff