--- license: apache-2.0 tags: - text-to-json - t5 - seq2seq - text-generation - json-conversion - machine-learning - nlp base_model: t5-small model_name: MD2JSON-T5-V1 version: V1 author: yahyakhoder --- # MD2JSON-T5-V1: Text-to-JSON Converter with T5 This model utilizes the **T5 (Text-to-Text Transfer Transformer)** architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object. ## Description The **MD2JSON-T5-V1** model is trained to interpret text strings where keys and values are separated by a colon (e.g., `#firstname: John`), and then convert them into a valid JSON object. This model can be used for a wide range of tasks where converting text to JSON is required. ### Example Input: - Input: ```text #firstname: John #lastname: Doe #age: 30 #married: true #hobbies: ["gaming", "running"] #address: {"city": "Berlin", "zipcode": 10115} #url: "https://example.com" ``` - Generated JSON Output: ```json { "firstname": "John", "lastname": "Doe", "age": 30, "married": true, "hobbies": ["gaming", "running"], "address": { "city": "Berlin", "zipcode": 10115 }, "url": "https://example.com" } ``` ### Another Example: - Input: ```text #name: Charlie #age: 29 #isStudent: true #skills: ["Java", "Machine Learning"] #profile: {"github": "charlie29", "linkedin": "charlie-linkedin"} #height: 172.3 ``` - Generated JSON Output: ```json { "name": "Charlie", "age": 29, "isStudent": true, "skills": ["Java", "Machine Learning"], "profile": { "github": "charlie29", "linkedin": "charlie-linkedin" }, "height": 172.3 } ``` ## Load the Model To use the model and perform inference, follow the steps below: ### Install Dependencies ```bash pip install torch transformers datasets from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch import json # Load the tokenizer and model model_name = "yahyakhoder/MD2JSON-T5-V1" # Replace with your Hugging Face model path tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Example Input input_text = """#firstname: John #lastname: Doe #age: 30 #married: true #hobbies: ["gaming", "running"] #address: {"city": "Berlin", "zipcode": 10115} #url: "https://example.com" """ # Tokenize and generate the output inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=256) outputs = model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True) # Decode and convert to JSON result = tokenizer.decode(outputs[0], skip_special_tokens=True) try: output_json = json.loads(result) print(json.dumps(output_json, indent=2, ensure_ascii=False)) except json.JSONDecodeError: print("Error during JSON conversion") ### Summary of Changes: 1. The **YAML metadata** section at the beginning of the file includes: - **license**: `apache-2.0` - **tags**: Relevant keywords like `text-to-json`, `t5`, `seq2seq`, `json-conversion`, etc. - **base_model**: `t5-small` - **model_name**: `MD2JSON-T5-V1` - **version**: `V1` - **author**: `yahyakhoder` 2. **Model path** in the code (under `model_name` variable) is updated to `yahyakhoder/MD2JSON-T5-V1` to reflect your Hugging Face username and model name. This should resolve the YAML metadata warning and provide all the necessary information for users accessing your model on Hugging Face.