Getting Started with Hugging Face

The Hub for Open-Source AI

Hugging Face has become the go-to platform for sharing, discovering, and using pre-trained machine learning models. Their Transformers library makes it possible to use state-of-the-art NLP, vision, and audio models with just a few lines of code.

Installation and Setup

pip install transformers datasets torch

Quick Start: Text Classification

The pipeline API is the fastest way to get started. It abstracts away tokenization, model loading, and inference into a single function call.

from transformers import pipeline

# Create a pipeline with a pre-trained model
classifier = pipeline("sentiment-analysis")

result = classifier([
    "I absolutely love this new update!",
    "This is the worst experience I've ever had.",
    "The weather is okay, nothing special."
])

for item in result:
    print(f"{item['label']}: {item['score']:.4f}")
# POSITIVE: 0.9998
# NEGATIVE: 0.9987
# POSITIVE: 0.5842

Named Entity Recognition

Extract structured information from unstructured text:

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
text = "Elon Musk founded SpaceX in 2002. He is also the CEO of Tesla."

entities = ner(text)
for entity in entities:
    print(f"  {entity['word']}: {entity['entity_group']} ({entity['score']:.4f})")
#  Elon Musk: PERSON
#  SpaceX: ORGANIZATION
#  2002: DATE
#  Tesla: ORGANIZATION

Fine-Tuning a Model

Pre-trained models are powerful, but fine-tuning on your own data unlocks domain-specific performance.

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load a pre-trained model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Prepare your dataset
dataset = load_dataset("csv", data_files={"train": "train.csv", "test": "test.csv"})

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="epoch",
)

# Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

trainer.train()
trainer.save_model("./fine-tuned-model")

The Hugging Face Hub

The Hugging Face Hub hosts over 900,000 models across NLP, computer vision, audio, and multimodal tasks. You can:

Search for models by task, language, or license
Upload your own models and datasets
Run inferences directly in the browser
Collaborate with the community

from huggingface_hub import login, list_models

# Login with your token
login("your_huggingface_token")

# Browse available models
models = list_models(task="text-classification", sort="downloads", direction=-1)
for model in list(models)[:5]:
    print(model.id)

Beyond NLP: Vision and Audio

Hugging Face supports much more than text:

# Image classification
from transformers import pipeline

image_classifier = pipeline("image-classification")
result = image_classifier("dog.jpg")

# Speech recognition
from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition")
result = transcriber("audio.wav")
print(result["text"])

Conclusion

Hugging Face has democratized access to cutting-edge AI. Whether you're a beginner trying your first inference or a researcher fine-tuning a large model, the platform provides the tools and community to get started quickly. Explore the Hub, experiment with different models, and contribute back to the open-source AI ecosystem.

Getting Started with Hugging Face

The Hub for Open-Source AI

Installation and Setup

Quick Start: Text Classification

Named Entity Recognition

Fine-Tuning a Model

The Hugging Face Hub

Beyond NLP: Vision and Audio

Conclusion

The Signal

Key takeaways

What to watch next

Who should care

Key players

One sharp read on the day’s biggest tech story.

Related reading

Natural Language Processing Basics

Introduction to Machine Learning

Computer Vision: Applications and Techniques