Attention Is All You Need
Vaswani et al. • The foundational paper introducing the Transformer architecture that revolutionized NLP and became the basis for models like GPT and BERT.
Key Takeaway: Self-attention mechanisms can replace recurrence entirely, enabling parallel computation and capturing long-range dependencies more effectively.
Show and Tell: A Neural Image Caption Generator
Vinyals et al. • Introduced the encoder-decoder framework for image captioning using CNN features and LSTM language models.
Key Takeaway: CNN-LSTM architecture effectively bridges visual understanding and language generation. Implemented this in my Advanced Image Captioning project.
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau et al. • Introduced the attention mechanism for sequence-to-sequence models, allowing the decoder to focus on relevant parts of the input.
Key Takeaway: Attention enables models to dynamically weight input features, solving the bottleneck problem in fixed-length encoding. Applied Bahdanau attention in my image captioning work.
You Only Look Once: Unified, Real-Time Object Detection
Redmon et al. • Pioneering work that frames object detection as a single regression problem, enabling real-time detection speeds.
Key Takeaway: Single-shot detection enables real-time applications. Used YOLOv8 in MARG for traffic vehicle detection at 30+ FPS.
Deep Residual Learning for Image Recognition
He et al. • Introduced skip connections that enable training of very deep networks by solving the vanishing gradient problem.
Key Takeaway: Residual connections are fundamental to deep network training. Used ResNet-101 as the encoder backbone in my image captioning project.
Language Models are Few-Shot Learners (GPT-3)
Brown et al. • Demonstrated that scaling language models enables impressive few-shot and zero-shot learning capabilities.
Key Takeaway: Scale and in-context learning can emerge powerful capabilities without fine-tuning. This understanding informs my work on AI agents and LLM applications.