Skip to main content
New Self-paced AI courses — learn ML, deep learning, and agents on your schedule. Enroll free

Primary literature · systems research

Labs, platforms, vector stores & agentic IDEs—mapped to papers

Tie each landmark PDF to what teams ship: frontier labs and silicon, SageMaker / Vertex-style training planes, embedding databases behind RAG evals, and Cursor-style agents that operationalize tool-use research.

Frontier labs · alignment · silicon

Where primary literature meets today’s builders

These are the teams, publications, and stacks researchers cite when arguing about scaling, safety, hardware, and deployment—your reading map stays grounded in real shipping systems.

Google

DeepMind · Gemini · research @ scale

  • Gemini
  • DeepMind
  • AlphaFold-class R&D

Meta

FAIR · open weights · PyTorch lineage

  • Llama
  • PyTorch
  • FAIR papers

OpenAI

Frontier APIs · reasoning evals

  • GPT
  • o-series
  • human preference RLHF

Anthropic

Interpretability · constitutional RLHF

  • Claude
  • Constitutional AI
  • safety bench

Microsoft

Research · Azure AI science

  • Phi / Orca lineage
  • Copilot science
  • Azure AI

NVIDIA

CUDA graphs · inference · NeMo research

  • CUDA
  • Tensor Core MLPerf
  • NeMo

Intel

Accelerators · edge AI science

  • Gaudi
  • OpenVINO
  • oneAPI AI

Apple

MLX · on-device ML research

  • MLX
  • Core ML
  • Apple Intelligence R&D

IBM

IBM Research · enterprise AI science

  • watsonx.ai
  • Granite models
  • AI governance

Agent research

Tools · planners · grounded retrieval

  • ReAct / tool papers
  • RAG evals
  • orchestration
Platforms · benchmarks · reproducibility

The training & experiment stacks behind modern papers

Vertex AI, SageMaker, and open frameworks are how teams turn ablation studies into repeatable numbers—matching the methods sections you read (distributed training, tracking, registry, serving).

Google Cloud

Vertex AI · managed training & batch

  • Vertex AI
  • BigQuery ML
  • TPU pods

AWS

SageMaker · Bedrock research sandboxes

  • SageMaker
  • Bedrock
  • ParallelCluster

Azure

Azure ML · enterprise MLOps

  • Azure ML
  • Fabric notebooks
  • ONNX Runtime

Databricks

Lakehouse · MLflow lineage

  • MLflow
  • Delta Lake
  • Mosaic training

Docker

Reproducible envs · containerized training

  • CUDA images
  • Compose stacks
  • CI GPU runners

MongoDB

Atlas · vectors · AI workload data

  • Atlas Vector Search
  • RAG stores
  • aggregation pipelines

PyTorch

Dynamic graphs · distributed research code

  • torch.compile
  • FSDP
  • TorchTune

TensorFlow

Graphs · TPU / XLA pathways

  • Keras 3
  • tf.data
  • TF Serving
HF

Hugging Face

Open weights · PEFT · leaderboards

  • Transformers
  • PEFT / LoRA
  • Open LLM LB

Weights & Biases

Experiment tracking · sweeps · model registry

  • W&B Sweeps
  • Registry
  • Artifacts
Vector databases · retrieval memory

Stores behind embeddings, hybrid search & RAG

ANN indexes, hybrid filters, and replication—the stacks RAG papers implicitly benchmark when they claim retrieval-augmented gains.

Pc

Pinecone

Managed vectors · namespaces · metadata filters

  • Serverless pods
  • Hybrid sparse+dense
  • Metadata pruning
Wa

Weaviate

GraphQL vectors · modular retrieval

  • HNSW / PQ
  • Multi-tenancy
  • Generative stack
Qt

Qdrant

Rust core · filtering-heavy RAG

  • Scalar filtering
  • Quantization
  • Distributed HA

Milvus

Open vectors · billion-scale ANN

  • GPU index build
  • Tiered storage
  • Attu ops UI
Ch

Chroma

Embedded UX · rapid RAG prototypes

  • Collections API
  • Persistence modes
  • Embedding adapters

PostgreSQL

pgvector · relational + embeddings

  • pgvector HNSW
  • Hybrid SQL+RAG
  • RDS / Aurora

Redis

RediSearch · low-latency vectors

  • Vector similarity
  • JSON search
  • Enterprise HA

Elasticsearch

dense_vector · lexical + semantic

  • dense_vector mapping
  • RRF fusion
  • ES|QL analytics

OpenSearch

Open lineage · k-NN serving

  • k-NN replication
  • Lexical fusion
  • Managed AOSS

Vespa

Ranking · tensors · hybrid serving

  • Tensor ranking
  • BM25 + ANN
  • Large-scale inference
Agentic IDEs · coding agents

Editors where models drive repos, tests & terminals

Cursor-style agents and Windsurf-class flows operationalize tool-use papers—multi-file edits, terminals, and PR-aware refactors beside every landmark PDF.

Cu

Cursor

Composer · codebase-wide agent edits

  • Composer agent
  • @docs grounding
  • Privacy tiers
Wf

Windsurf

Cascade · Codeium IDE lineage

  • Cascade flows
  • Repo-aware edits
  • Terminal agents

GitHub Copilot

Copilot Chat · workspace agents

  • Copilot Chat
  • CLI agent
  • PR summaries

VS Code

Extensions hub · Copilot host IDE

  • Remote SSH
  • Dev Containers
  • Copilot UX

Codeium

Autocomplete · IDE-native reasoning

  • Repo embeddings
  • Inline chat
  • JetBrains + VS

JetBrains

AI Assistant · Junie · Fleet

  • AI Assistant
  • Junie tasks
  • Kotlin/Python stacks

Replit

Agent · Ghostwriter · hosted shells

  • Replit Agent
  • Ghostwriter
  • Always-on runtimes

Neovim

Keyboard-first · LSP + AI plugins

  • Copilot.vim lineage
  • LSP hubs
  • Research scripting

Google Colab

Notebook GPUs · Gemini coding UX

  • TPU/GPU kernels
  • Gemini cells
  • BigQuery hooks

Gitpod

Cloud workspaces · prebuild parity

  • Prebuilds
  • Parallel sandboxes
  • Dockerfile CI sync

Deep AI ML Research Lab

Where AI research meets systems and products

Architectures, training dynamics, evaluation, and deployment—plus how today’s stacks combine RAG, tools, and agentic reasoning. Papers are anchors; the through-line is practitioner-grade AI research literacy.

  • Models & representation
  • Training & scaling
  • Evaluation & robustness
  • Systems & deployment

Reading journeys

Paths through primary AI literature

These are intentional sequences—not a generic playlist. Follow one path to build a coherent mental model: how ideas cite, critique, and replace each other in modern ML and AI systems research.

5 papers · ~10 hrs suggested

From sequence models to modern LLMs

Encoder–decoder intuition → self-attention → pre-training at scale.

Evidence layer · tiered

16 landmark papers — beginner → professional

Filters aren’t gates—they’re pacing guides. Move up when experiment sections feel familiar and limitations spark ideas instead of confusion.

5 papers

Beginner · foundations

Readable landmarks that anchor intuition—CNNs, sequences, games, and early generative ideas.

  1. 01 beginner
    Deep learning & vision

    ImageNet Classification with Deep Convolutional Neural Networks

    Krizhevsky, Sutskever & Hinton · NeurIPS 2012

    Vision Foundations
    Open in lab →
  2. 02 beginner
    Generative modeling

    Generative Adversarial Nets

    Goodfellow et al. · NeurIPS 2014

    Generative Foundations
    Open in lab →
  3. 03 beginner
    Sequence modeling

    Sequence to Sequence Learning with Neural Networks

    Sutskever, Vinyals & Le · NeurIPS 2014

    NLP Sequences
    Open in lab →
  4. 04 beginner
    Reinforcement learning

    Playing Atari with Deep Reinforcement Learning

    Mnih et al. · 2013 (NIPS Deep Learning Workshop)

    RL Agents
    Open in lab →
  5. 14 beginner
    Prompting & reasoning

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Wei et al. · NeurIPS 2022

    Reasoning Prompting LLMs
    Open in lab →
7 papers

Intermediate · systems thinking

Architecture depth, transformers, diffusion, RAG, multimodal alignment, and efficient tuning.

  1. 05 intermediate
    Very deep networks

    Deep Residual Learning for Image Recognition

    He et al. · CVPR 2016

    Vision Architectures
    Open in lab →
  2. 07 intermediate
    Transformers

    Attention Is All You Need

    Vaswani et al. · NeurIPS 2017

    Transformers NLP

    Core skill — read methods first

    Open in lab →
  3. 08 intermediate
    Language understanding

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin et al. · NAACL 2019

    NLP Pre-training
    Open in lab →
  4. 10 intermediate
    Generative diffusion

    Denoising Diffusion Probabilistic Models

    Ho, Jain & Abbeel · NeurIPS 2020

    Generative Diffusion
    Open in lab →
  5. 11 intermediate
    Knowledge-grounded NLP

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Lewis et al. · NeurIPS 2020

    RAG Knowledge LLMs
    Open in lab →
  6. 12 intermediate
    Vision–language

    Learning Transferable Visual Models From Natural Language Supervision

    Radford et al. · ICML 2021

    Multimodal Vision Representation
    Open in lab →
  7. 13 intermediate
    Parameter-efficient tuning

    LoRA: Low-Rank Adaptation of Large Language Models

    Hu et al. · ICLR 2022

    Efficiency Fine-tuning LLMs
    Open in lab →
4 papers

Professional · scale & agents

Large-scale LM behaviors, RL under uncertainty, and production-grade agent/tool stacks.

  1. 06 professional
    Games & planning

    Mastering the game of Go with deep neural networks and tree search

    Silver et al. · Nature 2016

    RL Planning
    Open in lab →
  2. 09 professional
    Large language models

    Language Models are Few-Shot Learners

    Brown et al. · NeurIPS 2020

    LLMs Scaling
    Open in lab →
  3. 15 professional
    Language agents

    ReAct: Synergizing Reasoning and Acting in Language Models

    Yao et al. · ICLR 2023

    Agents Agentic AI Reasoning
    Open in lab →
  4. 16 professional
    Tool-augmented LMs

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Schick et al. · NeurIPS 2023

    Agents Tools LLMs
    Open in lab →

See the home research preview

From research insight to production

Where AI research becomes products

Clear hypotheses, solid metrics, and reproducible pipelines are as important in the lab as in shipping. Below are example domains where research-grade ML meets real users, compliance, and scale.

Assembly line — research → AI products

Healthcare & life sciences

Clinical & imaging AI

Triaging, radiology assistants, and pathway support—always with audit logs, calibration checks, and human-in-the-loop review grounded in published benchmarks.

Research → validation → deployment

Finance & markets

Risk, fraud & forecasting

Sequence models and robust ensembles for credit, trading analytics, and anomaly detection—with stress tests and drift monitoring tied to reproducible ablations.

Research → validation → deployment

Education

Learning & assessment

Adaptive practice, feedback generation, and integrity tooling—built from cited methods, fairness review, and clear metrics instead of opaque black boxes.

Research → validation → deployment

Document intelligence

OCR, forms & knowledge extraction

Layouts, tables, and long-form PDFs into structured data—combining vision encoders and language models with traceable spans for compliance reviews.

Research → validation → deployment

Tax & accounting

Classification & line-item mapping

Hierarchical labels, entity linking, and jurisdiction-aware rules engines—trained on curated corpora with explicit error analysis on edge cases.

Research → validation → deployment

Legal & compliance

Clause mining & policy QA

Retrieval over corpora plus grounded generation—citations to source passages, versioned prompts, and evaluation sets that mirror real reviewer workflows.

Research → validation → deployment

Retail & operations

Demand & vision in the field

Forecasting stacks and shelf or warehouse vision—closed-loop evaluation on held-out seasons and geos, not just offline accuracy slides.

Research → validation → deployment

Public & civic systems

Allocation & anomaly monitoring

Transparent scoring and monitoring for services and infrastructure—documentation and bias checks treated as part of the product, not an afterthought.

Research → validation → deployment

Explore courses that support product-grade ML